UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 137 Logical Inference and Rationales in Large Foundation Models Supervisor: SONG Yangqiu / CSE Student: LIU Jiayu / COMP WANG Rui / COMP Course: UROP 2100, Summer UROP 1100, Summer Prospect Theory (PT) models how humans make decisions under uncertainty, while epistemic markers (e.g., “maybe”) communicate uncertainty through language. Whether PT applies to modern large language models (LLMs) and how these linguistic cues influence their choices remains underexplored. We design a three-phase experiment using economic choice tasks, introducing uncertainty through empirically estimated probabilities for common epistemic markers. Integrating these into a PT-based framework, we find that PT does not consistently account for LLM behavior, especially when uncertainty is expressed in diverse linguistic forms. Mental State Reasoning for Knowledge Graph Supervisor: SONG Yangqiu / CSE Student: DING Yuyi / COMP Course: UROP 1100, Fall Large Language Models (LLMs) have brought huge interest in their abilities for mental state reasoning and possible machine Theory of Mind (ToM). A few researches had dived into the topic, trying to evaluate the ToM abilities in LLMs. However, some common restrictions exist. In this project, I try to summarize the results of previous researches and present a new possible way to evaluate machine ToM in LLMs in dynamic surroundings through negotiation games. Mental State Reasoning for Large Language Models Supervisor: SONG Yangqiu / CSE Student: AO Yuzhuo / DSCT Course: UROP 1100, Fall UROP 2100, Spring This UROP Program primarily focused on the construction of a benchmark, named DesireBench to evaluate Large Language Models’ capabilities in recognizing human desires, which is a crucial component of the Theory of Mind. This project was conducted under the guidance of a PhD mentor. The benchmark is grounded in two prominent theories of human motivation: 1. Maslow’s Hierarchy of Needs 2. Reiss’s 16 basic desire theory (Reiss, 2004) DesireBench consists of narrative stories featuring characters with complex desires. The benchmark’s task is to identify the desires of the characters within these stories based on Maslow’s theory (at a 5-level granularity) and Reiss’s 16 basic desire theory (at a 16-level granularity). Existing research on LLMs in the field of Theory of Mind has extensively covered aspects such as belief, intention, and emotion. However, there is a notable lack of systematic study concerning the recognition of human desires, particularly under conditions of high complexity. This benchmark is designed to address and fill this gap in the current research landscape.

RkJQdWJsaXNoZXIy NDk5Njg=