UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 138 Mental State Reasoning for Large Language Models Supervisor: SONG Yangqiu / CSE Student: CHEN Hongyu / MATH-GM Course: UROP 1100, Spring This project explores AI’s ability to assess and emulate psychological states through the framework of belief, desire, and intention. We began with classic false belief experiments in the Theory of Mind (ToM) domain, revealing challenges in adapting these for AI evaluation. Shifting focus to knowledge boundaries, we aimed to enhance AI’s comprehension of complex phenomena using ToM methodologies. Practical applications were examined in fields like financial markets and recommendation systems, where understanding human behavior is crucial. Our research contributes to a nuanced understanding of AI’s potential to navigate humanlike reasoning, improving its effectiveness in real-world applications and fostering more intuitive human-AI interactions. Mental State Reasoning for Large Language Models Supervisor: SONG Yangqiu / CSE Student: CHOY Ping Hang / COGBM Course: UROP 1100, Fall In this project, Mental State Reasoning for Large Language Models, I was introduced to the concept of Theory of Mind (ToM) and the analysis of whether modern large language models (LLMs) possess such ability. This final report will provide a literature review on the topic of ToM and along with it several benchmarks on evaluation of LLMs’ ToM ability, and also review on the analysis on LLM advancements. Following the theoretical groundworks, this report details the practical work on data processing for a future multi-language ToM benchmark and highlights the difficulties of processing high quality data set as a result for the use case. Mental State Reasoning for Large Language Models Supervisor: SONG Yangqiu / CSE Student: FU Yixuan / COSC Course: UROP 1100, Spring As Large Language Models (LLMs) become increasingly integrated into everyday applications, their ability to comprehend human mental states, which is the core of Theory of Mind (ToM), is critical for enabling trustworthy and socially aware interactions. However, current evaluations of LLMs’ ToM capabilities remain fragmented, with inconsistencies across existing benchmarks and concerns about their validity in capturing nuanced reasoning. I reviewed some recent literature on Theory of Mind research in LLMs, which informed the development of methodologies for constructing evaluation benchmarks. Furthermore, I have systematically investigated the theoretical framework of 16 fundamental human desires (Reiss, 2004) and tried to apply it to contribute to ongoing research, which aims to develop a novel benchmark that assesses ToM capabilities in LLMs through 16 human desires mapped to Maslow’s hierarchy of needs.

RkJQdWJsaXNoZXIy NDk5Njg=