UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 141 Reasoning with Large Foundation Models Supervisor: SONG Yangqiu / CSE Student: WU Yuetong / DSCT Course: UROP 2100, Fall With the recent advancements in large language models, enhancing the commonsense reasoning capabilities of these models has become a crucial task in the field of natural language processing. Numerous new models and datasets have been proposed to address this challenge, reflecting the growing importance of understanding human-like reasoning in AI systems. In this paper, we provide a comprehensive overview of several commonly used models and datasets specifically designed for commonsense reasoning. Furthermore, we present various methods we have explored and tested, conducting detailed ablation experiments and visualization analyses. These efforts have yielded noteworthy findings that contribute to our understanding of how these models can be improved in their reasoning capabilities, ultimately advancing the state of the art in this vital area of research. Reasoning with Large Foundation Models Supervisor: SONG Yangqiu / CSE Student: ZHANG Xiaoan / MATH-CS Course: UROP 1100, Fall Food safety is a critical concern worldwide, necessitating effective detection methods for potential hazards in food products. This report presents the development of a machine learning model aimed at identifying food hazards. The project utilizes advanced algorithms and a huge dataset to train the model for accurate detection. By employing techniques such as data preprocessing, feature extraction, and model training, we aim to enhance food safety and minimize health risks. The results demonstrate the model’s capability to classify food hazards, and it shows that the trained model can be used in various situations to detect food hazards. Towards Better Multi-Agent Workflow Construction Supervisor: SONG Yangqiu / CSE Student: YANG Sijie / COMP Course: UROP 1100, Summer This progress report details the UROP1100 research on multimodal agents for long-video reasoning. We initiated by selecting multimodal LLMs as our research focus, then established a comprehensive field understanding through systematic literature review and recent influential works. Our focus narrowed to long-video deep reasoning—a key multimodal challenge—and we identified multimodal agents combining VLMs and RAG as a promising approach. We chose MMR-V and Video-Holmes as evaluation benchmarks, and DeepVideoDiscovery (DVD) as the foundation model, implemented via its open-source repository and HKUST Azure API. After deploying DVD, we evaluated it on the MMR benchmark. Observations revealed issues hindering correct reasoning, prompting us to propose solutions and establish an iterative “testing → analysis → modification” improvement loop.

RkJQdWJsaXNoZXIy NDk5Njg=