UROP Proceeding 2024-25

School of Engineering Department of Electronic and Computer Engineering 163 Visual-Language Large Foundation Models and Their Applications in Medical Image Analysis Supervisor: LI Xiaomeng / ECE Student: REN Haochong / DSCT Course: UROP 1000, Summer This report compiles the findings from the Visual-Language Large Foundation Models for Medical Image Analysis undergraduate research project. Learning the basics of neural networks (such as CNN architectures), deep learning frameworks (such as PyTorch), and linear algebra (such as matrix operations) was the main goal of the study’s first phase. Real-world experiments were conducted using datasets such as MNIST to understand image classification, in addition to studies of specialized models like U-Net for medical image segmentation. The project also required reading literature and using online resources to comprehend key concepts in disease segmentation, classification, and multimodal learning. The skills acquired during this preparatory phase pave the way for future research endeavors, such as the use of advanced models for clinical applications. Visual-Language Large Foundation Models and Their Applications in Medical Image Analysis Supervisor: LI Xiaomeng / ECE Student: TSOI Ming Yu / DSCT Course: UROP 1000, Summer During this UROP experience, I primarily focused on fine-tuning Visual-LLM models, specifically Qwen visual models to train the medical videos understanding ability of the models under the ms-swift framework. Under the guidance of Dr. Lan, my work involved generating video question-answer pairs for training and using msswift framework to train the models. In addition to this, I also have been reading research papers related to the application of reinforcement learning in large language models, exploring how these methods can improve model training efficiency. Furthermore, I have also studied the fMRI-image reconstruction model papers, learning the state-of-the-art techniques to infer images based on fMRI signals. Visual-Language Large Foundation Models and Their Applications in Medical Image Analysis Supervisor: LI Xiaomeng / ECE Student: WANG Anbang / DSCT Course: UROP 1100, Fall UROP 2100, Spring Large Language Models (LLMs) have shown significant progress, yet enhancing their complex reasoning remains a critical challenge. This report synthesizes recent advancements in improving model reasoning, focusing on foundational techniques such as Chain-of-Thought (CoT) prompting, Supervised Fine-Tuning (SFT) on reasoning traces, and Reinforcement Learning (RL). We specifically analyze the application and adaptation of these methods within the demanding domains of visual understanding and medical informatics, drawing insights from five representative studies. The review highlights common trends, compares the efficacy and limitations of these approaches, and identifies persistent challenges and promising future research directions towards developing more robust and reliable reasoning systems.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=