School of Engineering Department of Computer Science and Engineering 107 Multimodal Learning for Cancer Diagnosis and Prognosis Supervisor: CHEN Hao / CSE Student: CHEN Yuanzhong / COSC Course: UROP 1100, Fall UROP 2100, Spring Whole Slide Image (WSI) is one of the most important tools for cancer diagnosis. However, due to its large size and high complexity, manual examination by pathologists can be time-consuming. Therefore, developing a fast and accurate method for analyzing WSIs is essential. Given the enormous size of WSIs, traditional computer vision (CV) approaches are insufficient for direct processing. Currently, most state-ofthe-art models rely on Multiple Instance Learning (MIL) and Transformer-based architectures to handle this challenge. In this project, the ultimate objective is to generate a pathology report from WSI data. A key challenge in this task is the sheer size of WSIs, which makes it infeasible to directly input an entire WSI into most large language models (LLMs). To address this issue, an effective feature extraction method must be developed to condense the WSI into a manageable representation while preserving critical diagnostic information. Multimodal Learning for Cancer Diagnosis and Prognosis Supervisor: CHEN Hao / CSE Student: CHOW Chung Yan / CPEG Course: UROP 4100, Fall Regarding the poor performance of the previous model, the problem might lie in the training of the pipeline not being end-to-end, leading to non-convergence. The revised solution is to train the image and text transformers together using deep prompting in both the vision and text transformer. Multimodal Learning for Cancer Diagnosis and Prognosis Supervisor: CHEN Hao / CSE Student: HOU Jingcheng / COMP Course: UROP 2100, Fall Although large vision-language models can broaden medical knowledge and certain reasoning skills, the lack of precise control and interpretation of their behaviours may have both academic and ethical implications due to its unsupervised training stage. One of the most frequently used methods is fine-tuning the pretrained model with preferences from experts or target users, i.e, reinforcement learning from human feedback (RLHF), however, given the complexity and instability of this process from fitting a reward model into human annotators’ preferences, and then avoiding drifting the fine-tuned model from the original one too far during the reward maximization procedure. In this report, we will dive into several methods proposed recently to improve the stability and reliability of RLHF.
RkJQdWJsaXNoZXIy NDk5Njg=