School of Engineering Department of Computer Science and Engineering 108 The Future of Medical Imaging: Advancements in Analysis through Vision Language and Large Models Supervisor: CHEN Hao / CSE Student: CHIU Tiffany Ke Shuen / DSCT Course: UROP 1100, Summer Precise identification of medical imaging types is essential for streamlining clinical processes and ensuring accurate disease diagnosis. This study assesses the effectiveness of the ResNet50 in automatically classifying imaging modalities, utilizing a diverse dataset of 10,000 annotated images from multiple sources. Using pretrained weights and fine-tuning, the ResNet50 model achieved a strong micro accuracy of 0.89 on the test set. However, a notable difference between micro and macro accuracy metrics indicated weaker performance on less common modalities, underscoring the issue of class imbalance. These results suggest that while ResNet50 performs well for medical image modality classification, future efforts should focus on addressing class imbalance and investigating alternative models to improve performance across all modality types. The Future of Medical Imaging: Advancements in Analysis through Vision Language and Large Models Supervisor: CHEN Hao / CSE Student: LIN Qianwei / COMP Course: UROP 1100, Fall Our project team has been focused on developing MedDr (Generalist), an advanced AI medical generalist designed to function like a “medical GPT.” The multimodal language model can answer questions, generate medical reports, and provide initial diagnoses based on medical images. My tasks mainly consist of two parts: 1. Refining the annotation interface created during the UROP 1000 project to improve usability and efficiency for doctors. 2. Leveraging Llama 3.1 8B Instruct to convert references to images in medical literature into detailed image descriptions and match these with the corresponding images. After several group meetings, the model has been successfully completed and formally released, with continued plans to test and refine it in the future. The Future of Medical Imaging: Advancements in Analysis through Vision Language and Large Models Supervisor: CHEN Hao / CSE Student: LIU Runtong / DSCT Course: UROP 1100, Spring Since its release in 2021, CLIP (Contrastive Language-Image Pretraining) has accelerated the development of large-scale pretrained visual-language foundation models, enabling efficient transfer to diverse downstream tasks. Among these, CONCH (CONtrastive learning from Captions for Histopathology) specializes in pathology images and demonstrates remarkable performance in zero-shot classification at both ROI- and slide-levels, outperforming existing general-purpose models and domain-specific CLIP-based counterparts. By leveraging contrastive image-text pretraining, CONCH achieves balanced accuracy and direct transferability to multiple downstream computational pathology tasks and demonstrate surprising zero-shot capability on large-scale WSIs. This report documents the replication of CONCH’s framework and validates its performance, highlighting its potential for real-world applications in histopathology.
RkJQdWJsaXNoZXIy NDk5Njg=