1 119 Table of Contents 121 280

UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 101 Data-Efficient, Domain Generalizable and Interpretable Deep Learning Supervisor: CHEN Hao / CSE Student: LUI Hiu Pang / AE Course: UROP 1100, Fall This study proposes a novel semi-supervised learning workflow, voted pseudo-labeling method. It is capable to optimize model performance with unlabeled data, which is valuable for tasks associated with expensive data annotation. Serving as a preliminary test for the concept, this paper evaluates the proposed method’s performance in image classification tasks with convolutional neural networks (CNNs) before its application to knee articular cartilage segmentation for osteoarthritis. Our experiments in the CIFAR-10 dataset show that our proposed semi-supervised workflow outperforms a model trained solely on the labeled data by 17. 2% in test precision, achieving 0.673 compared to 0.768 for a model trained via a fully labeled dataset. This approach can be particularly beneficial for applications where high model accuracy is not a primary concern, but minimizing the labeling costs. Overall, this study demonstrates the potential of voted pseudo-labeling method to optimize the usage of unlabeled data, thereby paving the way to the adoption in image segmentation tasks that have high labeling cost such as medical image. Code is open-sourced for verification at: https://github.com/VYPang/CV-Ensemble-Learning Data-Efficient, Domain Generalizable and Interpretable Deep Learning Supervisor: CHEN Hao / CSE Student: WAN Hanzhe / COSC Course: UROP 2100, Fall UROP 3100, Spring The Transformer architecture revolutionized natural language processing with its self-attention mechanism. Its adaptation to computer vision led to the Vision Transformer (ViT), which treats images as sequences of patches, demonstrating competitive performance against convolutional networks. However, ViT’s global self-attention is computationally expensive for high-resolution images. The Swin Transformer addresses this limitation through hierarchical feature maps and shifted window-based self-attention, improving efficiency while maintaining strong performance. This report explores the evolution from ViT to Swin Transformer, highlighting key innovations in attention mechanisms, inductive biases, and architectural improvements that enhance scalability for vision tasks, bridging the gap between CNNs and pure attention-based models.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=