School of Engineering Department of Computer Science and Engineering 98 Diffusion Model for Classification Tasks Supervisor: CHEN Long / CSE Student: XU Minrui / DSCT Course: UROP 1100, Summer This paper provides an insightful progress report on the potentials and challenges associated with the application of diffusion models in Visual Relation Detection (VRD) within the realm of computer vision. VRD, a multifaceted task, involves the identification of subjects and objects within an image, as well as the discernment of their relationships, typically represented as subject-relation-object triplets. The integration of Vision-Language models, such as CLIP, has significantly propelled the field forward, ushering in capabilities like zero-shot learning and open-vocabulary tasks. Notwithstanding these advancements, CLIP faces constraints in low-resolution image contexts and nuanced comprehension of object relationships. In contrast, diffusion models, renowned for their efficacy in generative tasks like image synthesis, are now being investigated for discriminative undertakings such as classification and object detection. Noteworthy for their resilience across image resolutions and adept reasoning skills in intricate scenarios, these models prove particularly promising for VRD applications. Our objective is to enrich VRD research by delving into the synergies between diffusion models and CLIP, with a focus on mitigating existing limitations and proposing strategies for the development of more precise and robust VRD systems. Prospective endeavors are likely to concentrate on enhancing models’ capacities to handle high-resolution images, bolstering reasoning capabilities, and fortifying generalization abilities for real-world applications. Parameter-Efficient and Memory-Efficient Fine-Tuning Supervisor: CHEN Long / CSE Student: CHAU Yu Foon Darin / MATH-PMA Course: UROP 1100, Spring In recent years, Parameter-efficient Transfer Learning (PETL) has served as a crucial strategy for adopting pretrained large language models to downstream language tasks. In particular, side tuning, which involves attaching a lightweight adapter model on a frozen backbone model, has gained popularity due to its ease of use. However, deep transformer models nowadays often have billions of parameters, requiring massive memory to train and large computation resources during inference. In this paper, inspired by LayerDrop structural dropout and UniPT side tuning, we demonstrate that, using side tuning and by leveraging information about the dataset and the dropped layers, we can drop around 75% of the original parameters in an LLM with minimal loss of accuracy compared to side tuning. This allows us to finetune LLMs to downstream tasks with competitive results and use them with less compute and memory overhead.
RkJQdWJsaXNoZXIy NDk5Njg=