School of Engineering Department of Computer Science and Engineering 129 Open World Understanding based on Large Vision-Language Models Supervisor: XU Dan / CSE Student: ZHANG Li / COMP Course: UROP 1000, Summer Open-world understanding is an important research aspect in computer vision because there are infinite classes around our environment and it is impossible to create a class for each of them. The models must be prepared to deal with unfamiliar classes. This report mainly introduces some cutting-edge models with their advanced techniques, and the research procedure reflected in those papers. Although as a beginner, I do not have any research experience and I have found many of the concepts in those advanced projects hard to understand, I will try to explain what I have learned in an organized way. Scene Depth Diffusion Supervisor: XU Dan / CSE Student: XU Minrui / DSCT Course: UROP 1100, Fall Monocular depth estimation is a critical and challenging task in the field of Computer Vision. With the development of deep learning, nowadays, researchers mainly utilize regression or classification method for depth estimation. However, monocular depth estimation still suffers from unfamiliar styles and deficiency of cameras. A new breakthrough is make use of diffusion model and reformulate depth estimation as a denoising diffusion process. In this article, we present a general pipeline and perform continuous fine-tuning on Stable Diffusion V2 and NYU Depth V2. The model we train successfully identify the frame of indoor pictures and hold sense of distance. We will continue on how to integrate more techniques such as VAE and semantic segmentation to further improve our performance. Scene Depth Diffusion Supervisor: XU Dan / CSE Student: YIP Pak To Paco / CPEG Course: UROP 1100, Fall With the recent rise of diffusion models in image generation, and the wide application of monocular depth estimation in autonomous driving, and 3D scene reconstruction, this report explores the application of scene depth Diffusion Models. Focus on two models, DepthGen and DiffusionDepth, which utilize diffusion processes for iterative denoising refinement, aiming to provide insights for future advancements in the field. and related tasks. The paper compares their similar strategy for noisy and sparse training data set; and differences in their loss functions, network architectures, and training strategies. This report also summarizes the current project progress, with a reflection on the state of the project.
RkJQdWJsaXNoZXIy NDk5Njg=