1 125 Table of Contents 127 240

UROP Proceeding 2023-24

School of Engineering Department of Computer Science and Engineering 105 Generative AI Supervisor: CHEN Qifeng / CSE Student: ZHAO Chendi / QFIN Course: UROP 1100, Spring In an era where the confluence of artificial intelligence and multimedia content has revolutionized the way we interact with digital environments, the integration of AI in video production and translation presents a groundbreaking advance with growing importance. The project holding in our research group explores the deployment of generative artificial intelligence technologies akin to those used in conversational models like ChatGPT, extending their application to a more complex field of video creation. Our objective is to develop an interactive system that, upon receiving video prompts from users, can generate framed video content that is vivid and creative. The significance of this endeavor lies in its potential to democratize content creation, making it accessible to users without extensive video production skills or linguistic capabilities. During this project, I do multiple tasks including data labeling, data preprocessing, and reproduce the existing model. Generative AI Supervisor: CHEN Qifeng / CSE Student: ZHENG Chi Him / COMP Course: UROP 1000, Summer In UROP1000, I learned everything from scratch and gained valuable experiences and exposure to AI. Began with knowing only the programming language Python learned in Year 1, gradually, I expanded my knowledge to Linux commands, Pytorch modules, the use of GitHub, as well as basic concepts and theories behind AI. Most importantly, I gained experience in deploying AI models, including text-to-image models and object detection models such as YOLO, as well as experience in reading papers. Throughout this program, I developed a fundamental understanding of the AI research field, covering topics such as computer vision, machine learning, neural networks, and recent developments. Generative AI Supervisor: CHEN Qifeng / CSE Student: ZHOU Yukai / MATH-PMA Course: UROP 1100, Fall MVDream is a state-of-the-art text-to-3D generation model that leverages 2D multi-view diffusion and score disillation sampling. It utilizes 2D-lifting methods while partly solving the multi-face Janus issue and the content drifting problem, which ensures that the model can produce consistent multi-view content. The report delves into the detailed implementations behind MVDream and explains some of the critical components of such model (i.e., transformer and NeRF). It will also demonstrate a few potential applications of the model with some visual examples. Possible ways of refining the MVDream diffusion model will be briefly discussed at the end of the report.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=