1 110 Table of Contents 112 228

UROP Proceedings 2022-23

School of Engineering Department of Computer Science and Engineering 98 Generative AI Supervisor: CHEN, Qifeng / CSE Student: MA, Mingfei / SENG Course: UROP1000, Summer In this report, I will introduce a rising technology called ControlVideo which can help us generate videos according to the text that we type in and avoid wasting time and energy on fussy and meaningless video clip, as well as improving our understanding and control on the video production. However, there are still some difficulties and challenges because of the complexity and lack of experience. As a result, I will give you a general understanding of ControlVideo and guide you through creating your own video using this AIpowered tool in the following parts. To make the tool more accessible, I will demonstrate the steps using Hugging Face Space at https://huggingface.co/spaces/Yabo/ControlVideo. Generative AI Supervisor: CHEN, Qifeng / CSE Student: MAHAPATRA, Amadika / MATH-CS Course: UROP1000, Summer In the space of generative models for imaging, GANs took the lead over the diffusion architecture when first introduced, producing realistic and detailed imagery. However, recent state-of-the-art generative models such as DALL-E and Midjourney are stable diffusion models. The drawbacks of GANs include a tendency towards mode collapse which leads to the generation of repetitive and constricted outputs. GANs are faster to train than diffusion models but more research is turning towards diffusion models and their optimization now. Diffusion models are similar to VAEs in that they operate on an encoding followed by a decoding mechanism, but diffusion models generate a predictive output at every step of the decoding process. Stable diffusion is a subclass of latent diffusion models, named so due to the use of latent variables in the mechanism. Diffusion models are similar to transformers used for text generation in the use of latent coding. This paper discusses the structure and concepts behind diffusion models, offering a preliminary glance at the mathematics involved. We will then look at a particular implementation of diffusion models with VideoCrafter. As a part of this project, I am also maintaining this webpage. Generative AI Supervisor: CHEN, Qifeng / CSE Student: YANG, Ruoping / ELEC Course: UROP1000, Summer Recently generative AI has been a hot topic and many related AI tools are developed. Our team wanted to build a website that demonstrates some latest models. It also provides an online demo that visitors can try these models themselves, and it contains guidance that can help visitors to use them. I focused on one of the models, TextDiffuser, explored some features and wrote a tutorial on how to use it. This report contains the tutorial and how some parameters affect the generated images.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=