1 131 Table of Contents 133 280

UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 113 Generative AI Supervisor: CHEN Qifeng / CSE Student: JAMALBEKOV Sultan / CPEG Course: UROP 1100, Spring When the UROP project just began, I tried understanding model distillation and transformers architecture. I have quickly discovered that I lacked the fundamental knowledge of machine learning and decided to focus the rest of my semester filling the gaps I had. At first, I built a multi-level perceptron and a simple language model that could generate human-name-like text. Then I switched to a more comprehensive study of mathematical preliminaries and deep learning in linear regression. To apply my newly acquired knowledge, I have tried implementing a handwritten digit classification with the famous MNIST dataset. The model was extremely accurate with labeling the training data. But it failed to generalize on digits I drew myself, supposedly because of overfitting. Generative AI Supervisor: CHEN Qifeng / CSE Student: KONG Lingcheng / COSC Course: UROP 1100, Fall UROP 2100, Spring This report will mainly focus on the research I have done within this semester with Yunkang, Tao under the direction of Chenyang, Lei. Our project features precise camera control in video generation. My work was to perform experiments and evaluate metrics to quantify the quality of generated videos. In this report, I will introduce our model and the experiments I did. Generative AI Supervisor: CHEN Qifeng / CSE Student: LEE Pak Nin / COMP Course: UROP 1100, Summer This report presents two novel approaches for 4K video rescaling: a memory-efficient Variational Autoencoder (VAE)-based method and an extended Self-Conditioned Probabilistic (SelfC) model. The VAE approach incorporates residual connections and attention mechanisms to preserve high-frequency details and spatial-temporal coherence, and it is able to process on a single GPU. The extended SelfC model adapts a probabilistic framework to handle 4K details by conditioning high-frequency information and compressing low-frequency components using a VAE. Experimental results on 4K video dataset demonstrate comparable performance, with the VAE approach achieving a PSNR of 34.4 dB, SSIM of 0.979, and LPIPS of 0.122, while the extended SelfC model is still under training. These methods advance video rescaling by offering scalable, high-quality solutions for ultra-high resolution content on resource-constrained hardware.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=