School of Engineering Department of Computer Science and Engineering 112 Generative AI Supervisor: CHEN Qifeng / CSE Student: FEI Yang / COSC Course: UROP 3200, Fall Efficient video generation and compression rely on robust video Variational Autoencoders (VAEs), yet applying image VAEs to individual frames often leads to temporal inconsistencies and suboptimal compression due to the lack of temporal modeling. To address these issues, we present a novel video autoencoder that achieves high-fidelity video encoding. Unlike traditional 3D VAEs, which entangle spatial and temporal compression and risk introducing motion blur and detail loss, we propose a temporal-aware spatial compression framework to better encode spatial information. Additionally, we incorporate a lightweight motion compression module for enhanced temporal compression. By leveraging text guidance from text-to-video datasets, our model improves detail preservation and temporal stability. Extensive evaluations demonstrate our method’s superiority over recent baselines. Generative AI Supervisor: CHEN Qifeng / CSE Student: GAO Jincheng / COMP Course: UROP 1000, Summer In this report, we mainly explore the knowledge taught in cs231n in deep learning for computer vision. The contents focus on some of the starte-of-art technology in CV, starting from the most basic topics, including KNN, softmax, FC nets, backward propagaation, batch normalization, RNN, image captioning and so on. Following the review of lecture videos, I engaged with the assignments available on GitHub. In the second month, I conducted an evaluation of the model FramePack, focusing on video generation. My findings reveal notable improvements in model accuracy and efficiency, thereby contributing valuable insights to the broader field of computer vision. Generative AI Supervisor: CHEN Qifeng / CSE Student: GUO Yuxuan / COMP Course: UROP 1100, Fall UROP 2100, Spring Inappropriate prompt detection has consistently been a prominent research focus in the field of Gen AI, concerning AI safety and its real-world implementations. However, problems like noise in the training dataset may lead to misinterpretation of prompt semantics in models such as Latent Guard (LG). In this UROP 2100N project, we address these problems through dataset optimization and refined training strategies, achieving comprehensive performance gains, which include improvements across six evaluative AUCs. Meanwhile, we introduce Llama Guard, a specialized LLM and AI chat safety model, for comparative analysis, revealing domain gaps between different AI-related tasks. Our work not only enhances Latent Guard’s reliability but also clarifies its specialization boundaries as a robust safety framework for multimodal Gen AI systems.
RkJQdWJsaXNoZXIy NDk5Njg=