UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 110 Deep Video Super-Resolution Supervisor: CHEN Qifeng / CSE Student: JIA Yifei / COSC CHEN Ying / CPEG Course: UROP 1100, Summer UROP 1000, Summer In recent years, AI agents have emerged as a transformative paradigm in artificial intelligence, with multimodal agents showing particularly promising potential for complex real-world applications. These systems demonstrate unprecedented capabilities in understanding and generating content across multiple modalities, including text, images, and videos. In this work, we present a novel video generation agent enhanced with Retrieval-Augmented Generation (RAG) capabilities for creating educational content. Our system integrates a sophisticated rollout mechanism, inspired by recent advances in iterative optimization techniques, which enables automatic quality assessment and iterative refinement of generated videos. Through this approach, we demonstrate significant improvements in both content quality and production efficiency, showcasing the potential of multimodal agents in automated content creation pipelines. Deep Video Super-Resolution Supervisor: CHEN Qifeng / CSE Student: LU Chenyu / COSC Course: UROP 1100, Summer Nowadays, large language model (LLM)-based computer-using agents hold potential for automating scientific workflows but remain unreliable in complex multi-software tasks, with a 15% success rate in prior evaluations. This project aims to optimize their performance across professional software by curating diverse instructions and manually recording correct execution trajectories during interactions with tools like LibreOffice Excel, Chrome, and LibreOffice Impress on a Docker-deployed server. These trajectories are integrated into model training. Key results show significant improvements in the model’s performance on software (Ubuntu) compared to its pre-training version, validating trajectory-based training for enhancing scientific workflow automation. Deep Video Super-Resolution Supervisor: CHEN Qifeng / CSE Student: RIZVI Syed Momin Ahmed / COMP Course: UROP 1100, Spring Modern image resampling algorithms are focused on embedding high-resolution (HR) images within lowresolution (LR) thumbnails while preserving sufficient information for efficient HR reconstruction. Qi, Chenyang, and others proposed a novel framework called HyperThumbnail for real-time rate-distortionaware rescaling of 6K images. It employs an encoder with a quantization prediction mechanism to encode the HR image in a JPEG LR thumbnail at the expense of file size for negligible reconstruction quality. Extensive experiments reveal that this method outperforms existing image rescaling techniques in rate-distortion efficiency and makes real-time 6K image reconstruction possible. In this work, we experiment with a different encoder architecture. We replace the proposed RDBUnet encoder with Transformer encoder and notice the change in bpp, lpips and psnr. The modified architecture tends to achieve higher degrade PSNR, higher LPIPS Y, lower LPIPS RGB and similar bpp suggesting potentially better pixel-level accuracy, less distortion, and potentially less perceptual similarity in the luminance channel compared to the original architecture.

RkJQdWJsaXNoZXIy NDk5Njg=