UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 149 Knowledge Discovery over Database Supervisor: WONG Raymond Chi Wing / CSE Student: ZHANG Zongmin / COMP Course: UROP 2100, Fall The rapid advancement of Large Language Models (LLMs), exemplified by ChatGPT, Claude, and Gemini has catalyzed the widespread deployment of LLM-based AI agents across various domains. Although AI agents excel in some kinds of tasks such as summarizing and writing, they still exhibit notable limitations in other tasks like arithmetic calculations and logical reasoning. In this UROP project, we explore two innovative and interesting applications of AI agents and investigate three novel prompt engineering methods to enhance AI agents’ reasoning abilities. Additionally, we replace the traditional linear list memory structure with Tree Memory, which can optimize memory organization and response accuracy of AI agents. On-Skin Interaction Design for Smart Watches with Friction Sounds Supervisor: XIE Wentao / CSE Student: LIU Yixuan / ELEC QIN Zhengyan Lambo / CPEG Course: UROP 1100, Spring This project addresses smartwatch interaction limitations, by proposing an on-skin trackpad. It aims to turn the back of the hand into an input surface by analyzing friction sounds generated by finger sliding, captured via dual-channel microphones on the watch. The core objective is to simulate and restore finger sliding trajectories in real-time using deep learning models. For initial feasibility, the scope is simplified to straightline gestures. A robust data processing pipeline has been developed, using an LED ring for ground-truth gesture angle capture and scripts to segment gestures from video, calculate orientation, and synchronize corresponding audio data, ultimately producing labeled time-frequency audio slices for model training. Audio-Driven Gaussian Avatars with Generalizable Capability Supervisor: XU Dan / CSE Student: HUNG Ming Kin / DSCT Course: UROP 1100, Summer This UROP project explored encoder-based inversion of 3D-aware generative adversarial networks (GANs) and their integration with Gaussian Splatting for real-time single-image 3D head avatar generation. Building upon the Geometry and Occlusion-Aware Encoder (GOAE) and the UV-aligned generator architecture of GGHead, we investigated how canonical latent space encoding can be adapted for predicting multichannel UV maps that parameterize anisotropic 3D Gaussians. Throughout the project, I studied the technical foundations of GAN inversion in both 2D and 3D settings, StyleGAN’s latent spaces (W and W+), multi-view consistency losses, and UV-space Gaussian attribute regularization. Preliminary experiments show some limitations in fine detail recovery and hair rendering. The project deepened my understanding of 3D GAN architectures, encoder design, and differentiable Gaussian rasterization, providing a strong technical foundation for future research in realtime 3D avatar synthesis.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=