UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 153 Honor ML Sys Supervisor: YUAN Binhang / CSE Student: LIANG Yan / COMP Course: UROP 1000, Summer This report presents my UROP 1000 research on optimizing the training of long-context large language models in heterogeneous GPU environments. As context lengths continue to grow, distributed training strategies such as Context Parallelism (CP) and Head Parallelism (HP) have become increasingly essential. However, existing systems like Megatron assume homogeneous GPU clusters, overlooking the heterogeneous configurations commonly found in real-world deployments. This research proposes an efficient framework that leverages heterogeneous GPUs by integrating CP and HP. In addition, we design a scheduler capable of assigning variable workloads and group partitions across devices with differing capabilities. This report encompasses theoretical analysis, system-level design, and key implementation challenges, representing a step toward more flexible and efficient long-context LLM training in heterogeneous environments. Honor ML Sys Supervisor: YUAN Binhang / CSE Student: SUN Zhuotao / COMP Course: UROP 1000, Summer LLMs demonstrate remarkable capabilities but demands considerable operational cost. While LLM training prefers SOTA accelerators for optimal computation/communication performance, inference presents different priorities: latency tolerance enable deployment on cost-efficient legacy hardware. Moreover, not all user queries necessitate the most powerful models – smaller LLMs suffice for adequate responses in most cases. In above context, this project investigates economical LLM operation through a heterogeneous-accelerator cluster deploying strong and weak models with API routing. Our study leverages the state-of-the-art distributed inference engine SGLang and the DeepSeek-v3 MoE architectures as reference models. This inprogress report details foundational studies on DeepSeek-V3’s architecture and SGLang’s optimization, establishing the basis for our cost-optimized system. Advanced Analytics on Domain-Specific Knowledge Graphs Supervisor: ZHOU Xiaofang / CSE Student: LAM Hoi Kei / COSC Course: UROP 1100, Spring Graph-based methods for fraud detection and anomaly spotting are key to understanding complex connections in specialized knowledge graphs. This report explores two cutting-edge approaches: SEFraud and ARC. We dive into their techniques and strengths. These advances boost detection accuracy, clarity, and adaptability, setting the stage for solid knowledge graph analytics across various fields. We also look at a real-world application in anti-money laundering (AML) with the consisGAD algorithm. While consisGAD handles shifting data distributions well, it struggles with overfitting and false positives, pointing to a need for better control measures. This suggests that effort is still needed to tackle consisGAD’s issues to build dependable, scalable fraud detection solutions for real-world challenges.

Made with FlippingBook

RkJQdWJsaXNoZXIy NDk5Njg=