School of Engineering Department of Computer Science and Engineering 155 Automatic and Scalable Data Processing for LLM Supervisor: ZHOU Xiaofang / CSE Student: CHEN Zhenhong / COMP Course: UROP 2100, Fall Automated Machine Learning (AutoML) is recently revolutionizing the field of machine learning by simplifying the model development process and allowing non-experts to apply machine learning techniques effectively. And AutoGNN is the one that performs well on graph-relevant tasks. One of the key components of AutoGNN is its search space, as its basic mechanism is that it selects the configuration with the best performance for a specific task within the search space. Recently we have been working on adding new design dimensions and implementing more GNN architecture for it, for the purpose of enabling AutoGNN to have more candidates to choose from. This report will mainly focus how we implement the search space and test the performances on different datasets with specific tasks. Automatic and Scalable Data Processing for LLM Supervisor: ZHOU Xiaofang / CSE Student: LEI Yicong / DSCT Course: UROP 1100, Fall With the impressive performance demonstrated by large language models (LLMs) like ChatGPT, developing LLMs for specific domain needs has become popular. This trend encourages us to delve deeper into LLM development technologies. This study focuses on key issues in LLM training, including concepts of neural network training, and optimizer types, with a particular emphasis on memory consumption. Through a thorough analysis of memory consumption calculation methods, this research proposes techniques such as Mixed Precision Training and Zero1-3 to reduce memory overhead. Practical tools like DeepSpeed and FSDP are also used to optimize memory consumption. This study aims to offer memory optimization strategies for LLM developers, promote LLM technology development, and support efficient, low-cost LLM creation. Automatic and Scalable Data Processing for LLM Supervisor: ZHOU Xiaofang / CSE Student: YANG He / COMP Course: UROP 2100, Fall In the realm of computing, especially in Linux and Unix-like systems, command-line tools play a vital role in file management, data transfer, and automation. This report aims to explore the usage of basic commandline tools such as wget, the process of uploading and downloading files between local and remote systems, and how to manage files on platforms like GitHub. Additionally, we will delve into the application of Large Language Models (LLMs) for inference tasks and provide a comprehensive overview of these topics.
RkJQdWJsaXNoZXIy NDk5Njg=