School of Engineering Department of Computer Science and Engineering 131 Automatic and Scalable Data Processing for LLM Supervisor: ZHOU Xiaofang / CSE Student: HU Yutong / DSCT Course: UROP 2100, Spring The LLM (Large Language Model) is a powerful artificial intelligence model capable of generating human-like text based on the patterns and information it has learned from vast amounts of training data. The Math LLM extends the capabilities of the LLM by specializing in understanding and generating mathematical text, providing valuable assistance in mathematical problem-solving, concept explanation, and mathematical research. The duplication of experiment of the ‘AutoMathText’ (Yifan et al., 2024) is presented in this report, with a coverage of experimental details on both CPUs and GPUs by different models. Concurrently, a brief synopsis of the Pytorch course study and an assortment of structured text techniques will be presented. Automatic and Scalable Data Processing for LLM Supervisor: ZHOU Xiaofang / CSE Student: YANG Haolin / DSCT Course: UROP 2100, Spring UROP 3100, Summer Large language models (LLMs) have demonstrated remarkable capabilities in understanding and generating text. However, their proficiency diminishes significantly when interpreting and reasoning about tabular data. Our work aims to bridge this gap by fine-tuning a large language model specifically to understand tables and answer questions related to tabular information, catering to real-world office scenarios. We followed and compared the processes of training StructLM and TableLLM, modifying the training dataset based on their methods to enhance the model’s performance. We also improved the evaluation codes to reduce false negative errors. We release the model weights and training dataset to the community, along with relevant code on GitHub. Automatic and Scalable Data Processing for LLM Supervisor: ZHOU Xiaofang / CSE Student: YANG He / COMP Course: UROP 1100, Summer This paper briefly introduces principles of LLM model training, take several common optimizers used in LLM training as example, and efficient approaches to reduce the consumption memory during the training. Specifically, this paper compares Adam and SGD to better deep dive into how memory consumptions are mainly divided into four parts – parameters, gradients, optimizer states, and data load -- and therefore adopt efficient method correspondingly. In particular, we introduce data parallelism, model parallelism, pipeline parallelism, ZeRO, and other efficient methods like activation checkpointing. We will illustrate briefly the how they are applied in model training, compare their strength and weaknesses, and conclude what they enabled us to do.
RkJQdWJsaXNoZXIy NDk5Njg=