School of Engineering Department of Computer Science and Engineering 132 Large-Scale Spatiotemporal Data Analytics and Learning Supervisor: ZHOU Xiaofang / CSE Student: BIE Jiarui / COMP Course: UROP 2100, Fall UROP 3200, Spring In the logistics industry, complex physical systems are significantly involved in handling thousands of orders daily, from which millions of IoT data flows are generated. The data is connected in space and time, rather than an independent individual. While efficiency is always key in such an industry, the system can run into congestion from time to time. Thus, it is important to be able to learn about this efficiency problem in advance. With the analytics and learning of data coming from these complex systems, we apply scientific methods to work out a prediction model for estimating the delivery time of orders in a system. Large-Scale Spatiotemporal Data Analytics and Learning Supervisor: ZHOU Xiaofang / CSE Student: CHEN Sihan / COMP Course: UROP 2100, Fall This report discusses the research efforts on data collection and pruning for a Large Language Model (LLM). A large language model (LLM) is a type of artificial intelligence (AI) algorithm that employs deep learning techniques and vast data sets to comprehend, condense, produce, and anticipate novel content. As part of the data preparation and evaluation team, the responsibilities include trying to run through Dolma's Rustbased deduplicate code and reading papers to have a better understanding of advanced LLM technology. In detail, exact tasks were finished on learning how to use Anaconda to create a virtual environment, understanding Redpajama code on github and using Google Bigquery to get some data. During the process, tools including VS Code and Terminal, were used to abstract data from various sources and improve Large Language Models’ performance. Large-Scale Spatiotemporal Data Analytics and Learning Supervisor: ZHOU Xiaofang / CSE Student: CHEN Yixiang / DSCT Course: UROP 1100, Summer Point cloud registration (PCR) is a rigid transformation problem that aligns two point clouds and plays a significant role in computer vision applications. The development of deep learning (DL)-based methods has enhanced registration robustness. In this report, we focus on the performance of PointNetLK, a DL-based method of PCR, across various datasets. Firstly, this report introduces the definition of the point cloud and the PCR problem, along with a classical PCR algorithm. Secondly, this report demonstrates the architectures of PointNet, PointNet++ and PointNetLK. Finally, the report showcases the performance of PointNetLK across various types of datasets, accompanied by corresponding analyses, and puts forth potential improvement strategies.
RkJQdWJsaXNoZXIy NDk5Njg=