UROP Proceedings 2022-23

School of Engineering Department of Computer Science and Engineering 110 Using Large Language Models (LLMs) for Software Development Supervisor: CHEUNG, Shing-Chi / CSE Student: AUNG, Ye Moe / DSCT Course: UROP1000, Summer In software development, software testing is one of the most vital things to do. Testing is normally concern with checking whether the actual software product matches the expected requirements without any failures/defects. Detection of software failures is an important but challenging software engineering task. The process involves finding needles in a vast open sea. The ability to find such needles “failure-inducing tests” that contain an input triggering the software fault and then an oracle asserting the incorrect execution is extremely tedious. With the advent of large language models (LLMs) such as ChatGPT recently, we are motivated to study how this challenge can be tackled with a bit of ingenuity and creativity using LLMs. Using Large Language Models (LLMs) for Software Development Supervisor: CHEUNG, Shing-Chi / CSE Student: CHO, Hoon / COMP Course: UROP1100, Spring This research project aims to explore the application of Large Language Models (LLMs) in fuzzing deeplearning libraries such as TensorFlow and PyTorch, as described in the research paper “Fuzzing DeepLearning Libraries via Large Language Models”. Specifically, the project focuses on initial seed generation and mutation operators, each of significant importance in the overall fuzzing algorithm. The report discusses findings from replication and implementation of the sections – with emphasis on various approaches, problems encountered and corresponding solutions. Preliminary results indicate successful replication and implementation of the proposed techniques, and future work would involve further implementation of the overall fuzzing algorithm. Using Large Language Models (LLMs) for Software Development Supervisor: CHEUNG, Shing-Chi / CSE Student: CHUI, Man Yin Edward / COMP Course: UROP1000, Summer As large language models (LLMs) advance, their potential in assisting software engineering tasks like automatic program repair is promising yet underexplored. This study investigates the capabilities of different LLMs in automatic program repair tasks. Using the QuixBugs benchmark, models including Alpaca, Vicuna, Guanaco, ChatGPT-3.5, and Claude 2 are evaluated on their ability to infer program intentions and generate corrected code. Experiments reveal larger models like ChatGPT-3.5 and Claude 2 significantly outperform smaller models in intention inference and generating viable reference versions. However, challenges remain in handling defective logic during intention inference. Overall, findings provide valuable insights on leveraging LLMs for software repair and reveal current limitations to address.

RkJQdWJsaXNoZXIy NDk5Njg=