UROP Proceedings 2022-23

School of Engineering Department of Computer Science and Engineering 111 Using Large Language Models (LLMs) for Software Development Supervisor: CHEUNG, Shing-Chi / CSE Student: LI, Dongze / COMP Course: UROP1000, Summer With the launching of ChatGPT, scholars keep exploring its various usages in different fields. Based on its strong ability of code-writing with human language prompts, we attempted to apply it in testing famous Machine Learning frameworks i.e. TensorFlow and PyTorch. Following the principles of testing, the methodology of research contains blackbox and whitebox testing on the API provided by ML libraries. During blackbox testing, we tried to ask ChatGPT generate tests that reach more branches and edge cases, leveraging methods of chain-of-thought and trying to extract the properties of those API. While in whitebox testing, we analyzed the source code in both Python and C++, aiming at finding the binding rules with external API and internal implementation. Using Large Language Models (LLMs) for Software Development Supervisor: CHEUNG, Shing-Chi / CSE Student: LO, Hau Ching / COMP Course: UROP1100, Spring Test oracles automation has been sought after for many years inside the software testing community. Its success can lead to an enormous reduction in workload during software development. With the rise and improvement of Large Language Models (LLMs), like ChatGPT, in recent years, more researches see the opportunity of utilizing the pretrained models to achieve test oracles automation. However, since software code is also a valuable intelligence property to many companies and should be confidential to outsiders, we would like to hide the implementation details and private information from the LLM providers, like OpenAI. Hence, we plan to design a neural network to generate masked code as ChatGPT queries in a bid to generate test oracles securely. Using Large Language Models (LLMs) for Software Development Supervisor: CHEUNG, Shing-Chi / CSE Student: TSANG, Pui Kin / COMP Course: UROP1000, Summer Since the introduction of large language models (LLMs), the field of software engineering has adopted its use into daily development activities. This project explores the performance of two recent LLMs, gpt-3.5turbo and gpt-4, in the context of assisting in software development through code generation. The generated code is evaluated using various techniques and metrics that aim to expose the LLMs’ strengths and areas of improvement. Experiments show that gpt-4 is more robust and effective in writing functionally correct code, but both models do not consider and cannot reliably handle edge cases in advance. These findings reveal current limitations of LLMs in assisting software development.

RkJQdWJsaXNoZXIy NDk5Njg=