UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 144 Efficient Queries over Database Supervisor: WONG Raymond Chi Wing / CSE Student: LIAO Junyu / COSC Course: UROP 2100, Fall UROP 3100, Spring UROP 3200, Summer Given a database with millions of options, it is a meaningful task to pick out the ones the user prefers, which is considered a k-regret minimization problem. In an age of information overload, it is likely that the user is provided with a large number of attributes when making decisions. To address this, we propose a novel framework that works ideally for high-dimensional datasets, where the dimension may be 100 or even larger. Our proposed framework improves the regret minimization via user interactions, that iteratively identifies a small set of significant dimensions—the coordinates most influential to the user. We present the user with carefully chosen true tuples and ask for his/her favorite among them. It is worth mentioning that our method gracefully handles partial feedback, allowing users to skip responses to some questions. Depending on the information provided by the user, our algorithm returns either (1) the optimal (favorite) tuple for the user among all options, or (2) a set of tuples containing some that are preferred by the user, minimizing the user’s regret level. Experiments are done on both synthetic and real datasets. It is shown that our proposed algorithm achieves better performance over existing ones in both quality and efficiency. Efficient Queries over Database Supervisor: WONG Raymond Chi Wing / CSE Student: LUI Ka Kit / COMP Course: UROP 3100, Fall Declarative visualization languages (DVLs) such as Vega-Lite, ggplot2 are used for specifying data visualizations (DVs) so that knowledge discovery can be effectively performed in a dataset. However, learning these DVLs are difficult and not approachable for non-technical users. For these users, natural language queries are more convenient to use. These natural language queries can be transformed into DVLs and hence generate the corresponding visualizations. This transformation is known as NL2VIS problem. This report proposes the application of generative large language models (LLMs) to perform the translation task. Specifically, Vicuna-7B-v1.5 16K model is used for fine-tuning. This approach is expected to perform better than existing solutions. Extending the work from previous semester, some experimentations for model finetuning has been done.

RkJQdWJsaXNoZXIy NDk5Njg=