UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 145 Efficient Queries over Database Supervisor: WONG Raymond Chi Wing / CSE Student: SO Chun Hin / COGBM Course: UROP 1100, Spring UROP 2100, Summer Large datasets are becoming increasingly common nowadays. Our project focuses on the banking industry, in which millions of transactions are processed daily. Discovering conditional function dependencies (CFDs) can help with consistency checking and fraud detection. There are existing algorithms to mine them, such as CFDMiner, CTANE, and FastCFD. They are not efficient enough for our project, so we have been exploring ways to use or extend CFDMiner. We hope to find an approach that is sufficiently efficient. Here, we outline tools that we have implemented to aid with our work. We also discuss approaches that we have tried and observations on them. These help us to pinpoint directions that future work may cover. Efficient Queries over Database Supervisor: WONG Raymond Chi Wing / CSE Student: TANG Chiu Yeung / DSCT NGUYEN Kim Hue Nam / COGBM Course: UROP 1100, Spring In database research literature, extensive work explores methods to find a user’s utility function that captures the user’s preferences. This typically involves iterative interactions by asking the user a series of questions, which each need to compare pairs of data points and select the more preferred option. The goal of such approaches is to efficiently identify the optimal tuple within a large-scale database. In the real world scenarios, it is possible that the user is unable to provide useful information in some questions, say the user is not sure about their preference on the given points for now. In that case, the user may pick a point randomly, which can lead to outputting an undesirable tuple, due to the incorrect utility function learned. To address this problem, we propose a new problem of finding the most preferred tuple via interaction where the user is able to “skip” for the questions that they do not want to answer. And we are currently working on an algorithm that finds the best tuple when the dataset contains 2 dimensions and will work on algorithms for datasets containing d ≥ 2 dimensions. This report also includes Newt’s current progress on the exploration and implementation of Session-based Recommender System. Session-based Recommender System is an extensive field that has been studied on how to utilize users’ limited information in a single session to give good recommendations. Our current direction is to investigate the data leakage in previous published papers by implementing and testing the Recall@20 and MRR@20 of previous models.

RkJQdWJsaXNoZXIy NDk5Njg=