UROP Proceeding 2024-25

School of Business and Management Department of Finance 204 Applications of Machine Learning to Financial Data Supervisor: NOH Don / FINA Student: LIN Zhenyi / ECOF YANG Long Chun / QFIN Course: UROP 1100, Fall The goal of this project is to learn how programming tools can be applied in fields of financial economics. We read a research paper and used Python to perform statistical analyses, including replicating some figures and regression tables from that paper. We studied how retail trading proportion (RTP) can explain the large gap between overnight and intraday returns commonly found in stock markets. We gained knowledge of the stock market’s mechanisms such as opening and closing auctions, return gap calculation, among others; as well as methods in econometrics including panel regression, Fama-Macbeth regression and causal inference. Applications of Machine Learning to Financial Data Supervisor: NOH Don / FINA Student: TSE Ka Ming / ECOF ZHANG Jiayin / COMP ZHANG Ruoyan / QFIN Course: UROP 1100, Fall Our project replicates the study Overnight-Intraday Return Gap and the Retail Ebb and Flow (Ahn, Fan, Noh, & Park, 2024), focusing on the processes involved in empirical asset pricing research. Analysis of KRX data reveals that overnight returns are high, while intraday returns are negative. To establish causal inference regarding the relationship between retail trading proportion and this return gap, we employed various techniques such as portfolio sorting, panel regression, Fama-Macbeth regression, and IV Regression. Additionally, we propose further investigation into the “overnight risk” explanation, rather than the “price pressure” narrative emphasized in the original paper. Applications of Machine Learning to Financial Data Supervisor: NOH Don / FINA Student: XIE Hangcen / MATH-FAM Course: UROP 2100, Summer This report presents the workflow and findings of a summer UROP project on empirical asset pricing with machine learning in the Korean stock market. The study begins with a literature review on systematic noise and asset pricing, followed by a replication of the methodology from Gu, Kelly, and Xiu (2020) using an extensive dataset of stock-level and macroeconomic characteristics from Korean stock market. Multiple machine learning models, including a few linear models and XGBoost, are trained and evaluated to assess their predictive performance. The key finding is that incorporating retail trading features significantly enhances predictive accuracy, with XGBoost achieving an out-of-sample R² of 0.8255% for all stocks. In addition, the market segments of low cap stocks and high retail attention stocks exhibit greater predictability.

RkJQdWJsaXNoZXIy NDk5Njg=