UROP Proceedings 2022-23

School of Business and Management Department of Accounting 162 Deep Learning in Natural Language Processing Supervisor: HUANG, Allen / ACCT Co-supervisor: YANG, Yi / ISOM Student: WANG, Yueying / DSCT Course: UROP1100, Fall In this UROP project, we aim to use deep learning NLP algorithm to help detect financial fraud earlier and with more accuracy. We focused our data in Hong Kong Market over the past 20 years and went through a complete process of comparing dataset, collecting and cleaning data and finally the model training part. As we do not have a ready-to-use dataset for the target market, we spent major of time in collecting and cleaning data. We compared the learning result for 3 Bag-of-Words models and applied LSTM and GRU models. Both LSTM and GRU model both have neglectable improvement in accuracy from baseline model, which may implies that there is only very less relevant information in predicting whether it is fraud or not in the input text (mostly MDA files). Finally, we come up with insights for possible further research, targeted on input data and the problem of imbalance. Deep Learning in Natural Language Processing Supervisor: HUANG, Allen / ACCT Co-supervisor: YANG, Yi / ISOM Student: WONG, Hoi Tin / FINA Course: UROP1100, Spring BERT model is a Large Language Model (LLM) which is able to perform a wide range of tasks compared to long short-term memory networks or convolutional neural networks which perform certain specific tasks (Christopher D., 2022; Jacob et al., 2019). This project used the pre-trained bert-based-cased model to perform entity-level sentiment analysis in a financial news context. The implementation of the project included data annotations and model tuning. The resulted model trained on about 3000 examples was able to classify neutral entities with 0.45 recall. However, it had poor performance in terms of classifying positive or negative entities. Possible improvements in terms of data annotation and model choice will be discussed in the end of this report. Deep Learning in Natural Language Processing Supervisor: HUANG, Allen / ACCT Co-supervisor: YANG, Yi / ISOM Student: XIAO, Yicong / QFIN Course: UROP2100, Fall This project constructed StackMLP, an ensemble learning model that uses multiple layer perceptron as its meta-learner for task of financial report fraud detection. Comparing with individual models such as Logistic Regression, StackMLP aggregates the prediction of base learners into the embedding space of excerpts, and gives its own output based on a “weighted vote” of the both original word-level embeddings and base learners’ prediction, where the weights are trainable. An experiment on predicting fraudulent text segments from Hong Kong Exchange listed companies’ financial reports is carried out to compare the performance of StackMLP with those of individual base learners and a more common implementation of ensemble learning, with the result showing that StackMLP produced significantly better performance. (Code: https://github.com/yxiaoaz/UROP_StackMLP)

RkJQdWJsaXNoZXIy NDk5Njg=