UROP Proceedings 2021-22

School of Business and Management Department of Accounting 169 Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: SHE Fong Wing / RMBI Course: UROP1100, Summer In this paper, the performance of FinBERT, the Large Language Model that adapts to the financial domain, on multi-class classification of ESG categories based on the MSCI framework is evaluated. ESG sentences that are used in this research are extracted from the CSR reports of companies in 11 different industries. The evaluation is based on comparisons with other 11 Deep Learning and machine learning algorithms, including Naïve Bayes, Logistic Regression, Linear SVM, Random Forest, MLP, CNN, LSTM, Bi-directional LSTM, GRU, and BERT. Through fine-tuning of all the mentioned models, FinBERT is found giving the best performance in the task, but still have its performance extremely close to BERT, and the per-class precisions of FinBERT are less stable than BERT. Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: XIAO Yicong / QFIN Course: UROP1100, Summer This project fine-tuned a pre-trained FinBERT model for classifying financial text in ESG context into 9 predefined categories. The accuracy was then compared with those obtained by training and testing 4 Bag-ofWords based algorithms including Logistic Regression, Random Forest, Linear Support Vector Classification, and Naive Bayes, as well as 2 deep learning models including BERT and LSTM on the same data set. The project found that the deep learning models selected, which process text sequences instead of word counts, consistently outperformed Bag-of-Words based algorithms. BERT and FinBERT were also found to outperform LSTM and showed consistency in accuracy when reducing the size of the training set. No significant difference in performance was found between BERT and FinBERT. Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: ZOU Xinying / ACCT Course: UROP1100, Summer In the UROP project, we conducted data collecting by gathering ESG reports of companies in various industry with different scales from the internet, followed by data matching work that assigns sentences in those reports into different ESG labels. To make a preliminary analysis on the matching data, we compared the labeling result done by two different students and adjusted the data to get a more reasonable label through discussion in the weekly meetings. Finally, we did the fine tuning on FinBERT (BERT for Financial Text Mining) to classify ESG discussions.

RkJQdWJsaXNoZXIy NDk5Njg=