UROP Proceedings 2021-22

School of Business and Management Department of Accounting 166 Department of Accounting Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: CHENG Bo / IS Course: UROP1000, Summer This study aims to investigate the accuracy of the FinBERT model, a domain-specific pre-trained deep learning NLP model, in classifying the Environment, Social, and Governance (ESG) categories from companies’ ESG reports or Sustainability reports. After manually labeling sentences from the annual reports into respective ESG categories based on the MSCI ESG rating framework, 4,500 sentences from 55 companies in 11 industries are used as samples to proceed with model fine-tuning and testing. Two Bidirectional Encoder Representations from Transformer (BERT) models are chosen to train and compare their performances: the base BERT model (cased) and the FinBERT model. The overall findings suggest the fine-tuned FinBERT model outperforms others for the ESG classification accuracy though the alternative pre-trained BERT model produces qualitatively similar results. Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: GONG Xiuyi / FINA Course: UROP1100, Summer Bert has been widely recognized as a state-of-the-art pre-trained model of natural language processing. Finbert as an advanced version of NLP in the Financial area has also achieved ideal performance in tasks related to financial texts. In this report, an attempt was made to further finetune Fin-bert on an ESG-themed dataset to check its performance. The dataset is made up of sentences randomly selected from the ESG report of S&P 500 companies and labeled with one ESG theme. The NLP task is text multi-class classification with 9 classes of different ESG focus. And comparisons were also made among five other models including Bert, Random Forest, Support Vector Machine, Naive Bayes, and Long short-term memory. Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: HE Zhiyuan / MATH-SF Course: UROP1000, Summer In this paper, I used a dataset with sentences from ESG reports testing the performance of the pretrained finBERT model in classifying the sentences related to finance. The following is the main structure of the paper. At the start, I test the performance of finBERT model under different values of four hyperparameters, and discuss the possible reason behind the behaviour of the model. Then picking the best-performing values of all four hyperparameter to produce the best finBERT model in this classification task. In the last part, I test the performance of finBERT model while training dataset is severely unbalanced.

RkJQdWJsaXNoZXIy NDk5Njg=