UROP Proceedings 2021-22

School of Business and Management Department of Accounting 168 Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: MORSI Mohamed Sobhy Mohamed Hassan / SENG Course: UROP1000, Summer This study aims at fine-tuning the domain-specific pre-trained model FinBeRT to create a multi-class text classifier in the finance domain. Further, the results were compared against various other NLP models, namely the uncased base BeRT model, RoBeRTa model, Support Vector Machine Classifiers, Decision Trees, Random Forests, Logistic Regression, and Naive Bayes Classifiers. Experiments have shown that the BeRT family of models outperformed the other methods, with minor nuances in test accuracy and robustness. The main task given to these models was to classify ESG sentences into 9 predefined categories according to their meanings. The code and datasets used in this report are available on the following GitHub fork: https://github.com/mohamedsobhi777/FinBERT Deep Learning in Natural Language Processing Supervisor: HUANG Allen Hao / ACCT Co-supervisor: YANG Yi / ISOM Student: OUYANG Xiangting / SBM Course: UROP1000, Summer In recent years, ESG ratings have been arousing the attention of investors and driving their need to acquire related information. As manually analyzing some financial information may be labor consuming, there are some studies on financial sentiment analysis based on deep learning NLP algorithms. To effectively perform the analysis of ESG related sentences, we apply FinBERT to compare its performance of classification task with BERT. We prepare a sample of sentences from ESG reports of different firms and label them according to the MSCI ESG Ratings Methodology. Contrary to our prediction, BERT performs slightly better than FinBERT in ESG classification, which may not suggest superior performance of deep learning NLP algorithms that include domain-specific knowledge (FinBERT) than those rely on general texts (BERT) in a specific domain.

RkJQdWJsaXNoZXIy NDk5Njg=