國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,評估公司持續經營狀況:使用預測模型與文字探勘技術,Evaluation of the going-concern status for companies:Using prediction model and text mining technique

論文名稱 Title	評估公司持續經營狀況:使用預測模型與文字探勘技術 Evaluation of the going-concern status for companies:Using prediction model and text mining technique
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	109 學年度第 2 學期 The spring semester of Academic Year 109	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	72
研究生 Author	溫騏馹 Qi-Ri Wen
指導教授 Advisor	許育峯, 李偉柏 Yu-Feng Hsu; Lee,Wei-Po
召集委員 Convenor	楊新章 Hsin-Chang Yang
口試委員 Advisory Committee	王明昌 Ming-Chang Wang
口試日期 Date of Exam	2021-08-18	繳交日期 Date of Submission	2021-08-26
關鍵字 Keywords	Going-Concern、Red-Flag、BERT、lime、Random Forest、Tokenizer、LDA、TF-IDF Going-Concern, Red-Flag, BERT, lime, Random Forest, Tokenizer, LDA, TF-IDF
統計 Statistics	本論文已被瀏覽 604 次，被下載 5 次 The thesis/dissertation has been browsed 604 times, has been downloaded 5 times.

中文摘要
會計師要評估公司Going-Concern意見是困難且複雜的工作，會計師在審核公司Going-Concern時，為了在審核過程中使用相關資訊並做出正確的決定，必須考慮到公司不同的關鍵因素，如財務報表的關鍵因素、財務指標的關鍵因素…等等。為了支持會計師審核意見，我們採取一系列的實驗方法。首先，使用traditional Machine learning(ML)對財務數據預測Going-Concern，接下來是使用traditional ML與Deep learning(DL)對文字數據預測Going-Concern，並找最佳預測模型。為了探討頭條新聞與MD&A的模型最佳績效，我們分別使用TF-IDF、LDA、BERT、Tokenizer四種模型轉換文字的資料型態，將TF-IDF和LDA轉變的資料型態的數據輸入traditional ML模型並將BERT和Tokenizer轉變的資料型態的數據輸入DL預測Going-Concern。在實驗前，我們會把財務數據與頭條新聞數據分成三個年間，分別是2001~2006、2007~2008、2009~2019為了比較模型在不同年間的績效並了解模型預測效果在金融風暴前後有何不同。為了增加模型預測Going-Concern的可解釋性，最後我們蒐集2001~2019的頭條新聞與MD&A，使用BERT結合Random Forest(RF)和Local Interpretable Model-agnostic Explanations(lime)從MD&A與頭條新聞中找到’Red-Flag’，即被模型判定為Going-Concern的公司，探討MD&A、頭條新聞會有哪些字詞或句子會有問題？在比較模型績效的部分，由於數據在不同區間有數據分佈不平均的問題，我們使用ROC curve , Kappa value, Precision, Recall及F1_score五個評量指標比較模型的表現。
Abstract
Evaluating a Going-Concern opinion is a difficult and complex work for accountant. When the accountant reviewed the Going-Concern status of company, they have to consider about company different critical factors so that they can make a right result by using information from auditing process. Such as the key factors of financial statement, key factors of financial indicators and so on. We adopt a series of experiments for the proposed method. First, Going-Concern prediction on financial data by using traditional machine learning models. Next is using traditional machine learning models and deep learning models to predict Going-Concern status on linguistic data and finding best predictive model. We use four types of models like TF-IDF, LDA, BERT, Tokenizer to segment the text in order to discussing the best performance models of headline news and MD&A. We predict Going-Concern status by inputting type of TF-IDF and type of LDA into traditional machine learning models and inputting type of BERT and type of Tokenizer into deep learning models. Before the experiment, we will divide the financial data and headline news into third years, namely time of 2001~2006, time of 2007~2008, time of 2009~2019 so that comparing the performance of models in different years and understanding how the different of model prediction in different time intervals. Finally, we collect headline news and MD&A from time internal of 2001~2009 and using BERT to combine Random Forest and Local Interpretable Model-agnostic Explanations(lime) in order to finding the ‘Red-Flag’, which are judged as Going-Concern and increasing the interpretability of model prediction. In other words, we can discuss what the word or sentence has problem in company. These word or sentence can supply to accountant and external investors for reference. About comparing models performance. Due to the problem of data imbalance in different intervals, five performance indicators of ROC curve, Kappa value, Precision, Recall and F1-score are used to compare the performance of models.

目次 Table of Contents
論文審定書 i 公開授權書 ii 摘要 iii Abstract iv 目錄 v 圖次 viii 表次 ix 第一章緒論 1 1.1. 研究背景與動機 1 1.2. 研究目的 1 1.3. 研究方法與流程 2 1.4. 研究貢獻 3 第二章文獻探討 4 第三章研究方法與步驟 9 3.1. 研究方法 9 3.2. 傳統機器學習模型 9 3.3. 文字探勘模型 10 3.4. 資料不平衡(Data imbalance)出處理方法 13 第四章研究結果與討論 15 4.1. Dataset 15 4.2. 評量指標 18 4.3. 不同年間對Going-Concern預測的影響 19 4.4. 財務數據在2001~2006年間的模型設計與效能比較 20 4.4.1. 2001~2006年間的財務數據 20 4.4.2. 2001~2006年間的下採樣財務數據 21 4.4.3. 2001~2006年間的上採樣財務數據 22 4.5. 財務數據在2007~2008年間的模型設計與比較 24 4.5.1. 2007~2008年間的財務數據 24 4.5.2. 2007~2008年間的下採樣財務數據 25 4.5.3. 2007~2008年間的上採樣財務數據 26 4.6. 2009~2019年間的模型設計與比較 28 4.6.1. 2009~2019年間的財務數據 28 4.6.2. 2009~2019年間的下採樣財務數據 29 4.6.3. 2009~2019年間的上採樣財務數據 30 4.7. 頭條新聞在2001~2006年間的模型設計與比較 32 4.7.1. 2001~2006年間的頭條新聞 32 4.8. 2007~2008年間的頭條新聞模型設計與比較 33 4.8.1. 2007~2008年間的頭條新聞 33 4.8.2. 2007~2008年間的的下採樣頭條新聞 35 4.8.3. 2007~2008年間的上採樣頭條新聞 37 4.9. 2009~2019年間的頭條新聞模型設計與比較 40 4.9.1. 2009~2019年間的頭條新聞 40 4.9.2. 2009~2019年間的下採樣頭條新聞 41 4.9.3. 2009~2019年間的上採樣頭條新聞 43 4.10. 頭條新聞與財務數據的最佳模型比較 46 4.11. 2001~2019年間的頭條新聞與MD&A模型設計與比較 46 4.11.1. 頭條新聞在2001~2019年間的模型設計與比較 46 4.11.2. MD&A與頭條新聞在2001~2019年間的模型設計與比較 48 4.11.3. 頭條新聞與MD&A在2001~2019的Red-Flag探討 50 4.11.4. 頭條新聞與MD&A在2001~2019年間的Red-Flag比較 54 第五章結論 56 5.1. 總結 56 5.2. 建議與限制 56 5.3. 未來展望 57 參考文獻 58

參考文獻 References
[1] J. E. Ellingsen, K. Pany, and P. Fagan, "SAS No. 59: How to evaluate going concern," Journal of Accountancy, vol. 167, no. 1, p. 24, 1989. [2] K. C. Chen and B. K. Church, "Default on debt obligations and the issuance of going-concern opinions," Auditing, vol. 11, no. 2, p. 30, 1992. [3] M. A. Geiger and K. Raghunandan, "Auditor tenure and audit reporting failures," Auditing: A Journal of Practice & Theory, vol. 21, no. 1, pp. 67-78, 2002. [4] J. C. McKeown, J. F. Mutchler, and W. Hopwood, "Towards an explanation of auditor failure to modify the audit opinions of bankrupt companies," Auditing-a Journal Of Practice & Theory, vol. 10, pp. 1-13, 1991. [5] E. I. Altman, "Accounting implications of failure prediction models," Journal of Accounting, Auditing and Finance, vol. 6, no. 1, pp. 4-19, 1982. [6] T. B. Bell and R. H. Tabor, "Empirical analysis of audit uncertainty qualifications," Journal of Accounting Research, vol. 29, no. 2, pp. 350-370, 1991. [7] N. Dopuch, R. W. Holthausen, and R. W. Leftwich, "Predicting audit qualifications with financial and market variables," Accounting Review, pp. 431-454, 1987. [8] H. C. Koh, "Model predictions and auditor assessments of going concern status," Accounting and Business Research, vol. 21, no. 84, pp. 331-338, 1991. [9] A. S. Levitan and J. A. Knoblett, "Indicators of exceptions to the going concern assumption," Auditing-A Journal of Practice & Theory, vol. 5, no. 1, pp. 26-39, 1985. [10] K. Menon and K. B. Schwartz, "An empirical investigation of audit qualification decisions in the presence of going concern uncertainties," Contemporary Accounting Research, vol. 3, no. 2, pp. 302-315, 1987. [11] M. Anandarajan and A. Anandarajan, "A comparison of machine learning techniques with a qualitative response model for auditor’s going concern reporting," Expert Systems with Applications, vol. 16, no. 4, pp. 385-392, 1999. [12] H. C. Koh and C. K. Low, "Going concern prediction using data mining techniques," Managerial Auditing Journal, 2004. [13] D. Martens, L. Bruynseels, B. Baesens, M. Willekens, and J. Vanthienen, "Predicting going concern opinion with data mining," Decision Support Systems, vol. 45, no. 4, pp. 765-777, 2008. [14] E. B. Deakin, "A discriminant analysis of predictors of business failure," Journal of accounting research, pp. 167-179, 1972. [15] D. F. Robinson and L. R. Foulds, "Comparison of phylogenetic trees," Mathematical biosciences, vol. 53, no. 1-2, pp. 131-147, 1981. [16] K. L. Lunetta, L. B. Hayward, J. Segal, and P. Van Eerdewegh, "Screening large-scale association study data: exploiting interactions using random forests," BMC genetics, vol. 5, no. 1, p. 32, 2004. [17] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001. [18] W. Feller, "The strong law of large numbers," in An introduction to probability theory and its applications, vol. 1, no. 3): Wiley, 1968, pp. 243-245. [19] L. Breiman, "Bagging predictors," Machine learning, vol. 24, no. 2, pp. 123-140, 1996. [20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018. [21] Y. Zhang and Z. Rao, "Deep neural networks with pre-train model BERT for aspect-level sentiments classification," in 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), 2020: IEEE, pp. 923-927. [22] T. Mikolov, S. Kombrink, L. Burget, J. Černocký, and S. Khudanpur, "Extensions of recurrent neural network language model," in 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2011: IEEE, pp. 5528-5531. [23] J. F. Mutchler, "A multivariate analysis of the auditor's going-concern opinion decision," Journal of Accounting research, pp. 668-682, 1985. [24] H. C. Koh, "The sensitivity of optimal cutoff points to misclassification costs of type I and type II errors in the going‐concern prediction context," Journal of Business Finance & Accounting, vol. 19, no. 2, pp. 187-197, 1992. [25] M. J. Lenard, P. Alam, and D. Booth, "An analysis of fuzzy clustering and a hybrid model for the auditor's going concern assessment," Decision Sciences, vol. 31, no. 4, pp. 861-884, 2000. [26] C. Y. Shirata and M. Sakagami, "An analysis of the “going concern assumption’: Text mining from Japanese financial reports," Journal of Emerging Technologies in Accounting, vol. 5, no. 1, pp. 1-16, 2008. [27] C. Y. Shirata, H. Takeuchi, S. Ogino, and H. Watanabe, "Extracting key phrases as predictors of corporate bankruptcy: Empirical analysis of annual reports by text mining," Journal of emerging technologies in accounting, vol. 8, no. 1, pp. 31-44, 2011. [28] A.-H. Tan, "Text mining: The state of the art and the challenges," in Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases, 1999, vol. 8: Citeseer, pp. 65-70. [29] E. Henry, "Are investors influenced by how earnings press releases are written?," The Journal of Business Communication (1973), vol. 45, no. 4, pp. 363-407, 2008. [30] E. Henry and A. J. Leone, "Measuring qualitative information in capital markets research: Comparison of alternative methodologies to measure disclosure tone," The Accounting Review, vol. 91, no. 1, pp. 153-178, 2016. [31] R. Feldman, S. Govindaraj, J. Livnat, and B. Segal, "Management’s tone change, post earnings announcement drift and accruals," Review of Accounting Studies, vol. 15, no. 4, pp. 915-953, 2010. [32] J. L. Rogers, A. Van Buskirk, and S. L. Zechman, "Disclosure tone and shareholder litigation," The Accounting Review, vol. 86, no. 6, pp. 2155-2183, 2011. [33] J. L. Campbell, H. Chen, D. S. Dhaliwal, H.-m. Lu, and L. B. Steele, "The information content of mandatory risk factor disclosures in corporate filings," Review of Accounting Studies, vol. 19, no. 1, pp. 396-455, 2014. [34] M.-Y. Day and C.-C. Lee, "Deep learning for financial sentiment analysis on finance news providers," in 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016: IEEE, pp. 1127-1134. [35] H. Xu, B. Liu, L. Shu, and P. S. Yu, "Bert post-training for review reading comprehension and aspect-based sentiment analysis," arXiv preprint arXiv:1904.02232, 2019. [36] J. Surowiecki, "The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business," Economies, Societies and Nations, vol. 296, p. 5, 2004. [37] D. Frosyniotis, A. Stafylopatis, and A. Likas, "A divide-and-conquer method for multi-net classifiers," Pattern Analysis & Applications, vol. 6, no. 1, pp. 32-40, 2003. [38] T. K. Ho, J. J. Hull, and S. N. Srihari, "Decision combination in multiple classifier systems," IEEE transactions on pattern analysis and machine intelligence, vol. 16, no. 1, pp. 66-75, 1994. [39] G. Chen and B. Kégl, "Invariant pattern recognition using contourlets and AdaBoost," pattern recognition, vol. 43, no. 3, pp. 579-583, 2010. [40] E. Kim, W. Kim, and Y. Lee, "Combination of multiple classifiers for the customer's purchase behavior prediction," Decision Support Systems, vol. 34, no. 2, pp. 167-175, 2003. [41] W. Jiang, K. H. Rupley, and J. Wu, "Internal control deficiencies and the issuance of going concern opinions," Research in Accounting Regulation, vol. 22, no. 1, pp. 40-46, 2010. [42] D. West, S. Dellana, and J. Qian, "Neural network ensemble strategies for financial decision applications," Computers & operations research, vol. 32, no. 10, pp. 2543-2559, 2005. [43] C. Catal and B. Diri, "Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem," Information Sciences, vol. 179, no. 8, pp. 1040-1058, 2009. [44] J. C.-W. Chan and D. Paelinckx, "Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery," Remote Sensing of Environment, vol. 112, no. 6, pp. 2999-3011, 2008. [45] D. R. Cutler et al., "Random forests for classification in ecology," Ecology, vol. 88, no. 11, pp. 2783-2792, 2007. [46] R. Diaz-Uriarte, "GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest," BMC bioinformatics, vol. 8, no. 1, p. 328, 2007. [47] R. Genuer, J.-M. Poggi, and C. Tuleau-Malot, "Variable selection using random forests," Pattern recognition letters, vol. 31, no. 14, pp. 2225-2236, 2010. [48] J. Lundström and A. Verikas, "Assessing print quality by machine in offset colour printing," Knowledge-Based Systems, vol. 37, pp. 70-79, 2013. [49] V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez, "An assessment of the effectiveness of a random forest classifier for land-cover classification," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 67, pp. 93-104, 2012. [50] Y. Sakiyama et al., "Predicting human liver microsomal stability with machine learning techniques," Journal of molecular graphics and Modelling, vol. 26, no. 6, pp. 907-915, 2008. [51] C.-C. Yeh, F. Lin, and C.-Y. Hsu, "A hybrid KMV model, random forests and rough set theory approach for credit rating," Knowledge-Based Systems, vol. 33, pp. 166-172, 2012. [52] Y. F. Hsu and W. P. Lee, "Evaluation of the going‐concern status for companies: An ensemble framework‐based model," Journal of Forecasting, vol. 39, no. 4, pp. 687-706, 2020. [53] P. Craja, A. Kim, and S. Lessmann, "Deep learning for detecting financial statement fraud," Decision Support Systems, vol. 139, p. 113421, 2020.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0726121-161547.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS