國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以機器學習預測臺灣上市個股績效並建立投資組合,Forecast Performance of Listed Stocks in Taiwan Market by Machine Learning and Construct the Portfolio

論文名稱 Title	以機器學習預測臺灣上市個股績效並建立投資組合 Forecast Performance of Listed Stocks in Taiwan Market by Machine Learning and Construct the Portfolio
系所名稱 Department	管理學院國際經營管理碩士學程 Master of Business Administration Program in International Business
畢業學年期 Year, semester	109 學年度第 2 學期 The spring semester of Academic Year 109	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	91
研究生 Author	譚　莉 Li Tan
指導教授 Advisor	鄭義 Jeng,Yih
召集委員 Convenor	張世賢 JHANG,SHIH-SIAN
口試委員 Advisory Committee	林信惠 Lin,Hsin-hui
口試日期 Date of Exam	2021-08-07	繳交日期 Date of Submission	2021-08-19
關鍵字 Keywords	投資組合、機器學習、Black-Litterman 模型、特徵工程、投資者觀點預測、監督式學習 Portfolio, Machine Learning, Black-Litterman Model, Feature Engineering, Investor View Forecast, Supervised Learning
統計 Statistics	本論文已被瀏覽 320 次，被下載 89 次 The thesis/dissertation has been browsed 320 times, has been downloaded 89 times.

中文摘要
隨著近年金融科技愈受歡迎及進步，愈來愈多新穎的數學方法及模型被應用於金融產業已用於增加獲利。在眾多方法中，以機器學習最為出名且經常用於相關文獻中以研究是否可以在最有效的時間內預測經濟及市場情勢。本文將利用相關特徵嘗試增進機器學習模型之精準度及準確度以預測臺灣市場狀況，再進一步結合Black-Litterman模型以建立有效且可獲利之投資組合。同時，針對Black-Litterman模型的觀點定義也將被進一步改進，使模型更加具有使用性。研究結果顯示機器學習模型確實可為一預測市場狀況及標籤之有效方法，然而，資料庫之品質及量可能對精準度及準確度的影響極大，且當年分遭遇特殊事件時，精準度及準確率仍可維持並獲得報酬。
Abstract
With the increasing popularity of financial technology (fintech), an increasing number of methods are used in the financial industry with the aim of improving profitability. Machine learning is the famous one that we often see in the research which try to predict the market’s situation. In this paper, machine learning algorithms will be utilized with Black-Litterman model. How to enhance the accuracy form predictions of machine learning and input the ideal parameters in the Black-Litterman model will be paper’s goal. The research will use machine learning, and add macroeconomic indicators and industry dummy as features to test if the machine learning can really find out better stocks in Taiwan and makes the new outperform portfolio. At the same time, the definition of the view in the Black-Litterman model will be improved in order to let the model be much more feasible. As the results, machine learning can be a feasible tool in predict the labels of every stock. However, the quality and amount of data will affect the accuracy. Besides, when the specific incidents take place, accuracy and precision can be maintained and earn profits.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii I. Introduction 1 1. General Background Information 1 2. Research Purpose 3 II. Literature Review 5 1. Black-Litterman Model 5 2. Factors in Model 7 3. Machine Learning 10 3.1 Machine Learning 10 3.2 Decision Tree 10 3.3 Feature Engineering and Feature Selection 11 4. Finance with Machine Learning 12 5. Performance Evaluation 14 5.1 Sharpe Ratio 15 5.2 Max Drawdown 16 5.3 Accuracy 16 5.4 Precision 17 III. Methodology 18 1. Data Description 19 1.1 Period of Data 19 1.2 Data Source 19 2. Tool 20 3. Black-Litterman Model 20 3.1 Covariance Matrix 20 3.2 Implied Excess Equilibrium Return 21 3.3 View of Investors 21 3.4 New Combined Return 23 4. Machine Learning 23 4.1 Feature Engineering 24 4.2 Classifier 26 4.3 Supervised Learning 28 4.4 Missing Data 30 4.5 Ensemble Model 30 5. Portfolio Construction 33 5.1 Portfolio Components 33 5.2 Weight 33 IV. Empirical Result 35 1. Benchmark 35 2. Descriptive Statistics 35 3. Results and Performances 35 3.1 Performance Analysis 36 3.2 Accuracy and Precision 39 3.3 Features Selection 44 4. Performance During Specific Period 48 4.1 Pandemic Crisis 48 4.2 Bullish Market After Bearish Market 56 V. Conclusion 61 References 66 Appendix 72 Figure Figure 2- 1 The structure of new combined return forming 6 Figure 2- 2 Example of decision tree 11 Figure 3- 1 Research framework 18 Figure 3- 2 Training period example 19 Figure 4- 1Cumulative return of BL-Portfolio and benchmarks 36 Figure 4- 2 Cumulative return of Adj. BL-Portfolio and benchmarks 37 Figure 4- 3 Cumulative return of BL-Portfolio and benchmark during 2020 49 Figure 4- 4 Cumulative return of Adj. BL-Portfolio and benchmark during 2020 50 Figure 4- 5 Cumulative return of BL-Portfolio and benchmark during 2009 56 Figure 4- 6 Cumulative return of Adj. BL-Portfolio and benchmark during 2009 57 Table Table 4- 1 Performance of BL-Portfolio and benchmark 36 Table 4- 2 Performance of the Adj. BL-Portfolio 37 Table 4- 3 Classifier A’s accuracy and precision 39 Table 4- 4 BL-Portfolio classifier A’s performance of prediction 40 Table 4- 5 Adj. BL-Portfolio classifier A’s performance of prediction 41 Table 4- 6 Classifier B’s accuracy and precision 41 Table 4- 7 Classifier B’s performance of prediction 42 Table 4- 8 Top 5 features* 44 Table 4- 9 Classifier A’s feature proportion 46 Table 4- 10 Classifier B’s feature proportion 47 Table 4- 11 Performance of Adj. BL-Portfolio and benchmarks during 2020 49 Table 4- 12 Performance of Adj. BL-Portfolio and benchmarks during 2020 50 Table 4- 13 Classifier A’s accuracy and precision during 2020 51 Table 4- 14 Classifier B’s accuracy and precision during 2020 51 Table 4- 15 Classifier A’s performance of prediction of portfolio during 2020 52 Table 4- 16 Classifiers’ performance of prediction during 2020 52 Table 4- 17 Classifier A’s feature proportion during 2020 54 Table 4- 18 Top 5 features during 2020* 55 Table 4- 19 Performance of BL-Portfolio and benchmark during 2009 57 Table 4- 20 Performance of BL-Portfolio and benchmark during 2009 58 Table 4- 21 Classifier A’s accuracy and precision during 2009 58 Table 4- 22 Classifier A’s performance of prediction of portfolio during 2009 59

參考文獻 References
English Literature 1. Ang, A., & Chen, J. (2002). Asymmetric correlations of equity portfolios. Journal of Financial Economics, 63(3), 443–494. 2. Assiri, A. S., Nazir, S., & Velastin, S. A. (2020). Breast Tumor Classification Us-ing an Ensemble Machine Learning Method. Journal of Imaging, 6(6), 39. https://doi.org/10.3390/jimaging6060039 3. Awan, F., Saleem, Y., Minerva, R., & Crespi, N. (2020). A Comparative Analysis of Machine/Deep Learning Models for Parking Space Availability Prediction. Sensors, 20. https://doi.org/10.3390/s20010322 4. Black, F., & Litterman, R. B. (1991). Asset Allocation: Combining Investor Views with Market Equilibrium. The Journal of Fixed Income. https://doi.org/10.3905/jfi.1991.408013 5. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 6. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Taylor & Francis. 7. Carhart, M. M. (1997). On Persistence in Mutual Fund Performance. The Journal of Finance, 52(1), 57–82. https://doi.org/10.2307/2329556 8. Chen, N.-F., Roll, R., & Ross, S. A. (1986). Economic Forces and the Stock Mar-ket. The Journal of Business, 59(3), 383–403. 9. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 10. Chlebus, M., & Osika, Z. (2020). Comparison of tree-based models performance in prediction of marketing campaign results using Explainable Artificial Intelli-gence tools (Working Paper No. 2020–15). Faculty of Economic Sciences, Uni-versity of Warsaw. https://econpapers.repec.org/paper/warwpaper/2020-15.htm 11. Daniel, K., Hirshleifer, D., & Sun, L. (2020). Short- and Long-Horizon Behavioral Factors. The Review of Financial Studies, 33(4), 1673–1736. https://doi.org/10.1093/rfs/hhz069 12. Daoud, E. A. (2019). Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset. International Journal of Computer and Information Engineering, 13(1), 6–10. 13. Donthireddy, P. (2018). Black-Litterman Portfolios with Machine Learning de-rived Views. https://doi.org/10.13140/RG.2.2.26727.96160 14. Fama, E. F., & French, K. R. (1992). The Cross-Section of Expected Stock Re-turns. The Journal of Finance, 47(2), 427–465. https://doi.org/10.1111/j.1540-6261.1992.tb04398.x 15. Fama, E. F., & French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1–22. https://doi.org/10.1016/j.jfineco.2014.10.010 16. G. Beat, B., C. Smith, R., & Ward, P. (2009). The Barra Europe equity model (EUE3). Research Notes. 17. Harris, R., Stoja, E., & Tan, L. (2016). The Dynamic Black-Litterman Approach to Asset Allocation. European Journal of Operational Research, 259. https://doi.org/10.1016/j.ejor.2016.11.045 18. Heaton, J. B., Polson, N. G., & Witte, J. H. (2016). Deep Learning in Finance. ArXiv. http://arxiv.org/abs/1602.06561 19. Hou, K., Xue, C., & Zhang, L. (2015). Digesting Anomalies: An Investment Ap-proach. The Review of Financial Studies, 28(3), 650–705. https://doi.org/10.1093/rfs/hhu068 20. Idzorek, T. (2007). A step-by-step guide to the Black-Litterman model: Incorpo-rating user-specified confidence levels. In S. Satchell (Ed.), Forecasting Expected Returns in the Financial Markets (pp. 17–38). Academic Press. https://doi.org/10.1016/B978-075068321-0.50003-0 21. Jhaveri, S., Khedkar, I., Kantharia, Y., & Jaswal, S. (2019). Success Prediction using Random Forest, CatBoost, XGBoost and AdaBoost for Kickstarter Cam-paigns. 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 1170–1173. https://doi.org/10.1109/ICCMC.2019.8819828 22. Kang, X., Lin, G., Chen, Y., Zhao, F., Zhang, E., & Jing, C. (2020). Robust and secure zero-watermarking algorithm for color images based on majority voting pattern and hyper-chaotic encryption. Multimedia Tools and Applications, 79(1), 1169–1202. https://doi.org/10.1007/s11042-019-08191-y 23. Khanna, S. K. (2019). Machine Learning v/s Deep Learning. 06(02), 4. 24. Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), 77–91. https://doi.org/10.2307/2975974 25. Meucci, A. (2008). The Black-Litterman Approach: Original Model and Exten-sions (SSRN Scholarly Paper ID 1117574). Social Science Research Network. https://doi.org/10.2139/ssrn.1117574 26. Meucci, A. (2009). ‪Risk and asset allocation‬. Springer Science & Business Media. https://scholar.google.com/citations?view_op=view_citation&hl=en&user=9iZ0GUQAAAAJ&citation_for_view=9iZ0GUQAAAAJ:RHpTSmoSYBkC‬‬ 27. Novy-Marx, R. (2013). The other side of value: The gross profitability premium. Journal of Financial Economics, 108(1), 1–28. 28. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2019). CatBoost: Unbiased boosting with categorical features. NeurIPS 2018 Paper. http://arxiv.org/abs/1706.09516 29. Quinlan, J. R. (1986). Induction of Decision Trees. Machine Language, 1(1), 81–106. https://doi.org/10.1023/A:1022643204877 30. Rasekhschaffe, K. C., & Jones, R. C. (2019). Machine Learning for Stock Selec-tion. Financial Analysts Journal, 75(3), 70–88. https://doi.org/10.1080/0015198X.2019.1596678 31. Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3(3), 210–229. https://doi.org/10.1147/rd.33.0210 32. Stambaugh, R., & Yuan, Y. (2017). Mispricing Factors. The Review of Financial Studies, 1270–1315. https://doi.org/10.1093/rfs/hhw107 33. Walters, J. (2008). The Black¬Litterman Model: A Detailed Exploration. 39. 34. Weiss, S. M., & Indurkhya, N. (1998). Predictive Data Mining: A Practical Guide. Morgan Kaufmann. 35. Yang, J., Qu, Z., & Liu, Z. (2014). Improved Feature-Selection Method Consider-ing the Imbalance Problem in Text Categorization. The Scientific World Journal, 2014, 1–17. https://doi.org/10.1155/2014/625342 36. Yang, Y.-Q. (2021). Empirical Study on Industrial Trend Factor Trading Strate-gies in Taiwan Stock Market. National Sun Yat-sen University. 中文文獻 1. 黃介琳, & 林朝陽. (2011). 考慮異質變異風險值估計之比較－以外匯美元為例. 貨幣觀測與信用評等, 63–71.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0719121-135648.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS