國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以基因規劃法建構 Alpha 因子－台股之實證研究,Genetic Programming-based Construction of Alpha Factors：Evidence from Taiwan

論文名稱 Title	以基因規劃法建構 Alpha 因子－台股之實證研究 Genetic Programming-based Construction of Alpha Factors：Evidence from Taiwan
系所名稱 Department	財務管理學系 Department of Finance
畢業學年期 Year, semester	110 學年度第 2 學期 The spring semester of Academic Year 110	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	72
研究生 Author	楊雅鈞 Ya-Jun Yang
指導教授 Advisor	王昭文 Chou-Wen Wang
召集委員 Convenor	王銘駿 Ming-Chun Wang
口試委員 Advisory Committee	吳錦文, 陳昇鴻 Chin- Wen Wu; Sheng-Hung Chen
口試日期 Date of Exam	2022-06-20	繳交日期 Date of Submission	2022-07-12
關鍵字 Keywords	投資組合、股票因子、基因規劃法、自動化特徵工程、特徵篩選、機器學習 Portfolio, Stock Factor, Genetic Programming, Automated Feature Engineering, Feature Selection, Machine Learning
統計 Statistics	本論文已被瀏覽 458 次，被下載 0 次 The thesis/dissertation has been browsed 458 times, has been downloaded 0 times.

中文摘要
本研究提出以基因規劃法與機器學習之雙層機制構建合成股票因子及投資組合，從 2006 年至 2020 年每年以基因規劃法生成股票因子形成Alpha因子池，累積15年共包含150個因子，爾後以嵌入法選取因子並預測股票上漲機率用於建構投資組合。在樣本外績效當中，實證結果顯示本研究使用的三個預測模型之中，其中兩個Boosting模型之Sharpe比率與年化報酬率皆優於Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)報酬指數，不過最大虧損與年化波動率為略高，特別的是，當股票數量逐漸增加時，eXtreme Gradient Boosting (XGBoost) 在兩個風險指標上趨近於 TAIEX報酬指數。資訊比率方面，即使Light Gradient Boosting Machine (LightGBM)的值為所有模型當中最高者，不過，整體而言XGBoost相對穩定。最後，在樣本外穩健性測試中，測試結果發現在因子池當中有7個Alpha因子與未來20天的報酬具有統計顯著性相關。
Abstract
This paper proposes a double-selection method for systematically constructing synthesized stock factors and portfolios. In this procedure, which builds an Alpha factor pool using Genetic Programming for a time period spanning from 2006 to 2020, including a total of 150 Alpha factors, and thereby select the Alpha factors through embedded method to predict the upward probability of individual stocks for constructing portfolios. For the out-of-sample performance, the empirical results show that the Sharpe ratios and the annualized returns for the two Boosting models are greater than for the TAIEX Total Return Index, but the maximum drawdowns and the annualized volatilities tend to underperform. In particular, the XGBoost is nearly identical to the TAIEX Total Return Index in both risk indicators when the number of the stocks increases in the portfolios. For the Information ratio, the LightGBM is the largest model among all models. However, the XGBoost is relatively stable on the whole. Finally, in the case of the out-of-sample robustness test, seven Alpha factors from my Alpha factor pool are statistically significant correlated with returns over the following 20 trading days.

目次 Table of Contents
論文審定書 i 中文摘要 ii Abstract iii Table of Contents iv List of Figures vi List of Tables vii Chapter 1 Introduction 1 Chapter 2 Literature Review 4 2.1 Efficient-market Hypothesis 4 2.2 Margin Trading and Institutional Investors 5 2.3 Evolutionary Algorithms 6 2.4 Asset Pricing via Machine Learning 7 2.5 Summary 8 Chapter 3 Methodology 10 3.1 Data Sources and Research Design 10 3.2 Genetic Programming 11 3.3 Factor Selection and Portfolio Construction 20 3.4 Performance Measurement 26 3.5 Alpha Factor Test 28 Chapter 4 Empirical Results 30 4.1 Alpha Factors Construction and Selection 30 4.2 Portfolio Performance 30 4.3 Robustness Test of Alpha Factors 36 Chapter 5 Conclusion 41 5.1 Conclusions 41 5.2 Further Suggestions 41 References 43 Appendix A 48 Appendix B 58 Appendix C 61

參考文獻 References
Barberis, N., Shleifer, A., & Vishny, R. (1998). A model of investor sentiment. Journal of Financial Economics, 49(3), 307-343. Brock, W., Lakonishok, J., & LeBaron, B. (1992). Simple technical trading rules and the stochastic properties of stock returns. The Journal of Finance, 47(5), 1731-1764. Boehmer, E., Jones, C. M., & Zhang, X. (2008). Which shorts are informed?. The Journal of Finance, 63(2), 491-527. Campbell, J. Y., Ramadorai, T., & Schwartz, A. (2009). Caught on tape: Institutional trading, stock returns, and earnings announcements. Journal of Financial Economics, 92(1), 66-91. Carhart, M. M. (1997). On persistence in mutual fund performance. The Journal of Finance, 52(1), 57-82. Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). Daniel, K., Hirshleifer, D., & Subrahmanyam, A. (1998). Investor psychology and security market under‐and overreactions. The Journal of Finance, 53(6), 1839-1885. DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy?. The Review of Financial Studies, 22(5), 1915-1953. Di Maggio, M., Franzoni, F., Kermani, A., & Sommavilla, C. (2019). The relevance of broker networks for information diffusion in the stock market. Journal of Financial Economics, 134(2), 419-446. Diether, K. B., Lee, K. H., & Werner, I. M. (2009). Short-sale strategies and return predictability. The Review of Financial Studies, 22(2), 575-607. Edelen, R. M., Ince, O. S., & Kadlec, G. B. (2016). Institutional investors and stock return anomalies. Journal of Financial Economics, 119(3), 472-488. Faber, M. (2007). A quantitative approach to tactical asset allocation. The Journal of Wealth Management, 9(4), 69-79. Faber, M. (2017). A quantitative approach to tactical asset allocation revisited 10 years later. The Journal of Portfolio Management, 44(2), 156-167. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417. Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3-56. Fama, E. F., & French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1-22. Fang, J., Lin, J., Xia, S., Xia, Z., Hu, S., Liu, X., & Jiang, Y. (2020). Neural network-based automatic factor construction. Quantitative Finance, 20(12), 2101-2114. Feng, G., Giglio, S., & Xiu, D. (2020). Taming the factor zoo: A test of new factors. The Journal of Finance, 75(3), 1327-1370. Freeman, J. J. (1998). A linear representation for GP using context free grammars. In Genetic Programming 1998: Proceedings of the Third Annual Conference (pp. 72-77). USA. Morgan Kaufmann: University of Wisconsin, Madison, Wisconsin. George, T. J., & Hwang, C. Y. (2004). The 52‐week high and momentum investing. The Journal of Finance, 59(5), 2145-2176. George, T. J., & Hwang, C. Y. (2007). Long‐term return reversals: overreaction or taxes?. The Journal of Finance, 62(6), 2865-2896. Grinold, R., and R. Kahn. (2000). Active Portfolio Management, New York: McGraw-Hill/Irwin. Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223-2273. Gu, S., Kelly, B., & Xiu, D. (2021). Autoencoder asset pricing models. Journal of Econometrics, 222(1), 429-450. Guo, L., Rivero, D., Dorado, J., Munteanu, C. R., & Pazos, A. (2011). Automatic feature extraction using genetic programming: An application to epileptic EEG classification. Expert Systems with Applications, 38(8), 10425-10436. Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor. Hong, H., & Stein, J. C. (1999). A unified theory of underreaction, momentum trading, and overreaction in asset markets. The Journal of Finance, 54(6), 2143-2184. Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. The Journal of Finance, 48(1), 65-91. Kakushadze, Z. (2016). 101 formulaic alphas. Wilmott, 2016(84), 72-81. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30. Kilgallen, T. (2012). Testing the simple moving average across commodities, global stock indices, and currencies. The Journal of Wealth Management, 15(1), 82-100. Koza, J.R. (1992). Genetic Programming: on the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA. Krawiec, K. (2002). Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines, 3(4), 329-343. Lai, K., C. (2021). Generating Technical Indicator Factors by Using Genetic Programming: A Multifactor Stock Selection Strategy. National Sun Yat-sen University Master Thesis. Markowitz, H.M. (1952) Portfolio Selection. Journal of Finance, 7, 77-91. Paterson, N., & Livesey, M. (1997). Evolving caching algorithms in C by genetic programming. Genetic Programming, 1997, 262-267. Rapach, D. E., Ringgenberg, M. C., & Zhou, G. (2016). Short interest and aggregate stock returns. Journal of Financial Economics, 121(1), 46-65. Ratle, A., & Sebag, M. (2000). Genetic programming and domain knowledge: Beyond the limitations of grammar-guided machine discovery. In International Conference on Parallel Problem Solving from Nature (pp. 211-220). Springer, Berlin, Heidelberg. Ryan, C., Collins, J. J., & Neill, M. O. (1998). Grammatical evolution: Evolving programs for an arbitrary language. In European conference on genetic programming (pp. 83-96). Springer, Berlin, Heidelberg. Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. The Journal of Finance, 19(3), 425-442. Smith, M. G., & Bull, L. (2005). Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines, 6(3), 265-281. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. Whigham, P. A. (1996). Search bias, language bias, and genetic programming. Genetic Programming, 1996, 230-237.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2025-07-12 校外 Off-campus：開放下載的時間 available 2025-07-12 您的 IP(校外) 位址是 216.73.216.157 現在時間是 2025-05-24 論文校外開放下載的時間是 2025-07-12 Your IP address is 216.73.216.157 The current date is 2025-05-24 This thesis will be available to you on 2025-07-12.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2025-07-12

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS