國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,機器學習模型在台股期貨價格漲跌預測分析,Empirical Study on TAIEX Futures Price Prediction with Machine Learning Models

論文名稱 Title	機器學習模型在台股期貨價格漲跌預測分析 Empirical Study on TAIEX Futures Price Prediction with Machine Learning Models
系所名稱 Department	財務管理學系 Department of Finance
畢業學年期 Year, semester	110 學年度第 2 學期 The spring semester of Academic Year 110	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	66
研究生 Author	林晏如 Yen-Ju Lin
指導教授 Advisor	王昭文 Wang, Chou - Wen
召集委員 Convenor	洪志興 Hung, Chih - Hsing
口試委員 Advisory Committee	蘇玄啟, 林萍珍, 吳錦文 Su, Xuan - Qi; Lin, Ping - Chen; Wu, Chin - Wen
口試日期 Date of Exam	2022-07-05	繳交日期 Date of Submission	2022-07-25
關鍵字 Keywords	台股期貨、技術指標、多元羅吉斯迴歸模型、極限梯度提升模型、輕量梯度提升模型 TAIEX futures, Technical indicators, Multinomial logistic regression model, Extreme gradient boosting model, Light gradient boosting model
統計 Statistics	本論文已被瀏覽 256 次，被下載 200 次 The thesis/dissertation has been browsed 256 times, has been downloaded 200 times.

中文摘要
本研究使用2019 年1 月至2021 年6 月之台股期貨5 分鐘頻資料，以簡單移動平均、隨機指標、相對強弱指標以及指數平滑異同移動平均等技術指標做為特徵輸入多元羅吉斯迴歸模型、極限梯度提升模型以及輕量梯度提升模型預測台股期貨每20 分鐘價格為上漲、下跌或者持平，並嘗試將訓練資料分群進而比較不同模型間、不同訓練期間和預測不同漲跌點數之預測差異。就整體預測結果來看，當預測漲跌點數由超過0 點增至5 點再至10 點時，精確度會隨之下降，且所有預測上漲的精確度都比預測下跌的精確度來得高。至於模型間之預測能力比較，發現在預測不同漲跌點數的情況下，三種模型預測精確度雖互有高下，但之間並無大幅的差距，可謂不同模型並無明顯優劣之分。另為比較不同訓練期間是否影響預測結果，本文使用移動窗格法與定錨式移動窗格法將訓練期間分為12、15、18、21、24 及27 個月，實證發現無論是上漲或下跌的預測精確度皆無顯著差異，顯示訓練期間越長對模型預測效果影響並不顯著。最後分別選用模型預測結果中最佳的訓練期來訓練投資模型，投資模型共設有多元羅吉斯迴歸模型、極限梯度提升模型與輕量梯度提升模型三種單一模型，再加上結合任兩個單一模型與結合三個單一模型之兩種多模型組合，並以2021 年7 月至2021 年9 月每5 分鐘資料進行樣本外回溯投資，在預測漲跌點數超過0 點、5 點和10 點的情境下分別使用五種投資模型做多與放空，總共得到三十種投資績效。而其中僅以多元羅吉斯迴歸單一模型預測台股期貨價格未來會下跌超過10 點時進場放空操作能獲得正報酬，勝率有55.56%，平均每次操作獲利高達1361 元。
Abstract
This paper uses the 5-minute frequency data of TAIEX futures from January 2019 to June 2021, and employs technical indicators such as SMA, KD, RSI, and MACD as input features to the multinomial logistic regression model, the extreme gradient boosting model and the light gradient boosting model and then to predict that the price of TAIEX futures will rise, fall or remain flat every 20 minutes. Furthermore, we try to group the data in order to figure out if there are any differences between using different models, different training periods, or different predicted points of rise and fall. Overall, when the points of predicted ups and downs increases from more than 0 points to 5 points to 10 points, the precision would decrease accordingly, and the accuracy of all predictions of upsides is higher than that of downsides. As for the comparison of the prediction ability between the three models, it is found out that there are no significant differences between them. In addition, in order to compare whether different training periods affect the prediction results, we use the walk forward method and the anchored walk forward method to divide the training period into 12, 15, 18, 21, 24, and 27 months separately. Empirically it proofs that the longer the training period would not ensure the better outcome. Finally, the best training period in the model prediction results is selected to train the investment models. The investment models consist of three single models: the multinomial logistic regression model, the extreme gradient boosting model and the light gradient boosting model, plus a combination of any two single models, and a combination of three single models, and then using 5-minute frequency data from July 2021 to September 2021 to conduct out-of-sample retrospective investment, when the predicted up and down points exceed 0 points, 5 points and 10 points , five investment models are used to long and short the TAIEX futures. Among all of performance we obtain, only when the multinomial logistic regression model predicts the TAIEX futures will fall by more than 10 points and then shorts the futures could get positive returns, the winning rate is 55.56%, and the average profit per operation is as high as NT$1361.

目次 Table of Contents
論文審定書........................................................................................................................i 誌謝...................................................................................................................................ii 摘要..................................................................................................................................iii Abstract ...........................................................................................................................iv 第一章緒論.............................................................................................................. 1 1.1 研究背景與動機.......................................................................................... 1 1.2 研究目的...................................................................................................... 4 1.3 研究架構...................................................................................................... 4 第二章文獻探討...................................................................................................... 6 2.1 股市技術分析相關研究.............................................................................. 6 2.2 以機器學習預測股市漲跌........................................................................ 10 第三章研究方法.................................................................................................... 14 3.1 資料來源.................................................................................................... 14 3.2 研究模型.................................................................................................... 14 3.2.1 多元羅吉斯迴歸模型(Multinomial Logistic Regression)....... 14 3.2.2 極限梯度提升模型(Extreme Gradient Boosting, XGBoost).... 17 3.2.3 輕量梯度提升模型(Light Gradient Boosting, LightGBM)...... 21 3.3 特徵資料.................................................................................................... 24 3.3.1 簡單移動平均線(Simple Moving Average, SMA)........................ 25 3.3.2 指數平滑異同移動平均線(Moving Average Convergence and Divergence, MACD) ...................................................................................... 26 3.3.3 隨機指標(Stochastic Oscillator, KD)...................................... 26 3.3.4 相對強弱指標(Relative Strength Index, RSI)........................ 27 3.3.5 開盤價、最高價、最低價、收盤價................................................ 27 3.4 研究設計.................................................................................................... 29 第四章實證結果.................................................................................................... 32 4.1 樣本敘述統計............................................................................................ 32 4.2 模型預測結果分析.................................................................................... 35 4.3 實證回溯投資............................................................................................ 43 4.3.1 回測資料............................................................................................. 43 4.3.2 投資方式設定..................................................................................... 44 4.3.3 回測投資結果..................................................................................... 47 第五章結論與建議................................................................................................ 53 5.1 研究結論.................................................................................................... 53 5.2 研究限制與建議........................................................................................ 54 參考文獻........................................................................................................................ 55

參考文獻 References
一、中文文獻 1. 黃馨儀、黃華山、林俊騰(2020),應用機器學習方法預測台指期貨隔日漲跌之研究,華人精技研究第十八卷第二期,頁 55-80。 2. 黃華山、謝采凌、林坤盟、連世周、陳柏文、趙丹維(2021),預測台灣股票市場高低點之關鍵因素,華人經濟研究,第 19 卷第 1 期,頁 15-40。 3. 蘇彥廷(2016),支持向量機模型在台灣加權股價指數趨勢之預測,國立中山大學財務管理學系,碩士論文。 4. 吳錦文、王昭文、謝育展(2020),深度學習在 Smart Beta 交易策略之應用, 台灣管理學刊,第 20 卷第 2 期,頁 77-110。 5. 陳柔君、黃敬哲、蕭朝興(2022),股價移動平均真有用嗎?,管理學報第 39 卷第 2 期,頁 235-263。 6. 紀宗利、晏揚清、許普同(2020),運用程式交易策略-以蘋果、三星股價為例, 華人經濟研究所第 18 卷第 2 期,頁 21-53。 7. 林逸青、謝孟芬、徐旺興(2019),以深度學習建構股價預測模型:以台灣股票市場為例,當代商管論叢第 4 卷第 1 期,頁 35-59。 8. 沈沛瑄、魏廉臻、張瑞益(2019),以 LSTM-RNN 預測 ETF50 股價趨勢並結合交易策略以獲取最大獲利率,全國計算機會議論文,頁 36-41。 9. 鄭仁杰、江彌修(2019),漫步於隨機森林-輔以多數決學習的台股指數期貨交易策略,經濟論文第 47 卷第 3 期,頁 395-448。 10. 張育維(2013),改良式類神經網路預測模式於股價預測之研究,北商學報第 23 期,頁 1-18。 11. 陳振東、張羽欣、蘇子瑋、莊斯茜、詹茗淇(2017),時間序列法與類神經網路於股票漲跌預測比較之研究,管理資訊計算第6卷特刊2,頁136-146。 12. 林俊良(2021),利用機器學習建立台灣 50 隔週漲跌預測模型,中原大學資訊 56 管理學系,碩士論文。 13. 黃聖彥(1995),移動平均法的投資績效,管理評論第十四卷第一期,頁 47-68。 14. 陳淑玲、吳安琪、費業勳(2011),臺灣股票市場技術指標之研究─不同頻率資料績效比較,東海管理評論【特刊】第十二卷,第一期,頁187-226。 15. 周照偉、鄭榮祿、蔡賢亮、楊崇宏、牟聖遠(2015),臺灣股市技術分析實證: 以隨機指標、相對強弱指標、指數平滑異同平均線指標及趨向指標為例,高雄應用科技大學人文與社會科學學刊第一卷第二期,頁 119-133。 16. 簡上祐(2018),利用技術分析投資臺灣加權指數是否能打敗買進持有的操作策略,國立臺灣大學電機資訊學院資訊工程學系,碩士論文。 17. 鐘毅(2020),以深度學習 LSTM 方法進行台灣加權股價指數預測,國立交通大學科技管理研究所,碩士論文。 18. 非線性迴歸與相關分析:應用線性迴歸模型<補編>,劉應興,華泰文化事業股份有限公司,1998。二、英文文獻 1. Chen T. and Guestrin C. (2016), XGBoost: A Scalable Tree Boosting System, Proceddings of the 22nd Acm Sigkdd international Conference on Knowledge Discovery and Data Mining.ACM, 785-794. 2. Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W, Ye Q., Liu T. Y. (2017), LightGBM: A Highly Efficient Gradient Boosting Decision Tree, Advances in neural information processing systems 30, 3146-3154. 3. Yang S., Zhang H. (2018), Comparison of Several Data Mining Methods in Credit Card Default Prediction, Intelligent Information Management, 115-122. 4. Ma Q., Li K. and Hu J. (2019), Research on Personal Credit Evaluation Based on Multi-model Combination, World Scientific Research Journal, Volume 5 Issue 11, 129-144. 57 5. Daoud E. A. (2019), Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset, World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:13, No:1, 6-10. 6. Zeng H., Yang C., Zhang H., Wu Z., Zhang J., Dai G., Babiloni F., and Kong W. (2019), A lightGBM-Based EEG Analysis Method for Driver Mental States Classification, Computational Intelligence and Neuroscience Volume 2019, Article ID 3761203, 11 pages. 7. Ma X., Sha J., Wang D., Yu Y., Yang Q., Niu X. (2018), Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning, Electronic Commerce Research and Applications, Volume 31, 24-39. 8. Van Horne J. C.、Parker G. C. (1967), The Random-Walk Theory: An Empirical Test, Financial Analysts Journal, Vol. 23, No. 6, 87-92. 9. James F. E.Jr.(1968), Monthly Moving Averages--An Effective Investment Tool? , The Journal of Financial and Quantitative Analysis , Vol. 3, No. 3,315-326. 10. Daniel, K., Hirshleifer, D., & Sun, L. (2020). Short-and long-horizon behavioral factors. The review of financial studies, 33(4), 1673-1736. 11. Brock, W., Lakonishok, J., & LeBaron, B. (1992). Simple technical trading rules and the stochastic properties of stock returns. The Journal of finance, 47(5), 1731-1764.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0625122-184455.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS