Responsive image
博碩士論文 etd-0626121-103643 詳細資訊
Title page for etd-0626121-103643
論文名稱
Title
新聞情緒與投資人情緒建構台股機器學習交易策略
Taiwan Stock Machine Learning Trading Strategies with Financial News Sentiment and Investor Sentiment
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
52
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2020-06-18
繳交日期
Date of Submission
2021-07-26
關鍵字
Keywords
投資人情緒、財經新聞、情緒分析、機器學習、股市預測
Investor Sentiment, Financial News, Sentiment analysis, Machine Learning, Stock Prediction
統計
Statistics
本論文已被瀏覽 187 次,被下載 0
The thesis/dissertation has been browsed 187 times, has been downloaded 0 times.
中文摘要
投資人在股票市場總是尋求新的方式尋找獲利機會,並且使用不同指標與因子建構交易策略,而投資人主要在建構交易策略時,會朝著技術面、基本面、籌碼面以及消息面來做為策略的依據,而本研究主要建立在籌碼面投資人情緒與消息面新聞情緒的策略研究。
將Anue鉅亨網2013/01/02至2019/12/31共7年的臺灣股市新聞蒐集,挑選出新聞內文有提到五十檔股票標的的新聞,並以年的方式做為切割,隨機篩選50%的新聞,以人為的方式,進行情緒標籤,分成「正面」、「負面」、「中立」。接著使用臉書所開發的fastText文本分類模型,訓練及預測2014年至2019共6年全部新聞的新聞情緒。
另一方面使用投資人情緒特徵,輸入XGBoost演算法模型訓練,預測報酬,並挑選適當的預測值與標的數目,作為進場依據的原始交易策略。再探討運用文本分類模型所建置的新聞情緒加入特徵,成為新的情緒加強策略,是否能夠有效的提升策略績效。回測期間2015年至2019年共五年。
本研究結果顯示(一) fastText分類模型,在樣本內情緒分類上,在3-gram有最好的表現,二元模型最高有92%、三元模型有75%的準確度,而移動窗格下平均有88%及77%。(二)投資人情緒特徵所建立的交易策略以及新聞情緒的交易策略,在挑選越少標的時會有較好的報酬與績效表現,顯示模型是可以預測出實際上真的好的報酬,故可以當作選股的依據。(三)在策略績效部分,新聞情緒的加入使得績效在5檔標的績效提升20%及10檔標的提升8%,同時要有較好的贏率與較低最大回撤。
Abstract
Investors always seek new ways to find profitable opportunities in the stock market, and use different indicators and factors to integrate trading strategies, while investors mainly reorganize their trading strategies with technical, fundamental, chip and news. This research is mainly based on the strategy research of the investor sentiment on the chip side and the news sentiment on the news side.
This study collects the Taiwan Stock Market News from Anue from 2013/01/02 to 2019/12/31, selects the news that mentioned the fifty stock target, and randomly selects 50% of the news in each year, using artificial methods to annotate emotions, and dividing them into "positive", "negative", and "neutral". Then, we uses the fastText text classification model to train and predict the news sentiment from 2014 to 2019. At the same time, this study use the investor's emotional features as input into the XGBoost algorithm model training, predicts the return, and selects the appropriate predicted value as the basis for entering, and backtest the trading strategy from 2015 to 2019. Discuss whether the use of news sentiment features built by the text classification model can effectively improve strategic performance.
The results of this study show that (1) the fastText classification model has the best performance on the 3-gram in emotion classification, the binary model has a maximum of 92%, and the ternary model has 75% accuracy. (2) The better the strategies performance when the less the targets are selected. (3) Adding news sentiment to the strategy performance can have a better performance improvement in 5 targets and 10 targets, and have a better win-ratio and less maximum drawdown.
目次 Table of Contents

論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖次 vi
表次 vii
第一章、 緒論 8
第一節、 研究動機與目的 8
第二節、 研究架構 9
第二章、 文獻回顧 10
第一節、 投資人情緒 10
第二節、 財經領域情緒分析 11
第三節、 機器學習預測股票報酬 15
第四節、 小結 18
第三章、 研究方法 19
第一節、 研究流程 19
第二節、 研究資料 20
第三節、 新聞資料處理與分類模型 24
第四節、 新聞情緒分數 27
第五節、 機器學習交易策略介紹 30
第四章、 實證結果 35
第一節、 分類模型結果 35
第二節、 策略績效 38
第五章、 研究結論 45
參考文獻 47
中文文獻 47
英文文獻 48
參考文獻 References
中文文獻
王力弘(2015),「社群媒體新詞偵測系統以PTT八卦版為例」,國立政治大學資訊科學系碩士論文。
王釗東(2017),「以大數據探究財經新聞對台灣股票市場表現之影響」,碩士論文,國立臺灣大學新聞研究所。
王彥鈞(2017),「不同市場狀態下新聞情緒的預測能力:以台灣五十指數為例」,碩士論文,國立中央大學財務金融系研究所。
田高銘(2019),「新聞文本情緒分類之實證研究-以鉅亨網新聞為例」,碩士論文,國立中山大學財務管理學系研究所。
李哲宇(2017),「以遞歸卷積神經網路擷取財經新聞知識預測股價」, 碩士論文,國立清華大學資訊系統與應用研究所。
李昱穎(2019),「新聞文本情緒分類之實證研究-以鉅亨網新聞為例」,碩士論文,國立政治大學金融學系研究所。
林宜萱(2013),「財經領域情緒辭典之建置與其有效性之驗證-以財經新聞為元件」,碩士論文,國立台灣大學會計學研究所。
周賓凰、張宇志、林美珍(2007),¬「投資人情緒與股票報酬互動關係」,證券市場發展季刊;行為財務學特別專刊,153 - 190
張偉德(2018),「應用情感分析從媒體評論推測企業聲譽之研究」,國立中央大學企業管理學系碩士論文。
陳建宏(2018),「新聞輿情、報酬與投資人交易行為」,碩士論文,國立中山大學財務管理學系研究所。
蔡佩蓉、王元章、張眾卓(2009),「投資人情緒、公司特徵與台灣股票報酬之研究」,經濟研究 (Taipei Economic Inquiry),45(2),273-322。
謝委霖(2015),「從財金新聞預測公司財報之營收走勢」,碩士論文,國立中山大學資訊管理學系研究所。

英文文獻
Alessa, A., Faezipour, M., & Alhassan, Z. (2018). Text classification of flu-related tweets using fastText with sentiment and keyword features. 2018 International Conference on Healthcare Informatics, 366-367.
Baker, M. & Wurgler, J. (2006). Investor sentiment and the cross-section of stock returns. Journal of Finance, 61(4), 1645-1680.
Baker, M. & Wurgler, J. (2007). Investor sentiment in the stock market. Journal of Economic Perspectives, 21(2), 129-151.
Ballings M., Poel D.V.d., Hespeels N., Gryp R.(2015)Evaluating multiple classifiers for stock price direction prediction.Expert Systems with Applications, 42 (20), 7046-7056.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.
Brown, G. W. & Cliff, M. T. (2004). Investor sentiment and the near-term stock market. Journal of Empirical Finance, 11(1), 1-27.
Cavnar, W. B., & Trenkle, J. M. (1994). N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794, ACM.
Chowdhury, S. G., Routh, S., & Chakrabarti, S. (2014). News analytics and sentiment analysis to predict stock price trends. International Journal of Computer Science and Information Technologies, 5(3), 3595-3604.
Chue, T. K., Gul, F. A., & Mian, G. M. (2019) Aggregate Investor Sentiment and Stock ReturnSynchronicity, Journal of Banking & Finance, 105628.
Dörre, J., Gerstl, P., & Seiffert, R. (1999). Text mining: finding nuggets in mountains of textual data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 398-401.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine.
Annals of statistics, 1189-1232.
Hagenau, M., Liebmann, M., & Neumann, D. (2013) Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55, 685-697.
John, V., & Vechtomova, O. (2017).UW-FinSent at SemEval-2017 Task 5: Sentiment Analysis on Financial News Headlines using Training Dataset Augmentation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 869–873.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). fastText.zip: Compressingtext classification models. arXiv preprint arXiv:1612.03651.
Kumar, B. S., & Ravi, V. (2016). A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114, 128-147.

Loughran, T. & McDonald, B. (2011). When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks, Journal of Finance, 66(1), 35-65.
Ma, W. Y. & Chen, K. J. (2003). Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 168–171.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mittermayer, M. A. (2004). Forecasting intraday stock price trends with text mining techniques. In Proceedings of the 37th annual Hawaii international conference on system sciences,10.
Nagel, S. (2005). Short sales, institutional investors and the cross-section of stock returns. Journal of Financial Economics, 78, 277-305.
Obeidat, S., Shapiro, D., Lemay, M., MacPherson, M. K., & Bolic, M. (2018). Adaptive Portfolio Asset Allocation Optimization with Deep Learning. International Journal on Advances in Intelligent Systems, 11(1), 2534.
Prusa, J. D., Khoshgoftaar, T. M., & Dittman, D. J. (2015). Impact of feature selection techniques for tweet sentiment classification. In The Twenty-Eighth International Flairs Conference.
Song, Q., Liu, A., & Yang, S. Y. (2017). Stock portfolio selection using learning-to-rank algorithms with news sentiment. Neurocomputing, 264, 20-28.
Suryoday, B., Saibal, K., Snehanshu, S., Luckyson, K., & Sudeepa, R. (2019). Predicting the direction of stock market prices using tree-based classifiers. The North American Journal of Economics and Finance, 47, 552–567.

Tan, A. H. (1999). Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, Vol. 8, 65-70.
Usmani, M., Syed H. A., Kamran R., & Syed S. A. A. (2016) Stock Market Prediction Using Machine Learning Techniques. In 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), 322–327.
Vu, T. T., Chang, S., Ha, Q. T., & Collier, N. (2012). An experiment in integrating sentiment features for tech stock prediction in twitter. In Proceedings of the workshop on information extraction and entity analytics on social media data, 23-38.
Yadav, R., Kumar, A. V., & Kumar, A. (2019). News-based supervised sentiment analysis for prediction of futures buying behaviour. IIMB Management Review (2019), 31, 157–166.
Zhou, F., Zhang, Q., Sornette, D., & Jiang, L. (2019) Cascading logistic regression onto gradient boosted decision trees for forecasting and trading stock indices. Applied Soft Computing Journal, 84,105747.
Zhang, X., & Chen, W. (2019). Stock Selection Based on Extreme Gradient Boosting. 2019 Chinese Control Conference (CCC).
Zouaoui, M., Nouyrigat, G., & Beer, F. (2011). How does investor sentiment affect stock market crises? Evidence from panel data. Financial Review, 46(4), 723-747
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2024-07-26
校外 Off-campus:開放下載的時間 available 2024-07-26

您的 IP(校外) 位址是 3.133.131.168
現在時間是 2024-04-29
論文校外開放下載的時間是 2024-07-26

Your IP address is 3.133.131.168
The current date is 2024-04-29
This thesis will be available to you on 2024-07-26.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2024-07-26

QR Code