論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2033-08-10
校外 Off-campus:開放下載的時間 available 2033-08-10
論文名稱 Title |
整合新聞情緒分析與國際衍生性金融商品構建台灣指數預測模型——以鋼鐵市場為例 Constructing a Taiwan Index Prediction Model by Integrating News Sentiment Analysis and International Derivatives- A Case Study of the Steel Market |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
95 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2023-07-21 |
繳交日期 Date of Submission |
2023-08-10 |
關鍵字 Keywords |
情緒分析、自動文本摘要、鋼鐵指數、股價走勢預測、機器學習 Sentiment analysis, Automatic text summarization, Steel index, Stock price movement prediction, Machine learning |
||
統計 Statistics |
本論文已被瀏覽 192 次,被下載 0 次 The thesis/dissertation has been browsed 192 times, has been downloaded 0 times. |
中文摘要 |
鋼鐵產業相關股價在Covid-19大流行期間受到原物料波動的影響下劇烈上升,因此本研究將投資者情緒視為構建台灣鋼鐵指數(TSI)動量預測模型的因素之一。本研究收集了兩種類型的數據,分別為與供應鏈相關的每日價格數據,和可以提供即時信息且容易被大眾取得的新聞文章。其樣本期間為2018年1月至2022年8月。 我們的實證結果首先表明大多數變量與台灣鋼鐵指數具有很高度的相關性。 我們選擇必和必拓股價、焦炭期貨價格、台灣上櫃鋼鐵指數、中鋼優先股和三星科技的股價作為我們的特徵變量以進行分析。通過普通最小平方法(OLS),我們發現它們對台灣鋼鐵指數有顯著的正向影響。除此之外,我們使用機器學習模型來預測台灣鋼鐵指數的走向,發現邏羅吉斯回歸 (Logistic Regression)的表現優於XGBoost和支持向量模型(SVM),其準確率高達60%。 然而,由新聞萃取出的投資者情緒並沒有顯著提升台灣鋼鐵指數的準確性。 |
Abstract |
In this study, we aim to consider investor sentiments as factors to construct the momentum prediction model for Taiwan Steel Index (TSI), in which the index level was driven by the surge in raw materials for its industry during the Covid-19 pandemic. Two types of data are collected for this research. One is daily price data, which is about the supply chain. The other type of data consists of news articles. The two sets of data cover the same sample period from January 2018 to August 2022. Our empirical results first show that most of the variables have a high correlation with the Taiwan Steel Index. We select BHP, Coke, Taiwan OTC Steel Index, China Steel Corp Preference Shares, and SAN SHING FASTECH CORP as our feature variables for analysis. We find they have significant positive impacts on the Taiwan Steel Index by ordinary least squares (OLS) regression. In addition, we attempt to predict the direction of the Taiwan Steel Index using machine learning models and find that Logistic Regression performs better than XGBoost and Support Vector Model, achieving an accuracy of 60%. However, investor sentiment scores from news do not lead to improved forecast accuracy of the Taiwan Steel Index. |
目次 Table of Contents |
論文審定書 i 摘要 ii Abstract iii Table of Contents iv List of Figures vi List of Tables vii Chapter 1. Introduction 1 1.1 Motivation 1 1.2 Research Methodology and Framework 2 Chapter 2. Literature Review 4 2.1 Investor Sentiment 4 2.1.1 Overview of investor sentiment 4 2.1.2 Data for sentiment analysis 5 2.1.3 Method of sentiment analysis 6 2.1.4 Methods for generating summary 8 2.2 Prediction model 9 2.2.1 Logistic Regression 9 2.2.2 SVM 9 2.2.3 XGBoost 10 2.3 Steel Market 11 2.3.1 Steel market overview 11 2.3.2 Application of sentiment analysis to steel market 11 2.3.3 Factors affecting the steel market 12 Chapter 3. Data 13 3.1 Introduction of Daily Price Data 13 3.2 Time Series of TSI and the Number of News 18 Chapter 4. Method 20 4.1 Research Framework for Predicting Taiwan Steel Index 20 4.2 Sentiment Analysis of Taiwan Steel Industry News 22 4.2.1 Cleaning of news text noise 22 4.2.2 Generation of news text summaries: 25 4.2.3 Daily news sentiment analysis and aggregation 27 4.3 Numerical Data Processing and Filtering 29 4.3.1 Currency conversion 29 4.3.2 Pre-processing of various price data 30 4.3.3 Data normalization 30 4.3.4 Feature variables filtering 30 4.3.5 Parameter importance evaluation 32 4.3.6 Lead and lag analysis between TSI and important parameters 33 4.4 Taiwan Steel Index Predictive Momentum Models Construction 33 4.4.1 Categorizing the TSI 33 4.4.2 Various parameter data conversion 34 4.4.3 Feature scaling 34 4.4.4 Data segmentation 35 4.4.5 Construction of TSI forecasting model 35 4.4.6 Model Accuracy Evaluation 41 Chapter 5. Result 42 5.1 Sentiment Values 42 5.2 Feature Select 47 5.3 Importance Features 56 5.4 Lead and Lag Relationships 59 5.5 Return 63 5.6 Prediction Result 66 Chapter 6. Conclusion 69 References 71 Appendix 77 |
參考文獻 References |
A Adli, K. (2020). Forecasting steel prices using ARIMAX model: A case study of Turkey. The International Journal of Business Management and Technology. Ali, S. S., Mubeen, M., & Hussain, A. (2018). Prediction of stock performance by using logistic regression model: evidence from Pakistan Stock Exchange (PSX). Patron of the Conference. Aliyeva, A. (2021). Predicting stock prices using Random Forest and logistic regression algorithms. International Conference on Theory and Application of soft Computing, Computing with Words and Perceptions, 95-101. Ampomah, E. K., Qin, Z., & Nyame, G. (2020). Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information, 11(6), 332. Audrino, F., Sigrist, F., & Ballinari, D. (2020). The impact of sentiment and attention measures on stock market volatility. International Journal of Forecasting, 36(2), 334-357. Baccianella, S., Esuli, A., & Sebastiani, F. (2010). Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Lrec, 10, 2200-2204. Ballings, M., Van den Poel, D., Hespeels, N., & Gryp, R. (2015). Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications, 42(20), 7046-7056. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC press. Brown, G. W., & Cliff, M. T. (2005). Investor sentiment and asset valuation. The Journal of Business, 78(2), 405-440. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. Chen, M.-Y., & Chen, T.-H. (2019). Modeling public mood and emotion: Blog and news sentiment and socio-economic phenomena. Future Generation Computer Systems, 96, 692-699. Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785-794. Chen, S.-Y., & Shen, C.-W. (2008). A Study on the Relationships among BRICs Economic Indicators and Steel Price Unpublished Master's Thesis, National Kaohsiung First University of Science and Technology Department of Logistic Business,Taiwan. Chen, Q. y., & Chen, Y. q. (2009). Analyzing the Influence of Asia Emerging Countries Rising on Steel Price - the Case of China and India. Unpublished Master's Thesis, Minghsin University of Science and Technology Department of Business Administration, Taiwan. Chou, M.-T., Su, Y.-L., Chou, T.-Y., & Liang, H.-U. (2015). An analysis of the relationship between Asian Steel Index and the Baltic Capsize Index. Modern Economy, 6(02), 207. Chuang, W.-J., Ouyang, L.-Y., & Lo, W.-C. (2010). The impact of investor sentiment on excess returns: A Taiwan stock market case. International Journal of Information and Management Sciences, 21(1), 13-28. Clemens, J., & Rogers, P. (2020). Demand shocks, procurement policies, and the nature of medical innovation: Evidence from wartime prosthetic device patents. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273-297. Cristescu, M. P., Nerisanu, R. A., Mara, D. A., & Oprea, S.-V. (2022). Using market news sentiment analysis for stock market prediction. Mathematics, 10(22), 4255. Dang, M., & Duong, D. (2016). Improvement methods for stock market prediction using financial news articles. 2016 3rd National foundation for science and technology development conference on information and computer science (NICS), 125-129. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Dey, S., Kumar, Y., Saha, S., & Basak, S. (2016). Forecasting to Classification: Predicting the direction of stock market price using Xtreme Gradient Boosting. PESIT South Campus, 1-10. Dickinson, B., & Hu, W. (2015). Sentiment analysis of investor opinions on twitter. Social Networking, 4(03), 62. Duong, D., Nguyen, T., & Dang, M. (2016). Stock market prediction using financial news articles on ho chi minh stock exchange. Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, 1-6. Fang, H., Chung, C.-P., Lu, Y.-C., Lee, Y.-H., & Wang, W.-H. (2021). The impacts of investors' sentiments on stock returns using fintech approaches. International Review of Financial Analysis, 77, 101858. Feng, C.-W., & Chiang, Y.-H. (2021). Hybridizing deep learning with google trends to predict rebar price fluctuation in taiwan. Journal of the Chinese Institute of Civil and Hydraulic Engineering, 33(8), 595-604. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232. Gao, Z., Ren, H., & Zhang, B. (2020). Googling investor sentiment around the world. Journal of Financial and Quantitative Analysis, 55(2), 549-580. Hasan, T., Bhattacharjee, A., Islam, M. S., Samin, K., Li, Y.-F., Kang, Y.-B., Rahman, M. S., & Shahriyar, R. (2021). XL-sum: Large-scale multilingual abstractive summarization for 44 languages. arXiv preprint arXiv:2106.13822. Hsu Chun-Ning & Kang Hsin-Hong (2009). The Relationship between Economic Indicators and International Steel Prices- An Empirical Study of China, USA and Japan, Unpublished Master's Thesis, National Cheng Kung University. Master of Business Administration M.B.A, Taiwan. Hu, W.-Y., Huang, C.-J., Chang, H.-Y., & Lin, W.-J. (2015). The effect of investor sentiment on feedback trading and trading frequency: Evidence from Taiwan intraday data. Emerging Markets Finance and Trade, 51(sup1), S111-S120. Hui, E. C.-m., Zheng, X., & Wang, H. (2013). Investor sentiment and risk appetite of real estate security market. Applied Economics, 45(19), 2801-2807. Huynh, H. D., Dang, L. M., & Duong, D. (2017). A new model for stock price movements prediction using deep neural network. Proceedings of the 8th International Symposium on Information and Communication Technology, 57-62. Javed Awan, M., Mohd Rahim, M. S., Nobanee, H., Munawar, A., Yasin, A., & Zain, A. M. (2021). Social media and stock market prediction: a big data approach. MJ Awan, M. Shafry, H. Nobanee, A. Munawar, A. Yasin et al.," Social media and stock market prediction: a big data approach," Computers, Materials & Continua, 67(2), 2569-2583. Jena, P. R., & Majhi, R. (2023). Are Twitter sentiments during COVID-19 pandemic a critical determinant to predict stock market movements? A machine learning approach. Scientific African, 19, e01480. Jin, F., Self, N., Saraf, P., Butler, P., Wang, W., & Ramakrishnan, N. (2013). Forex-foreteller: Currency trend modeling using news articles. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 1470-1473. Kaplanski, G., & Levy, H. (2010). Sentiment and stock prices: The case of aviation disasters. Journal of financial economics, 95(2), 174-201. Karalevicius, V., Degrande, N., & De Weerdt, J. (2018). Using sentiment analysis to predict interday Bitcoin price movements. The Journal of Risk Finance, 19(1), 56-75. Khedr, A. E., & Yaseen, N. (2017). Predicting stock market behavior using data mining technique and news sentiment analysis. International Journal of Intelligent Systems and Applications, 9(7), 22. Kim, Y., Jeong, S. R., & Ghani, I. (2014). Text opinion mining to analyze news for stock market prediction. Int. J. Advance. Soft Comput. Appl, 6(1), 2074-8523. Ko, C.-R., & Chang, H.-T. (2021). LSTM-based sentiment analysis for stock price forecast. PeerJ Computer Science, 7, e408. Lemmon, M., & Portniaguina, E. (2006). Consumer confidence and asset prices: Some empirical evidence. The Review of Financial Studies, 19(4), 1499-1529. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461. Li, M., Li, W., Wang, F., Jia, X., & Rui, G. (2021). Applying BERT to analyze investor sentiment in stock market. Neural Computing and Applications, 33, 4663-4676. Lin, S.-Y. & Lin, M.-S. (2006). An Empirical Study on the Pricing Mechanism of International Steel Prices, Unpublished Master's Thesis, Chung Yuan Christian University Department of International Business Master of Business Administration M.B.A, Taiwan. Li, X., Xie, H., Chen, L., Wang, J., & Deng, X. (2014). News impact on stock price return via sentiment analysis. Knowledge-Based Systems, 69, 14-23. Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of finance, 66(1), 35-65. Ma, W.-Y., & Chen, K.-J. (2003). Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff https://aclanthology.org/W03-1726 Maks, I., & Vossen, P. (2012). A lexicon model for deep sentiment analysis and opinion mining applications. Decision support systems, 53(4), 680-688. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Mohan, S., Mullapudi, S., Sammeta, S., Vijayvergia, P., & Anastasiu, D. C. (2019). Stock price prediction using news sentiment analysis. 2019 IEEE fifth international conference on big data computing service and applications (BigDataService), Othan, D., Kilimci, Z. H., & Uysal, M. (2019). Financial sentiment analysis for predicting direction of stocks using bidirectional encoder representations from transformers (BERT) and deep learning models. Proc. Int. Conf. Innov. Intell. Technol, Pagolu, V. S., Reddy, K. N., Panda, G., & Majhi, B. (2016). Sentiment analysis of Twitter data for predicting stock market movements. 2016 international conference on signal processing, communication, power and embedded system (SCOPES), Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. LREc, Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Qi , T., & Jiang , H. (2021). Exploring Stock Price Trend Using Seq2Seq Based Automatic Text Summarization and Sentiment Mining. Management Review, 33(5), 257-269. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 5485-5551. Ren, R., Wu, D. D., & Liu, T. (2018). Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Systems Journal, 13(1), 760-770. Shapiro, A. H., & Wilson, D. J. (2022). Taking the fed at its word: A new approach to estimating central bank objectives using text analysis. The Review of Economic Studies, 89(5), 2768-2805. Simon, D. P., & Wiggins III, R. A. (2001). S&P futures returns and contrary sentiment indicators. Journal of Futures Markets: Futures, Options, and Other Derivative Products, 21(5), 447-462. Souma, W., Vodenska, I., & Aoyama, H. (2019). Enhanced news sentiment analysis using deep learning methods. Journal of Computational Social Science, 2(1), 33-46. Todshki, N. E., & Ranjbaraki, A. (2016). The impact of major macroeconomic variables on Iran's steel import and export. Procedia Economics and Finance, 36, 390-398. Tripathi, M. (2021). Sentiment analysis of nepali covid19 tweets using nb svm and lstm. Journal of Artificial Intelligence, 3(03), 151-168. Tyagi, P., & Tripathi, R. (2019). A review towards the sentiment analysis techniques for the analysis of twitter data. Proceedings of 2nd international conference on advanced computing and software engineering (ICACSE), Uhl, M. W. (2014). Reuters sentiment and stock returns. Journal of Behavioral Finance, 15(4), 287-298. Urolagin, S. (2017). Text mining of tweet for sentiment classification and association with stock prices. 2017 International Conference on Computer and Applications (ICCA), Wang, Q., Xu, W., & Zheng, H. (2018). Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles. Neurocomputing, 299, 51-61. Wang, X., Ye, Q., Zhao, F., & Kou, Y. (2018). Investor sentiment and the Chinese index futures market: Evidence from the internet search. Journal of Futures Markets, 38(4), 468-477. Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. International Conference on Machine Learning, Zola, P., Cortez, P., & Brentari, E. (2021). Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers. Neural Computing and Applications, 33, 1245-1260. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus:開放下載的時間 available 2033-08-10 校外 Off-campus:開放下載的時間 available 2033-08-10 您的 IP(校外) 位址是 13.58.161.115 現在時間是 2024-11-21 論文校外開放下載的時間是 2033-08-10 Your IP address is 13.58.161.115 The current date is 2024-11-21 This thesis will be available to you on 2033-08-10. |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 2028-08-10 |
QR Code |