論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
利用資料增強方法搭配深度學習技術預測新冠肺炎疫情發展趨勢-以美國為例 Predicting Trends of COVID-19 with Data Augmentation and Deep Learning Methods - The Case of US |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
66 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2021-07-22 |
繳交日期 Date of Submission |
2021-08-06 |
關鍵字 Keywords |
新冠肺炎、資料增強、深度學習、長短期記憶模型、多對一序列預測架構、ARIMA模型 2019 novel coronavirus infection, Data augmentation, Deep learning, Long short-term memory model, Many-to-one architecture, ARIMA |
||
統計 Statistics |
本論文已被瀏覽 660 次,被下載 666 次 The thesis/dissertation has been browsed 660 times, has been downloaded 666 times. |
中文摘要 |
自2019年12月31日新冠肺炎疫情爆發以來,各國政府持續關注著新冠肺炎疫情整體的發展趨勢,以求能即時因應未來可能面臨到的突發狀況。 本研究蒐集並統整美國確診人數、死亡人數、經過PCR測試人數、人口流動數等開放資料集變數,並進一步分為「醫療檢測」、「機場出入境」、「境內移動距離」、「醫療檢測+機場出入境」、「醫療檢測+境內移動距離」、「機場出入境+境內移動距離」和「所有變數」七大類自變數,藉由Box-Cox and Loess-based decomposition bootstrap方法將篩選的變數進行資料增強處理,再搭配多對一(Many-to-one)架構的 LSTM深度學習模型預測美國新冠肺炎疫情的發展趨勢。此外本研究以ARIMA模型、Local and Global Trend(LGT)統計模型、一般LSTM模型與多對一架構LSTM為對照組。 研究結果顯示當自變數為「所有變數」類組合,且滑動窗格為7天時,對於美國每日確診人數的預測誤差可達到最小,MAPE值和sMAPE值分別為9.14 %和8.95 %,明顯優於ARIMA模型、LGT模型、一般LSTM模型與多對一架構LSTM的預測結果。此外,本研究經過Grad-CAM方法分析後,發現每單位機場篩檢人數、每單位美國國內商務航班數、旅行距離超過100英哩的人數對於本研究設計之模型的預測有顯著正向影響。 本研究提出一套加強時間序列預測準確率的方法,並彙整過去學者採用的研究變數,除了提供相關政府單位在未來防疫政策制定上的參考依據,也能提供學者一套用於預測少量資料的時間序列分析方法。 |
Abstract |
To rapidly respond to the emergence of COVID-19 pandemic, governments of the globe have continued to observe the dynamic trends of this pandemic and search more accurate methods to forecast it since its outbreak in December, 2019. Our data is mainly from US, the dependent variable we chose is daily confirmed cases, and the independent variables are divided into seven categories: medical test, import/export, domestic moving distance, medical test and import/export, medical test and domestic moving distance, import/export and domestic moving distance, and all variables. We preprocessed raw data by method of “Box-Cox and Loess-based decomposition bootstrap” to generate a large amount of training data without changing its pattern. We use Many-to-one LSTM as our predicting models to forecast future trends of COVID-19 pandemic in US. We developed a method combining data augmentation with Many-to-one LSTM model to forecast the dynamic trends in US. Compared with ARIMA, LGT, LSTM, and Many-to-one LSTM, the predictive power of our method in MSE, RMSE, MAE, MAPE, and sMAPE is better than the other three models. In other words, our method is able to get lower error than previous models. Furthermore, we utilize Grad-CAM to find the important features for the model so that we can realize the independent variables that are worthy of reference. |
目次 Table of Contents |
論文審定書 i 摘要 ii Abstract iii 一、 緒論 1 第一節、 研究背景 1 第二節、 研究動機與目的 2 第三節、 研究流程 3 二、 文獻探討 4 第一節、 新冠肺炎的相關研究 4 第二節、 深度學習於新冠肺炎或傳染病疫情上的應用 6 第三節、 資料增強 10 1. Decomposition-based Methods 11 2. Model-based Methods 13 第四節、 時間序列相關模型 16 1. 整合移動平均自迴歸模型(ARIMA) 16 2. 長短期記憶模型(LSTM) 19 3. Encoder-Decoder LSTM 22 4. 滑動窗格演算法 23 三、 研究方法 25 第一節、 資料蒐集與變數篩選 25 第二節、 資料增強 27 第三節、 資料正規化與滑動窗格法處理 28 1. 資料正規化 29 2. 滑動窗格 29 第四節、 模型建置與驗證 30 1. 長短期記憶模型(LSTM) 31 2. Many-to-one LSTM 31 3. 優化器與EarlyStopping 機制 32 4. 模型評估標準 33 四、 研究分析 35 第一節、 模型預測分析 35 第二節、 變數重要性分析 52 五、 研究結論 54 第一節、 研究發現 54 第二節、 研究貢獻 55 六、 研究限制與未來建議 56 參考文獻 57 |
參考文獻 References |
Acuña-Zegarra, M., Comas-García, A., Hernandez Vargas, E. A., Santana Cibrian, M., & Velasco-Hernandez, J. (2020). The SARS-CoV-2 epidemic outbreak: a review of plausible scenarios of containment and mitigation for Mexico. Bengio, Y., Courville, A. C., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1798-1828. BenYahmed, Y., Abu Bakar, A., RazakHamdan, A., Alshareef, A., & Abdullah, S. M. S. (2015). Adaptive sliding window algorithm for weather data segmentation. Journal of theoretical applied information technology, 80, 322-333. Bergmeir, C., Hyndman, Rob J., & Benítez, José M. (2016). Bagging exponential smoothing methods using STL decomposition and Box–Cox transformation. International Journal of Forecasting, 32(2), 303-312. doi:https://doi.org/10.1016/j.ijforecast.2015.07.002 Box, G. E. P., & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control: San Francisco: Holden-Day. Bukhari, Q., & Jameel, Y. (2020). Will Coronavirus Pandemic Diminish by Summer? SSRN Electronic Journal. doi:10.2139/ssrn.3556998 Chae, S., Kwon, S., & Lee, D. (2018). Predicting Infectious Disease Using Deep Learning and Big Data. International journal of environmental research and public health, 15(8), 1596. doi:10.3390/ijerph15081596 Chimmula, V. K. R., Zhang, L. J. C., Solitons, & Fractals. (2020). Time series forecasting of COVID-19 transmission in Canada using LSTM networks. 135, 109864. Cho, K., Merrienboer, B. V., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encode’ Decoder for Statistical Machine Translation. ArXiv, abs/1406.1078. Chumachenko, D., Turiy, A., & Chukhray, A. (2019, 2-6 July 2019). Application of Statistical Simulation for Measles Epidemic Process Forecasting. Paper presented at the 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON). Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. (1990). STL: A seasonal-trend decomposition. Journal of official statistics, 6(1), 3-73. Džiugys, A., Bieliūnas, M., Skarbalius, G., Misiulis, E., & Navakas, R. (2020). Simplified model of Covid-19 epidemic prognosis under quarantine and estimation of quarantine effectiveness. Chaos, Solitons & Fractals, 140, 110162. doi:https://doi.org/10.1016/j.chaos.2020.110162 Fang, H., Wang, L., & Yang, Y. (2020). Human mobility restrictions and the spread of the Novel Coronavirus (2019-nCoV) in China. Journal of Public Economics, 191, 104272. doi:https://doi.org/10.1016/j.jpubeco.2020.104272 Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L., & Muller, P.-A. (2018). Data augmentation using synthetic data for time series classification with deep residual networks. ArXiv, abs/1808.02455. Gaglione, D., Braca, P., Millefiori, L. M., Soldi, G., Forti, N., Marano, S., . . . Pattipati, K. R. (2020). Adaptive Bayesian Learning and Forecasting of Epidemic Evolution—Data Analysis of the COVID-19 Outbreak. IEEE Access, 8, 175244-175264. doi:10.1109/ACCESS.2020.3019922 Gupta, S., Raghuwanshi, G. S., & Chanda, A. (2020). Effect of weather on COVID-19 spread in the US: A prediction model for India in 2020. Science of The Total Environment, 728, 138860. doi:https://doi.org/10.1016/j.scitotenv.2020.138860 Hadjidemetriou, G. M., Sasidharan, M., Kouyialis, G., & Parlikad, A. K. (2020). The impact of government measures and human mobility trend on COVID-19 related deaths in the UK. Transportation Research Interdisciplinary Perspectives, 6, 100167. doi:https://doi.org/10.1016/j.trip.2020.100167 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction: Springer Science & Business Media. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 770-778. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-term Memory. Neural computation, 9, 1735-1780. doi:10.1162/neco.1997.9.8.1735 Holmes, E. E., M. D. Scheuerell, and E. J. Ward, (2021). Applied time series analysis for fisheries and environmental data. NOAA Fisheries, Northwest Fisheries Science Center, 2725 Montlake Blvd E., Seattle, WA 98112. Hota, H., Handa, R., & Shrivas, A. (2017). Time Series Data Prediction Using Sliding Window Based RBF Neural Network. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems, 25. doi:10.1145/3065386 Lauer, S. A., Grantz, K. H., Bi, Q., Jones, F. K., Zheng, Q., Meredith, H. R., . . . Lessler, J. (2020). The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application. Annals of Internal Medicine, 172(9), 577-582. doi:10.7326/M20-0504 Liang, L.-L., Tseng, C.-H., Ho, H. J., & Wu, C.-Y. (2020). Covid-19 mortality is negatively associated with test number and government effectiveness. Scientific Reports, 10(1), 12567. doi:10.1038/s41598-020-68862-x Lim, B., & Zohren, S. (2020). Time Series Forecasting With Deep Learning: A Survey. ArXiv, abs/2004.13408. Olah, C. (2015). Understanding lstm networks. Park, S., Kim, B., Kang, C. M., Chung, C., & Choi, J. W. (2018). Sequence-to-Sequence Prediction of Vehicle Trajectory via LSTM Encoder-Decoder Architecture. IEEE Intelligent Vehicles Symposium, 1672-1678. Pequeno, P., Mendel, B., Rosa, C., Bosholn, M., Souza, J. L., Baccaro, F., . . . Magnusson, W. (2020). Air transportation, population density and temperature predict the spread of COVID-19 in Brazil. PeerJ, 8, e9322-e9322. doi:10.7717/peerj.9322 Rangarajan, P., Mody, S., & Marathe, M. (2019). Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. PLOS Computational Biology, 15, e1007518. doi:10.1371/journal.pcbi.1007518 Reilev, M., Kristensen, K. B., Pottegård, A., Lund, L. C., Hallas, J., Ernst, M. T., . . . Thomsen, R. W. (2020). Characteristics and predictors of hospitalization and death in the first 11,122 cases with a positive RT-PCR test for SARS-CoV-2 in Denmark: a nationwide cohort. International Journal of Epidemiology. doi:10.1093/ije/dyaa140 Shorten, C., & Khoshgoftaar, T. (2019). A survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 6. doi:10.1186/s40537-019-0197-0 Smyl, S., & Kuber, K. (2016). Data Preprocessing and Augmentation for Multiple Short Time Series Forecasting with Recurrent Neural Networks. Um, T. T., Pfister, F., Pichler, D., Endo, S., Lang, M., Hirche, S., . . . Kulj , D. (2017). Data augmentation of wearable sensor data for parkinson's disease monitoring using convolutional neural networks. Proceedings of the 19th ACM International Conference on Multimodal Interaction. Wen, Q., Gao, J., Song, X., Sun, L., & Tan, J. (2019). RobustTrend: A Huber Loss with a Combined First and Second Order Difference Regularization for Time Series Trend Filtering. ArXiv, abs/1906.03751. Wen, Q., Sun, L., Song, X., Gao, J., Wang, X., & Xu, H. (2020). Time Series Data Augmentation for Deep Learning: A Survey. ArXiv, abs/2002.12478. Ying, X. (2019). An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series, 1168, 022022. doi:10.1088/1742-6596/1168/2/022022 Yudistira, N. (2020). COVID-19 growth prediction using multivariate long short term memory. ArXiv, abs/2005.04809. Zhao, Z., Nehil-Puleo, K., & Zhao, Y. (2020). How well can we forecast the COVID-19 pandemic with curve fitting and recurrent neural networks? medRxiv, 2020.2005.2014.20102541. doi:10.1101/2020.05.14.20102541 Zou, D., Wang, L., Xu, P., Chen, J., Zhang, W., & Gu, Q. (2020). Epidemic Model Guided Machine Learning for COVID-19 Forecasts in the United States. 林大貴. (2017). TensorFlow+Keras 深度學習人工智慧實務應用, 臺灣:博碩 洪育勛. (2020). 以深度學習探討臺灣新型冠狀病毒傳播之模式. (碩士), 國立中山大學, 高雄市. Retrieved from https://hdl.handle.net/11296/kn456u |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外完全公開 unrestricted 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |