國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,使用時間序列機器學習演算法進行 eGFR 預測：關於患者病史預測價值的實證研究,eGFR Forecasting using Temporal Pattern Recognition: An Empirical Study on the Value of Patient History

論文名稱 Title	使用時間序列機器學習演算法進行 eGFR 預測：關於患者病史預測價值的實證研究 eGFR Forecasting using Temporal Pattern Recognition: An Empirical Study on the Value of Patient History
系所名稱 Department	資訊管理學系電子商務與商業分析數位學習碩士在職專班 Online Master of Information Management in Electronic Commerce and Business Analytics
畢業學年期 Year, semester	112 學年度第 2 學期 The spring semester of Academic Year 112	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	51
研究生 Author	王雅莉 Ya-Li Wang
指導教授 Advisor	康藝晃 Kang,Yi-Huang
召集委員 Convenor	李珮如 Lee,Pei-Ju
口試委員 Advisory Committee	楊惠芳 Yang,Huei-Fang
口試日期 Date of Exam	2024-07-11	繳交日期 Date of Submission	2024-07-23
關鍵字 Keywords	隨機森林、1維卷積神經網路、大型語言模型、迴歸樹、慢性腎臟病、腎絲球過濾率估計值 Random Forest, One Dimension Convolutional Neural Network, Large Language Mode, Regression Tree, Chronic kidney disease, estimated glomerular filtration rate
統計 Statistics	本論文已被瀏覽 65 次，被下載 1 次 The thesis/dissertation has been browsed 65 times, has been downloaded 1 times.

中文摘要
「慢性腎臟病」為常見的慢性病，國際重視慢性腎臟病防治，我國近年來也於此投入相當比例的資源。慢性腎臟病是因腎功能衰退，導致身體無法進行正常的代謝。其治療、照護及預後，取決於病患位於哪個CKD階段，而決定CKD階段是依據腎絲球過濾率估計值。若能準確的預測腎絲球過濾率估計值，將能以此做及早的準備。目前用來預測腎絲球過濾率估計值是採移動平均法，此方法雖有不錯的預測準確度，但仍受到部份限制。本論文運用機器學習方法，來進行腎絲球過濾率估計值的預測，使用3種資料切割以探討病患個體及CKD階段與腎絲球過濾率估計值關係；並使用迴歸樹、隨機森林、1維卷積網路及大型語言模型4種方法，來進行以達目的。
Abstract
"Chronic Kidney Disease (CKD) is a common chronic illness that has garnered significant attention for its prevention and treatment on an international level. In recent years, our country has also dedicated substantial resources to address CKD. CKD results from a decline in kidney function, leading to the body's inability to perform normal metabolic processes. The treatment, care, and prognosis of CKD patients depend on the stage of CKD they are in, which is determined based on the estimated glomerular filtration rate (eGFR). Accurate prediction of eGFR is crucial for early preparation and intervention. Currently, the moving average method is employed to predict eGFR. Although this method provides reasonable prediction accuracy, it has certain limitations. This thesis leverages machine learning techniques to forecast eGFR values. It explores the relationship between patient characteristics, CKD stages, and eGFR using three data segmentation strategies. Additionally, it employs four methods — regression trees, random forests, 1-dimensional convolutional networks, and large language models — to achieve the prediction objectives."

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 v 圖目錄 vii 表目錄 viii 第一章緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 3 第二章文獻探討 4 2.1 慢性腎臟病 4 2.1.1 慢性腎臟病(chronic kidney disease, CKD) 4 2.1.2 腎絲球過濾率估計值(estimated glomerular filtration rate, eGFR) 7 2.2 機器學習方法 8 2.2.1 迴歸樹(Regression Tree) 8 2.2.2 集成學習(Ensemble learning) 9 2.2.3 隨機森林(Random Forest) 10 2.2.4 1維卷積神經網路(1Dimension Convolutional Neural Network) 11 2.3 大型語言模型(Large Language Model, LLM) 14 第三章研究設計及方法 15 3.1 研究流程 15 3.2 研究方法 15 3.3 資料切割 16 3.3.1 依病患看診順序Holdout法 17 3.3.2 依CKD 病程階段Holdout法 17 3.3.3 Prequential blocks 18 3.4 模型建置與預測 19 3.5 模型評估 19 3.5.1 Mean absolute error (MAE) 19 3.5.2 Normalized root mean square error (NRMSE) 20 3.5.3 Mean absolute percentage error (MAPE) 20 第四章實驗設置與結果 22 4.1 資料預處理 22 4.1.1 原始資料概況 22 4.1.2 預處理過程 23 4.1.3 預處理前後數據比較 23 4.2 資料切割 24 4.2.1 Prequential blocks 切割法 26 4.3 實驗結果 26 4.3.1 實驗結果說明 27 4.3.2 實驗結果展示 27 第五章研究結論 31 5.1 研究結論 31 5.2 研究限制 31 第六章參考文獻 33 附錄A 本研究實驗詳細結果 36 附錄B 模型建置參數 42

參考文獻 References
[1] 國家發展委員會, “中華民國人口推估(2022年至2070年)報告.” 2022. [2] 財團法人國家衛生研究院, “2015臺灣慢性腎臟病臨床診療指引專書.” 財團法人國家衛生研究院, 2015. [3] 衛生福利部, “衛生福利部最新消息.” [Online]. Available: https://www.mohw.gov.tw/np-16-1.html [4] 衛生福利部全民健康保險會, “全民健康保險醫療給付費用總額協商參考指標要覽,” 112 107AD. [Online]. Available: https://dep.mohw.gov.tw/nhic/lp-1665-116.html [5] NIDDK, “2023 USUDS Annual Data Report.” 2023. [Online]. Available: https://usrds-adr.niddk.nih.gov/2023 [6] 財團法人國家衛生研究院 and 台灣腎臟醫學會, “2022 台灣腎病年報.” [7] L. A. Inker et al., “KDOQI US Commentary on the 2012 KDIGO Clinical Practice Guideline for the Evaluation and Management of CKD,” Am. J. Kidney Dis., vol. 63, no. 5, pp. 713–735, May 2014, doi: 10.1053/j.ajkd.2014.01.416. [8] C.-C. Hsu et al., “High Prevalence and Low Awareness of CKD in Taiwan: A Study on the Relationship Between Serum Creatinine and Awareness From a Nationally Representative Survey,” Am. J. Kidney Dis., vol. 48, no. 5, pp. 727–738, Nov. 2006, doi: 10.1053/j.ajkd.2006.07.018. [9] 衛生福利部, “全民健康保險初期慢性腎臟病醫療給付改善方案.” [10] 衛生福利部, “全民健康保險末期腎臟病前期(Pre-ESRD)之病人照護與衛教計畫.” 2021. [11] B. H. Rovin et al., “KDIGO 2021 Clinical Practice Guideline for the Management of Glomerular Diseases,” Kidney Int., vol. 100, no. 4, pp. S1–S276, Oct. 2021, doi: 10.1016/j.kint.2021.05.021. [12] 衛生福利部國民健康署 and 台灣腎臟醫學會, “慢性腎臟病健康管理手冊.” 107AD. [13] 辛和宗, 基礎生理學 = Basic physiology, 三版. 臺中市: 華格那, 2017. [14] 林明彥 and 黃尚志, “台灣慢性腎臓病／末期腎臓病流行病學過去、現在與未來,” 腎臟與透析, vol. 19, no. 1, Jan. 2007, doi: 10.6340/KD.2007(1).01. [15] 台灣腎臟醫學會, 衛生福利部國民健康署編撰, 早期慢性腎臟病照護手冊. 衛生福利部國民健康署, 2022. [16] 台灣腎臟醫學會, “腎絲球過濾率計算.” [Online]. Available: https://www.tckdf.org.tw/Main/GFR02 [17] A. S. Levey et al., “A New Equation to Estimate Glomerular Filtration Rate”. [18] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. in Adaptive computation and machine learning. Cambridge, Massachusetts: The MIT Press, 2016. [19] E. Alpaydin, Introduction to machine learning, Fourth edition. in Adaptive computation and machine learning series. Cambridge, Massachusetts: The MIT Press, 2020. [20] G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning: with applications in R, Corrected at 8th printing. in Springer texts in statistics. New York Heidelberg Dordrecht London: Springer, 2017. doi: 10.1007/10.1007/978-1-4614-7138-7. [21] L. Breiman, Classification and regression trees. Abingdon: Routledge, 2017. [22] A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, “An introduction to decision tree modeling,” J. Chemom., vol. 18, no. 6, pp. 275–285, Jun. 2004, doi: 10.1002/cem.873. [23] C. Zhang and Y. Ma, Eds., Ensemble Machine Learning: Methods and Applications. New York, NY: Springer New York, 2012. doi: 10.1007/978-1-4419-9326-7. [24] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324. [25] Z.-H. Zhou, Ensemble methods: foundations and algorithms. in Chapman & Hall/CRC machine learning & pattern recognition series. Boca Raton, FL: Taylor & Francis, 2012. [26] L. Rokach, Pattern classification using ensemble methods. in Series in machine perception and artificial intelligence, no. v. 75. Singapore ; Hackensack, NJ: World Scientific, 2010. [27] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998, doi: 10.1109/5.726791. [28] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman, “1D convolutional neural networks and applications: A survey,” Mech. Syst. Signal Process., vol. 151, p. 107398, Apr. 2021, doi: 10.1016/j.ymssp.2020.107398. [29] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” in Computer Vision – ECCV 2014, vol. 8689, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., in Lecture Notes in Computer Science, vol. 8689. , Cham: Springer International Publishing, 2014, pp. 818–833. doi: 10.1007/978-3-319-10590-1_53. [30] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space.” arXiv, Sep. 06, 2013. Accessed: Jul. 03, 2024. [Online]. Available: http://arxiv.org/abs/1301.3781 [31] X. Zhang, J. Zhao, and Y. LeCun, “Character-level Convolutional Networks for Text Classification.” arXiv, Apr. 03, 2016. Accessed: Jul. 03, 2024. [Online]. Available: http://arxiv.org/abs/1509.01626 [32] Y. Kim, “Convolutional Neural Networks for Sentence Classification.” arXiv, Sep. 02, 2014. Accessed: Jul. 03, 2024. [Online]. Available: http://arxiv.org/abs/1408.5882 [33] S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus, “End-To-End Memory Networks.” arXiv, Nov. 24, 2015. Accessed: Jul. 03, 2024. [Online]. Available: http://arxiv.org/abs/1503.08895 [34] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735. [35] A. Vaswani et al., “Attention Is All You Need.” arXiv, Aug. 01, 2023. Accessed: Jul. 03, 2024. [Online]. Available: http://arxiv.org/abs/1706.03762 [36] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” arXiv, May 24, 2019. Accessed: Jul. 03, 2024. [Online]. Available: http://arxiv.org/abs/1810.04805 [37] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners”. [38] T. B. Brown et al., “Language Models are Few-Shot Learners.” arXiv, Jul. 22, 2020. Accessed: Jul. 03, 2024. [Online]. Available: http://arxiv.org/abs/2005.14165 [39] J. Gama, “Knowledge Discovery from Data Streams”. [40] Winsor, C. P., “Stabilization of the mean and variance of the truncated normal distribution,” The Annals of Mathematical Statistics, vol. 12, no. 3, pp. 344–354.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0623124-212758.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS