論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
基於決策樹隱半馬可夫模型的借貸審核流程探勘 Loan Process Model based on Tree-based Hidden Semi-Markov Models |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
33 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2022-07-29 |
繳交日期 Date of Submission |
2022-08-24 |
關鍵字 Keywords |
分類及迴歸樹、隱半馬可夫模型、流程模型、時間性資料探勘、可解釋的機器學習、借貸審核 Classification and Regression Tree, Hidden Semi-Markov Models, Process Model, Temporal Data Mining, Interpretable Machine Learning, Loan Evaluation |
||
統計 Statistics |
本論文已被瀏覽 504 次,被下載 87 次 The thesis/dissertation has been browsed 504 times, has been downloaded 87 times. |
中文摘要 |
在大數據的時代浪潮下,各領域的專家們應用流程探勘方法分析資訊系統事件日誌並建構流程模型,進而探索、監控以及改進實際流程。然而,傳統的流程模型無法有效處理機率式的流程序列推論等問題。本文中,我們嘗試應用結合決策樹及隱半馬可夫模型的概率式流程模型-分類樹隱半馬可夫模型,挖掘現實中銀行借貸審核流程可能發生的延遲現象及影響因素,實驗結果亦驗證此模型可以在模擬現實生活中時間可變性及不確定性的假設前提下,能協助我們了解「在給定觀察序列的條件下,最有可能的隱藏狀態序列為何?」以及「影響各種隱藏狀態的特徵條件為何?」。最終,我們能透過所定義的特徵條件推論實際借貸審核過程及導致借貸流程中延遲的因素。例如:某個借貸審核流程在持續32天內,間斷性的發生了3次延遲現象,推論其影響因素可能是借貸金額龐大及申請文件不完整所造成。 |
Abstract |
In the era of big data, experts in various fields apply process mining methods to analyze event logs collected by information systems and build process models to explore, monitor and improve actual processes. However, the traditional process model has certain bottlenecks that cannot be overcome. In this paper, we try to use a probabilistic process model that combines with decision tree and hidden semi-Markov model, classification tree hidden semi-Markov model, to mine the delay phenomenon that may occur in real-life bank loan review process cases , and influencing factors behind it. The experimental results also verify that this model can help us understand "What is the most likely sequence of hidden states given the sequence of observations?" and " What are the characteristic conditions that affect the various hidden states?”, under the assumption of simulating real-life temporal variability and uncertainty. Finally, we can infer the actual loan review process and the factors that cause delays in the loan process from the defined characteristic conditions. For example, a loan review process has intermittently delayed 3 times within 32 days. It is inferred that the influencing factors may be caused by the large loan amount and incomplete application documents. |
目次 Table of Contents |
論文審定書 i 摘要 ii Abstract iii 第一章、緒論 1 1-1 研究背景 1 1-2 研究動機 1 1-3 研究目的 2 第二章、文獻探討 3 2-1 無關模型的局部解釋方法(Local Interpretable Model-Agnostic Explanations, LIME) 4 2-2 馬可夫模型(Markov Model) 5 2-3 隱馬可夫模型(Hidden Markov Model, HMM) 5 2-4 隱半馬可夫模型(Hidden Semi-Markov Model, HSMM) 7 2-5 分類樹隱半馬可夫模型(Classification Tree Hidden Semi-Markov Model, CTHSMM) 7 第三章、研究方法 9 3-1 建立CART分類樹 9 3-2 狀態轉移概率估計方法 10 3-3 維特比路徑準確性評估 11 3-4 分類樹隱半馬可夫模型的模型選擇 11 第四章、實驗結果與討論 13 4-1 觀察值及問題的定義 14 4-2 構建多個候選分類樹隱半馬可夫模型 15 4-3 選擇最佳的分類樹隱半馬可夫模型 16 4-4 較可能發生延遲的狀態為何?並推論影響該狀態延遲的因素 17 4-5 維特比準確率曲線變化所代表的意涵 18 4-6 在給定觀察序列及其持續時間的狀況下,預測其維特比路徑 20 4-7推論維特比路徑其中的借貸審核過程及延遲因素 21 4-8 比較LSTM應用LIME的模型解釋結果 23 第五章、結論 23 參考文獻 24 |
參考文獻 References |
Bahl, L., Brown, P., de Souza, P., & Mercer, R. (1986). Maximum mutual information estimation of hidden Markov model parameters for speech recognition. ICASSP ’86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 11, 49–52. https://doi.org/10.1109/ICASSP.1986.1169179 Cassandras, C. G., & Lafortune, S. (Eds.). (2008). Introduction to Discrete Event Systems. Springer US. https://doi.org/10.1007/978-0-387-68612-7 Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 Kang, Y., & Zadorozhny, V. (2016). Process Discovery Using Classification Tree Hidden Semi-Markov Model. 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), 361–368. https://doi.org/10.1109/IRI.2016.55 Loh, W. (2011). Classification and regression trees. WIREs Data Mining and Knowledge Discovery, 1(1), 14–23. https://doi.org/10.1002/widm.8 Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 1–1. https://doi.org/10.1109/TKDE.2018.2876857 Lukashin, A. (1998). GeneMark.hmm: New solutions for gene finding. Nucleic Acids Research, 26(4), 1107–1115. https://doi.org/10.1093/nar/26.4.1107 Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47–60. https://doi.org/10.1109/79.543975 O’Connell, J., & Højsgaard, S. (2011). Hidden Semi Markov Models for Multiple Observation Sequences: The mhsmm Package for R. Journal of Statistical Software, 39(4). https://doi.org/10.18637/jss.v039.i04 O’Connell, J., Tøgersen, F. A., Friggens, N. C., Løvendahl, P., & Højsgaard, S. (2011). Combining Cattle Activity and Progesterone Measurements Using Hidden Semi- Markov Models. Journal of Agricultural, Biological, and Environmental Statistics, 16(1), 1–16. https://doi.org/10.1007/s13253-010-0033-7 Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. https://doi.org/10.1109/5.18626 Ribeiro, M. T., Singh, S., & Guestrin, C. (2016a). Model-Agnostic Interpretability of Machine Learning. https://doi.org/10.48550/ARXIV.1606.05386 Ribeiro, M. T., Singh, S., & Guestrin, C. (2016b). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. https://doi.org/10.1145/2939672.2939778 Silverman, B. W. (2018). Density Estimation for Statistics and Data Analysis (1st ed.). Routledge. https://doi.org/10.1201/9781315140919 Therneau, Terry M, Atkinson, Elizabeth J, & others. (n.d.). An introduction to recursive partitioning using the RPART routines. van der Aalst, W. (2012). Process Mining: Overview and Opportunities. ACM Transactions on Management Information Systems, 3(2), 1–17. https://doi.org/10.1145/2229156.2229157 Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269. https://doi.org/10.1109/TIT.1967.1054010 Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31(7), 1235–1270. https://doi.org/10.1162/neco_a_01199 |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外完全公開 unrestricted 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |