國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於決策樹隱半馬可夫模型的借貸審核流程探勘,Loan Process Model based on Tree-based Hidden Semi-Markov Models

論文名稱 Title	基於決策樹隱半馬可夫模型的借貸審核流程探勘 Loan Process Model based on Tree-based Hidden Semi-Markov Models
系所名稱 Department	電子商務與商業分析數位學習碩士在職專班 Online Master of Business Administration in Electronic Commerce and Business Analytics
畢業學年期 Year, semester	110 學年度第 2 學期 The spring semester of Academic Year 110	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	33
研究生 Author	許文杰 Wen-Chieh Hsu
指導教授 Advisor	康藝晃 KANG, YI-HUANG
召集委員 Convenor	李珮如 LEE, PEI-JU
口試委員 Advisory Committee	楊惠芳 Yang,Huei-Fang
口試日期 Date of Exam	2022-07-29	繳交日期 Date of Submission	2022-08-24
關鍵字 Keywords	分類及迴歸樹、隱半馬可夫模型、流程模型、時間性資料探勘、可解釋的機器學習、借貸審核 Classification and Regression Tree, Hidden Semi-Markov Models, Process Model, Temporal Data Mining, Interpretable Machine Learning, Loan Evaluation
統計 Statistics	本論文已被瀏覽 657 次，被下載 89 次 The thesis/dissertation has been browsed 657 times, has been downloaded 89 times.

中文摘要
在大數據的時代浪潮下，各領域的專家們應用流程探勘方法分析資訊系統事件日誌並建構流程模型，進而探索、監控以及改進實際流程。然而，傳統的流程模型無法有效處理機率式的流程序列推論等問題。本文中，我們嘗試應用結合決策樹及隱半馬可夫模型的概率式流程模型-分類樹隱半馬可夫模型，挖掘現實中銀行借貸審核流程可能發生的延遲現象及影響因素，實驗結果亦驗證此模型可以在模擬現實生活中時間可變性及不確定性的假設前提下，能協助我們了解「在給定觀察序列的條件下，最有可能的隱藏狀態序列為何？」以及「影響各種隱藏狀態的特徵條件為何？」。最終，我們能透過所定義的特徵條件推論實際借貸審核過程及導致借貸流程中延遲的因素。例如：某個借貸審核流程在持續32天內，間斷性的發生了3次延遲現象，推論其影響因素可能是借貸金額龐大及申請文件不完整所造成。
Abstract
In the era of big data, experts in various fields apply process mining methods to analyze event logs collected by information systems and build process models to explore, monitor and improve actual processes. However, the traditional process model has certain bottlenecks that cannot be overcome. In this paper, we try to use a probabilistic process model that combines with decision tree and hidden semi-Markov model, classification tree hidden semi-Markov model, to mine the delay phenomenon that may occur in real-life bank loan review process cases , and influencing factors behind it. The experimental results also verify that this model can help us understand "What is the most likely sequence of hidden states given the sequence of observations?" and " What are the characteristic conditions that affect the various hidden states?”, under the assumption of simulating real-life temporal variability and uncertainty. Finally, we can infer the actual loan review process and the factors that cause delays in the loan process from the defined characteristic conditions. For example, a loan review process has intermittently delayed 3 times within 32 days. It is inferred that the influencing factors may be caused by the large loan amount and incomplete application documents.

目次 Table of Contents
論文審定書　i 摘要　ii Abstract　iii 第一章、緒論　1 1-1 研究背景　1 1-2 研究動機　1 1-3 研究目的　2 第二章、文獻探討　3 2-1 無關模型的局部解釋方法(Local Interpretable Model-Agnostic Explanations, LIME)　4 2-2 馬可夫模型(Markov Model)　5 2-3 隱馬可夫模型(Hidden Markov Model, HMM)　5 2-4 隱半馬可夫模型(Hidden Semi-Markov Model, HSMM)　7 2-5 分類樹隱半馬可夫模型(Classification Tree Hidden Semi-Markov Model, CTHSMM)　7 第三章、研究方法　9 3-1 建立CART分類樹　9 3-2 狀態轉移概率估計方法　10 3-3 維特比路徑準確性評估　11 3-4 分類樹隱半馬可夫模型的模型選擇　11 第四章、實驗結果與討論　13 4-1 觀察值及問題的定義　14 4-2 構建多個候選分類樹隱半馬可夫模型　15 4-3 選擇最佳的分類樹隱半馬可夫模型　16 4-4 較可能發生延遲的狀態為何？並推論影響該狀態延遲的因素　17 4-5 維特比準確率曲線變化所代表的意涵　18 4-6 在給定觀察序列及其持續時間的狀況下，預測其維特比路徑　20 4-7推論維特比路徑其中的借貸審核過程及延遲因素　21 4-8 比較LSTM應用LIME的模型解釋結果　23 第五章、結論　23 參考文獻　24

參考文獻 References
Bahl, L., Brown, P., de Souza, P., & Mercer, R. (1986). Maximum mutual information estimation of hidden Markov model parameters for speech recognition. ICASSP ’86. IEEE International Conference on Acoustics, Speech, and Signal Processing, 　　 11, 49–52. https://doi.org/10.1109/ICASSP.1986.1169179 Cassandras, C. G., & Lafortune, S. (Eds.). (2008). Introduction to Discrete Event Systems. 　　　Springer US. https://doi.org/10.1007/978-0-387-68612-7 Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural 　　　　　Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 Kang, Y., & Zadorozhny, V. (2016). Process Discovery Using Classification Tree Hidden 　　　　Semi-Markov Model. 2016 IEEE 17th International Conference on Information 　　　　　Reuse and Integration (IRI), 361–368. https://doi.org/10.1109/IRI.2016.55 Loh, W. (2011). Classification and regression trees. WIREs Data Mining and Knowledge 　　　Discovery, 1(1), 14–23. https://doi.org/10.1002/widm.8 Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under Concept 　　　Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 1–1. 　　　　https://doi.org/10.1109/TKDE.2018.2876857 Lukashin, A. (1998). GeneMark.hmm: New solutions for gene finding. Nucleic Acids 　　　Research, 26(4), 1107–1115. https://doi.org/10.1093/nar/26.4.1107 Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing 　　　Magazine, 13(6), 47–60. https://doi.org/10.1109/79.543975 O’Connell, J., & Højsgaard, S. (2011). Hidden Semi Markov Models for Multiple 　　　Observation Sequences: The mhsmm Package for R. Journal of Statistical 　　　Software, 39(4). https://doi.org/10.18637/jss.v039.i04 O’Connell, J., Tøgersen, F. A., Friggens, N. C., Løvendahl, P., & Højsgaard, S. (2011). 　　　Combining Cattle Activity and Progesterone Measurements Using Hidden Semi- 　　　Markov Models. Journal of Agricultural, Biological, and Environmental 　　　Statistics, 16(1), 1–16. https://doi.org/10.1007/s13253-010-0033-7 Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in 　　　speech recognition. Proceedings of the IEEE, 77(2), 257–286. 　　　https://doi.org/10.1109/5.18626 Ribeiro, M. T., Singh, S., & Guestrin, C. (2016a). Model-Agnostic Interpretability of 　　　Machine Learning. https://doi.org/10.48550/ARXIV.1606.05386 Ribeiro, M. T., Singh, S., & Guestrin, C. (2016b). “Why Should I Trust You?”: Explaining 　　　the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD 　　　International Conference on Knowledge Discovery and Data Mining, 1135–1144. 　　　https://doi.org/10.1145/2939672.2939778 Silverman, B. W. (2018). Density Estimation for Statistics and Data Analysis (1st ed.). 　　　Routledge. https://doi.org/10.1201/9781315140919 Therneau, Terry M, Atkinson, Elizabeth J, & others. (n.d.). An introduction to recursive 　　　partitioning using the RPART routines. van der Aalst, W. (2012). Process Mining: Overview and Opportunities. ACM 　　　Transactions on Management Information Systems, 3(2), 1–17. 　　　https://doi.org/10.1145/2229156.2229157 Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum 　　　decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269. 　　　https://doi.org/10.1109/TIT.1967.1054010 Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A Review of Recurrent Neural Networks: LSTM 　　　Cells and Network Architectures. Neural Computation, 31(7), 1235–1270. 　　　https://doi.org/10.1162/neco_a_01199

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0724122-185640.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS