Responsive image
博碩士論文 etd-0724122-185640 詳細資訊
Title page for etd-0724122-185640
論文名稱
Title
基於決策樹隱半馬可夫模型的借貸審核流程探勘
Loan Process Model based on Tree-based Hidden Semi-Markov Models
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
33
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2022-07-29
繳交日期
Date of Submission
2022-08-24
關鍵字
Keywords
分類及迴歸樹、隱半馬可夫模型、流程模型、時間性資料探勘、可解釋的機器學習、借貸審核
Classification and Regression Tree, Hidden Semi-Markov Models, Process Model, Temporal Data Mining, Interpretable Machine Learning, Loan Evaluation
統計
Statistics
本論文已被瀏覽 374 次,被下載 82
The thesis/dissertation has been browsed 374 times, has been downloaded 82 times.
中文摘要
在大數據的時代浪潮下,各領域的專家們應用流程探勘方法分析資訊系統事件日誌並建構流程模型,進而探索、監控以及改進實際流程。然而,傳統的流程模型無法有效處理機率式的流程序列推論等問題。本文中,我們嘗試應用結合決策樹及隱半馬可夫模型的概率式流程模型-分類樹隱半馬可夫模型,挖掘現實中銀行借貸審核流程可能發生的延遲現象及影響因素,實驗結果亦驗證此模型可以在模擬現實生活中時間可變性及不確定性的假設前提下,能協助我們了解「在給定觀察序列的條件下,最有可能的隱藏狀態序列為何?」以及「影響各種隱藏狀態的特徵條件為何?」。最終,我們能透過所定義的特徵條件推論實際借貸審核過程及導致借貸流程中延遲的因素。例如:某個借貸審核流程在持續32天內,間斷性的發生了3次延遲現象,推論其影響因素可能是借貸金額龐大及申請文件不完整所造成。
Abstract
In the era of big data, experts in various fields apply process mining methods to analyze event logs collected by information systems and build process models to explore, monitor and improve actual processes. However, the traditional process model has certain bottlenecks that cannot be overcome. In this paper, we try to use a probabilistic process model that combines with decision tree and hidden semi-Markov model, classification tree hidden semi-Markov model, to mine the delay phenomenon that may occur in real-life bank loan review process cases , and influencing factors behind it. The experimental results also verify that this model can help us understand "What is the most likely sequence of hidden states given the sequence of observations?" and " What are the characteristic conditions that affect the various hidden states?”, under the assumption of simulating real-life temporal variability and uncertainty. Finally, we can infer the actual loan review process and the factors that cause delays in the loan process from the defined characteristic conditions. For example, a loan review process has intermittently delayed 3 times within 32 days. It is inferred that the influencing factors may be caused by the large loan amount and incomplete application documents.
目次 Table of Contents
論文審定書 i
摘要  ii
Abstract iii
第一章、緒論 1
1-1 研究背景 1
1-2 研究動機 1
1-3 研究目的 2
第二章、文獻探討 3
2-1 無關模型的局部解釋方法(Local Interpretable Model-Agnostic Explanations, LIME) 4
2-2 馬可夫模型(Markov Model) 5
2-3 隱馬可夫模型(Hidden Markov Model, HMM) 5
2-4 隱半馬可夫模型(Hidden Semi-Markov Model, HSMM) 7
2-5 分類樹隱半馬可夫模型(Classification Tree Hidden Semi-Markov Model, CTHSMM) 7
第三章、研究方法 9
3-1 建立CART分類樹 9
3-2 狀態轉移概率估計方法 10
3-3 維特比路徑準確性評估 11
3-4 分類樹隱半馬可夫模型的模型選擇 11
第四章、實驗結果與討論 13
4-1 觀察值及問題的定義 14
4-2 構建多個候選分類樹隱半馬可夫模型 15
4-3 選擇最佳的分類樹隱半馬可夫模型 16
4-4 較可能發生延遲的狀態為何?並推論影響該狀態延遲的因素 17
4-5 維特比準確率曲線變化所代表的意涵 18
4-6 在給定觀察序列及其持續時間的狀況下,預測其維特比路徑 20
4-7推論維特比路徑其中的借貸審核過程及延遲因素 21
4-8 比較LSTM應用LIME的模型解釋結果 23
第五章、結論 23
參考文獻 24
參考文獻 References
Bahl, L., Brown, P., de Souza, P., & Mercer, R. (1986). Maximum mutual information
estimation of hidden Markov model parameters for speech recognition. ICASSP
’86. IEEE International Conference on Acoustics, Speech, and Signal Processing,   
11, 49–52. https://doi.org/10.1109/ICASSP.1986.1169179
Cassandras, C. G., & Lafortune, S. (Eds.). (2008). Introduction to Discrete Event Systems.
   Springer US. https://doi.org/10.1007/978-0-387-68612-7
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural   
   Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kang, Y., & Zadorozhny, V. (2016). Process Discovery Using Classification Tree Hidden  
   Semi-Markov Model. 2016 IEEE 17th International Conference on Information   
   Reuse and Integration (IRI), 361–368. https://doi.org/10.1109/IRI.2016.55
Loh, W. (2011). Classification and regression trees. WIREs Data Mining and Knowledge
   Discovery, 1(1), 14–23. https://doi.org/10.1002/widm.8
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under Concept
   Drift: A Review. IEEE Transactions on Knowledge and Data Engineering, 1–1.  
   https://doi.org/10.1109/TKDE.2018.2876857
Lukashin, A. (1998). GeneMark.hmm: New solutions for gene finding. Nucleic Acids
   Research, 26(4), 1107–1115. https://doi.org/10.1093/nar/26.4.1107
Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing
   Magazine, 13(6), 47–60. https://doi.org/10.1109/79.543975
O’Connell, J., & Højsgaard, S. (2011). Hidden Semi Markov Models for Multiple
   Observation Sequences: The mhsmm Package for R. Journal of Statistical
   Software, 39(4). https://doi.org/10.18637/jss.v039.i04
O’Connell, J., Tøgersen, F. A., Friggens, N. C., Løvendahl, P., & Højsgaard, S. (2011).
   Combining Cattle Activity and Progesterone Measurements Using Hidden Semi-
   Markov Models. Journal of Agricultural, Biological, and Environmental
   Statistics, 16(1), 1–16. https://doi.org/10.1007/s13253-010-0033-7
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in
   speech recognition. Proceedings of the IEEE, 77(2), 257–286.
   https://doi.org/10.1109/5.18626
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016a). Model-Agnostic Interpretability of
   Machine Learning. https://doi.org/10.48550/ARXIV.1606.05386
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016b). “Why Should I Trust You?”: Explaining
   the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD
   International Conference on Knowledge Discovery and Data Mining, 1135–1144.
   https://doi.org/10.1145/2939672.2939778
Silverman, B. W. (2018). Density Estimation for Statistics and Data Analysis (1st ed.).
   Routledge. https://doi.org/10.1201/9781315140919
Therneau, Terry M, Atkinson, Elizabeth J, & others. (n.d.). An introduction to recursive
   partitioning using the RPART routines.
van der Aalst, W. (2012). Process Mining: Overview and Opportunities. ACM
   Transactions on Management Information Systems, 3(2), 1–17.
   https://doi.org/10.1145/2229156.2229157
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum
   decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
   https://doi.org/10.1109/TIT.1967.1054010
Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A Review of Recurrent Neural Networks: LSTM
   Cells and Network Architectures. Neural Computation, 31(7), 1235–1270.
   https://doi.org/10.1162/neco_a_01199
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code