Responsive image
博碩士論文 etd-0804124-203910 詳細資訊
Title page for etd-0804124-203910
論文名稱
Title
可解釋的聚類與次群組分析: 使用深度無監督規則森林
Enhanced Cluster Interpretation and Subgroup Analysis using Deep Unsupervised Rule Forest
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
43
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2024-08-22
繳交日期
Date of Submission
2024-09-04
關鍵字
Keywords
聚類、可解釋性、深度無監督規則森林、表徵學習、深度架構、深度學習
Clustering, Interpretability, Deep Unsupervised Rule Forest, Representation Learning, Deep Architecture, Deep learning
統計
Statistics
本論文已被瀏覽 395 次,被下載 17
The thesis/dissertation has been browsed 395 times, has been downloaded 17 times.
中文摘要
聚類是一種無監督學習技術,旨在不需要標籤的情況下將相似的資料分七群。傳統的聚類演算法在處理混合數據類型時往往面臨挑戰,且通常缺乏可解釋性。在本文中,我們提出了一種新方法——深度無監督規則森林(DURF),可以克服傳統聚類方法局限性,並為聚類結果提供可解釋的規則。此外,DURF 的深層結構通過學習資料中的複雜關係,進一步提高聚類的性能。
Abstract
Clustering is an unsupervised learning technique that aims to group similar data points without requiring labeled data. Traditional clustering algorithms often encounter challenges, particularly when dealing with mixed data types, and they typically lack interpretability. In this paper, we proposed Deep Unsupervised Rule Forest (DURF), a novel approach designed to handle mixed data types while providing interpretable rules for the clustering results. Moreover, DURF's deep structure further improves clustering performance by capturing complex patterns within the data.
目次 Table of Contents
論文審定書……………………………………………………………………………....i誌謝……………………………………………………………………………...............ii摘要……………………………………………………………………………………..iii
Abstract………………………………………………………………………………….iv
Table of Contents………………………………………………………………………...v
List of Figures…………………………………………………………………………..vii
List of Tables…………………………………………………………………………..viii
1. Introduction…………………………………………………………………………....1
2. Background…………………………………………………………………………....3
2.1 Addressing Challenges in Mixed Data and Interpretability in Clustering…………….3
2.2 Tree-Based Models…………………………………………………………………..3
2.2.1 Decision Trees……………………………………………………………………...3
2.2.2 Random Forests…………………………………………………………………...6
2.2.3 Unsupervised Random Forest……………………………………………………...6
2.3 Rule Learning………………………………………………………………………..8
2.4 Representation Learning in Clustering……………………………………………..10
2.5 Combining Deep Learning with Tree-Based Models for Enhanced Data Exploration……………………………………………………………………………..11
2.6 Data processing inequality (DPI)…………………………………………………..12
3. Deep Unsupervised Rule Forest….………………………………………………….13
3.1 Building DURF….…………………………………….…………………………...14
3.2 Clustering with NNSAE….…………………………………….………………….16
4. Experiment….…………………………………….…………………………………18
4.1 Clustering Results Comparisons….…………………………………….………….18
4.2 Extracting cluster properties from DURF….………………………………………23
4.3 Cluster Interpretation on Adult Dataset….…………………………………………25
4.4 Assessing Hyperparameter Sensitivity in Model Performance….…………………27
4.5 Discussion………………………………………………………………………….30
5. Conclusion…………………………………………………………………………...31
Reference……………………………………………………………………………….32
參考文獻 References
Markelle Kelly, Rachel Longjohn, Kolby Nottingham, The UCI Machine Learning Repository, https://archive.ics.uci.edu
Barry Becker, R. K. (1996). Adult [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5XW20
Basak, J., & Krishnapuram, R. (2005). Interpretable hierarchical clustering by constructing an unsupervised decision tree. IEEE Transactions on Knowledge and Data Engineering, 17(1), 121–132. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2005.11
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2013.50
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Taylor & Francis.
Citation—UCI Machine Learning Repository. (n.d.). Retrieved June 19, 2024, from https://archive.ics.uci.edu/citation
Cover, T. M., & Thomas, J. A. (2012). Elements of Information Theory. John Wiley & Sons.
Fürnkranz, J., Gamberger, D., & Lavrač, N. (2012). Foundations of Rule Learning. Springer. https://doi.org/10.1007/978-3-540-75197-7
Goodman, B., & Flaxman, S. (2017). European Union Regulations on Algorithmic Decision Making and a “Right to Explanation.” AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741
Gower, J. C. (1971). A General Coefficient of Similarity and Some of Its Properties. Biometrics, 27(4), 857–871. https://doi.org/10.2307/2528823
Huang, P., Huang, Y., Wang, W., & Wang, L. (2014). Deep Embedding Network for Clustering. 2014 22nd International Conference on Pattern Recognition, 1532–1537. https://doi.org/10.1109/ICPR.2014.272
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Jiao, L., Yang, H., Liu, Z., & Pan, Q. (2022). Interpretable fuzzy clustering using unsupervised fuzzy decision trees. Information Sciences, 611, 540–563. https://doi.org/10.1016/j.ins.2022.08.077
Kang, Y., Huang, S.-T., & Wu, P.-H. (2021). Detection of Drug–Drug and Drug–Disease Interactions Inducing Acute Kidney Injury Using Deep Rule Forests. SN Computer Science, 2(4), 299. https://doi.org/10.1007/s42979-021-00670-0
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), Article 7553. https://doi.org/10.1038/nature14539
Lemme, A., Reinhart, R. F., & Steil, J. J. (2012). Online learning and generalization of parts-based image representations by non-negative sparse autoencoders. Neural Networks, 33, 194–203. https://doi.org/10.1016/j.neunet.2012.05.003
McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11), 205. https://doi.org/10.21105/joss.00205
Miller, K., Hettinger, C., Humpherys, J., Jarvis, T., & Kartchner, D. (2017). Forward Thinking: Building Deep Random Forests (arXiv:1705.07366). arXiv. https://doi.org/10.48550/arXiv.1705.07366
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251
Quinlan, J. R. (2014). C4.5: Programs for Machine Learning. Elsevier.
R. A. Fisher. (1936). Iris [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76
Shi, T., & Horvath, S. (2006). Unsupervised Learning With Random Forest Predictors. Journal of Computational and Graphical Statistics. https://doi.org/10.1198/106186006X94072
Song, C., Liu, F., Huang, Y., Wang, L., & Tan, T. (2013). Auto-encoder Based Data Clustering. In J. Ruiz-Shulcloper & G. Sanniti di Baja (Eds.), Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 117–124). Springer. https://doi.org/10.1007/978-3-642-41822-8_15
Ward Jr., J. H. (1963). Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845
Zhou, Z.-H., & Feng, J. (2020). Deep Forest (arXiv:1702.08835). arXiv. http://arxiv.org/abs/1702.08835
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code