論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
可解釋的聚類與次群組分析: 使用深度無監督規則森林 Enhanced Cluster Interpretation and Subgroup Analysis using Deep Unsupervised Rule Forest |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
43 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2024-08-22 |
繳交日期 Date of Submission |
2024-09-04 |
關鍵字 Keywords |
聚類、可解釋性、深度無監督規則森林、表徵學習、深度架構、深度學習 Clustering, Interpretability, Deep Unsupervised Rule Forest, Representation Learning, Deep Architecture, Deep learning |
||
統計 Statistics |
本論文已被瀏覽 395 次,被下載 17 次 The thesis/dissertation has been browsed 395 times, has been downloaded 17 times. |
中文摘要 |
聚類是一種無監督學習技術,旨在不需要標籤的情況下將相似的資料分七群。傳統的聚類演算法在處理混合數據類型時往往面臨挑戰,且通常缺乏可解釋性。在本文中,我們提出了一種新方法——深度無監督規則森林(DURF),可以克服傳統聚類方法局限性,並為聚類結果提供可解釋的規則。此外,DURF 的深層結構通過學習資料中的複雜關係,進一步提高聚類的性能。 |
Abstract |
Clustering is an unsupervised learning technique that aims to group similar data points without requiring labeled data. Traditional clustering algorithms often encounter challenges, particularly when dealing with mixed data types, and they typically lack interpretability. In this paper, we proposed Deep Unsupervised Rule Forest (DURF), a novel approach designed to handle mixed data types while providing interpretable rules for the clustering results. Moreover, DURF's deep structure further improves clustering performance by capturing complex patterns within the data. |
目次 Table of Contents |
論文審定書……………………………………………………………………………....i誌謝……………………………………………………………………………...............ii摘要……………………………………………………………………………………..iii Abstract………………………………………………………………………………….iv Table of Contents………………………………………………………………………...v List of Figures…………………………………………………………………………..vii List of Tables…………………………………………………………………………..viii 1. Introduction…………………………………………………………………………....1 2. Background…………………………………………………………………………....3 2.1 Addressing Challenges in Mixed Data and Interpretability in Clustering…………….3 2.2 Tree-Based Models…………………………………………………………………..3 2.2.1 Decision Trees……………………………………………………………………...3 2.2.2 Random Forests…………………………………………………………………...6 2.2.3 Unsupervised Random Forest……………………………………………………...6 2.3 Rule Learning………………………………………………………………………..8 2.4 Representation Learning in Clustering……………………………………………..10 2.5 Combining Deep Learning with Tree-Based Models for Enhanced Data Exploration……………………………………………………………………………..11 2.6 Data processing inequality (DPI)…………………………………………………..12 3. Deep Unsupervised Rule Forest….………………………………………………….13 3.1 Building DURF….…………………………………….…………………………...14 3.2 Clustering with NNSAE….…………………………………….………………….16 4. Experiment….…………………………………….…………………………………18 4.1 Clustering Results Comparisons….…………………………………….………….18 4.2 Extracting cluster properties from DURF….………………………………………23 4.3 Cluster Interpretation on Adult Dataset….…………………………………………25 4.4 Assessing Hyperparameter Sensitivity in Model Performance….…………………27 4.5 Discussion………………………………………………………………………….30 5. Conclusion…………………………………………………………………………...31 Reference……………………………………………………………………………….32 |
參考文獻 References |
Markelle Kelly, Rachel Longjohn, Kolby Nottingham, The UCI Machine Learning Repository, https://archive.ics.uci.edu Barry Becker, R. K. (1996). Adult [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5XW20 Basak, J., & Krishnapuram, R. (2005). Interpretable hierarchical clustering by constructing an unsupervised decision tree. IEEE Transactions on Knowledge and Data Engineering, 17(1), 121–132. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2005.11 Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2013.50 Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Taylor & Francis. Citation—UCI Machine Learning Repository. (n.d.). Retrieved June 19, 2024, from https://archive.ics.uci.edu/citation Cover, T. M., & Thomas, J. A. (2012). Elements of Information Theory. John Wiley & Sons. Fürnkranz, J., Gamberger, D., & Lavrač, N. (2012). Foundations of Rule Learning. Springer. https://doi.org/10.1007/978-3-540-75197-7 Goodman, B., & Flaxman, S. (2017). European Union Regulations on Algorithmic Decision Making and a “Right to Explanation.” AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741 Gower, J. C. (1971). A General Coefficient of Similarity and Some of Its Properties. Biometrics, 27(4), 857–871. https://doi.org/10.2307/2528823 Huang, P., Huang, Y., Wang, W., & Wang, L. (2014). Deep Embedding Network for Clustering. 2014 22nd International Conference on Pattern Recognition, 1532–1537. https://doi.org/10.1109/ICPR.2014.272 Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011 Jiao, L., Yang, H., Liu, Z., & Pan, Q. (2022). Interpretable fuzzy clustering using unsupervised fuzzy decision trees. Information Sciences, 611, 540–563. https://doi.org/10.1016/j.ins.2022.08.077 Kang, Y., Huang, S.-T., & Wu, P.-H. (2021). Detection of Drug–Drug and Drug–Disease Interactions Inducing Acute Kidney Injury Using Deep Rule Forests. SN Computer Science, 2(4), 299. https://doi.org/10.1007/s42979-021-00670-0 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), Article 7553. https://doi.org/10.1038/nature14539 Lemme, A., Reinhart, R. F., & Steil, J. J. (2012). Online learning and generalization of parts-based image representations by non-negative sparse autoencoders. Neural Networks, 33, 194–203. https://doi.org/10.1016/j.neunet.2012.05.003 McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11), 205. https://doi.org/10.21105/joss.00205 Miller, K., Hettinger, C., Humpherys, J., Jarvis, T., & Kartchner, D. (2017). Forward Thinking: Building Deep Random Forests (arXiv:1705.07366). arXiv. https://doi.org/10.48550/arXiv.1705.07366 Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251 Quinlan, J. R. (2014). C4.5: Programs for Machine Learning. Elsevier. R. A. Fisher. (1936). Iris [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C56C76 Shi, T., & Horvath, S. (2006). Unsupervised Learning With Random Forest Predictors. Journal of Computational and Graphical Statistics. https://doi.org/10.1198/106186006X94072 Song, C., Liu, F., Huang, Y., Wang, L., & Tan, T. (2013). Auto-encoder Based Data Clustering. In J. Ruiz-Shulcloper & G. Sanniti di Baja (Eds.), Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (pp. 117–124). Springer. https://doi.org/10.1007/978-3-642-41822-8_15 Ward Jr., J. H. (1963). Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845 Zhou, Z.-H., & Feng, J. (2020). Deep Forest (arXiv:1702.08835). arXiv. http://arxiv.org/abs/1702.08835 |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外完全公開 unrestricted 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |