國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,多分支深度規則森林演算法的集成分類學習,Ensemble Classification Using Multi-split Deep Rule Forests

論文名稱 Title	多分支深度規則森林演算法的集成分類學習 Ensemble Classification Using Multi-split Deep Rule Forests
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	110 學年度第 2 學期 The spring semester of Academic Year 110	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	47
研究生 Author	洪瑋襄 Wei-Hsiang Hung
指導教授 Advisor	康藝晃 KANG, YI-HUANG
召集委員 Convenor	林耕霈 Lin, Keng-Pei
口試委員 Advisory Committee	胡雅涵 Ya-Han Hu
口試日期 Date of Exam	2021-09-10	繳交日期 Date of Submission	2022-03-17
關鍵字 Keywords	深度模型結構、解釋性、隨機森林、表徵學習、規則學習、深度規則森林 Deep Architecture, Interpretability, Random Forest, Representation Learning, Rule Learning, Deep Rule Forest
統計 Statistics	本論文已被瀏覽 785 次，被下載 107 次 The thesis/dissertation has been browsed 785 times, has been downloaded 107 times.

中文摘要
近年來類神經網路一直都有卓越的表現，也因此受到許多關注，但受限於模型的不可解釋性，導致人們不敢將它使用在一些高風險應用上，像是醫療和犯罪領域。決策樹演算法可以利用規則來讓人們了解決策的過程提供模型解釋性，然而，面對複雜的問題時，決策樹演算法的模型太過簡單以致於無法處理。我們將深度規則森林演算法的概念做延伸，提出多分支深度規則森林演算法。多分支深度規則森林演算法可以藉由多層的規則森林提高模型的能力來產生更複雜的規則，而實驗結果顯示，多分支深度規則森林能夠用規則來學到更好的資料表徵，並用學到的表徵來更精準的做分類預測。而我們提出的模型也能夠利用多分支規則森林的特性來用更少的樹表現得一樣競爭力，更重要的是產生的規則也將預測的過程用可以被人類理解的方式呈現。
Abstract
Throughout these years, the neural network has been the leading and receiving lots of attention, while the drawback of its interpretability holds people back from applying it in high-stake applications (e.g. medical field and criminal justice). The decision tree algorithms offer the transparency of reasoning with rules, however, the model capacity of decision trees is too low to deal with complex problems. The tree-based DRF algorithms inspire us to further extend the idea and propose the multi-split DRF. The proposed multi-split DRF increases the model capacity to generate the more complex rules with multiple layers of rule forests. The experimental results show that the multi-split DRF is able to learn better data representations, which can be used to classify the data more accurately. Also, the proposed model performs competitively with fewer trees compared to DRF with the multi-split rule forests. More importantly, the generated rules of multi-split DRF present the prediction-making process in a human-comprehensible way. As a result, the proposed approach is an interpretable representation learning algorithm.

目次 Table of Contents
論文審定書 ............................................................................. i 摘要 ....................................... ii Abstract ......................................................................................................... iii List of Figures ................................................................. v List of Tables ...................................................................... vi 1 Introduction .......................................................................... 1 2 Background and Related Work ........................................ 2 2.1 Tree-based model ..................................................... 2 2.2 Two-level Rule in Tree ............................................ 6 2.3 Representation Learning ............................................................... 9 2.4 Deep Architecture ...................................................................... 11 2.5 Explainable AI .................................................................................... 13 3 Building Multi-split Deep Rule Forests ............................................ 15 3.1 Building the Multi-split DRF ..................................................................... 15 3.2 Interpretability of Multi-split DRF ............................................................ 18 3.3 Rule-encoding for Data ....................................................................... 20 3.4 Hyper-parameters Tuning in Multi-split DRF .......................................... 22 4 Experiment and Discussion ....................................................... 24 4.1 Experiment Setup .................................................................................. 24 4.2 Prediction Accuracy ............................................................................... 25 4.3 Backtracking Rules ............................................................................ 29 4.4 Influence of Hyper-parameters ........................................................... 31 5 Conclusion ................................................................................................. 35 6 References .................................................................................. 36

參考文獻 References
Anil K., J., & Jianchang, M. (1996). Artificial Neural Networks: A tutorial. IEEE Computer 29 (Mar.), 31–44. Arnould, L., Boyer, C., & Scornet, E. (2021). Analyzing the tree-layer structure of Deep Forests. ArXiv:2010.15690 [Cs, Math, Stat]. http://arxiv.org/abs/2010.15690 Bengio, Y. (n.d.). Learning Deep Architectures for AI. 56. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50 Bengio, Y., & Delalleau, O. (2011). On the Expressive Power of Deep Architectures. In J. Kivinen, C. Szepesvári, E. Ukkonen, & T. Zeugmann (Eds.), Algorithmic Learning Theory (Vol. 6925, pp. 18–36). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_3 Bengio, Y., Delalleau, O., & Simard, C. (2010). DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS. Computational Intelligence, 26(4), 449–467. https://doi.org/10.1111/j.1467-8640.2010.00366.x Bergmeir, C., & Benitez, J. M. (2012). Neural Network in R using the Stuttgart Neural Network Simulator: RSNNS. https://doi.org/10.18637/jss.v046i07 Bernhard E., B., Isabelle M., G., & Vladimir N., V. (n.d.). A training Algorithm for optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152. https://doi.org/10.1145/130385.130401 Breiman. (2001). Random forest. Machine Learning, 45, 5-32,. https://doi.org/10.023/A:1010933404324 Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655 Breiman, L. (2017). Classification and Regression Trees. https://doi.org/10.1201/9781315139470 Bunn, A., & Korpela, M. (n.d.). An introduction to dplR. 16. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 Comon, P. (1994). Independent component analysis, A new concept? Signal Processing, 36(3), 287–314. https://doi.org/10.1016/0165-1684(94)90029-9 Cover, T. M., & Thomas, J. A. (n.d.). ELEMENTS OF INFORMATION THEORY. 774. DARPA. (2016). Defense Advanced Research Projects Agency. Broad Agency Announcement,Explainable Artificial Intelligence(XAI). DARPA-BAA-16-53. https://www.darpa.mil/attachments/DARPA-BAA-16-53.pdf Dietterich, T. G. (2002). Ensemble Learning. The Handbook of Brain Theory and Neural Network, 2, 110–125. Dua, D., & Graff, C. (2019). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml Erhan, D., Courville, A., Bengio, Y., & Box, P. O. (n.d.). Understanding Representations Learned in Deep Architectures. 26. Freund, Y., & Schapire, R. E. (n.d.). A Short Introduction to Boosting. 14. Fürnkranz, J., Gamberger, D., & Lavrač, N. (2012). Foundations of Rule Learning. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-75197-7 Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut Learning in Deep Neural Networks. Nature Machine Intelligence, 2(11), 665–673. https://doi.org/10.1038/s42256-020-00257-z Hinton, G. E. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647 Kang, Y., Huang, S.-T., & Wu, P.-H. (2021). Detection of Drug–Drug and Drug–Disease Interactions Inducing Acute Kidney Injury Using Deep Rule Forests. SN Computer Science, 2(4), 299. https://doi.org/10.1007/s42979-021-00670-0 Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (n.d.). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 9. Kohavi, R., John, G., Long, R., & Manley, D. (1994). In Proceedings Sixth International Conference on TOols with Aritficial Intelligence. Koitka, S., & Friedrich, C., M. (2016). nmfgpu4R: GPU-Accelerated Computation of the Non-Negative Matrix Factorization (NMF) Using CUDA Capable Hardware. The R Journal, 8(2), 382. https://doi.org/10.32614/RJ-2016-053 Kuhn, M., Steve, W., & Mark, C. (2014). C50: C5.0 decision trees and rule-based models. R Package Version 0.1.5. https://cran.r-project.org/web/packages/C50/C50.pdf Kukreja, S. L., Löfberg, J., & Brenner, M. J. (2006). A LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR (LASSO) FOR NONLINEAR SYSTEM IDENTIFICATION. IFAC Proceedings Volumes, 39(1), 814–819. https://doi.org/10.3182/20060329-3-AU-2901.00128 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539 Li Deng. (2012). The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Processing Magazine, 29(6), 141–142. https://doi.org/10.1109/MSP.2012.2211477 Lipton, Z. C. (n.d.). The mythos of model interpretability. Machine Learning, 28. Miller, G. A. (n.d.). The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for ProcessingInformation. 10. Miller, K., Hettinger, C., Humpherys, J., Jarvis, T., & Kartchner, D. (2017). Forward Thinking: Building Deep Random Forests. ArXiv:1705.07366 [Cs, Stat]. http://arxiv.org/abs/1705.07366 Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007 Molnar, C. (n.d.). Interpretable Machine Learning. 185. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251 Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Radley-Gardner, O., Beale, H., & Zimmermann, R. (Eds.). (2016). Fundamental Texts On European Private Law. Hart Publishing. https://doi.org/10.5040/9781782258674 Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. http://arxiv.org/abs/1602.04938 Rstudio Team. (2015). RStudio: Integrated Developement for R. http://www.rstudio.com/ Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x Samek, W. (2020). Learning with explainable trees. Nature Machine Intelligence, 2(1), 16–17. https://doi.org/10.1038/s42256-019-0142-0 Shlens, J. (2014). A Tutorial on Principal Component Analysis. ArXiv:1404.1100 [Cs, Stat]. http://arxiv.org/abs/1404.1100 Su, G., Wei, D., Varshney, K. R., & Malioutov, D. M. (2016). Interpretable Two-level Boolean Rule Learning for Classification. ArXiv:1606.05798 [Cs, Stat]. http://arxiv.org/abs/1606.05798 Therneau, T. M., Atkinson, E. J., & Foundation, M. (n. d. ). (n.d.). An Introduction to Recursive Partitioning Using the RPART Routines. 60. Trigeorgis, G., Bousmalis, K., Zafeiriou, S., & Schuller, B. W. (2015). A deep matrix factorization method for learning attribute representations. ArXiv:1509.03248 [Cs, Stat]. http://arxiv.org/abs/1509.03248 Wright, M. N., & Zeigler, A. (2015). Ranger: A fast implementation of random forest for high dimensional data in C++ and R. Zhong, G., Wang, L.-N., Ling, X., & Dong, J. (2016). An overview on data representation learning: From traditional feature learning to recent deep learning. The Journal of Finance and Data Science, 2(4), 265–278. https://doi.org/10.1016/j.jfds.2017.05.001 Zhou, Z.-H., & Feng, J. (2020). Deep Forest. ArXiv:1702.08835 [Cs, Stat]. http://arxiv.org/abs/1702.08835 Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0217122-131544.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS