博碩士論文 etd-0717121-235800 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 蘇亭方(Ting-Fang SU) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Department of Information Management)
畢業學位 碩士(Master) 畢業時期 109學年第2學期
論文名稱(中) 基於嵌入向量規則森林的可解釋的極度多標籤學習
論文名稱(英) Embedding-based Rule Forests for Interpretable Extreme Multi-label Learning
檔案
  • etd-0717121-235800.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:2 年後公開 (2023-08-17 公開)

    電子論文:使用者自訂權限:校內 2 年後、校外 2 年後公開

    論文語文/頁數 英文/32
    統計 本論文已被瀏覽 67 次,被下載 0 次
    摘要(中) 極度多標籤學習是多標籤學習的延伸,以具有複數標籤的資料訓練分類器,用來為輸入資料預測出最相關的一組標籤子集。由於標籤數量龐大的特性,極度多標籤資料無法以一般的多標籤演算法處理,許多能夠處理極度多標籤資料的演算法也因此被開發出來。然而,儘管這些方法能夠達到優秀的預測準確度,卻無法為預測的結果做出解釋,這類無法提供決策過程的模型被稱為「黑盒子模型」。這些黑盒子模型衍生出許多問題,例如模型可能存在歧視以及使用者不信任模型的預測結果等,人們因此開始注意可解釋模型。雖然研究者開發出許多解釋器與可解釋的演算法,但至今仍未有能夠處理極度多標籤資料並對結果提供原生解釋的演算法,因此,我們結合常用於極度多標籤學習的嵌入方法之自動編碼器與能夠進行規則學習的決策樹方法,提出能夠同時達成學習資料表徵以及萃取出可解釋的規則的演算法——可解釋的極度多標籤森林。
    摘要(英) Extreme multi-label learning is an extension of multi-label learning, which learns a classifier with multiple labels in the same domain to predict the most relevant subsets of labels for new instances. Because of the great number of labels, extreme multi-label data are unable to be handled by general multi-label learning algorithms. Many algorithms designed for extreme multi-label learning are thus developed. However, although these methods can achieve high performance, they are “black box” models, which cannot provide explanations for the corresponding predictions without further interpretation. Several problems are raised as black box models occurred, such as the implicit model discrimination and user-trust issues, and therefore attract public attention to explainable models. Although researchers are devoted to developing explainers and interpretable algorithms, there is still no method to provide inherently explanations for extreme multi-label learning. Consequently, we combine an embedding-based method, autoencoder, and a tree-based method that can learn rules from data to propose an algorithm that can learn representations and generate interpretable rules, interpretable extreme multi-label forest.
    關鍵字(中)
  • 規則學習
  • 隨機森林
  • 模型可解釋性
  • 多標籤分類
  • 極度多標籤分類
  • 關鍵字(英)
  • Rule Learning
  • Random Forest
  • Model Interpretability
  • Multi-label Classification
  • Extreme Multi-label Classification
  • 論文目次 論文審定書 i
    摘要 ii
    Abstract iii
    List of Figures v
    List of Tables vi
    1. Introduction 1
    2. Background and Related Work 2
    2.1 Multi-label Learning 2
    2.2 Extreme Multi-label Learning 4
    2.3 Explainable AI 5
    2.4 Rule Learning 6
    3. Methodology 7
    3.1 Model Structure of Interpretable Extreme Multi-label Forest 7
    3.2 Interpretability of Interpretable Extreme Multi-label Forest 9
    4. Experiment and Discussion 13
    4.1 Experiment Setup 13
    4.2 Performance Evaluation 13
    4.3 Prediction Interpretation 17
    4.4 Discussion 20
    5. Conclusion 21
    6. References 22
    參考文獻 Bengio, Y., & Delalleau, O. (2011). On the expressive power of deep architectures. International Conference on Algorithmic Learning Theory, 18–36.
    Bengio, Y., Delalleau, O., & Simard, C. (2010). Decision trees do not generalize to new variations. Computational Intelligence, 26(4), 449–467.
    Bhatia, K., Dahiya, K., Jain, H., Kar, P., Mittal, A., Prabhu, Y., & Varma, M. (2016). The extreme classification repository: Multi-label datasets and code. http://manikvarma.org/downloads/XC/XMLRepository.html
    Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (2015). Sparse Local Embeddings for Extreme Multi-label Classification. NIPS, 29, 730–738.
    Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
    Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186.
    Carter, C., Renuart, E., Saunders, M., & Wu, C. C. (2006). The credit card market and regulation: In need of repair. HeinOnline, 10, 23–56.
    Deng, J., Berg, A. C., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), European conference on computer vision (pp. 71–84). Springer.
    Denton, E., Weston, J., Paluri, M., Bourdev, L., & Fergus, R. (2015). User conditional hashtag prediction for images. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1731–1740.
    Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. ArXiv Preprint ArXiv:1702.08608.
    Falkowski, B. J. (1999). A note on the polynomial form of Boolean functions and related topics. IEEE Transactions on Computers, 48(8), 860–864.
    Freitas, A. A. (2014). Comprehensible Classification Models – a position paper. Association for Computing Machinery, 15(1), 1–10.
    Fürnkranz, J., Gamberger, D., & Lavrač, N. (2012). Foundations of Rule Learning. Springer Berlin Heidelberg. http://link.springer.com/10.1007/978-3-540-75197-7
    Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665–673.
    Goodman, B., & Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a “right to explanation.” AI Magazine, 38(3), 50–57.
    Jain, H., Prabhu, Y., & Varma, M. (2016). Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 935–944.
    Khandagale, S., Xiao, H., & Babbar, R. (2020). Bonsai: Diverse and shallow trees for extreme multi-label classification. Machine Learning, 109(11), 2099–2119.
    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
    Liu, H., Xu, X., & Li, J. J. (2020). A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models. Statistica Sinica.
    Liu, J., Chang, W.-C., Wu, Y., & Yang, Y. (2017). Deep learning for extreme multi-label text classification. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 115–124.
    Lowry, S., & Macpherson, G. (1988). A blot on the profession. 296(6623), 657–658.
    Miller, G. A. (1956). The magic number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 91–97.
    Mittal, A., Dahiya, K., Agrawal, S., Saini, D., Agarwal, S., Kar, P., & Varma, M. (2021). DECAF: Deep Extreme Classification with Label Features. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 49–57.
    Papenmeier, A., Englebienne, G., & Seifert, C. (2019). How model accuracy and explanation fidelity influence user trust. ArXiv Preprint ArXiv:1907.12652.
    Prabhu, Y., & Varma, M. (2014). Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 263–272.
    Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should i trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144.
    Saini, D., Jain, A. K., Dave, K., Jiao, J., Singh, A., Zhang, R., & Varma, M. (2021). GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification.
    Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.
    Tagami, Y. (2017). AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 455–464.
    The Yelp dataset challenge—Multilabel Classification of Yelp reviews into relevant categories. (2013). https://www.ics.uci.edu/~vpsaini/
    Weston, J., Makadia, A., & Yee, H. (2013). Label partitioning for sublinear ranking. International Conference on Machine Learning, 181–189.
    Wu, Q., Tan, M., Song, H., Chen, J., & Ng, M. K. (2016). ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 28(10), 2665–2680.
    You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., & Zhu, S. (2018). Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification.
    Zhang, M.-L., Li, Y.-K., Liu, X.-Y., & Geng, X. (2018). Binary relevance for multi-label learning: An overview. Frontiers of Computer Science, 12(2), 191–202.
    口試委員
  • 林耕霈 - 召集委員
  • 李珮如 - 委員
  • 康藝晃 - 指導教授
  • 口試日期 2021-07-02 繳交日期 2021-08-17

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫