論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
基於機率論之可解釋性特徵擷取人工智慧 Explainable Feature Extraction Artificial Intelligence Based on Probability Theory |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
147 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2020-07-30 |
繳交日期 Date of Submission |
2020-08-18 |
關鍵字 Keywords |
可解釋性人工智慧、醫療影像、卷積神經網絡、高斯混成模型、前饋神經網絡、淺層機器學習、算法透明性 Medical Imageology, Feedforward Neural Network, eXplainable Artificial Intelligence, Shallow Machine Learning Model, Gaussian Mixture Model, Convolution Neural Network |
||
統計 Statistics |
本論文已被瀏覽 5758 次,被下載 42 次 The thesis/dissertation has been browsed 5758 times, has been downloaded 42 times. |
中文摘要 |
深度學習的活躍帶動學術界的熱潮,帶領AI的發展前往下一個紀元,深度學習有架構上的彈性,性能上的優越,然而,深度學習難以交代AI如何學習與學習的內容,在此時空背景下可解釋性人工智慧(XAI)應運而生。本論文將致力於XAI的開發,並針對熱門的卷積神經網絡做破解,其中卷積神經網路最強大的功能在於其特徵提取的能力,本論文將利用機率論為基石拓展另一套體系提取特徵,有別於Saab轉換基於線性代數,本論文方法更符合人類直覺,並透過理論證明、程式實踐,統合結果應用在醫療領域中。 本論文提出可解釋性高斯核(XGK),利用高斯混成模型表達區域特徵,透過機率論證明其收斂條件,並設計算法自XGK淬煉出具可解釋性的特徵提取核(XGCF),而後利用其將影像映射到決定性的特徵空間,進而解決分類問題。並透過實驗的設計比較XGCF與CNN、Saab轉換的特徵提取能力,最後證實XGCF在少量資料上獲得優勢。又由於XGCF具有透算法透明性,讓超參數制定可透過切比雪夫不等式獲得界限,非大海撈針。XGK除了能將影像映射到決定性特徵空間,還行將影像映射到隨機性的特徵空間,提取更多的訊息,並整合不同機率量度結果分析理論與實作上的誤差,並做修正與改進,最後將兩種映射方式做整合。 XGK理論之完善,更將之安心地應用在細胞有絲分裂的辨別上,透過XGK與SHAP的整合,領域專家除了能參與機器學習與且能透過AI回饋紋理給領域專家,更在F1-score上贏過許多龐大的架構如RAM、VGG19、LeNET-5。 |
Abstract |
The bloom of deep learning has driven AI to the new era and the upsurge in academia. Although deep learning has a flexible architecture and superior performance, it is difficult to explain how AI to learn and what AI has learned. Thus, XAI (eXplainable Artificial Intelligence) comes into being. This thesis focuses on developing XAI to replace popular Convolution Neural Networks. The most powerful function of convolutional neural networks is its ability to extract features. This thesis introduces an interpretable feature extraction kernel based on probability theory and operated by a feedforward neural network. This thesis proposes the eXplainable Gaussian Kernel (XGK), which uses a Gaussian mixture model to express the characteristics of regions, and proves the conditions of its convergence through probability theory. Furthermore, an algorithm is developed to obtain XGCF (eXplainable Gaussian Convolution Filter) from XGK in this thesis. XGCF is used to replace the convolution kernel in CNN. Through various experiments to compare the feature extraction capabilities of XGCF, CNN, and Saab transform, the experiemental results show that XGCF has advantages in a few amount of data. XGCF has some property that are superior to CNN and Saab transform in algorithm transparency. XGCF can restrict value of hyperparameters in a range by Chebyshev inequality. XGK not only maps an image to deterministic feature maps by XGCF, but also to stochastic feature maps by an original probability distribution, and extracts more information from the image as well. Due to the fact that there are many measurements in probability theory, this thesis also discusses pro and cons by experiments and integrates all feature maps to an ensemble model. The XGK has been applied to the classification of cell mitosis. With the integration of XGK and SHAP, domain experts can not only participate in machine learning but also obtain the texture learned by AI. More surprising is the fact that the proposed shallow machine learning model wins many NN models with huge and complicated architectures, such as RAM, VGG19, LeNET-5, in F1-score measure. |
目次 Table of Contents |
論文審定書 i 中文摘要 iii ABSTRACT iv 目錄 vi 圖目錄 xi 表目錄 xiv 符號目錄 xvi 第1章 緒論 1 1.1 研究動機 1 1.2 文獻回顧 2 1.3 論文架構 3 第2章 研究背景 5 2.1 可解釋性人工智慧(eXplainable Artificial Intelligence) 5 2.1.1 透明模型(Transparent Models) 6 2.1.2 事後可解釋性(Post hoc) 7 2.2 機率論(Probability Theory) 8 2.2.1 高斯混成模型(Gaussian Mixture Model) 9 2.2.2 最大期望算法(Expectation Maximization Algorithm) 10 2.2.3 切比雪夫不等式(Chebyshev Inequality) 12 2.3 夏普利疊加可解釋性(SHapley Additive exPlanations) 12 2.3.1 合作賽局中之夏普利值 13 2.3.2 疊加可解釋性 14 2.4 卷積神經網絡(Convolutional Neural Network) 16 2.4.1 卷積層(Convolutional layer) 17 2.4.2 池化層(Pooling layer) 20 2.4.3 全連接層(Fully Connected Layer) 21 2.5 透過前饋網絡解釋卷積神經網絡 24 2.5.1 主成分分析(Principle Component) 25 2.5.2 具有可適性偏移量的子空間近似(Saab) 27 2.5.3 偏移量選擇(Bias selection) 28 第3章 可解釋性高斯核 30 3.1 可解釋性高斯核(eXplainable Gaussian Kernel) 30 3.1.1 單通道可解釋性高斯核 31 3.1.2 兩種多通道可解釋性高斯核比較 35 3.1.3 生成XGK算法 40 3.2 機率分布誤差測試實驗 41 3.2.1 機率誤差分布測試資料集 41 3.2.2 實驗設計及參數 42 第4章 特徵空間映射 47 4.1 決定性空間特徵表達 47 4.1.1 可解釋性高斯卷積濾波器 47 4.1.2 標準化處理 51 4.1.3 決定性空間特徵表達算法 53 4.2 隨機性空間特徵表達 53 4.2.1 各種隨機的空間特徵表達 54 4.2.2 隨機性空間特徵表達算法 57 4.3 實驗環境 58 4.3.1 測試資料集介紹 58 4.3.2 環境版本與模組板本 61 4.3.3 模組參數介紹 61 4.4 決定性特徵空間探討實驗 63 4.4.1 特徵空間映射統整 63 4.4.2 決定性特徵空間標準化測試 65 4.4.3 全通道與逐通道性能比較測試 70 4.4.4 決定性特徵空間探討實驗小結 73 4.5 隨機性特徵空間探討實驗 73 4.5.1 不同機率映射探討 74 4.5.2 標準化隨機性特徵空間測試 76 4.5.3 隨機性特徵空間探討實驗小結 80 第5章 特徵空間集成 81 5.1 各種空間特徵表達關係 81 5.2 特徵空間結果集成實驗 83 5.2.1 隨機性特徵空間與決定性特徵空間比較 83 5.2.2 集成方法 84 5.2.3 集成方法測試 84 第6章 延伸探討 87 6.1 核的性能比較 87 6.1.1 XGCF與CNN比較 87 6.1.2 XGCF與saab轉換比較 92 6.1.3 XGCF、saab轉換、CNN統整比較 95 6.2 醫學影像應用與比較 96 6.2.1 夏普利值特徵回饋模型 97 6.2.2 實驗設計與參數 98 6.2.3 實驗結果與性能衡量 99 6.2.4 XGCF、RAM、VGG-19、LeNET-5比較 103 第7章 結論 106 7.1 論文貢獻 106 7.2 未來展望 107 參考文獻 108 附錄 112 |
參考文獻 References |
[1] A. J. B.-B. Hleg, "Ethics guidelines for trustworthy AI," 2019. [2] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th Symposium on Operating Systems Design and Implementation, 2016, pp. 265-283. [3] Y. LeCun, L. Bottou, Y. Bengio, and P. J. P. o. t. I. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [4] K. Fukushima and S. Miyake, "Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition," in Competition and Cooperation in Neural Nets: Springer, 1982, pp. 267-285. [5] K. Simonyan and A. J. a. p. a. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv, 2014. [6] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [7] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9. [8] N. O’Mahony et al., "Deep learning vs. traditional computer vision," in Science and Information Conference, 2019, pp. 128-144: Springer. [9] C.-C. J. Kuo, M. Zhang, S. Li, J. Duan, Y. J. J. o. V. C. Chen, and I. Representation, "Interpretable convolutional neural networks via feedforward design," Journal of Visual Communication, vol. 60, pp. 346-359, 2019. [10] C.-C. J. Kuo, Y. J. J. o. V. C. Chen, and I. Representation, "On data-driven saak transform," Journal of Visual Communication, vol. 50, pp. 237-246, 2018. [11] Q. Zhang, Y. Nian Wu, and S.-C. Zhu, "Interpretable convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8827-8836. [12] Y. Wang, H. Su, B. Zhang, and X. Hu, "Interpret neural networks by identifying critical data routing paths," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8906-8914. [13] M. T. Ribeiro, S. Singh, and C. Guestrin, "" Why should i trust you?" Explaining the predictions of any classifier," in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135-1144. [14] S. M. Lundberg and S.-I. Lee, "A unified approach to interpreting model predictions," in Advances in neural information processing systems, 2017, pp. 4765-4774. [15] A. B. Arrieta et al., "Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI," Information Fusion, vol. 58, pp. 82-115, 2020. [16] D. P. Bertsekas and J. N. Tsitsiklis, Introduction to Probability. Athena Scientific, 2002. [17] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006. [18] J. L. Snell, Introduction to Probability. Random House New York, 1988. [19] A. P. Dempster, N. M. Laird, and D. B. J. J. o. t. R. S. S. S. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society: Series B, vol. 39, no. 1, pp. 1-22, 1977. [20] S. Tadelis, Game Theory: an Introduction. Princeton University Press, 2013. [21] S. Lipovetsky, M. J. A. S. M. i. B. Conklin, and Industry, "Analysis of regression in game theory approach," Applied Stochastic Models in Business, vol. 17, no. 4, pp. 319-330, 2001. [22] M. T. Hagan, H. B. Demuth, M. H. Beale, and O. De Jesús, Neural Network Design. Martin Hagan, 2014. [23] K. J. N. n. Fukushima, "Neocognitron: A hierarchical neural network capable of visual pattern recognition," Neural Networks, vol. 1, no. 2, pp. 119-130, 1988. [24] A. J. J. o. O. S. S. LeNail, "Nn-svg: Publication-ready neural network architecture schematics," Journal of Open Source Software, vol. 4, no. 33, p. 747, 2019. [25] R. Lienhart and J. Maydt, "An extended set of haar-like features for rapid object detection," in Proceedings. international conference on image processing, 2002, vol. 1, pp. I-I: IEEE. [26] C. Liu and F. Sun, "HMAX model: A survey," in 2015 International Joint Conference on Neural Networks (IJCNN), 2015, pp. 1-7: IEEE. [27] M. Riesenhuber and T. J. N. n. Poggio, "Hierarchical models of object recognition in cortex," Nature Neuroscience, vol. 2, no. 11, pp. 1019-1025, 1999. [28] Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce, "Learning mid-level features for recognition," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2559-2566: IEEE. [29] M. W. Van Alstyne, "Remaking the PDP Network: A Three Node Solution to the XOR Problem." [30] L. J. a. p. a. Datta, "A Survey on Activation Functions and their relation with Xavier and He Normal Initialization," arXiv, 2020. [31] D. P. Kingma and J. J. a. p. a. Ba, "Adam: A method for stochastic optimization," arXiv, 2014. [32] M. E. Tipping and C. M. J. N. c. Bishop, "Mixtures of probabilistic principal component analyzers," Neural Computation, vol. 11, no. 2, pp. 443-482, 1999. [33] Z. Ghahramani, "Factorial learning and the EM algorithm," in Advances in neural information processing systems, 1995, pp. 617-624. [34] A. Krizhevsky, V. Nair, and G. J. o. h. w. c. t. e. k. c. h. Hinton, "The cifar-10 dataset," vol. 55, 2014. [35] S. Ioffe and C. J. a. p. a. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv, 2015. [36] T. Jayalakshmi, A. J. I. J. o. C. T. Santhakumaran, and Engineering, "Statistical normalization and back propagation for classification," International Journal of Computer Theory, vol. 3, no. 1, pp. 1793-8201, 2011. [37] Z. Ihsan, M. Y. Idris, and A. H. J. L. S. J. Abdullah, "Attribute normalization techniques and performance of intrusion classifiers: A comparative analysis," Life Science Journal, vol. 10, no. 4, pp. 2568-2576, 2013. [38] S. Ali and K. A. Smith-Miles, "Improved support vector machine generalization using normalized input space," in Australasian Joint Conference on Artificial Intelligence, 2006, pp. 362-371: Springer. [39] Y. J. h. y. l. c. e. m. LeCun, "The MNIST database of handwritten digits," 1998. [40] H. Xiao, K. Rasul, and R. J. a. p. a. Vollgraf, "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms," arXiv, 2017. [41] K. Nguyen, "cluttered mnist," ed, 2017. [42] W.-C. Hung, "Breast Cancer Detection by Recurrent Attention Model," p. 94 [43] M. I. A. G. E. (IMAG/e), "Tumor Proliferation Assessment Challenge 2016 " 2016. [44] T. IPAL, Pitié-Salpêtrière Hospital, The Ohio State University, "Mitosis Detection in Breast Cancer Histological Images(ICPR2012 Mitosis dataset)," 2011. [45] K. T. Gribbon and D. G. Bailey, "A novel approach to real-time bilinear interpolation," in Proceedings. DELTA 2004. Second IEEE International Workshop on Electronic Design, Test and Applications, 2004, pp. 126-131: IEEE. [46] B. J. Erickson, P. Korfiatis, Z. Akkus, T. Kline, and K. J. J. o. d. i. Philbrick, "Toolkits and Libraries for Deep Learning," Journal of Digital Imaging, vol. 30, no. 4, pp. 400-405, 2017. [47] F. Pedregosa et al., "Scikit-learn: Machine learning in Python," vol. 12, pp. 2825-2830, 2011. [48] L. Prechelt, "Early stopping-but when?," in Neural Networks: Tricks of the trade: Springer, 1998, pp. 55-69. [49] J. D. Rodriguez, A. Perez, J. A. J. I. t. o. p. a. Lozano, and m. intelligence, "Sensitivity analysis of k-fold cross validation in prediction error estimation," IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 32, no. 3, pp. 569-575, 2009. [50] A. Davies and Z. J. a. p. a. Ghahramani, "The random forest kernel and creating other kernels for big data from random partitions," arXiv, 2014. [51] E. J. I. T. o. I. T. Scornet, "Random forests and kernel methods," IEEE Transactions on Information Theory, vol. 62, no. 3, pp. 1485-1500, 2016. [52] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. J. a. p. a. Riedmiller, "Striving for simplicity: The all convolutional net," arXiv, 2014. [53] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. J. I. p. l. Warmuth, "Occam's razor," Information Processing Letters, vol. 24, no. 6, pp. 377-380, 1987. [54] V. Mnih, N. Heess, and A. Graves, "Recurrent models of visual attention," in Advances in neural information processing systems, 2014, pp. 2204-2212. [55] F. Ji and W. P. J. I. T. o. S. P. Tay, "A Hilbert Space Theory of Generalized Graph Signal Processing," IEEE Transactions on Signal Processing, vol. 67, no. 24, pp. 6188-6203, 2019. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |