博碩士論文 etd-0804120-103028 詳細資訊

[回到前頁查詢結果 | 重新搜尋]

姓名 許紘齊(Hung-Chi Hsu) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Department of Information Management)
畢業學位 碩士(Master) 畢業時期 109學年第1學期
論文名稱(中) 探索性多模態機器學習模型—以房產鑑價為例
論文名稱(英) Exploration of Multimodal Machine Learning Model - Findings from Real Estate Valuation
  • etd-0804120-103028.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。



    論文語文/頁數 英文/46
    統計 本論文已被瀏覽 5729 次,被下載 61 次
    摘要(中) 機器學習以及深度學習近年來被廣泛應用在各個領域,然而許多模型在追求預 測表現的同時卻犧牲了模型的可解釋性,使得模型像是黑盒子一樣讓人難以理解。在 本文中,我們提出透過多模態模型的概念來使得模型同時擁有預測準確度以及模型的 可解釋性。模型的可解釋性指得是我們能不能去了解模型是如何產生預測結果的,或 者是說模型是根據哪些特徵來產生預測。我們透過房地產鑑價為例子並搭配我們所提 出的多模態模型架構來驗證我們的想法,使得模型在擁有好的預測表現的同時,也具 有一定的解釋能力。而模型的可解釋性我們更進一步透過模型所學到的特徵來做全局 解釋以及透過局部解釋器來做局部解釋。
    摘要(英) Machine learning and deep learning have been woidely used in various fields in recent years. However, many models sacrifice the model interpretability while purchasing the predictive performance, which make the model difficult to understand like a block box. In this article, we propose the concept of multi-modal models to enable the model to have both predictive performance and model interpretability. The interpretability of a model refers to whether we can understand how the model produces the predictions. We take real estate value evaluation task as an example with our propsed method yp verify our ideas, so that the model has a good predictive performance while also having a certain explanatory power. As for the interpretability of the model, we further use the features learned by the model to make a global explanations and a local explainer to make a local explanations.
  • 房產鑑價
  • 模型可解釋性
  • 多模態模型
  • 卷積神經網路
  • 關鍵字(英)
  • Convolutional neural network
  • Real estate value evaluation
  • Multi-modal model
  • Model interpretability
  • 論文目次 論文審定書 i
    誌謝 ii
    摘要 iii
    Abstract iv
    目錄 v
    List of Figures vi
    List of Tables vii
    1. Introduction 1
    2. Background & Related Work 3
    2.1. Explainable AI 3
    2.2. Housing Price Estimation 6
    2.3. Representation Learning 8
    2.4. Multi-modal Model 10
    2.5. LIME explainer 11
    3. Methodology 13
    3.1. Real estate transaction dataset 14
    3.2. Boosting model’s predictive performance by image features 14
    3.3. Interpretability of model 17
    3.3.1. Explaining the models by sets of labels 18
    3.3.2. Explaining the models by LIME explainer 19
    4. Experimental Results 22
    4.1. Experiment environment 22
    4.2. Data pre-processing 23
    4.3. The base real estate value evaluation model 25
    4.4. The effect of images’ embeddings 26
    4.5. Model interpretability 28
    4.5.1. Explain the models by images’ labels 28
    4.5.2. Explain the models by LIME explainer 30
    5. Conclusion 35
    6. Reference 35
    參考文獻 Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2019). Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607
    Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1–127. https://doi.org/10.1561/2200000006
    Bengio, Yoshua, Courville, A., & Vincent, P. (2014). Representation Learning: A Review and New Perspectives. ArXiv:1206.5538 [Cs]. http://arxiv.org/abs/1206.5538
    Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016). Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ArXiv:1412.7062 [Cs]. http://arxiv.org/abs/1412.7062
    Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255. https://doi.org/10.1109/CVPR.2009.5206848
    Detect Labels | Cloud Vision API. (n.d.). Google Cloud. Retrieved August 17, 2020, from https://cloud.google.com/vision/docs/labels?hl=zh-tw
    D’mello, S. K., & Kory, J. (2015). A Review and Meta-Analysis of Multimodal Affect Detection Systems. ACM Computing Surveys, 47(3), 1–36. https://doi.org/10.1145/2682899
    Dubey, A., Naik, N., Parikh, D., Raskar, R., & Hidalgo, C. A. (2016). Deep Learning the City: Quantifying Urban Perception At A Global Scale. ArXiv:1608.01769 [Cs]. http://arxiv.org/abs/1608.01769
    Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
    Fu, X., Jia, T., Zhang, X., Li, S., & Zhang, Y. (2019). Do street-level scene perceptions affect housing prices in Chinese megacities? An analysis using open access datasets and deep learning. PLOS ONE, 14(5), e0217505. https://doi.org/10.1371/journal.pone.0217505
    Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), 1440–1448. https://doi.org/10.1109/ICCV.2015.169
    Goodman, B., & Flaxman, S. (2017). European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation.” AI Magazine, 38(3), 50–57. https://doi.org/10.1609/aimag.v38i3.2741
    He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. ArXiv:1512.03385 [Cs]. http://arxiv.org/abs/1512.03385
    Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597
    Hodosh, M., Young, P., & Hockenmaier, J. (n.d.). Framing Image Description as a Ranking Task Data, Models and Evaluation Metrics Extended Abstract. 5.
    K-means clustering. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=K-means_clustering&oldid=973148926
    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105). Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
    Law, S., Paige, B., & Russell, C. (2019). Take a Look Around: Using Street View and Satellite Images to Estimate House Prices. ACM Transactions on Intelligent Systems and Technology, 10(5), 1–19. https://doi.org/10.1145/3342240
    Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26 (pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
    RBM, wikipedia. (2019). In 維基百科,自由的百科全書. https://zh.wikipedia.org/w/index.php?title=%E5%8F%97%E9%99%90%E7%8E%BB%E5%B0%94%E5%85%B9%E6%9B%BC%E6%9C%BA&oldid=57289227
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. ArXiv:1605.05396 [Cs]. http://arxiv.org/abs/1605.05396
    Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat]. http://arxiv.org/abs/1602.04938
    RStudio | Open source & professional software for data science teams. (n.d.). Retrieved July 15, 2020, from https://rstudio.com/
    Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ArXiv:1708.08296 [Cs, Stat]. http://arxiv.org/abs/1708.08296
    Seresinhe, C. I., Preis, T., & Moat, H. S. (n.d.). Using deep learning to quantify the beauty of outdoor places. Royal Society Open Science, 4(7), 170170. https://doi.org/10.1098/rsos.170170
    Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961
    Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv:1409.1556 [Cs]. http://arxiv.org/abs/1409.1556
    Srivastava, N., & Salakhutdinov, R. (n.d.). Multimodal Learning with Deep Boltzmann Machines. 32.
    Therneau, T. M., Atkinson, E. J., & Foundation, M. (n.d.). An Introduction to Recursive Partitioning Using the RPART Routines. 60.
    Wright, M. N., & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1). https://doi.org/10.18637/jss.v077.i01
    Yuhas, B. P., Goldstein, M. H., & Sejnowski, T. J. (1989). Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, 27(11), 65–71. https://doi.org/10.1109/35.41402
  • 林耕霈 - 召集委員
  • 簡士鎰 - 委員
  • 康藝晃 - 指導教授
  • 口試日期 2020-08-28 繳交日期 2020-09-04

    [回到前頁查詢結果 | 重新搜尋]