博碩士論文 etd-0719120-224032 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 洪紹銘(Shao-Min Hung) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Department of Information Management)
畢業學位 碩士(Master) 畢業時期 108學年第2學期
論文名稱(中) 基於在線式深度非負自編碼的主題演進及分散度探索
論文名稱(英) Topic Evolution and Diffusion Discovery based on Online Deep Non-negative Autoencoder
檔案
  • etd-0719120-224032.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:立即公開

    電子論文:校內校外完全公開

    論文語文/頁數 英文/45
    統計 本論文已被瀏覽 5668 次,被下載 24 次
    摘要(中) 隨著資料的儲存及取得越來越便利,我們可以方便的在網路上閱讀各式各樣的內容,在如此大量的資訊中,要完全了解、閱讀所有的內容是不太可能的,我們往往依賴著分類或搜尋關鍵字的方式找出想要獲得的資訊,也因為這個快速尋找的需求,大部分的網站都會提供關鍵字搜尋及詳細的分類,可是隨著資料的增長,持續依賴人工的方式分門別類想必是一件逐漸困難的事情,透過機器學習的技巧幫助我們分群、分類資料內容將會是趨勢。以文本資料來說,最著名的分類技巧為主題模型,透過求文章的近似分佈或矩陣分解的方式將大量資料轉換成主題,即便主題模型的成熟幫助了我們分類文章內容產生主題,但主題在現實生活中是會隨著時間的改變而出現或消失,如何在主題改變的過程中有完善的解釋,是這篇論文所要探討的主題模型技巧。
    本篇論文提出新穎的主題模型技巧,稱之為深度非負自編碼,並且結合在線式模型,用以探索主題隨著時間的改變,使用的文本內容是機器學習的論文,實驗結果表明,透過我們提出的方法可以快速的找到各個時間點的主題,我們也提出以網路圖、熱點圖及計算距離的方法,透過這些方式達到解釋及探討主題演進的目標。
    摘要(英) The storage type of books, newspapers and magazines has changed from tangible papers to digital documents. This phenomenon indicates that a large number of documents are stored on the Internet. Therefore, it is infeasible for us to review all information to find out what we need from these numerous papers. We need to rely on keywords or well-defined topics to find out our requirements. Unfortunately, these topics change over time in the real world. How to correctly classify these documents has been an increasingly important issue. Our approach aims to improve the problem of the topic model, which considers time. Considering that the inference method for the posterior probability is too complicated, so for simplicity, we use an autoencoder variant to build a topic model with shared weights at different times, called Deep Non-negative Autoencoder (DNAE). This model is a multi-layer structure, the evolution of topics in each layer is also a focus of this paper. Besides, we use generalized Jensen-Shannon divergence to measure the topic diffusion and use network diagrams to observe the evolution of topics.
    關鍵字(中)
  • 網路分析
  • 主題擴散
  • 主題演進
  • 主題模行
  • 自編碼器
  • 深度學習
  • 關鍵字(英)
  • Network Analysis
  • Autoencoder
  • Deep learning
  • Topic Diffusion
  • Topic Modeling
  • Topic Evolution
  • 論文目次 論文審定書 i
    摘要 ii
    ABSTRACT iii
    1. Introduction 1
    2. Background and related work 2
    2.1 Topic model 3
    2.2 Time series topic model 4
    2.3 Multi-layer topic model 6
    2.4 Deep Learning 7
    2.5 Online Learning 8
    3. Methodology 9
    3.1 Topic model based on Autoencoder 11
    3.2 Online Deep Non-negative Autoencoder 13
    3.3 Evaluation of topic diffusion 15
    3.4 Visualization of topic evolution 16
    3.5 Topic Evolution and Diffusion Discovery based on online DNAE 18
    4. Experiment 19
    4.1 Online topic model with DNAE 21
    4.2 Topic evolution and diffusion with DNAE 22
    4.3 Term evolution with DNAE 24
    5. Discussion 27
    6. Conclusion 29
    7. Reference 30
    Appendix A 35
    Appendix B 37
    參考文獻 Baldi, P. (n.d.). Autoencoders, Unsupervised Learning, and Deep Architectures. 14.
    Blei, D. M. (n.d.-a). Introduction to Probabilistic Topic Models. 16.
    Blei, D. M. (n.d.-b). Latent Dirichlet Allocation. 30.
    Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning - ICML ’06, 113–120. https://doi.org/10.1145/1143844.1143859
    Bourlard, H., & Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59(4), 291–294. https://doi.org/10.1007/BF00332918
    Greene, D., & Cross, J. P. (2016). Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. ArXiv:1607.03055 [Cs]. http://arxiv.org/abs/1607.03055
    Greene, D., O’Callaghan, D., & Cunningham, P. (2014). How Many Topics? Stability Analysis for Topic Models. ArXiv:1404.4606 [Cs]. http://arxiv.org/abs/1404.4606
    Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Supplement 1), 5228–5235. https://doi.org/10.1073/pnas.0307752101
    Griffiths, Thomas L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004). Hierarchical Topic Models and the Nested Chinese Restaurant Process. In S. Thrun, L. K. Saul, & B. Schölkopf (Eds.), Advances in Neural Information Processing Systems 16 (pp. 17–24). MIT Press. http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf
    Grosse, I., Bernaola-Galván, P., Carpena, P., Román-Roldán, R., Oliver, J., & Stanley, H. E. (2002). Analysis of symbolic sequences using the Jensen-Shannon divergence. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 65(4 Pt 1), 041905. https://doi.org/10.1103/PhysRevE.65.041905
    Handbook of Latent Semantic Analysis. (2007). Routledge Handbooks Online. https://doi.org/10.4324/9780203936399
    Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647
    Hinton, Geoffrey E, & Zemel, R. S. (1994). Autoencoders, Minimum Description Length and Helmholtz Free Energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6 (pp. 3–10). Morgan-Kaufmann. http://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdf
    Kang, Y., Cheng, I.-L., Mao, W., Kuo, B., & Lee, P.-J. (2019). Towards Interpretable Deep Extreme Multi-label Learning. ArXiv:1907.01723 [Cs, Stat]. http://arxiv.org/abs/1907.01723
    Kang, Y., Lin, K.-P., & Cheng, I.-L. (2018). Topic Diffusion Discovery Based on Sparseness-Constrained Non-Negative Matrix Factorization. 2018 IEEE International Conference on Information Reuse and Integration (IRI), 94–101. https://doi.org/10.1109/IRI.2018.00021
    Kang, Y., & Zadorozhny, V. (2016). Process Monitoring Using Maximum Sequence Divergence. Knowledge and Information Systems, 48(1), 81–109. https://doi.org/10.1007/s10115-015-0858-z
    Lake, J. A. (n.d.). Reconstructing evolutionary trees from DNA and protein sequences: Parallnear distances. 5.
    LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
    Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. https://doi.org/10.1038/44565
    McCloskey, M., & Cohen, N. J. (1989). Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. In G. H. Bower (Ed.), Psychology of Learning and Motivation (Vol. 24, pp. 109–165). Academic Press. https://doi.org/10.1016/S0079-7421(08)60536-8
    Ognyanova, K. (n.d.). Network visualization with R. 66.
    Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111–126. https://doi.org/10.1002/env.3170050203
    Phylogenetic trees | Evolutionary tree (article) | Khan Academy. (n.d.). Retrieved July 2, 2020, from https://www.khanacademy.org/science/high-school-biology/hs-evolution/hs-phylogeny/a/phylogenetic-trees
    Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. O’Reilly Media, Inc.
    Song, H. A., & Lee, S.-Y. (2013). Hierarchical Representation Using NMF. In M. Lee, A. Hirose, Z.-G. Hou, & R. M. Kil (Eds.), Neural Information Processing (Vol. 8226, pp. 466–473). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-42054-2_58
    Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring Topic Coherence over Many Models and Many Topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 952–961. https://www.aclweb.org/anthology/D12-1087
    Tu, D., Chen, L., Lv, M., Shi, H., & Chen, G. (2018). Hierarchical online NMF for detecting and tracking topic hierarchies in a text stream. Pattern Recognition, 76, 203–214. https://doi.org/10.1016/j.patcog.2017.11.002
    Wang, X., & McCallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’06, 424. https://doi.org/10.1145/1150402.1150450
    Ye, F., Chen, C., & Zheng, Z. (2018). Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection. Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM ’18, 1393–1402. https://doi.org/10.1145/3269206.3271697
    Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., & Sun, M. (2019). Graph Neural Networks: A Review of Methods and Applications. ArXiv:1812.08434 [Cs, Stat]. http://arxiv.org/abs/1812.08434
    口試委員
  • 黃三益 - 召集委員
  • 李珮如 - 委員
  • 康藝晃 - 指導教授
  • 口試日期 2020-07-30 繳交日期 2020-08-19

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫