博碩士論文 etd-0118121-163425 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 古展東(Chan-Tung Ku) 電子郵件信箱 E-mail 資料不公開
畢業系所 資訊管理學系研究所(Department of Information Management)
畢業學位 碩士(Master) 畢業時期 109學年第1學期
論文名稱(中) 基於在線式深度非負變分自編碼的主題演進探索
論文名稱(英) Topic Diffusion Discovery based on Online Deep Non-negative Variational Autoencoder
檔案
  • etd-0118121-163425.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    紙本論文:立即公開

    電子論文:校內校外完全公開

    論文語文/頁數 中文/55
    統計 本論文已被瀏覽 90 次,被下載 53 次
    摘要(中) 現今資訊科技已改變人們的生活習慣,電腦及手持式行動裝置的普及讓我們可以隨時憑藉網路傳遞汲取大量的資訊,然而這類行為的改變,也意味著人們每天必須消化網路上難以負荷的龐大資料,當然不可能完全瞭解這些資料的內容,仰賴資料分類與搜索關鍵字的方式,可過濾出使用者想要的資料,然面對日益膨脹的資料量,日復一日更新的資料內容,單以人工方式進行資料分群與分類,不僅更為艱難,也無法達成目標,透過機器學習的方法協助進行相關工作也日漸普及。以文本而言,主題模型是著名的分類方式,運用文章的近似分佈或矩陣分解,將大量資料轉換成主題,成熟地幫助分類文章內容產生主題,但現實情況是資料或主題會隨著時間推進出現、更新或消失,如何完整地解釋主題改變的過程,即為本文所要探討的主題模型技巧。
    本篇論文提出深度非負變分自編碼(Deep Non-negative Variational Autoencoder ,DNVAE)演算法,結合在線式模型,用以探索隨時間改變的主題,使用的文本係以機器學習為內容範疇的論文,實驗結果表明,透過我們提出的方法可以快速的找到各個時間點的主題,更透過主題網路圖、熱點圖及計算距離的等方法,進而達到解釋及探討主題演進的目標。
    摘要(英) Today, the storage type of books, newspapers, and magazines has changed from tangible papers to digital documents. A large number of documents are stored digitally, and it is time-consuming to classify documents/texts manually. Consequently, topic modeling techniques are commonly used to deal with this problem. However, topics are changing over time. Therefore, how to properly classify these documents with the diffusion of topics has been an important issue in recent years.
    In this thesis, we propose a topic diffusion discovery approach able to deal with the evolutions/changes of topics. Considering that the inference method for the posterior probability is too complicated, for simplicity, we use a variational autoencoder variant to build the topic model with shared weights at different times, called Deep Non-negative Variational Autoencoder (DNVAE). Our proposed model with multi-layer structure is able to understand the evolution of topics. The generalized Jensen-Shannon divergence is to used to measure the magnitude of topic diffusion. And we present our approach with topic network diagrams to help understand the evolution of topics.
    關鍵字(中)
  • 網路分析
  • 主題演進
  • 主題模型
  • 主題擴散
  • 深度學習
  • 變分自編碼器
  • 關鍵字(英)
  • Network Analysis.
  • Topic Evolution
  • Topic Modeling
  • Topic Diffusion
  • Deep learning
  • Variational Autoencoder
  • 論文目次 論文審定書 i
    誌 謝 ii
    摘 要 iii
    Abstract iv
    圖 次 vii
    表 次 viii
    第一章 緒論 1
    1.1 研究背景 1
    1.2 研究動機 1
    1.3 研究目的 2
    第二章 文獻探討 3
    2.1 主題模型Topic model 3
    2.1.1時間序列主題模型Time series topic model 3
    2.1.2非負矩陣分解Nonnegative Matrix Factorization(NMF) 4
    2.1.3多層主題模型Multi-layer topic model 5
    2.2深度學習Deep Learning 5
    2.3 在線學習Online Learning 7
    第三章 研究方法與步驟 8
    3.1 研究方法 8
    3.1.1 Topic model based on Variational Autoencoder 8
    3.1.2 Online Deep Non-negative Variational Autoencoder(DNVAE) 11
    3.2 評估標準 12
    3.2.1評價詞彙擴散程度 12
    3.2.2主題關聯性的可視化 13
    3.3研究架構 14
    第四章 實驗結果與討論分析 17
    4.1資料整理 17
    4.2 研究流程 18
    4.3 研究過程 18
    4.3.1 Raw data—Predict Topic and term 19
    4.3.2 Visualization of Topic Relationship and Evolution 21
    4.3.3 Term Evolution with DNVAE 23
    4.4 研究分析 25
    第五章 研究結論與建議 28
    5.1 研究結論 28
    第六章 參考文獻 29
    參考文獻 Berthelot, D., Raffel, C., Roy, A., & Goodfellow, I. (2018). Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer. ArXiv:1807.07543 [Cs, Stat]. http://arxiv.org/abs/1807.07543
    Blei, D. M. (2011). Introduction to Probabilistic Topic Models. 16.
    Blei, D. M., Andrew Y. Ng, & Michael I. Jordan. (2003). Latent Dirichlet Allocation. 30.
    Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning - ICML ’06, 113–120. https://doi.org/10.1145/1143844.1143859
    D. Falbel et al. (n.d.). keras: R Interface to “Keras". 2019.
    Doersch, C. (2016). Tutorial on Variational Autoencoders. ArXiv:1606.05908 [Cs, Stat]. http://arxiv.org/abs/1606.05908
    Dubey, A., Hefny, A., Williamson, S., & Xing, E. P. (2012). A non-parametric mixture model for topic modeling over time. ArXiv:1208.4411 [Stat]. http://arxiv.org/abs/1208.4411
    Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
    Greene, D., O’Callaghan, D., & Cunningham, P. (2014). How Many Topics? Stability Analysis for Topic Models. ArXiv:1404.4606 [Cs]. http://arxiv.org/abs/1404.4606
    Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (n.d.). Hierarchical Topic Models and the Nested Chinese Restaurant Process. 8.
    Grosse, I., Bernaola-Galvan, P., Carpena, P., Roman-Roldan, R., Oliver, J., & Stanley, H. E. (n.d.). Analysis of symbolic sequences using the Jensen-Shannon divergence. PHYSICAL REVIEW E, 16.
    H. Wickham. (n.d.). ggplot2: Elegant graphics for data analysis. New York: Springer-Verlag New York, 2016.
    Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
    Hoi, S. C. H., Sahoo, D., Lu, J., & Zhao, P. (2018). Online Learning: A Comprehensive Survey. ArXiv:1802.02871 [Cs]. http://arxiv.org/abs/1802.02871
    Hung,S. (2020). Topic Evolution and Diffusion Discovery based on Online Deep Non-negative Autoencoder.
    K. Karthik Ram, K. Karl Broman, and cre. (n.d.). aRxiv: Interface to the arXiv API. 2019.
    Kang, Y., Cheng, I.-L., Mao, W., Kuo, B., & Lee, P.-J. (2019). Towards Interpretable Deep Extreme Multi-label Learning. ArXiv:1907.01723 [Cs, Stat]. http://arxiv.org/abs/1907.01723
    Kang, Y., Lin, K.-P., & Cheng, I.-L. (2018). Topic Diffusion Discovery based on Sparseness-constrained Non-negative Matrix Factorization. ArXiv:1807.04386 [Cs, Stat]. http://arxiv.org/abs/1807.04386
    Kang, Y., & Zadorozhny, V. (2016). Process Monitoring Using Maximum Sequence Divergence. Knowledge and Information Systems, 48(1), 81–109. https://doi.org/10.1007/s10115-015-0858-z
    Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. ArXiv:1312.6114 [Cs, Stat]. http://arxiv.org/abs/1312.6114
    Landauer, T. K. (Ed.). (2007). Handbook of latent semantic analysis. Erlbaum.
    LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.
    Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. https://doi.org/10.1038/44565
    McCloskey, M., & Cohen, N. J. (n.d.). CATASTROPHIC INTERFERENCE IN CONNECTIONIST NETWORKS: THE SEQUENTIAL LEARNING PROBLEM. 57.
    Ognyanova, K. (n.d.). Network visualization with R. 71.
    Oring, A., Yakhini, Z., & Hel-Or, Y. (2020). Autoencoder Image Interpolation by Shaping the Latent Space. ArXiv:2008.01487 [Cs, Stat]. http://arxiv.org/abs/2008.01487
    Paatero, P., & Tapper, U. (1994). Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111–126. https://doi.org/10.1002/env.3170050203
    Qin, Z., Yu, F., Liu, C., & Chen, X. (2018). How convolutional neural network see the world—A survey of convolutional neural network visualization methods. ArXiv:1804.11191 [Cs]. http://arxiv.org/abs/1804.11191
    Roger, V., Farinas, J., & Pinquier, J. (2020). Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data. ArXiv:2003.04241 [Cs, Eess, Stat]. http://arxiv.org/abs/2003.04241
    Silge, J., & Robinson, D. (2017). Text mining with R: A tidy approach (First edition). O’Reilly.
    Song, H. A., & Lee, S.-Y. (2013). Hierarchical Representation Using NMF. In M. Lee, A. Hirose, Z.-G. Hou, & R. M. Kil (Eds.), Neural Information Processing (Vol. 8226, pp. 466–473). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-42054-2_58
    Srivastava, A., & Sutton, C. (2017). Autoencoding Variational Inference For Topic Models. ArXiv:1703.01488 [Stat]. http://arxiv.org/abs/1703.01488
    Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 952–961.
    The R Core Team. (n.d.). R:A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing,2019.
    Theis, L., Oord, A. van den, & Bethge, M. (2016). A note on the evaluation of generative models. ArXiv:1511.01844 [Cs, Stat]. http://arxiv.org/abs/1511.01844
    Torfi, A., Shirvani, R. A., Keneshloo, Y., Tavaf, N., & Fox, E. A. (2020). Natural Language Processing Advancements By Deep Learning: A Survey. ArXiv:2003.01200 [Cs]. http://arxiv.org/abs/2003.01200
    Tu, D., Chen, L., Lv, M., Shi, H., & Chen, G. (2018). Hierarchical online NMF for detecting and tracking topic hierarchies in a text stream. Pattern Recognition, 76, 203–214. https://doi.org/10.1016/j.patcog.2017.11.002
    Wang, C., Blei, D., & Heckerman, D. (2015). Continuous Time Dynamic Topic Models. ArXiv:1206.3298 [Cs, Stat]. http://arxiv.org/abs/1206.3298
    Wang, W., Gan, Z., Xu, H., Zhang, R., Wang, G., Shen, D., Chen, C., & Carin, L. (2019). Topic-Guided Variational Autoencoders for Text Generation. ArXiv:1903.07137 [Cs]. http://arxiv.org/abs/1903.07137
    Wang, X., & McCallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’06, 424. https://doi.org/10.1145/1150402.1150450
    口試委員
  • 楊惠芳 - 召集委員
  • 李珮如 - 委員
  • 康藝晃 - 指導教授
  • 口試日期 2021-01-28 繳交日期 2021-02-18

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫