Responsive image
博碩士論文 etd-0724121-095301 詳細資訊
Title page for etd-0724121-095301
論文名稱
Title
跨語言文字分析之研究
A Study on Text Analysis Across Languages
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
130
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2021-07-26
繳交日期
Date of Submission
2021-08-24
關鍵字
Keywords
跨語言文字資料、跨語言文字分析、跨語言情緒詞彙推導、跨語言主題模型、神經網路
multilingual textual data, cross-lingual text analysis, cross-lingual sentiment lexicon induction, cross-lingual topic model, neural network
統計
Statistics
本論文已被瀏覽 828 次,被下載 8
The thesis/dissertation has been browsed 828 times, has been downloaded 8 times.
中文摘要
隨著網際網路的快速發展,資訊傳播開始不分國界,相同的事件、產品、品牌更容易被不同國家的使用者在網路上被討論,跨語言文字分析技術正是分析如此跨語言文字資料的核心技術。然而,過往的文獻多以成本較高的跨語言資源作為參考資源,例如:平行文本 (parallel corpora)、機器翻譯和跨語言知識庫,這些資源不容易取得,也不適合特定領域文本 (domain-specific corpora)。為此,本論文專注在跨語言情緒分析以及跨語言主題模型兩技術,並提出具有高準確度且使用低成本跨語言資源的方法。因此,我們提出跨語言文字空間,其僅需少量跨語言監督資源的特性,使我們將其視為低成本的跨語言資源,並以其為輸入參考提出兩個具體方法:多步驟式雙語情緒詞彙推導 (MS-BSLI) 和以中心點為基礎的跨語言主題模型 (Cb-CLTM)。當中,MS-BSLI 目的在於將語意資源由支配語言 (dominant language, e.g., English) 透過跨語言文字空間推廣傳播至弱資源的語言上,進而調整空間以利產生品質更高的雙語情緒詞彙。Cb-CLTM 則是利用跨語言文字空間來擴展隱含狄利克雷分佈,使其能由跨語言文本資料中辨識出潛在的跨語言討論主題。最後,基於神經網路技術的快速發展,我們進一步的探討以神經網路搭建的跨語言主題模型,探討的方向具體分為兩項:(1) 提出兩個以神經網路為基礎的跨語言主題模型 xETM 和 cProdLDA (2) 與既有神經網路跨語言主題模型 ZeroShotTM比較,並衡量 xETM, cProdLDA 在擷取跨語言主題的表現。
Abstract
The rapid development of the Internet facilitates the dissemination of information worldwide. People from different countries express opinions on the same entity, event, and product, which triggers the demand for analyzing texts across languages. In previous studies, analyzing such multilingual textual data requires expensive interlingual resources, such as parallel corpora, machine translators, and knowledge bases, to link the extracted information across languages. In this dissertation, we address the sentiment analysis and topic modeling in cross-lingual context, aiming to achieve high accuracy while requiring less resources. Specifically, the dissertation proposes two resource-light methods: multistep bilingual sentiment lexicon induction (MS-BSLI) and center-based cross-lingual topic model (Cb-CLTM). Both methods rely on cross-lingual word embedding for bridging the languages and minimizing the need for interlingual resources. MS-BSLI aims to propagate the dominant language’s (i.e., English) lexical resources to another resource-less language for generating a better bilingual sentiment lexicon. Cb-CLTM extends the generative process of Latent Dirichlet Allocation (LDA) using the cross-lingual word embedding for identifying the common hidden topics from the multilingual corpus. We then further investigate other cross-lingual topic models that are implemented using neural network (NN), due to the rising trend of NN. The investigations include: (1) proposing two NN-based topic models: xETM and cProdLDA, and (2) comparing the performances between three NN-based cross-lingual topic models, including xETM, ZeroShotTM, and cProdLDA.
目次 Table of Contents
論文審定書 i
致謝 ii
中文摘要 iii
英文摘要 iv
目錄 v
圖次 vii
表次 viii
Chapter 1 Introduction 1
DECLARATION 4
Chapter 2 The Development of Cross-lingual Word Embedding 5
2.1 CONSTRUCTION OF MONOLINGUAL WORD EMBEDDING 5
2.2 METHODS FOR CROSS-LINGUAL WORD EMBEDDING ALIGNMENT 7
2.3 SUMMARY 10
Chapter 3 A Multistep Approach for Cross-lingual Sentiment Lexicon Construction 11
3.1 RELATED WORK 14
3.1.1 Sentiment Analysis for Online Reviews 14
3.1.2 Bilingual Sentiment Lexicon Induction 15
3.2 THE MULTISTEP APPROACH 18
3.2.1 Step 1: Generate a Monolingual Word Vector Space 20
3.2.2 Step 2: Determine the Language Transformation 20
3.2.3 Step 3: Produce a Specialized Word Vector Space Using Lexical Resources 21
3.2.4 Step 4: Postmap the Word Vector Space for Unseen Words 23
3.2.5 Margin-Based Similarity Search Method 24
3.3 EVALUATION 25
3.3.1 Experimental Setups 25
3.3.2 Experiment 1: Comparison Between Existing Methods and Lexicons 29
3.3.3 Experiment 2: Comparison Between Variants of MS-BSLI 35
3.3.4 Experiment 3: Sensitivity Analysis of the MSS 39
3.4 SUMMARY 41
Chapter 4 A Word Embedding-based Approach to Cross-lingual Topic Model 44
4.1 RELATED WORKS 46
4.1.1 Cross-lingual LDA 46
4.1.2 Continuous LDA 48
4.2 OUR APPROACH 50
4.2.1 Background 50
4.2.2 Preparing the Cross-lingual Word Embedding 51
4.2.3 Center-Based Cross-lingual Topic Model 53
4.3 EXPERIMENTAL RESULTS 60
4.3.1 Description of Datasets 60
4.3.2 Performance Metrics 62
4.3.3 Parameter Settings 65
4.3.4 Coherence Performance 68
4.3.5 Diversity Performance 72
4.3.6 Performance in Cross-lingual Document Representation 74
4.3.7 Qualitative Analysis 77
4.4 SUMMARY 81
Chapter 5 Neural Network Based Cross-lingual Topic Models 83
5.1 BACKGROUND 85
5.1.1 Variational Auto-Encoder 85
5.1.2 Auto-Encoding Variational Bayes 86
5.1.3 Extension to Topic Model: ProdLDA 88
5.2 CROSS-LINGUAL NEURAL TOPIC MODELS 90
5.2.1 Extended Embedded Topic Model 90
5.2.2 ZeroShot Topic Model 95
5.2.3 Contextualized ProdLDA 96
5.3 EXPERIMENTS 97
5.3.1 Experiment Settings 97
5.3.2 Experimental Results 100
5.4 SUMMARY 107
Chapter 6 Conclusion 109
6.1 FUTURE WORKS 110
References 112
Appendix A: Table of Notations 119

參考文獻 References
1. Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., & Zhu, M. (2013). A practical algorithm for topic modeling with provable guarantees. International Conference on Machine Learning, 280–288.
2. Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6(Sep), 1345–1382.
3. Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. Context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 238–247.
4. Batmanghelich, K., Saeedi, A., Narasimhan, K., & Gershman, S. (2016). Nonparametric spherical topic modeling with word embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Volume 2: Short Papers, 537.
5. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 1137–1155.
6. Bianchi, F., Terragni, S., Hovy, D., Nozza, D., & Fersini, E. (2021). Cross-lingual Contextualized Topic Models with Zero-shot Learning. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 1676–1683. https://www.aclweb.org/anthology/2021.eacl-main.143
7. Bischof, J., & Airoldi, E. M. (2012). Summarizing topical content with word frequency and exclusivity. Proceedings of the 29th International Conference on Machine Learning (ICML-12), 201–208.
8. Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859–877. https://doi.org/10.1080/01621459.2017.1285773
9. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3(Jan), 993–1022.
10. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
11. Bond, F., & Foster, R. (2013). Linking and extending an open multilingual wordnet. ACL, 1352–1362.
12. Boyd-Graber, J., & Blei, D. M. (2009). Multilingual topic models for unaligned text. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 75–82.
13. Chang, C.-H., & Hwang, S.-Y. (2021). A word embedding-based approach to cross-lingual topic modeling. Knowledge and Information Systems. https://doi.org/10.1007/s10115-021-01555-7
14. Chang, C.-H., Hwang, S.-Y., & Wu, M.-L. (2021). Learning bilingual sentiment lexicon for online reviews. Electronic Commerce Research and Applications, 47, 101037. https://doi.org/10.1016/j.elerap.2021.101037
15. Chang, C.-H., Hwang, S.-Y., & Xui, T.-H. (2018). Incorporating Word Embedding into Cross-Lingual Topic Modeling. 2018 IEEE International Congress on Big Data (BigData Congress), 17–24. https://doi.org/10.1109/BigDataCongress.2018.00010
16. Chang, C.-H., Wu, M.-L., & Hwang, S.-Y. (2019). An Approach to Cross-Lingual Sentiment Lexicon Construction. 2019 IEEE International Congress on Big Data (BigDataCongress), 129–131. https://doi.org/10.1109/BigDataCongress.2019.00030
17. Cheng, M., & Jin, X. (2019). What do Airbnb users care about? An analysis of online review comments. Int. J. Hosp. Manage., 76, 58–70.
18. Das, R., Zaheer, M., & Dyer, C. (2015). Gaussian LDA for Topic Models with Word Embeddings. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 795–804. https://doi.org/10.3115/v1/P15-1077
19. Deng, S., Sinha, A. P., & Zhao, H. (2017). Adapting sentiment lexicons to domain-specific social media texts. Decision Support Systems, 94, 65–76.
20. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Volume 1 (Long and Short Papers), 4171–4186.
21. Dieng, A. B., Ruiz, F. J. R., & Blei, D. M. (2020). Topic Modeling in Embedding Spaces. Transactions of the Association for Computational Linguistics, 8, 439–453. https://doi.org/10.1162/tacl_a_00325
22. Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.
23. Dinu, G., & Baroni, M. (2013). Dissect-distributional semantics composition toolkit. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 31–36.
24. Fan, Z.-P., Che, Y.-J., & Chen, Z.-Y. (2017). Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis. Journal of Business Research, 74, 90–100.
25. Fang, X., & Zhan, J. (2015). Sentiment analysis using product review data. Journal of Big Data, 2(1), 5.
26. Faruqui, M., & Dyer, C. (2014). Improving vector space word representations using multilingual correlation. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 462–471.
27. Fast, E., Chen, B., & Bernstein, M. S. (2016). Empath: Understanding topic signals in large-scale text. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 4647–4657.
28. Fujinuma, Y., Boyd-Graber, J., & Paul, M. J. (2019). A resource-free evaluation metric for cross-lingual word embeddings based on graph modularity. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4952–4962.
29. Gao, D., Wei, F., Li, W., Liu, X., & Zhou, M. (2015). Cross-lingual sentiment lexicon learning with bilingual word graph label propagation. Computational Linguistics, 41(1), 21–40.
30. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.
31. Hamilton, W. L., Clark, K., Leskovec, J., & Jurafsky, D. (2016). Inducing domain-specific sentiment lexicons from unlabeled corpora. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, 2016, 595.
32. Hao, S., Boyd-Graber, J. L., & Paul, M. J. (2018). Lessons from the bible on modern topics: Adapting topic model evaluation to multilingual and low-resource settings. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, 1–6.
33. Hao, S., & Paul, M. J. (2020). An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models. Computational Linguistics, 46(1), 95–134. https://doi.org/10.1162/coli_a_00369
34. Harris, Z. S. (1954). Distributional structure. Word & World, 10(2–3), 146–162.
35. Hassan, A., Abu-Jbara, A., Jha, R., & Radev, D. (2011). Identifying the semantic orientation of foreign words. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2, 592–597.
36. Heyman, G., Vulić, I., & Moens, M.-F. (2016). C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content. Data Mining and Knowledge Discovery, 30(5), 1299–1323. https://doi.org/10.1007/s10618-015-0442-x
37. Hogenboom, A., Heerschop, B., Frasincar, F., Kaymak, U., & de Jong, F. (2014). Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decision Support Systems, 62, 43–53.
38. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168–177.
39. Hu, Y., Zhai, K., Eidelman, V., & Boyd-Graber, J. (2014). Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1166–1176. https://doi.org/10.3115/v1/P14-1110
40. Huang, C.-L., Chung, C. K., Hui, N., Lin, Y.-C., Seih, Y.-T., Lam, B. C., Chen, W.-C., Bond, M. H., & Pennebaker, J. W. (2012). The development of the Chinese linguistic inquiry and word count dictionary. Chinese Journal of Psychology.
41. Jagarlamudi, J., & Daumé, H. (2010). Extracting Multilingual Topics from Unaligned Comparable Corpora. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, & K. van Rijsbergen (Eds.), Advances in Information Retrieval (Vol. 5993, pp. 444–456). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_39
42. Jiang, D., Tong, Y., & Song, Y. (2016). Cross-lingual topic discovery from multilingual search engine query log. ACM Transactions on Information Systems (TOIS), 35(2), 9.
43. Khoo, C. S. G., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44(4), 491–511.
44. Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. ArXiv:1312.6114 [Cs, Stat]. http://arxiv.org/abs/1312.6114
45. Lample, G., Conneau, A., Ranzato, M., Denoyer, L., & Jégou, H. (2018). Word translation without parallel data. ICLR.
46. Lau, J. H., Newman, D., & Baldwin, T. (2014). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530–539.
47. Lazaridou, A., Dinu, G., & Baroni, M. (2015). Hubness and pollution: Delving into Cross-Space mapping for Zero-Shot learning. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 270–280.
48. Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (n.d.). RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5, 361–397.
49. Liu, Y., Bi, J.-W., & Fan, Z.-P. (2017). Ranking products through online reviews: A method based on sentiment analysis technique and intuitionistic fuzzy set theory. Inf. Fusion, 36, 149–161.
50. Lopez, R., Boyeau, P., Yosef, N., Jordan, M. I., & Regier, J. (2020). Decision-Making with Auto-Encoding Variational Bayes. ArXiv:2002.07217 [Cs, Stat]. http://arxiv.org/abs/2002.07217
51. Lucas, J., Tucker, G., Grosse, R., & Norouzi, M. (2019). Don’t Blame the ELBO! A Linear VAE Perspective on Posterior Collapse. ArXiv:1911.02469 [Cs, Stat]. http://arxiv.org/abs/1911.02469
52. Ma, T., & Nasukawa, T. (2016). Inverted bilingual topic models for lexicon extraction from non-parallel data. ArXiv Preprint ArXiv:1612.07215.
53. MacKay, D. J. C. (1998). Choice of Basis for Laplace Approximation. Machine Learning, 33(1), 77–86. https://doi.org/10.1023/A:1007558615313
54. Manaman, H. S., Jamali, S., & AleAhmad, A. (2016). Online reputation measurement of companies based on user-generated content in online social networks. Computers in Human Behavior, 54, 94–100.
55. McAuley, J. J., & Leskovec, J. (2013). From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. WWW, 897–908.
56. Mihalcea, R., Banea, C., & Wiebe, J. (2007). Learning multilingual subjective language via cross-lingual projections. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 976–983.
57. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. ArXiv Preprint ArXiv:1301.3781.
58. Mikolov, T., Le, Q. V., & Sutskever, I. (2013). Exploiting similarities among languages for machine translation. ArXiv Preprint ArXiv:1309.4168.
59. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 3111–3119.
60. Miller, G. A. (1998). WordNet: An electronic lexical database. MIT press.
61. Mimno, D., Wallach, H. M., Naradowsky, J., Smith, D. A., & McCallum, A. (2009). Polylingual topic models. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 2 - EMNLP ’09, 2, 880. https://doi.org/10.3115/1699571.1699627
62. Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436–465.
63. Mrksic, N., Séaghdha, D. Ó., Thomson, B., Gasic, M., Rojas-Barahona, L. M., Su, P.-H., Vandyke, D., Wen, T.-H., & Young, S. J. (2016). Counter-fitting word vectors to linguistic constraints. NAACL-HLT, 142–148.
64. Mrksic, N., Vulic, I., Séaghdha, D. Ó., Leviant, I., Reichart, R., Gasic, M., Korhonen, A., & Young, S. J. (2017). Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. TACL, 5, 309–324.
65. Nagamma, P., Pruthvi, H. R., Nisha, K. K., & Shwetha, N. H. (2015). An improved sentiment analysis of online movie reviews based on clustering for box-office prediction. International Conference on Computing, Communication Automation, 933–937.
66. Nguyen, D. Q., Billingsley, R., Du, L., & Johnson, M. (2015). Improving Topic Models with Latent Feature Word Representations. Transactions of the Association for Computational Linguistics, 3, 299–313. https://doi.org/10.1162/tacl_a_00140
67. Ni, X., Sun, J.-T., Hu, J., & Chen, Z. (2009). Mining multilingual topics from wikipedia. Proceedings of the 18th International Conference on World Wide Web, 1155–1156.
68. Oliveira, N., Cortez, P., & Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62–73.
69. Ono, M., Miwa, M., & Sasaki, Y. (2015). Word embedding-based antonym detection using thesauri and distributional information. NAACL-HLT, 984–989.
70. Patra, B., Moniz, J. R. A., Garg, S., Gormley, M. R., & Neubig, G. (2019). Bilingual lexicon induction with semi-supervision in non-isometric embedding spaces. ACL, 184–193.
71. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543.
72. Pires, T., Schlinger, E., & Garrette, D. (2019). How multilingual is multilingual BERT? ArXiv Preprint ArXiv:1906.01502.
73. Qi, Y., Sachan, D. S., Felix, M., Padmanabhan, S., & Neubig, G. (2018). When and why are pre-trained word embeddings useful for neural machine translation? NAACL-HLT, 529–535.
74. Qiu, G., Liu, B., Bu, J., & Chen, C. (2009). Expanding domain sentiment lexicon through double propagation. Twenty-First International Joint Conference on Artificial Intelligence, 1199–1204.
75. Rambocas, M., & Pacheco, B. G. (2018). Online sentiment analysis in marketing research: A review. Journal of Research in Interactive Marketing.
76. Reisinger, J., Waters, A., Silverthorn, B., & Mooney, R. J. (2010). Spherical topic models. Proceedings of the 27th International Conference on Machine Learning (ICML-10), 903–910.
77. Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. Proceedings of the 31st International Conference on Machine Learning, 32, 1278–1286.
78. Ruder, S., Vulić, I., & Søgaard, A. (2019). A survey of cross-lingual word embedding models. Journal of Artificial Intelligence Research, 65, 569–631.
79. Schwenk, H., & Li, X. (2018, May 7). A Corpus for Multilingual Document Classification in Eight Languages. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). LREC 2018.
80. Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70.
81. Smith, S. L., Turban, D. H., Hamblin, S., & Hammerla, N. Y. (2017). Offline bilingual word vectors, orthogonal transformations and the inverted softmax. ArXiv Preprint ArXiv:1702.03859.
82. Sra, S. (2012). A short note on parameter approximation for von Mises-Fisher distributions: And a fast implementation of I s (x). Computational Statistics, 27(1), 177–190.
83. Srivastava, A., & Sutton, C. (2017). Autoencoding Variational Inference For Topic Models. ArXiv:1703.01488 [Stat]. http://arxiv.org/abs/1703.01488
84. Stajner, T., & Mladenic, D. (2019). Cross-lingual document similarity estimation and dictionary generation with comparable corpora. Knowledge and Information Systems, 58(3), 729–743.
85. Steinberger, J., Ebrahim, M., Ehrmann, M., Hurriyetoglu, A., Kabadjov, M., Lenkova, P., Steinberger, R., Tanev, H., Vázquez, S., & Zavarella, V. (2012). Creating sentiment dictionaries via triangulation. Decision Support Systems, 53(4), 689–694.
86. Taboada, M., Brooke, J., Tofiloski, M., Voll, K. D., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307.
87. Tamura, A., & Sumita, E. (2016). Bilingual segmented topic model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1266–1276.
88. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.
89. Tian, L., Wong, D. F., Chao, L. S., Quaresma, P., Oliveira, F., & Yi, L. (2014). UM-Corpus: A large english-chinese parallel corpus for statistical machine translation. LREC, 1837–1842.
90. Vulić, I., De Smet, W., & Moens, M.-F. (2013). Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora. Information Retrieval, 16(3), 331–368.
91. Vulić, I., Glavaš, G., Mrkšić, N., & Korhonen, A. (2018). Post-specialisation: Retrofitting vectors of words unseen in lexical resources. NAACL-HLT, 516–527.
92. Wu, F., Huang, Y., Song, Y., & Liu, S. (2016). Towards building a high-quality microblog-specific Chinese sentiment lexicon. Decision Support Systems, 87, 39–49.
93. Xing, C., Wang, D., Liu, C., & Lin, Y. (2015). Normalized word embedding and orthogonal transform for bilingual word translation. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1006–1011.
94. Xing, F. Z., Pallucchini, F., & Cambria, E. (2019). Cognitive-inspired domain adaptation of sentiment lexicons. Inf. Process. Manag., 56(3), 554–564.
95. Yang, W., Boyd-Graber, J., & Resnik, P. (2019). A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 1243–1248. https://doi.org/10.18653/v1/D19-1120
96. Yang, Y., Cer, D., Ahmad, A., Guo, M., Law, J., Constant, N., Abrego, G. H., Yuan, S., Tar, C., Sung, Y.-H., Strope, B., & Kurzweil, R. (2019). Multilingual Universal Sentence Encoder for Semantic Retrieval. ArXiv:1907.04307 [Cs]. http://arxiv.org/abs/1907.04307
97. Yuan, M., Van Durme, B., & Ying, J. L. (2018). Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 31). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/28b9f8aa9f07db88404721af4a5b6c11-Paper.pdf
98. Zhong, S., & Ghosh, J. (2005). Generative model-based document clustering: A comparative study. Knowledge and Information Systems, 8(3), 374–384.
99. Zhou, G., Zhu, Z., He, T., & Hu, X. T. (2016). Cross-lingual sentiment classification with stacked autoencoders. Knowledge and Information Systems, 47(1), 27–44.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code