Responsive image
博碩士論文 etd-0601121-143125 詳細資訊
Title page for etd-0601121-143125
論文名稱
Title
食品闢謠查核輔助系統
Detection Support of Food Rumor Veracity
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
59
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2021-06-21
繳交日期
Date of Submission
2021-07-01
關鍵字
Keywords
食品謠言、分類、分群、K-medoid、PAM、詞嵌入、同義詞
food rumor, classification, cluster, K-medoid, PAM, word embedding, synonym
統計
Statistics
本論文已被瀏覽 409 次,被下載 0
The thesis/dissertation has been browsed 409 times, has been downloaded 0 times.
中文摘要
假新聞、謠言一直是全球最重大的問題之一,在台灣,社群媒體也深受
假新聞其害,自從 2014 年台灣食安風暴後,台灣人對於食品健康與安全的重視度也隨之上升。食品假新聞與謠言的數量也隨著人們對於食品安全的恐懼感上升,這些食品謠言不但會影響大眾對於飲食的觀念,嚴重的情況下甚至會導致聽信偏方的患者延誤就醫,造成不可挽回的傷害。然而,假新聞危害如此嚴重,日常中所使用的 line 社群軟體卻還是有各種食品假新聞在傳播; 遺憾的是,在台灣食品藥物管制署 (FDA) 的闢謠資訊更新的速度還遠遠不及假新聞增長的速度。
為了解決此問題,本文提出一個系統輔助架構,利用分類、分群、詞嵌入等機器學習演算法,讓組織端可以藉由使用者查詢系統的資訊,不但能增進澄清謠言的速度,還能淘汰非謠言的查詢,降低組織端人力成本及增加闢謠效率。對於使用者端,倚靠相似度查詢以及適當的前處理,可以解決同義字食品被查詢的問題,此外,使用 K­medoid 分群演算法,降低每次查詢的複雜度,提升使用者查詢的速度。


Abstract
Fake news and rumor has always been one of the most important issues in the world. In Taiwan, social media has also suffered from fake news. Since the food safety crisis in Taiwan in 2014, Taiwanese people have attached greater importance to food health and safety. The number of food fake news and rumor also rises with people’s fear of food safety. These food rumors will not only affect the public’s perception of diet, but in
severe cases can even cause delays in seeking medical treatment for patients who follow the prescription, and ultimately cause irreparable harm. Although fake news is greatly
harmful, The line social software used in daily life is still spreading various food fake news; Regrettably, the rate of update of the anti­-rumor information in Taiwan’s Food and
Drug Administration (FDA) is far slower than the growth rate of fake news.
In order to solve this problem, this article proposes a system­-assisted architecture that uses machine learning algorithms such as classification, clustering, and word embedding, so that the organization can query system information through users; it can not only increase the speed of clarifying rumors, but also eliminate non-­rumor queries. The
inquiry of rumors reduces the labor cost of the organization and increases the efficiency of rumor rejection. For the user side, relying on similarity query and proper pre­processing can solve the problem of synonymous food being queried. In addition, the K-­medoid
grouping algorithm is used to reduce the complexity of each query and improve the query speed of users. In order to solve this problem, this article proposes a system-­assisted architecture that uses machine learning algorithms such as classification, clustering, and word embedding, so that the organization can analyze user queries through the system; it can not
only increase the speed of clarifying rumors, but also eliminate non-­rumor queries. The inquiry of rumors reduces the labor cost of the organization and increases the efficiency of rumor rejection. For the user side, relying on similarity query and proper pre­processing can solve the problem of synonymous food being queried. In addition, the K­-medoid grouping algorithm is used to reduce the complexity of each query and improve the query speed of users.
目次 Table of Contents
論文審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective Of The Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Organization Of The Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Fake news detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Rumor detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Food fake news . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Text-based
Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.1 Randomforest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.2 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.3 XGBoosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.1 K-means. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.2 K-medoids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Relevant word embeddings from Wiki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Corpus Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Classification-based query matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Cluster Analysis on Food . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 Query Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Matching results and post-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Query Detection Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Word2Vec Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Food Clusters and Query Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.1 Food Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.2 Query Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 New Query Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
4.3.1 Query Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1 Validation Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
5.2 Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Managerial Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3 Research Limitations and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
參考文獻 References
[1] M. G. Vestergaard and L. M. Nielsen, “The danish veterinary and food administration’s fight against fake nutrition news on digital media,” Tidsskrift for Medier,
Erkendelse Og Formidling, vol. 7, no. 2, pp. 21–21, 2019.
[2] N. Grinberg, K. Joseph, L. Friedland, B. Swire­Thompson, and D. Lazer, “Fake news
on twitter during the 2016 us presidential election,” Science, vol. 363, no. 6425, pp.
374–378, 2019.
[3] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news detection on social media:
A data mining perspective,” ACM SIGKDD explorations newsletter, vol. 19, no. 1,pp. 22–36, 2017.
[4] C. Zhang, A. Gupta, C. Kauten, A. V. Deokar, and X. Qin, “Detecting fake news for reducing misinformation risks using analytics approaches,” European Journal of Operational Research, vol. 279, no. 3, pp. 1036–1052, 2019.
[5] H. Ahmed, I. Traore, and S. Saad, “Detection of online fake news using n­gram analysis and machine learning techniques,” in International conference on intelligent,secure, and dependable systems in distributed and cloud environments. Springer,2017, pp. 127–138.
[6] K. Demestichas, K. Remoundou, and E. Adamopoulou, “Food for thought: Fighting fake news and online disinformation,” IT Professional, vol. 22, no. 2, pp. 28–34,2020.
[7] J. Sampson, F. Morstatter, L. Wu, and H. Liu, “Leveraging the implicit structure within social media for emergent rumor detection,” in Proceedings of the 25th ACM international on conference on information and knowledge management, 2016, pp.2377–2382.
[8] J. Ma, W. Gao, and K.­F. Wong, “Rumor detection on twitter with tree­structured recursive neural networks.” Association for Computational Linguistics, 2018.
[9] A. Habib, S. Akbar, M. Z. Asghar, A. M. Khattak, R. Ali, and U. Batool, “Rumor detection in business reviews using supervised machine learning,” in 2018 5th International Conference on Behavioral, Economic, and Socio­Cultural Computing(BESC). IEEE, 2018, pp. 233–237.
[10] S. B. Rowe and N. Alexander, “On post­truth, fake news, and trust,” Nutrition Today,vol. 52, no. 4, pp. 179–182, 2017.
[11] S. Abnar, R. Ahmed, M. Mijnheer, and W. Zuidema, “Experiential, distributional and dependency­based word embeddings have complementary roles in decoding brain activity,” arXiv preprint arXiv:1711.09285, 2017.
[12] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[13] A. Bakarov, “A survey of word embeddings evaluation methods,” arXiv preprint arXiv:1801.09536, 2018.
[14] T. K. Ho, “Random decision forests,” in Proceedings of 3rd international conference on document analysis and recognition, vol. 1. IEEE, 1995, pp. 278–282.
[15] P. Vora, M. Khara, and K. Kelkar, “Classification of tweets based on emotions using word embedding and random forest classifiers,” International Journal of Computer Applications, vol. 178, no. 3, pp. 1–7, 2017.
[16] Z.­Q. Wang, X. Sun, D.­X. Zhang, and X. Li, “An optimal svm­based text classification algorithm,” in 2006 International Conference on Machine Learning and Cybernetics. IEEE, 2006, pp. 1378–1381.
[17] J. Chen, D. Liang, Z. Zhu, X. Zhou, Z. Ye, and X. Mo, “Social media popularity prediction based on visual­textual features with xgboost,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 2692–2696.
[18] E. Sherkat, J. Velcin, and E. E. Milios, “Fast and simple deterministic seeding of kmeans for text document clustering,” in International conference of the crosslanguage evaluation forum for European languages. Springer, 2018, pp. 76–88.
[19] L. Jing, M. K. Ng, J. Xu, and J. Z. Huang, “Subspace clustering of text documents with feature weighting k­means algorithm,” in Pacific­Asia Conference on Knowledge Discovery and Data Mining. Springer, 2005, pp. 802–812.
[20] A. Rangrej, S. Kulkarni, and A. V. Tendulkar, “Comparative study of clustering techniques for short text documents,” in Proceedings of the 20th international conference
companion on World wide web, 2011, pp. 111–112.
[21] A. Onan, “A k­medoids based clustering scheme with an application to document clustering,” in 2017 international conference on computer science and engineering (UBMK). IEEE, 2017, pp. 354–359.
[22] F. Liu and L. Xiong, “Survey on text clustering algorithm,” in 2011 IEEE 2nd International Conference on Software Engineering and Service Science. IEEE, 2011,pp. 901–904.
[23] N. K. Kaur, U. Kaur, and D. D. Singh, “K­medoid clustering algorithm­a review,” In ternational Journal of Computer Application and Technology (IJCAT), vol. 1, no. 1,pp. 2349–1841, 2014.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2026-07-01
校外 Off-campus:開放下載的時間 available 2026-07-01

您的 IP(校外) 位址是 18.226.226.186
現在時間是 2024-04-26
論文校外開放下載的時間是 2026-07-01

Your IP address is 18.226.226.186
The current date is 2024-04-26
This thesis will be available to you on 2026-07-01.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2026-07-01

QR Code