Interactive and Interpretable Topic Refinement for Analyzing Online Vaccine-Related Narratives
Covid-19, Anti-vaccination, Narratives analysis, Topic modeling, Data visualization, Interactive system, Human-in-the-loop
COVID-19 冠狀病毒爆發後,為了更有效控制疫情,各國政府及研究人員皆致力於了解大眾對於疫情的想法、反疫苗及支持疫苗的言論趨勢等,然而卻鮮少有針對疫情的視覺化分析系統,且大多無法讓使用者能自行修正資料標籤,以優化分析結果。為此,本研究利用深度學習模型與半監督分群模型,建構能區分反對疫苗和支持疫苗兩種貼文的主題模型,生成易於解釋的主題分析結果,並結合可互動圖表,發展一個可互動且具可解釋性的系統,讓使用者能透過該系統深入了解反疫苗及支持疫苗之相似或相異的主題及敘事內容;並且,它結合一種受約束的分群演算法,允許使用者透過系統界面進行人機迴圈 (Human-in-the-loop) 的過程,使用者可以通過視覺化圖表探索主題之間的關係,驗證各貼文的標籤是否正確,並修正可能不正確的標籤,再次建構主題模型並觀察結果,以此反覆過程來優化主題分析的結果。本研究使用 COVID-19 疫苗相關的社群媒體貼文作為案例研究,測試該系統在識別反對疫苗和支持疫苗兩種敘事方面的能力,實驗結果顯示,透過該系統有助於提高主題模型的各項評估指標,如熵值 (entropy) 和純度 (purity) 等,以此讓使用者可以更精確地了解反疫苗和支持疫苗兩種主題間的關係。
This research aims to develop highly interpretable models that help generate easy-to-explain data representations of social media texts, which will enhance the interpretability of the online measurement extracted from social media user-generated texts. Such a capacity can benefit our research seeking to measure online engagement and its connection to collective decision-making on societal changes. In this research, we develop an interactive and interpretable framework that allows analysts to identify text with similar or distinct narratives. We use social media text related to the Coronavirus disease 2019 (COVID-19) vaccines as a case study and test the capability of our framework in identifying the Anti-vaccine and Pro-vaccine narratives. Our framework offers two major advantages. First, it leverages semi-supervised topic modeling with deep learning architecture to identify topics that distinguishes between Anti-vaccine and Pro-vaccine posts. Second, it incorporates a constrained hierarchical clustering method that allows human-in-the-loop topic refinement through the system interface, where analysts can explore the relationship of topics via visual representation, verify the labels of post instances, or update labels that are more likely to be incorrect or less certain. Our evaluation shows that the results with refinement significantly improve the topics' coherence and allow for exploring the relationship between Anti-vaccine and Pro-vaccine topics.
