國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於事件紀錄分析攻擊趨勢,Attack Trend Analysis Based on Incident Logs

論文名稱 Title	基於事件紀錄分析攻擊趨勢 Attack Trend Analysis Based on Incident Logs
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	111 學年度第 2 學期 The spring semester of Academic Year 111	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	74
研究生 Author	簡哲銘 Che-Ming Chien
指導教授 Advisor	陳嘉玫 Chen,Chia-Mei
召集委員 Convenor	鄭炳強 Jeng,Bing-Chiang
口試委員 Advisory Committee	林耕霈, 韓毅, 吳東興 Lin, Keng-Pei; Han, Yi; Dong-Shing Wu
口試日期 Date of Exam	2023-06-29	繳交日期 Date of Submission	2023-07-16
關鍵字 Keywords	攻擊圖、自然語言處理、網路威脅情資、事件日誌、分群分析 Attack Graph, NLP, CTI, Incident Log, Clustering
統計 Statistics	本論文已被瀏覽 304 次，被下載 0 次 The thesis/dissertation has been browsed 304 times, has been downloaded 0 times.

中文摘要
隨著網路科技的蓬勃發展，企業組織建置大量網路設備以降低通訊、管理成本，卻同時將大量網路資產暴露於風險中，增加企業組織受到網路攻擊的風險。企業組織部署不同的IDS (Intrusion-detection system 入侵偵測系統，縮寫為IDS)、SOC (Security Operation Center資訊安全監控中心；簡稱SOC)等資安防禦系統產生巨量且格式不一的事件日誌，格式不一的事件日誌將造成資安人員鑑識上的困難。因此，有效利用事件日誌彙整攻擊趨勢並提供事件日誌的入侵指標，已成為重要的研究議題。本研究提出「LoFA」(Log Forensics Analysis)事件日誌鑑識系統。LoFA系統使用大型網路架構(ISP、SOC)真實的事件日誌，透過自然語言處理(Natural Language Processing；簡稱NLP)，產生事件日誌入侵指標分群後群集的關聯。此外，LoFA系統也提供事件日誌中入侵指標的攻擊圖及事件日誌中入侵指標的威脅情資。研究結果顯示，使用Word2Vec搭配Hierarchical Clustering的詞嵌入與分群演算法，最適合執行事件日誌分群的任務。此外，LoFA系統可以運用在實際的資安案例，透過攻擊圖以及網路威脅情資系統提升資安人員執行鑑識任務之效率。
Abstract
With the flourishing development of Internet technology, enterprises have deployed a large number of network devices to reduce communication and management costs. However, this has also exposed a significant amount of network assets to risks, increasing the vulnerability of enterprise organizations to cyber attacks. To defend against these risks, enterprises implement various security systems such as Intrusion Detection Systems (IDS) and Security Operation Centers (SOC). These systems generate a vast amount of incident logs in different formats, making it difficult for security personnel to analyze and interpret them. Therefore, effectively aggregating incident logs to identify attack trends and providing intrusion indicators from these logs has become an important research topic. This study proposes the "LoFA" (Log Forensics Analysis) system for incident log forensics. By utilizing real incident logs from large-scale network architectures such as ISPs and SOCs, the system employs Natural Language Processing (NLP) techniques to generate clusters of correlated intrusion indicators from the incident logs. It also provides attack graphs based on the intrusion indicators in the incident logs and threat intelligence associated with these indicators. The research findings demonstrate that using Word2Vec in conjunction with Hierarchical Clustering for word embedding and clustering algorithms is the most suitable approach for performing incident log clustering tasks. Additionally, the LoFA system can be applied to real-world cybersecurity cases, as the attack graphs and network threat intelligence system effectively assist security personnel in performing forensic tasks.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 iv 圖次 vii 表次 viii 第一章序論 1 1.1 研究背景 1 1.2 研究動機 3 第二章文獻探討 5 2.1 網路威脅情資 5 2.2攻擊圖相關研究 6 2.3自然語言處理 10 2.3.1 Word2Vec 10 2.3.2 IP2vec 11 2.3.3 Bag of Words 12 2.3.4 AutoEncoder 13 2.4分群演算法 14 2.4.1 HDBSCAN 14 2.4.2 Affinity Propagation 15 2.4.3 Hierarchical Clustering 16 第三章研究方法 18 3.1資料前處理(Data Preprocessing)模組 20 3.2攻擊圖生成(Attack Graph Generation)模組 20 3.3特徵選擇(Feature Selection)模組 22 3.3.1 IP選擇模組 23 3.3.2 攻擊手法選擇模組 24 3.4特徵嵌入(Feature Embedding)模組 25 3.5攻擊日誌分群(Attack Clustering)模組 26 3.6威脅情資(CTI Info)模組 26 第四章系統評估 27 實驗資料集 28 實驗環境 28 評估指標 29 4.1實驗一：調整特徵以及搭配不同詞嵌入與分群演算法 31 實驗1.1 IP (Full/ Level C/ Level B) +BoW + HC 32 實驗1.2 IP (Full/ Level C/ Level B) + W2V + HDBSCAN 33 實驗1.3 IP (Full/ Level C/ Level B) + W2V + HC 34 實驗1.4 IP (Full/ Level C/ Level B) + W2V + AP 35 實驗1.5 IP (Full/ Level C/ Level B) + (W2V +AE) + HDBSCAN 36 實驗1.6 IP (Full/ Level C/ Level B) + (W2V +AE)+ HC 37 實驗1.7 IP (Full/ Level C/ Level B) + (W2V +AE)+ AP 38 實驗1.8 IP (Full/ Level C/ Level B) + (Bag Of Words +AE) + HDBSCAN 41 實驗1.9 IP (Full/ Level C/ Level B) + (Bag Of Words +AE) + HC 42 實驗1.10 IP (Full/ Level C/ Level B) + (Bag Of Words +AE) + AP 43 實驗一小結、比較詞嵌入與分群演算法評估指標 44 4.2實驗二：長期日誌分析 47 實驗2.1 2022年攻擊事件日誌分群 47 實驗2.2 2022年 10月-12月(第四季) 攻擊事件日誌分群 49 4.3實驗三：產出攻擊關聯圖 51 實驗3.1時間區間查詢 51 實驗3.2 IP查詢 52 實驗3.3 事件編號查詢 53 4.4實驗四：威脅情資系統(CTI Info)與實際資安案例 54 4.1案例一：駭客APT組織MuddyWater攻擊事件 54 4.2案例二：校園攝影機監控伺服器攻擊事件 57 第五章研究貢獻與未來展望 59 參考文獻 61 附錄 64

參考文獻 References
[1] Check Point. "2022 Cyber Security Report." https://www.ithome.com.tw/news/156040 (accessed May 18,2023. [2] TeamT5. "2022年TeamT5臺灣APT攻擊研究." https://www.ithome.com.tw/news/154758 (accessed May 19,2023. [3] iThome. "美、英、澳聯手公布2020年最常被利用的CVE漏洞." https://www.ithome.com.tw/news/146015 (accessed October 14, 2022). [4] G. Husari, E. Al-Shaer, M. Ahmed, B. Chu, and X. Niu, "Ttpdrill: Automatic and accurate extraction of threat actions from unstructured text of cti sources," in Proceedings of the 33rd annual computer security applications conference, 2017, pp. 103-115. [5] S. Zhang et al., "An Automatic Assessment Method of Cyber Threat Intelligence Combined with ATT&CK Matrix," Wireless Communications and Mobile Computing, vol. 2022, 2022. [6] H. Almohannadi, I. Awan, J. Al Hamar, A. Cullen, J. P. Disso, and L. Armitage, "Cyber threat intelligence from honeypot data using elasticsearch," in 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA), 2018: IEEE, pp. 900-906. [7] C. Zhong, J. Yen, P. Liu, and R. F. Erbacher, "Learning from experts’ experience: toward automated cyber security data triage," IEEE Systems Journal, vol. 13, no. 1, pp. 603-614, 2018. [8] C. Phillips and L. P. Swiler, "A graph-based system for network-vulnerability analysis," in Proceedings of the 1998 workshop on New security paradigms, 1998, pp. 71-79. [9] S. Khan and S. Parkinson, "Eliciting and utilising knowledge for security event log analysis: an association rule mining and automated planning approach," Expert Systems with Applications, vol. 113, pp. 116-127, 2018. [10] H. S. Lallie, K. Debattista, and J. Bal, "A review of attack graph and attack tree visual syntax in cyber security," Computer Science Review, vol. 35, p. 100219, 2020. [11] MITRE. https://www.mitre.org/ (accessed September 10, 2022). [12] MITRE CVE DATABASE. https://cve.mitre.org/ (accessed September 10, 2022). [13] MITRE ATT&CK. "Enterprise Matrix." https://attack.mitre.org/matrices/enterprise/ (accessed August 30, 2022). [14] "MITRE Software." https://attack.mitre.org/software/ (accessed March 17, 2023). [15] "VirusTotal." https://www.virustotal.com/ (accessed March 21, 2023). [16] "Malpedia." https://malpedia.caad.fkie.fraunhofer.de/ (accessed March 21, 2023). [17] "CISCO TALOS." https://talosintelligence.com/reputation_center/ (accessed March 12, 2023). [18] "AbuseIPDB." https://www.abuseipdb.com/ (accessed April 8, 2023. [19] W. U. Hassan, M. A. Noureddine, P. Datta, and A. Bates, "OmegaLog: High-fidelity attack investigation via transparent multi-layer log analysis," in Network and distributed system security symposium, 2020. [20] R. E. Sawilla and X. Ou, "Identifying critical attack assets in dependency attack graphs," in European Symposium on Research in Computer Security, 2008: Springer, pp. 18-34. [21] K. Kaynar and F. Sivrikaya, "Distributed attack graph generation," IEEE Transactions on Dependable and Secure Computing, vol. 13, no. 5, pp. 519-532, 2015. [22] H. Hu, J. Liu, Y. Zhang, Y. Liu, X. Xu, and J. Tan, "Attack scenario reconstruction approach using attack graph and alert data mining," Journal of Information Security and Applications, vol. 54, p. 102522, 2020. [23] A. Gylling, M. Ekstedt, Z. Afzal, and P. Eliasson, "Mapping cyber threat intelligence to probabilistic attack graphs," in 2021 IEEE International Conference on Cyber Security and Resilience (CSR), 2021: IEEE, pp. 304-311. [24] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013. [25] S. Cai, L. Zhang, A. Palazoglu, and J. Hu, "Clustering analysis of process alarms using word embedding," Journal of Process Control, vol. 83, pp. 11-19, 2019. [26] M. Naili, A. H. Chaibi, and H. H. B. Ghezala, "Comparative study of word embedding methods in topic segmentation," Procedia computer science, vol. 112, pp. 340-349, 2017. [27] N. Sayer, "Google code archive-long-term storage for google code project hosting," XP055260798, Retrieved from the Internet [retrieved on 20160323], 2014. [28] M. Ring, A. Dallmann, D. Landes, and A. Hotho, "Ip2vec: Learning similarities between ip addresses," in 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017: IEEE, pp. 657-666. [29] Y. Zhang, R. Jin, and Z.-H. Zhou, "Understanding bag-of-words model: a statistical framework," International journal of machine learning and cybernetics, vol. 1, pp. 43-52, 2010. [30] M. Mimura and H. Tanaka, "Heavy log reader: learning the context of cyber attacks automatically with paragraph vector," in Information Systems Security: 13th International Conference, ICISS 2017, Mumbai, India, December 16-20, 2017, Proceedings 13, 2017: Springer, pp. 146-163. [31] Y. HaCohen-Kerner, D. Miller, and Y. Yigal, "The influence of preprocessing on text classification using a bag-of-words representation," PloS one, vol. 15, no. 5, p. e0232525, 2020. [32] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006. [33] "AutoEncoder Structure." https://medium.com/@birla.deepak26/autoencoders-76bb49ae6a8f (accessed. [34] S. Naseer et al., "Enhanced network anomaly detection based on deep neural networks," IEEE access, vol. 6, pp. 48231-48246, 2018. [35] S. Naseer, R. Faizan Ali, P. Dominic, and Y. Saleem, "Learning representations of network traffic using deep neural networks for network anomaly detection: A perspective towards oil and gas IT infrastructures," Symmetry, vol. 12, no. 11, p. 1882, 2020. [36] A. Ghosal, A. Nandy, A. K. Das, S. Goswami, and M. Panday, "A short review on different clustering techniques and their applications," Emerging technology in modelling and graphics, pp. 69-83, 2020. [37] R. J. Campello, D. Moulavi, A. Zimek, and J. Sander, "Hierarchical density estimates for data clustering, visualization, and outlier detection," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 10, no. 1, pp. 1-51, 2015. [38] "HDBSCAN Works." https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html (accessed 2023,April 25). [39] A. Nadeem, C. Hammerschmidt, C. H. Ganán, and S. Verwer, "MalPaCA: malware packet sequence clustering and analysis," arXiv preprint arXiv:1904.01371, 2019. [40] R. A. Ariyaluran Habeeb et al., "Clustering‐based real‐time anomaly detection—A breakthrough in big data technologies," Transactions on Emerging Telecommunications Technologies, vol. 33, no. 8, p. e3647, 2022. [41] B. J. Frey and D. Dueck, "Clustering by passing messages between data points," science, vol. 315, no. 5814, pp. 972-976, 2007. [42] H. Lin, Z. Chen, and J. Li, "Affinity propagation‐based interference‐free clustering for wireless sensor networks," International Journal of Communication Systems, vol. 33, no. 5, p. e4273, 2020. [43] M. Katebi, A. RezaKhani, S. Joudaki, and M. E. Shiri, "RAPSAMS: Robust affinity propagation clustering on static android malware stream," Concurrency and Computation: Practice and Experience, vol. 34, no. 15, p. e6980, 2022. [44] S. C. Johnson, "Hierarchical clustering schemes," Psychometrika, vol. 32, no. 3, pp. 241-254, 1967. [45] L. Wang, L. Gu, and Y. Tang, "Research on Alarm Reduction of Intrusion Detection System Based on Clustering and Whale Optimization Algorithm," Applied Sciences, vol. 11, no. 23, p. 11200, 2021. [46] C. Deepa and B. Latha, "HHSRP: a cluster based hybrid hierarchical secure routing protocol for wireless sensor networks," Cluster Computing, vol. 22, pp. 10449-10465, 2019. [47] P. Bafna, D. Pramod, and A. Vaidya, "Document clustering: TF-IDF approach," in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016: IEEE, pp. 61-66. [48] P. Jin, Y. Zhang, X. Chen, and Y. Xia, "Bag-of-embeddings for text classification," in IJCAI, 2016, vol. 16, pp. 2824-2830. [49] "TACERT." https://cert.tanet.edu.tw/prog/index.php (accessed 2023, April 13. [50] P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of computational and applied mathematics, vol. 20, pp. 53-65, 1987. [51] T. Caliński and J. Harabasz, "A dendrite method for cluster analysis," Communications in Statistics-theory and Methods, vol. 3, no. 1, pp. 1-27, 1974. [52] D. L. Davies and D. W. Bouldin, "A cluster separation measure," IEEE transactions on pattern analysis and machine intelligence, no. 2, pp. 224-227, 1979. [53] "Mitre ATT&CK MuddyWater." https://attack.mitre.org/groups/G0069/ (accessed 2023, April 17. [54] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2028-07-16 校外 Off-campus：開放下載的時間 available 2028-07-16 您的 IP(校外) 位址是 216.73.216.218 現在時間是 2025-06-05 論文校外開放下載的時間是 2028-07-16 Your IP address is 216.73.216.218 The current date is 2025-06-05 This thesis will be available to you on 2028-07-16.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2028-07-16

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS