論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2028-07-16
校外 Off-campus:開放下載的時間 available 2028-07-16
論文名稱 Title |
基於事件紀錄分析攻擊趨勢 Attack Trend Analysis Based on Incident Logs |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
74 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2023-06-29 |
繳交日期 Date of Submission |
2023-07-16 |
關鍵字 Keywords |
攻擊圖、自然語言處理、網路威脅情資、事件日誌、分群分析 Attack Graph, NLP, CTI, Incident Log, Clustering |
||
統計 Statistics |
本論文已被瀏覽 181 次,被下載 0 次 The thesis/dissertation has been browsed 181 times, has been downloaded 0 times. |
中文摘要 |
隨著網路科技的蓬勃發展,企業組織建置大量網路設備以降低通訊、管理成本,卻同時將大量網路資產暴露於風險中,增加企業組織受到網路攻擊的風險。企業組織部署不同的IDS (Intrusion-detection system 入侵偵測系統,縮寫為IDS)、SOC (Security Operation Center資訊安全監控中心;簡稱SOC)等資安防禦系統產生巨量且格式不一的事件日誌,格式不一的事件日誌將造成資安人員鑑識上的困難。因此,有效利用事件日誌彙整攻擊趨勢並提供事件日誌的入侵指標,已成為重要的研究議題。 本研究提出「LoFA」(Log Forensics Analysis)事件日誌鑑識系統。LoFA系統使用大型網路架構(ISP、SOC)真實的事件日誌,透過自然語言處理(Natural Language Processing;簡稱NLP),產生事件日誌入侵指標分群後群集的關聯。此外,LoFA系統也提供事件日誌中入侵指標的攻擊圖及事件日誌中入侵指標的威脅情資。研究結果顯示,使用Word2Vec搭配Hierarchical Clustering的詞嵌入與分群演算法,最適合執行事件日誌分群的任務。此外,LoFA系統可以運用在實際的資安案例,透過攻擊圖以及網路威脅情資系統提升資安人員執行鑑識任務之效率。 |
Abstract |
With the flourishing development of Internet technology, enterprises have deployed a large number of network devices to reduce communication and management costs. However, this has also exposed a significant amount of network assets to risks, increasing the vulnerability of enterprise organizations to cyber attacks. To defend against these risks, enterprises implement various security systems such as Intrusion Detection Systems (IDS) and Security Operation Centers (SOC). These systems generate a vast amount of incident logs in different formats, making it difficult for security personnel to analyze and interpret them. Therefore, effectively aggregating incident logs to identify attack trends and providing intrusion indicators from these logs has become an important research topic. This study proposes the "LoFA" (Log Forensics Analysis) system for incident log forensics. By utilizing real incident logs from large-scale network architectures such as ISPs and SOCs, the system employs Natural Language Processing (NLP) techniques to generate clusters of correlated intrusion indicators from the incident logs. It also provides attack graphs based on the intrusion indicators in the incident logs and threat intelligence associated with these indicators. The research findings demonstrate that using Word2Vec in conjunction with Hierarchical Clustering for word embedding and clustering algorithms is the most suitable approach for performing incident log clustering tasks. Additionally, the LoFA system can be applied to real-world cybersecurity cases, as the attack graphs and network threat intelligence system effectively assist security personnel in performing forensic tasks. |
目次 Table of Contents |
論文審定書 i 摘要 ii Abstract iii 目錄 iv 圖次 vii 表次 viii 第一章 序論 1 1.1 研究背景 1 1.2 研究動機 3 第二章 文獻探討 5 2.1 網路威脅情資 5 2.2攻擊圖相關研究 6 2.3自然語言處理 10 2.3.1 Word2Vec 10 2.3.2 IP2vec 11 2.3.3 Bag of Words 12 2.3.4 AutoEncoder 13 2.4分群演算法 14 2.4.1 HDBSCAN 14 2.4.2 Affinity Propagation 15 2.4.3 Hierarchical Clustering 16 第三章 研究方法 18 3.1資料前處理(Data Preprocessing)模組 20 3.2攻擊圖生成(Attack Graph Generation)模組 20 3.3特徵選擇(Feature Selection)模組 22 3.3.1 IP選擇模組 23 3.3.2 攻擊手法選擇模組 24 3.4特徵嵌入(Feature Embedding)模組 25 3.5攻擊日誌分群(Attack Clustering)模組 26 3.6威脅情資(CTI Info)模組 26 第四章 系統評估 27 實驗資料集 28 實驗環境 28 評估指標 29 4.1實驗一:調整特徵以及搭配不同詞嵌入與分群演算法 31 實驗1.1 IP (Full/ Level C/ Level B) +BoW + HC 32 實驗1.2 IP (Full/ Level C/ Level B) + W2V + HDBSCAN 33 實驗1.3 IP (Full/ Level C/ Level B) + W2V + HC 34 實驗1.4 IP (Full/ Level C/ Level B) + W2V + AP 35 實驗1.5 IP (Full/ Level C/ Level B) + (W2V +AE) + HDBSCAN 36 實驗1.6 IP (Full/ Level C/ Level B) + (W2V +AE)+ HC 37 實驗1.7 IP (Full/ Level C/ Level B) + (W2V +AE)+ AP 38 實驗1.8 IP (Full/ Level C/ Level B) + (Bag Of Words +AE) + HDBSCAN 41 實驗1.9 IP (Full/ Level C/ Level B) + (Bag Of Words +AE) + HC 42 實驗1.10 IP (Full/ Level C/ Level B) + (Bag Of Words +AE) + AP 43 實驗一小結、比較詞嵌入與分群演算法評估指標 44 4.2實驗二:長期日誌分析 47 實驗2.1 2022年攻擊事件日誌分群 47 實驗2.2 2022年 10月-12月(第四季) 攻擊事件日誌分群 49 4.3實驗三:產出攻擊關聯圖 51 實驗3.1時間區間查詢 51 實驗3.2 IP查詢 52 實驗3.3 事件編號查詢 53 4.4實驗四:威脅情資系統(CTI Info)與實際資安案例 54 4.1案例一:駭客APT組織MuddyWater攻擊事件 54 4.2案例二:校園攝影機監控伺服器攻擊事件 57 第五章 研究貢獻與未來展望 59 參考文獻 61 附錄 64 |
參考文獻 References |
[1] Check Point. "2022 Cyber Security Report." https://www.ithome.com.tw/news/156040 (accessed May 18,2023. [2] TeamT5. "2022年TeamT5臺灣APT攻擊研究." https://www.ithome.com.tw/news/154758 (accessed May 19,2023. [3] iThome. "美、英、澳聯手公布2020年最常被利用的CVE漏洞." https://www.ithome.com.tw/news/146015 (accessed October 14, 2022). [4] G. Husari, E. Al-Shaer, M. Ahmed, B. Chu, and X. Niu, "Ttpdrill: Automatic and accurate extraction of threat actions from unstructured text of cti sources," in Proceedings of the 33rd annual computer security applications conference, 2017, pp. 103-115. [5] S. Zhang et al., "An Automatic Assessment Method of Cyber Threat Intelligence Combined with ATT&CK Matrix," Wireless Communications and Mobile Computing, vol. 2022, 2022. [6] H. Almohannadi, I. Awan, J. Al Hamar, A. Cullen, J. P. Disso, and L. Armitage, "Cyber threat intelligence from honeypot data using elasticsearch," in 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA), 2018: IEEE, pp. 900-906. [7] C. Zhong, J. Yen, P. Liu, and R. F. Erbacher, "Learning from experts’ experience: toward automated cyber security data triage," IEEE Systems Journal, vol. 13, no. 1, pp. 603-614, 2018. [8] C. Phillips and L. P. Swiler, "A graph-based system for network-vulnerability analysis," in Proceedings of the 1998 workshop on New security paradigms, 1998, pp. 71-79. [9] S. Khan and S. Parkinson, "Eliciting and utilising knowledge for security event log analysis: an association rule mining and automated planning approach," Expert Systems with Applications, vol. 113, pp. 116-127, 2018. [10] H. S. Lallie, K. Debattista, and J. Bal, "A review of attack graph and attack tree visual syntax in cyber security," Computer Science Review, vol. 35, p. 100219, 2020. [11] MITRE. https://www.mitre.org/ (accessed September 10, 2022). [12] MITRE CVE DATABASE. https://cve.mitre.org/ (accessed September 10, 2022). [13] MITRE ATT&CK. "Enterprise Matrix." https://attack.mitre.org/matrices/enterprise/ (accessed August 30, 2022). [14] "MITRE Software." https://attack.mitre.org/software/ (accessed March 17, 2023). [15] "VirusTotal." https://www.virustotal.com/ (accessed March 21, 2023). [16] "Malpedia." https://malpedia.caad.fkie.fraunhofer.de/ (accessed March 21, 2023). [17] "CISCO TALOS." https://talosintelligence.com/reputation_center/ (accessed March 12, 2023). [18] "AbuseIPDB." https://www.abuseipdb.com/ (accessed April 8, 2023. [19] W. U. Hassan, M. A. Noureddine, P. Datta, and A. Bates, "OmegaLog: High-fidelity attack investigation via transparent multi-layer log analysis," in Network and distributed system security symposium, 2020. [20] R. E. Sawilla and X. Ou, "Identifying critical attack assets in dependency attack graphs," in European Symposium on Research in Computer Security, 2008: Springer, pp. 18-34. [21] K. Kaynar and F. Sivrikaya, "Distributed attack graph generation," IEEE Transactions on Dependable and Secure Computing, vol. 13, no. 5, pp. 519-532, 2015. [22] H. Hu, J. Liu, Y. Zhang, Y. Liu, X. Xu, and J. Tan, "Attack scenario reconstruction approach using attack graph and alert data mining," Journal of Information Security and Applications, vol. 54, p. 102522, 2020. [23] A. Gylling, M. Ekstedt, Z. Afzal, and P. Eliasson, "Mapping cyber threat intelligence to probabilistic attack graphs," in 2021 IEEE International Conference on Cyber Security and Resilience (CSR), 2021: IEEE, pp. 304-311. [24] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013. [25] S. Cai, L. Zhang, A. Palazoglu, and J. Hu, "Clustering analysis of process alarms using word embedding," Journal of Process Control, vol. 83, pp. 11-19, 2019. [26] M. Naili, A. H. Chaibi, and H. H. B. Ghezala, "Comparative study of word embedding methods in topic segmentation," Procedia computer science, vol. 112, pp. 340-349, 2017. [27] N. Sayer, "Google code archive-long-term storage for google code project hosting," XP055260798, Retrieved from the Internet [retrieved on 20160323], 2014. [28] M. Ring, A. Dallmann, D. Landes, and A. Hotho, "Ip2vec: Learning similarities between ip addresses," in 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017: IEEE, pp. 657-666. [29] Y. Zhang, R. Jin, and Z.-H. Zhou, "Understanding bag-of-words model: a statistical framework," International journal of machine learning and cybernetics, vol. 1, pp. 43-52, 2010. [30] M. Mimura and H. Tanaka, "Heavy log reader: learning the context of cyber attacks automatically with paragraph vector," in Information Systems Security: 13th International Conference, ICISS 2017, Mumbai, India, December 16-20, 2017, Proceedings 13, 2017: Springer, pp. 146-163. [31] Y. HaCohen-Kerner, D. Miller, and Y. Yigal, "The influence of preprocessing on text classification using a bag-of-words representation," PloS one, vol. 15, no. 5, p. e0232525, 2020. [32] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006. [33] "AutoEncoder Structure." https://medium.com/@birla.deepak26/autoencoders-76bb49ae6a8f (accessed. [34] S. Naseer et al., "Enhanced network anomaly detection based on deep neural networks," IEEE access, vol. 6, pp. 48231-48246, 2018. [35] S. Naseer, R. Faizan Ali, P. Dominic, and Y. Saleem, "Learning representations of network traffic using deep neural networks for network anomaly detection: A perspective towards oil and gas IT infrastructures," Symmetry, vol. 12, no. 11, p. 1882, 2020. [36] A. Ghosal, A. Nandy, A. K. Das, S. Goswami, and M. Panday, "A short review on different clustering techniques and their applications," Emerging technology in modelling and graphics, pp. 69-83, 2020. [37] R. J. Campello, D. Moulavi, A. Zimek, and J. Sander, "Hierarchical density estimates for data clustering, visualization, and outlier detection," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 10, no. 1, pp. 1-51, 2015. [38] "HDBSCAN Works." https://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html (accessed 2023,April 25). [39] A. Nadeem, C. Hammerschmidt, C. H. Ganán, and S. Verwer, "MalPaCA: malware packet sequence clustering and analysis," arXiv preprint arXiv:1904.01371, 2019. [40] R. A. Ariyaluran Habeeb et al., "Clustering‐based real‐time anomaly detection—A breakthrough in big data technologies," Transactions on Emerging Telecommunications Technologies, vol. 33, no. 8, p. e3647, 2022. [41] B. J. Frey and D. Dueck, "Clustering by passing messages between data points," science, vol. 315, no. 5814, pp. 972-976, 2007. [42] H. Lin, Z. Chen, and J. Li, "Affinity propagation‐based interference‐free clustering for wireless sensor networks," International Journal of Communication Systems, vol. 33, no. 5, p. e4273, 2020. [43] M. Katebi, A. RezaKhani, S. Joudaki, and M. E. Shiri, "RAPSAMS: Robust affinity propagation clustering on static android malware stream," Concurrency and Computation: Practice and Experience, vol. 34, no. 15, p. e6980, 2022. [44] S. C. Johnson, "Hierarchical clustering schemes," Psychometrika, vol. 32, no. 3, pp. 241-254, 1967. [45] L. Wang, L. Gu, and Y. Tang, "Research on Alarm Reduction of Intrusion Detection System Based on Clustering and Whale Optimization Algorithm," Applied Sciences, vol. 11, no. 23, p. 11200, 2021. [46] C. Deepa and B. Latha, "HHSRP: a cluster based hybrid hierarchical secure routing protocol for wireless sensor networks," Cluster Computing, vol. 22, pp. 10449-10465, 2019. [47] P. Bafna, D. Pramod, and A. Vaidya, "Document clustering: TF-IDF approach," in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016: IEEE, pp. 61-66. [48] P. Jin, Y. Zhang, X. Chen, and Y. Xia, "Bag-of-embeddings for text classification," in IJCAI, 2016, vol. 16, pp. 2824-2830. [49] "TACERT." https://cert.tanet.edu.tw/prog/index.php (accessed 2023, April 13. [50] P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of computational and applied mathematics, vol. 20, pp. 53-65, 1987. [51] T. Caliński and J. Harabasz, "A dendrite method for cluster analysis," Communications in Statistics-theory and Methods, vol. 3, no. 1, pp. 1-27, 1974. [52] D. L. Davies and D. W. Bouldin, "A cluster separation measure," IEEE transactions on pattern analysis and machine intelligence, no. 2, pp. 224-227, 1979. [53] "Mitre ATT&CK MuddyWater." https://attack.mitre.org/groups/G0069/ (accessed 2023, April 17. [54] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus:開放下載的時間 available 2028-07-16 校外 Off-campus:開放下載的時間 available 2028-07-16 您的 IP(校外) 位址是 18.219.209.144 現在時間是 2024-11-21 論文校外開放下載的時間是 2028-07-16 Your IP address is 18.219.209.144 The current date is 2024-11-21 This thesis will be available to you on 2028-07-16. |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 2028-07-16 |
QR Code |