國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,資安事件摘要萃取,Abstractive Summarization of Target Attacks Based on Transfer Learning

論文名稱 Title	資安事件摘要萃取 Abstractive Summarization of Target Attacks Based on Transfer Learning
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	109 學年度第 2 學期 The spring semester of Academic Year 109	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	64
研究生 Author	王妤瑄 Yu-Xuan Wang
指導教授 Advisor	陳嘉玫 Chen,Chia-Mei
召集委員 Convenor	鄭伯炤 Bo-Chao Cheng
口試委員 Advisory Committee	李宗南, 林輝堂, 賴谷鑫 Lee,Chung-Nan; Hui-Tang Lin; Gu Hsin Lai
口試日期 Date of Exam	2021-08-17	繳交日期 Date of Submission	2021-10-16
關鍵字 Keywords	網路威脅情資、APT事件、自然語言處理、自動化摘要系統、類神經網路 CTI, APT Events, NLP, Automatic Summarization System, Neural Network
統計 Statistics	本論文已被瀏覽 529 次，被下載 0 次 The thesis/dissertation has been browsed 529 times, has been downloaded 0 times.

中文摘要
資通科技在硬體與軟體上的快速發展，提供企業組織與個人更加便利的生活。與此同時，也提升資訊安全的風險。隨著APT組織的出現，駭客組織攻擊頻率與複雜程度日益升級。針對單一組織與領域的攻擊接連出現。因此，有效利用網路威脅情資，提前了解駭客組織過往的行為，並將以往被動的防禦策略轉為主動的提前部屬，企業組織才能應對APT攻擊。近年來，網路威脅情資蓬勃發展，已有許多全國知名的威脅情資交換平台。但所產生的大量CTI逐漸演變為大數據。若仰賴人工進行收集與分析，將花費許多時間。因此，企業組織如何快速的篩選自身所需的資訊成為一項必經課題。有鑑於此，本研究提出一個專用於資訊安全威脅事件的自動化摘要系統「TISUM」（TISUM Threat Intelligence Summarizer）。收集大量的資訊安全事件新聞以及資訊安全報告。透過自然語言處理(Natural Language Processing，簡稱NLP）以及類神經網路，自動化產生資訊安全事件的摘要。「TISUM」達到ROUGE評分70%，讓企業組織可以快速理解網路威脅情資的重點。
Abstract
The rapid development of ICT (Information Communication Technology) in hardware and software distribute more convenient life to enterprises and individuals. However, it also increases information security risk. The emergence of APT (Advanced Persistent Threat) group extends complexity and frequency of cyber-attack. More cyber-attacks target at individual organization and industry, and therefore proactive defense such as Cyber Threat Intelligence (CTI) acquisition to comprehend the behaviors of hacker groups is needed for enterprises and organizations to properly respond to APT attacks, rather than the passive and conventional defense strategies. There are many famous threat intelligences sharing platforms in recent year, representing the flourishing development of CTI. However, it takes much time to collect and analyze the accumulated CTI information manually. Therefore, filtering out the needed information is a crucial issue for enterprises and organizations. To solve the abovementioned issues, this study proposes an automated summarization system “TISUM” (Threat Intelligence Summarizer) to gather plenty of news and APT reports and produce summary of information security incidents automatically by utilizing Natural Language Processing (NLP) and neural networks. The proposed system can reach 70% in ROUGE evaluation, which means enterprises and organizations can comprehend the key point of cyber threat intelligences with the proposed system.

目次 Table of Contents
論文審定書.....................................................................................................................i 摘要................................................................................................................................ii Abstract........................................................................................................................ iii 目錄...............................................................................................................................iv 圖次...............................................................................................................................vi 表次..............................................................................................................................vii 第一章緒論............................................................................................................1 1.1 研究背景....................................................................................................1 1.2 研究動機....................................................................................................2 第二章文獻探討....................................................................................................5 2.1 背景相關研究............................................................................................5 2.2 網路威脅情資............................................................................................7 2.3 機器學習與類神經網路............................................................................8 2.4 摘要技術..................................................................................................15 2.4.1 威脅行為擷取..........................................................................................17 2.4.2 實體萃取..................................................................................................17 2.4.3 關聯萃取..................................................................................................18 第三章研究方法..................................................................................................19 3.1 資料蒐集..................................................................................................21 3.2 文本標註..................................................................................................21 3.2.1 標註工具..................................................................................................22 3.3 威脅實體萃取..........................................................................................24 3.4 威脅事件摘要萃取..................................................................................26 v 第四章系統評估..................................................................................................28 4.1 實驗 1、標註工具與標註規則、數量比較與篩選...............................34 4.2 實驗 2、比較不同 BERT 優化器與參數設置對系統效能的影響.......37 4.3 實驗 3、比較威脅實體萃取模組中的三種不同神經網路...................40 4.4 實驗 4、威脅實體萃取相關論文比較...................................................44 4.5 實驗五、資安摘要萃取..........................................................................45 第五章研究貢獻與未來展望..............................................................................50 參考文獻......................................................................................................................52

參考文獻 References
[1] D. Bodeau and R. Graubart, "Cyber resiliency and NIST special publication 800-53 Rev. 4 controls," MITRE, Tech. Rep., 2013. [2] Fireeye. "Russia’s APT28 Strategically Evolves its Cyber Operations." https://www.fireeye.com/current-threats/apt-groups/rpt-apt28.html (accessed 06/12, 2021). [3] Fireeye. "Advanced Persistent Threat Groups Who's who of cyber threat actors." https://www.fireeye.com/current-threats/apt-groups.html (accessed 06/12, 2021). [4] USCERT. https://us-cert.cisa.gov/ncas/alerts/aa20-301a (accessed 06/12, 2021). [5] 蔣曜宇. "中油勒索病毒事件幕後黑手來自中國，威脅「再攻10家台灣企業」，資安防護該怎麼做." https://www.bnext.com.tw/article/57748/ransomware-winntigroup-threateningtaiwan (accessed 06/13, 2021). [6] H. Christian, M. P. Agus, and D. Suhartono, "Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF)," ComTech: Computer, Mathematics and Engineering Applications, vol. 7, no. 4, pp. 285-294, 2016. [7] J. Steinberger and M. Křišťan, "Lsa-based multi-document summarization," in Proceedings of 8th International PhD Workshop on Systems and Control, 2007, vol. 7: Citeseer. [8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018. [9] D. Miller, "Leveraging BERT for extractive text summarization on lectures," arXiv preprint arXiv:1906.04165, 2019. [10] Y. Liu, "Fine-tune BERT for extractive summarization," arXiv preprint arXiv:1903.10318, 2019. [11] MITER. "ATT&CK Matrix." https://attack.mitre.org/matrices (accessed 07/ 16, 2021). [12] M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut, "Text summarization techniques: a brief survey," arXiv preprint arXiv:1707.02268, 2017. [13] G. Husari, E. Al-Shaer, M. Ahmed, B. Chu, and X. Niu, "Ttpdrill: Automatic and accurate extraction of threat actions from unstructured text of cti sources," in Proceedings of the 33rd Annual Computer Security Applications Conference, 2017, pp. 103-115. [14] 蕭博文. "中國駭客組織攻擊10政府單位調查局專案偵辦." https://www.cna.com.tw/news/asoc/202008190094.aspx (accessed 03/ 21, 2021). [15] MITER. "MITER." https://www.mitre.org/ (accessed 06/16, 2021). [16] H. He, L. Yu, W. Cai, X. Wang, X. Gong, H. Wang, and C. Liu, "PPIDS: A Pyramid-Like Printer Intrusion Detection System Based on ATT&CK Framework," in Information Security and Cryptology: 15th International Conference, Inscrypt 2019, Nanjing, China, December 6–8, 2019, Revised Selected Papers, 2020, vol. 12020: Springer Nature, p. 277. [17] J.Y. Kan, "應用資訊檢索提取網路威脅情資 (Extracting Cyber Threat Intelligence by Using Information Retrieval)," 2020. [18] 羅正漢. "【不只幫助攻擊入侵行為的理解，更便於企業防禦評估】資安攻防新戰略MITRE ATT&CK." https://www.ithome.com.tw/news/131274 (accessed 07/07, 2021). [19] R. M. Lee, "2020 SANS Cyber Threat Intelligence (CTI) Survey," 2020. [20] 吳佳翰. "網路威脅情資淺談." https://www2.deloitte.com/tw/tc/pages/risk/articles/cyber-threat-intelligence.html (accessed 06/18, 2021). [21] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [22] M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997. [23] C. Parmar, R. Chaubey, K. Bhatt, and R. Lokare, "Abstractive text summarization using artificial intelligence," in 2nd International Conference on Advances in Science & Technology (ICAST), 2019. [24] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112. [25] N. Limsopatham and N. Collier, "Bidirectional LSTM for named entity recognition in Twitter messages," 2016. [26] C. Dong, H. Wu, J. Zhang, and C. Zong, "Multichannel LSTM-CRF for named entity recognition in Chinese social media," in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data: Springer, 2017, pp. 197-208. [27] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," arXiv preprint arXiv:1310.4546, 2013. [28] J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543. [29] Wiki. "Wiki." https://en.wikipedia.org/wiki/Main_Page (accessed 03/19, 2021). [30] BooksCorpus. "BooksCorpus." https://www.english-corpora.org/googlebooks/ (accessed 03/18, 2021). [31] J. Devlin. "Bert." https://github.com/google-research/bert (accessed 04/07, 2021). [32] SQuAD. "SQuAD." https://rajpurkar.github.io/SQuAD-explorer/ (accessed 06/16, 2021). [33] 微軟亞洲研究院. "微軟亞洲研究院." https://www.msra.cn/ (accessed 07/28, 2021). [34] CoNLL2003. "CoNLL2003." https://huggingface.co/datasets/conll2003 (accessed 06/16, 2021). [35] N. Reimers and I. Gurevych, "Optimal hyperparameters for deep lstm-networks for sequence labeling tasks," arXiv preprint arXiv:1707.06799, 2017. [36] B. Larsen, "A trainable summarizer with knowledge acquired from robust NLP techniques," Advances in automatic text summarization, vol. 71, 1999. [37] V. Dalal and L. Malik, "A survey of extractive and abstractive text summarization techniques," in 2013 6th International Conference on Emerging Trends in Engineering and Technology, 2013: IEEE, pp. 109-110. [38] H. P. Luhn, "The automatic creation of literature abstracts," IBM Journal of research and development, vol. 2, no. 2, pp. 159-165, 1958. [39] H. P. Edmundson, "New methods in automatic extracting," Journal of the ACM (JACM), vol. 16, no. 2, pp. 264-285, 1969. [40] E. Hovy and C.-Y. Lin, "Automated text summarization in SUMMARIST," Advances in automatic text summarization, vol. 14, pp. 81-94, 1999. [41] P. M. Hanunggul and S. Suyanto, "The impact of local attention in lstm for abstractive text summarization," in 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2019: IEEE, pp. 54-57. [42] S. Song, H. Huang, and T. Ruan, "Abstractive text summarization using LSTM-CNN based deep learning," Multimedia Tools and Applications, vol. 78, no. 1, pp. 857-875, 2019. [43] G. Husari, X. Niu, B. Chu, and E. Al-Shaer, "Using entropy and mutual information to extract threat actions from cyber threat intelligence," in 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), 2018: IEEE, pp. 1-6. [44] Z. Zhu and T. Dumitraş, "Featuresmith: Automatically engineering features for malware detection by mining the security literature," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 767-778. [45] Stanford NLP Group. "Named Entity Recognition (NER) and Information Extraction (IE)." https://nlp.stanford.edu/ner/ (accessed 06/15, 2021). [46] Spacy.io. "spacy." https://spacy.io/ (accessed 04/12, 2021). [47] L. Rabiner and B. Juang, "An introduction to hidden Markov models," ieee assp magazine, vol. 3, no. 1, pp. 4-16, 1986. [48] J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," 2001. [49] S. Lai, L. Xu, K. Liu, and J. Zhao, "Recurrent convolutional neural networks for text classification," in Proceedings of the AAAI Conference on Artificial Intelligence, 2015, vol. 29, no. 1. [50] T. Linzen, E. Dupoux, and Y. Goldberg, "Assessing the ability of LSTMs to learn syntax-sensitive dependencies," Transactions of the Association for Computational Linguistics, vol. 4, pp. 521-535, 2016. [51] H. Gasmi, A. Bouras, and J. Laval, "LSTM recurrent neural networks for cybersecurity named entity recognition," ICSEA, vol. 11, p. 2018, 2018. [52] F. Yi, B. Jiang, L. Wang, and J. Wu, "Cybersecurity named entity recognition using multi-modal ensemble learning," IEEE Access, vol. 8, pp. 63214-63224, 2020. [53] NVD. "NVD Data Feeds." https://nvd.nist.gov/vuln/data-feeds (accessed 06/07, 2021). [54] R. Bunescu and R. Mooney, "A shortest path dependency kernel for relation extraction," in Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 2005, pp. 724-731. [55] P. Shi and J. Lin, "Simple bert models for relation extraction and semantic role labeling," arXiv preprint arXiv:1904.05255, 2019. [56] Kbandla. "APTnotes." https://github.com/kbandla/APTnotes (accessed 03/08, 2021). [57] Feeds Post. "Top 40 Cyber Security News Websites for Information Security Pros." https://blog.feedspot.com/cyber_security_news_websites/ (accessed 06/17, 2021). [58] L. Richardson. "BeautifulSoup." https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (accessed 03/08, 2021). [59] J. Huggins. "Selenium." https://pypi.org/project/selenium/ (accessed 03/08, 2021). [60] PDFminer. "PDFminer." https://pypi.org/project/pdfminer/ (accessed 03/08, 2021). [61] Amadanmath. "Brat Rapid Annotation Tool (brat)." https://github.com/nlplab/brat (accessed 06/14, 2021). [62] J. Yang. "YEDDA: A Lightweight Collaborative Text Span Annotation Tool." https://github.com/jiesutd/YEDDA (accessed 06/19, 2021). [63] C.Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text summarization branches out, 2004, pp. 74-81. [64] Google. "Colab." https://colab.research.google.com/?utm_source=scs-index (accessed 07/07, 2021). [65] H. Nakayama. "Seqeval " https://github.com/chakki-works/seqeval (accessed 08/07, 2021). [66] S. Bird. "NLTK." https://www.nltk.org/ (accessed 08/07, 2021). [67] FAIR Facebook AIResearch. "Fasttext." https://fasttext.cc/ (accessed 03/26, 2021). [68] Rank-bm25. "Rank-bm25." https://pypi.org/project/rank-bm25/ (accessed 08/07, 2021). [69] Stanford NLP Group. "Named Entity Recognition (NER) and Information Extraction (IE)." https://nlp.stanford.edu/ner/ (accessed 06/15, 2021). [70] W. McKinney. "Pandas." https://pypi.org/project/pandas/ (accessed 08/07, 2021). [71] T. Oliphant. "Numpy." https://pypi.org/project/numpy/ (accessed 08/07, 2021). [72] Tqdm. "Tqdm." https://pypi.org/project/tqdm/ (accessed 08/07, 2021). [73] Scikit-learn. https://pypi.org/project/scikit-learn/ (accessed 08/07, 2021). [74] S. Gatlan. "Chinese state hackers target Linux systems with new malware." https://www.bleepingcomputer.com/news/security/chinese-state-hackers-target-linux-systems-with-new-malware/ (accessed 08/07, 2021).

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2026-10-16 校外 Off-campus：開放下載的時間 available 2026-10-16 您的 IP(校外) 位址是 216.73.216.218 現在時間是 2025-06-05 論文校外開放下載的時間是 2026-10-16 Your IP address is 216.73.216.218 The current date is 2025-06-05 This thesis will be available to you on 2026-10-16.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2026-10-16

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS