國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,對抗式攻擊擾動異常偵測模型的穩健性與防禦,Robustness and Defense of Anomaly Detection Model Against Adversarial Attack

論文名稱 Title	對抗式攻擊擾動異常偵測模型的穩健性與防禦 Robustness and Defense of Anomaly Detection Model Against Adversarial Attack
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	110 學年度第 2 學期 The spring semester of Academic Year 110	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	70
研究生 Author	劉晉瑋 Jin-Wei Liu
指導教授 Advisor	陳嘉玫 Chen,Chia-Mei
召集委員 Convenor	賴谷鑫 Lai,Gu-Hsin
口試委員 Advisory Committee	楊惠芳, 林孝忠, 歐雅惠 Yang,Huei-Fang; Lin,Hsiao-Chung; Ou,Ya-Hui
口試日期 Date of Exam	2022-07-19	繳交日期 Date of Submission	2022-07-30
關鍵字 Keywords	對抗式攻擊、黑箱攻擊、表格式資料、超參數調整演算法、異常事件偵測系統 Adversarial Attack, Black-box Attack, Tabular Data, Hyperparameter Tuning Algorithm, Outlier Detection
統計 Statistics	本論文已被瀏覽 327 次，被下載 0 次 The thesis/dissertation has been browsed 327 times, has been downloaded 0 times.

中文摘要
隨著資料量增長開啟大數據時代，機器學習與深度學習等人工智慧方法受到重視，並在資料探勘、自然語言處理、電腦視覺與異常事件偵測各領域開啟廣泛的研究及深入的應用。人工智慧方法比起人類專家更具優勢解決複雜且高重複性的任務，但根據研究[1, 2]指出此類模型容易受到對抗式攻擊影響，攻擊者能擾動偵測結果甚至操弄預測標籤，一旦瞄準目標系統中關鍵且脆弱的核心演算法發起對抗式攻擊，便會威脅整體系統的完整性與資訊安全。系統開發人員開發基於機器學習的資訊系統時，為求加速開發時程並具有完整的理論證明方法的有效性，因而採用其他研究者公開的標記資料集、預訓練模型、套件及文獻程式碼。倘若上述開源專案受到網路駭客汙染、植入後門或本身存在漏洞，開發人員誤用後該系統將暴露於威脅中。滲透測試(Penetration Test, PT)模擬攻擊者方法，為測試系統安全性最直接的做法，利用各種手法測試目標系統弱點，協助資訊系統開發人員強化目標系統的抗擾動性。本研究提出基於真實企業 Active Directory(AD)事件記錄檔的可循環對抗式樣本訓練方法，屬於黑箱攻擊。所設計的對抗式樣本訓練方法能確保擾動性品質並合乎事件紀錄檔的規範，目的在於挑戰有效且複雜的異常事件偵測系統，藉由訓練並產生的對抗式樣本找出目標偵測模型的潛在弱點。實驗結果證明本方法訓練出的對抗式樣本能成功攻擊複雜的異常事件偵測系統，且造成的擾動性優於其他研究提出的生成對抗式樣本擾動方法，並在最後依據攻擊結果實地提出目標系統提升抵禦對抗式攻擊擾動的方法。
Abstract
As the amount of data kept expanding, the era of big data has come. Artificial intelligence (AI)-related technologies, including machine learning, deep learning, natural language processing, have been applied to anomaly detection and many other application fields and achieved efficient solutions. Comparing with human expert, AI approaches are more suitable for solving complicated problems with repetitions. However, according to the previous research [1, 2], deep learning models are vulnerable to adversarial attacks, where an adversary manipulates the outcomes of a detection model by inserting adversarial samples. Once the adversary exploits the vulnerability of the core algorithm of the target model, the integrity and correctness of the model might be at risk. To accelerate the development process of information system and support by theory, system developer intends to use open source including labeled dataset, pre-trained model, library and code published by other scholars. If these open resources have been contaminated by cyber attacker, it will affect the practical system security. Fortunately, penetration testing can simulate cyber-attack against the target system. With the hacking drill, it’s the most direct way to help developer find out the exploitable vulnerabilities and keep target system away from the threats. This study proposes a cyclic adversarial sample training method based on real-world Active Directory event log and it’s inspired by black-box attack. In order to challenge welldesigned anomaly detection system and find out the potential weakness of the target system, the method proposed by this study train strong perturbative adversarial samples under the specification of the event log. The experimental results provide the trained adversarial samples can attack target system successfully and the attack achievement is better than other studies performed. At the end of the paper, this study will provide an ingenious method inspired by the attack process to truly improve the robustness of the anomaly detection system.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 iv 圖次 vii 表次 viii 第一章序論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 5 第二章文獻探討 6 2.1對抗式攻擊 6 2.2產生對抗式樣本 8 2.2.1 取樣方法 9 2.2.2 表格資料合成方法 10 2.2.3 超參數與超參數調整演算法 11 2.2.4 對抗式攻擊擾動方法與替代模型 13 2.3 異常事件模型偵測方法 15 2.4 評估方法 16 2.4.1防禦者對攻擊者能力評估 16 2.4.2攻擊者對目標模型的掌握評估 18 2.4.3目標模型的抗擾動性評估 18 第三章研究方法 19 3.1系統架構 20 1. 取樣與擾動模組 23 2. 攻擊樣本訓練模組 23 3. 攻擊樣本評估子模組 23 4. 目標攻擊模組 23 3.2 取樣與擾動模組 24 3.3 攻擊樣本訓練模組 26 3.4 攻擊樣本評估子模組 27 3.5 目標攻擊模組 30 第四章系統評估 32 4.1待訓練資料取樣與前處理 36 4.1.1取樣比例Ps實驗 36 4.1.2 Flip擾動度實驗 38 4.1.3 Sequence擾動實驗 39 4.2對抗式樣本訓練 41 4.2.1訓練評估方法效能實驗 41 4.2.2基因演算法調參實驗 42 4.3目標模型對抗式攻擊實驗 46 4.3.1 Huang系統擾動實驗 46 4.3.2 Kang系統擾動實驗 48 4.4文獻對抗式攻擊法比較 49 4.4.1 Artificial adversary法 49 4.4.2 FGSM法 50 4.4.3 LowProFool法 51 4.5 偵測系統抗擾動性提升 52 第五章研究貢獻與未來展望 55 參考資料 57 附錄A 訓練時間 61 圖次圖1-1、干擾攻擊 4 圖2-1、擾動影響分類[1] 6 圖2-2、毒化攻擊與逃逸攻擊[14] 7 圖2-3、CTGAN架構[27] 11 圖2-4、基因演算法 12 圖2-5、攻擊策略示意圖[49] 17 圖3-1、系統總覽 20 圖3-2、系統架構圖 21 圖4-1、取樣方法結果比較圖(Recall) 37 圖4-2、0%擾動Pf結果 39 圖4-3、30%擾動Pf結果 39 圖4-4、70%擾動Pf結果 39 圖4-5、100%擾動Pf結果 39 圖4-6、Sequence擾動結果(Recall) 41 圖4-7、Sequence擾動結果(Precision) 41 圖4-8、超參數調整演算法收斂圖 45 圖4-9、Huang系統擾動結果圖(情境一) 47 圖4-10、Kang系統擾動結果圖(情境一) 48 圖4-11、Kang系統倍數原始偵測資料集抗擾動實驗 53 圖4-12、Huang系統倍數原始偵測資料集抗擾動實驗 54 圖A-1、訓練時間複雜度 61 圖A-2、資料量與訓練時間關係圖 61 表次表3-1、演算法變數與描述表 21 表3-2、一般事件與高風險事件ID 28 表3-3、研究設定的不可感知性規則 28 表4-1、混淆矩陣 32 表4-2、實驗項目總表 34 表4-3、實驗設備 35 表4-4、取樣策略 37 表4-5、最佳取樣比例Ps結果 38 表4-6、事件紀錄序列表 40 表4-7、訓練評估法實驗結果 42 表4-8、基因調參演算法超參數設定組合 43 表4-9、超參數調整實驗綜合評估項目與權重表 44 表4-10、超參數調整演算法實驗結果 45 表4-11、超參數調整演算法提出超參數結果 45 表4-12、Huang系統擾動結果表(情境二) 47 表4-13、Kang系統擾動結果表(情境二) 49 表4-14、Artificial adversary法比較 50 表4-15、FGSM法比較 51 表4-16、LowProFool法比較 52 表A-1、訓練時間實驗結果 61

參考文獻 References
[1] I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," arXiv preprint arXiv:1412.6572, 2014. [2] C. Szegedy et al., "Intriguing properties of neural networks," arXiv preprint arXiv:1312.6199, 2013. [3] Cisco, "2020 全球網路趨勢報告 ," 2020. [Online]. Available: https://www.cisco.com/c/dam/m/zh_tw/solutions/enterprise-networks/networkingreport/files/Cisco_BlockBuster_2020-Global-Networking-TrendsReport_ZHTW.pdf [4] McKinsey, "The state of AI in 2020," McKinsey, Ed., ed. https://www.mckinsey.com/business-functions/mckinsey-analytics/ourinsights/global-survey-the-state-of-ai-in-2020: mckinsey.com, 2020. [5] "2020 年網路資安威脅偵測數量成長 20%，突破 626 億," vol. 2021, ed: Trend Micro, 2021. [6] "Exploiting AI: How Cybercriminals Misuse and Abuse AI and ML," UNICRI,Trend Micro,Europol, 2020, vol. 2022. [Online]. Available: https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digitalthreats/exploiting-ai-how-cybercriminals-misuse-abuse-ai-and-ml [7] mitre.org. "CVE-2021-44228." https://cve.mitre.org/cgibin/cvename.cgi?name=CVE-2021-44228 (accessed November 22, 2022). [8] T. L. 趨勢科技全球技術支援與研發中心, "Apache Log4j 爆十年來最嚴重的漏洞，而且人人都有危險，Google、Apple、Amazon、 Netflix 等等也都無法倖免," ed, 2021. [9] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, "Detecting adversarial samples from artifacts," arXiv preprint arXiv:1703.00410, 2017. [10] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay, "Adversarial attacks and defences: A survey," arXiv preprint arXiv:1810.00069, 2018. [11] A. Aldahdooh, W. Hamidouche, S. A. Fezza, and O. Déforges, "Adversarial example detection for DNN models: A review and experimental comparison," Artificial Intelligence Review, pp. 1-60, 2022. [12] S. Wang, S. Nepal, C. Rudolph, M. Grobler, S. Chen, and T. Chen, "Backdoor attacks against transfer learning with pre-trained deep learning models," IEEE Transactions on Services Computing, 2020. [13] B. Dickson, "Adversarial AI: Blocking the hidden backdoor in neural networks," vol. 2020, ed. bdtechtalks.com, 2020. [14] Y. Deldjoo, T. D. Noia, and F. A. Merra, "A survey on adversarial recommender systems: from attack/defense strategies to generative adversarial networks," ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1-38, 2021. [15] V. Ballet, X. Renard, J. Aigrain, T. Laugel, P. Frossard, and M. Detyniecki, "Imperceptible adversarial attacks on tabular data," arXiv preprint arXiv:1911.03274, 2019. [16] B. Biggio et al., "Evasion attacks against machine learning at test time," in Joint European conference on machine learning and knowledge discovery in databases, 2013: Springer, pp. 387-402. [17] G. L. Wittel and S. F. Wu, "On Attacking Statistical Spam Filters," in CEAS, 2004. 66 [18] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, "Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition," in Proceedings of the 2016 acm sigsac conference on computer and communications security, 2016, pp. 1528- 1540. [19] K. D. Gupta and D. Dasgupta, "Using Negative Detectors for Identifying Adversarial Data Manipulation in Machine Learning," in 2021 International Joint Conference on Neural Networks (IJCNN), 2021: IEEE, pp. 1-8. [20] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [21] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," 2009. [22] A. I. Newaz, N. I. Haque, A. K. Sikder, M. A. Rahman, and A. S. Uluagac, "Adversarial attacks to machine learning-based smart healthcare systems," in GLOBECOM 2020-2020 IEEE Global Communications Conference, 2020: IEEE, pp. 1-6. [23] F. Cartella, O. Anunciacao, Y. Funabiki, D. Yamaguchi, T. Akishita, and O. Elshocht, "Adversarial attacks for tabular data: Application to fraud detection and imbalanced data," arXiv preprint arXiv:2101.08030, 2021. [24] I. Goodfellow et al., "Generative adversarial nets," Advances in neural information processing systems, vol. 27, 2014. [25] N. Park, M. Mohammadi, K. Gorde, S. Jajodia, H. Park, and Y. Kim, "Data synthesis based on generative adversarial networks," arXiv preprint arXiv:1806.03384, 2018. [26] E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, "Generating multilabel discrete patient records using generative adversarial networks," in Machine learning for healthcare conference, 2017: PMLR, pp. 286-305. [27] L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, "Modeling tabular data using conditional gan," arXiv preprint arXiv:1907.00503, 2019. [28] R. Agarwal, T. Thapliyal, and S. K. Shukla, "Detecting malicious accounts showing adversarial behavior in permissionless blockchains," arXiv preprint arXiv:2101.11915, 2021. [29] Y. Mathov, E. Levy, Z. Katzir, A. Shabtai, and Y. Elovici, "Not all datasets are born equal: On heterogeneous data and adversarial examples," arXiv preprint arXiv:2010.03180, 2020. [30] M. Chalé and N. D. Bastian, "Challenges and opportunities for generative methods in the cyber domain," in 2021 Winter Simulation Conference (WSC), 2021: IEEE, pp. 1- 12. [31] T. Yu and H. Zhu, "Hyper-parameter optimization: A review of algorithms and applications," arXiv preprint arXiv:2003.05689, 2020. [32] J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization," Journal of machine learning research, vol. 13, no. 2, 2012. [33] J. Snoek, H. Larochelle, and R. P. Adams, "Practical bayesian optimization of machine learning algorithms," Advances in neural information processing systems, vol. 25, 2012. [34] J. Luketina, M. Berglund, K. Greff, and T. Raiko, "Scalable gradient-based tuning of continuous regularization hyperparameters," in International conference on machine 67 learning, 2016: PMLR, pp. 2952-2960. [35] S. R. Young, D. C. Rose, T. P. Karnowski, S.-H. Lim, and R. M. Patton, "Optimizing deep learning hyper-parameters through an evolutionary algorithm," in Proceedings of the workshop on machine learning in high-performance computing environments, 2015, pp. 1-5. [36] N. Gorgolis, I. Hatzilygeroudis, Z. Istenes, and L. G. Gyenne, "Hyperparameter optimization of LSTM network models through genetic algorithm," in 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), 2019: IEEE, pp. 1-4. [37] T. Elsken, J. H. Metzen, and F. Hutter, "Neural architecture search: A survey," The Journal of Machine Learning Research, vol. 20, no. 1, pp. 1997-2017, 2019. [38] C. Liu et al., "Progressive neural architecture search," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 19-34. [39] C. Liu et al., "Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 82-92. [40] Z. Guo et al., "Single path one-shot neural architecture search with uniform sampling," in European conference on computer vision, 2020: Springer, pp. 544-560. [41] D. Soni. "artificial-adversary." https://github.com/airbnb/artificial-adversary (accessed May 20, 2022). [42] W. Wang et al., "Delving into data: Effectively substitute training for black-box attack," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4761-4770. [43] A. Botta, "Getting to know a black-box model:A two-dimensional example of Jacobian-based adversarial attacks and Jacobian-based data augmentation," ed. towardsdatascience, 2018. [44] W. Matsuda, M. Fujimoto, and T. Mitsunaga, "Detecting apt attacks against active directory using machine leaning," in 2018 IEEE Conference on Application, Information and Network Security (AINS), 2018: IEEE, pp. 60-65. [45] Q. Cao, Y. Qiao, and Z. Lyu, "Machine learning to detect anomalies in web log analysis," in 2017 3rd IEEE International Conference on Computer and Communications (ICCC), 2017: IEEE, pp. 519-523. [46] R. Chen et al., "Logtransfer: Cross-system log anomaly detection for software systems with transfer learning," in 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020: IEEE, pp. 37-47. [47] 黃嵩育, "基於 Active Directory 事件紀錄偵測系統," 碩士論文, 資訊管理學系, 國立中山大學, 2021. [48] 康為傑, "以非監督式分群及風險分析偵測暴力破解攻擊," 碩士論文, 資訊管理學系研究所, 國立中山大學, 2021. [49] R. Bhargava and C. Clifton, "Anomaly detection under poisoning attacks," in Proceedings of the ODD v5. 0: Outlier Detection De-constructed Workshop, 24th ACM SIGKDD international conference on Knowledge Discovery and Data Mining (KDD), 2018. [50] P. J. Huber, Robust statistical procedures. SIAM, 1996. [51] L. Perini, C. Galvin, and V. Vercruyssen, "A Ranking Stability Measure for Quantifying the Robustness of Anomaly Detection Methods," in Joint European 68 Conference on Machine Learning and Knowledge Discovery in Databases, 2020: Springer, pp. 397-408.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2027-07-30 校外 Off-campus：開放下載的時間 available 2027-07-30 您的 IP(校外) 位址是 3.139.80.42 現在時間是 2024-07-27 論文校外開放下載的時間是 2027-07-30 Your IP address is 3.139.80.42 The current date is 2024-07-27 This thesis will be available to you on 2027-07-30.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2027-07-30

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS