國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於深度學習與語義提取之自動化韌體漏洞檢測系統,Automatic Firmware Vulnerability Detection Based on Deep Learning with Semantic Extraction

論文名稱 Title	基於深度學習與語義提取之自動化韌體漏洞檢測系統 Automatic Firmware Vulnerability Detection Based on Deep Learning with Semantic Extraction
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	110 學年度第 2 學期 The spring semester of Academic Year 110	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	85
研究生 Author	王子菁 Tzu-Ching Wang
指導教授 Advisor	陳嘉玫 Chen,Chia-Mei
召集委員 Convenor	賴谷鑫 Gu Hsin Lai
口試委員 Advisory Committee	楊惠芳, 歐雅惠, 林孝忠 Yang,Huei-Fang; Ya-Hui Ou; Lin, Hsiao-Chung
口試日期 Date of Exam	2022-07-19	繳交日期 Date of Submission	2022-08-24
關鍵字 Keywords	韌體、自動化、跨架構、二進制程式、相似度檢測、深度學習、自然語言處理 Firmware, automation, cross-architecture, binary programs, similarity detection, deep learning, natural language processing
統計 Statistics	本論文已被瀏覽 319 次，被下載 0 次 The thesis/dissertation has been browsed 319 times, has been downloaded 0 times.

中文摘要
隨著物聯網的盛行，其所受到的資安攻擊日益新增，相關的安全議題也逐漸受到大家重視，根據Synopsys科技公司發布的報告顯示，物聯網設備商經常會引入開源和第三方程式碼來擴充其功能，但由於難以控制其安全風險，當函式庫中出現漏洞時，經常會讓一系列的產品受到影響，成為駭客攻擊的目標。研究人員在檢測設備時大多會先從韌體開始，在檢測時通常需要配合工具協助，除了對設備架構要有一定的熟悉度之外，也仰賴研究人員長期累積的經驗，但韌體的跨架構與其複雜結構除了會造成分析上的困難，也會增加時間與金錢的成本。本研究統整先前的研究方法，提出一套自動化跨架構韌體漏洞檢測系統，分別使用自然語言處理與多種神經網路訓練模型，與漏洞程式碼進行相似度檢測，同時也從韌體檔案系統中搜尋敏感資訊，經過系統分析後會產生對應的結果報告，輔助研究人員初步辨識目標設備可能存在的風險，降低耗費的時間成本。
Abstract
With the prevalence of the Internet of Things, more and more information security attacks have been encountered, and the IoT security has gradually become a significant issue. According to a report released by Synopsys Technology, IoT equipment vendors often introduce open source and third-party codes to expand its functions, but due to the difficulty in controlling security risks, once there are vulnerabilities in function libraries, a series of products are often affected and become the target of hacker attacks. Most researchers start with the firmware when testing devices, they usually need the accessory tools. In addition to having a certain degree of familiarity with the device architecture, they also rely on the researchers’ long-term experience. However, the diversity of firmware architectures and humongous amounts of files in file system can procrastinate the progress of firmware analysis significantly, and its complex structure can also increase the cost in time and money. This study summarizes previous works and proposes an automatic cross-platform firmware vulnerability detection system. It uses natural language processing and various neural network training models to perform similarity detection with vulnerability code, and searches for sensitive information from the firmware file system at the same time. After system analysis, a corresponding result report will be generated to assist researchers identify the possible risks of the target device and reduce the time and cost.

目次 Table of Contents
論文審定書 i 摘要 ii ABSTRACT iii 目錄 iv 圖次 vii 表次 ix Chapter 1 緒論 1 1.1 研究背景 1 1.2 研究動機 3 Chapter 2 文獻探討 6 2.1 韌體 6 2.2 相似度檢測 7 2.2.1 相似度檢測流程 7 2.2.2 二進制程式碼檢測的難題 8 2.3 動態分析 9 2.4 靜態分析 10 2.4.1 基於文本的檢測方式 10 2.4.2 基於屬性的檢測方式 11 2.4.3 基於程式邏輯的檢測方式 11 2.4.4 基於語義的檢測方式 12 2.5 韌體分析工具 14 2.5.1 拆解工具 14 2.5.2 檢測工具 14 2.5.3 二進制檔分析工具 15 2.6 神經網路模型 16 2.6.1 深度神經網路 17 2.6.2 循環神經網路 19 2.6.3 孿生神經網路(Siamese Networks) 20 2.7 自然語言處理模型 21 Chapter 3 研究方法 24 3.1 預處理模組 27 3.2 Function2Vector模型 28 3.2.1 指令嵌入向量提取 29 3.2.2 函式語義特徵向量提取 32 3.2.3 模型訓練 37 3.3 漏洞檢測模組 38 3.4 敏感資訊檢索模組 41 Chapter 4 系統評估 45 4.1 實驗一模型評估 50 4.1.1 優化器與資料量實驗 50 4.1.2 相似度門檻值與Top-K實驗 53 4.2 實驗二單一平台和跨平台實驗 55 4.3 實驗三加入語義提取方法結果比較 57 4.3.1 語義提取方式比較 57 4.3.2 與林學者系統比較 58 4.4 實驗四系統成效實驗 61 4.4.1 預處理模組 61 4.4.2 敏感資訊檢索模組 62 4.4.3 漏洞檢測模組 63 4.4.4 IoT Inspector系統驗證 65 4.4.5 最終成果報告 66 4.5 小結 67 Chapter 5 研究貢獻與未來展望 69 參考文獻 71

參考文獻 References
[1] "Internet of Things (IoT) market." https://www.marketdataforecast.com/market-reports/internet-of-things-iot-market (accessed January, 2022). [2] "FORTINET 發布《台灣最新資安威脅情報》:迎戰新型態網路攻擊整合各種資安防禦工具才能突圍." https://m.fortinet.com.tw/site/Integrating-various-information-security-defense-tools-to-breakthrough/#news_note1 (accessed May 12, 2021). [3] "恐影響上億裝置的 Ripple20 漏洞，廠商因應現況總整理(至 2020 年 6 月 30 日)." https://www.ithome.com.tw/news/138395 (accessed June 30, 2020). [4] OWASP. "OWASP Firmware Security Testing Methodology." https://scriptingxss.gitbook.io/firmware-security-testing-methodology/?fbclid=IwAR0HQHFmSe0_WZuaJU59hPDxQBHstZcNCCmisoKWmstZMfsv_M6WiG4hI (accessed May 16, 2021). [5] A. Own. "OpenSSL Heartbleed 全球駭客的殺戮祭典，你參與了嗎?" DEVCORE2014. https://devco.re/blog/2014/04/11/openssl-heartbleed-how-to-hack-how-to-protect/ (accessed Dec 2, 2020). [6] I. Synopsys. "2020 Open Source Security and Risk Analysis Re-port[EB/OL]." https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html (accessed 2020-07-08). [7] 王雅慧. "淺談 Embedded System 與 MCU." sci2018. https://www.eebreakdown.com/2018/11/embedded-system-mcu.html (accessed Dec 2, 2020). [8] I. C. Martínez. "The key to everything: Firmware on IoT devices." PUFFIN SECURITY. https://www.puffinsecurity.com/the-key-to-everything-firmware-on-iot-devices/ (accessed Dec 2, 2020). [9] G. WHALE, "Plague: plagiarism detection using program structure," School of Electrical Engineering and Computer Science, University of New South Wales. [10] D. D. Chen, M. Woo, D. Brumley, and M. Egele, "Towards automated dynamic analysis for linux-based embedded firmware," in NDSS, 2016, vol. 1, pp. 1.1-8.1. [11] J. Chen, "IoTFuzzer: Discovering Memory Corruptions in IoT Through App-based Fuzzing," in NDSS, 2018. [12] M. Kim, D. Kim, E. Kim, S. Kim, Y. Jang, and Y. Kim, "Firmae: Towards large-scale emulation of iot firmware for dynamic analysis," in Annual Computer Security Applications Conference, 2020, pp. 733-745. [13] 方磊, 武泽慧, and 魏强, "二进制代码相似性检测技术综述," 计算机科学, vol. 48, no. 5, pp. 1-8. [14] A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti, "A {Large-Scale} Analysis of the Security of Embedded Firmwares," in 23rd USENIX Security Symposium (USENIX Security 14), 2014, pp. 95-110. [15] J. W. Oh, "DarunGrim: a patch analysis and binary diffing too," ed, 2011. [16] S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla, "discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code," in NDSS, 2016, vol. 52, pp. 58-79. [17] Zynamics, "BinDifHome[EB/OL].(2020-05-05)[2020-07-11]." [Online]. Available: https://www.zynam-ics.com/bindif.html. [18] D. Gao, M. K. Reiter, and D. Song, "Binhunt: Automatically finding semantic differences in binary programs," in International Conference on Information and Communications Security, 2008: Springer, pp. 238-255. [19] J. Ming, M. Pan, and D. Gao, "iBinHunt: Binary hunting with inter-procedural control flow," in International Conference on Information Security and Cryptology, 2012: Springer, pp. 92-109. [20] Y. David, N. Partush, and E. Yahav, "Statistical similarity of binaries," Acm Sigplan Notices, vol. 51, no. 6, pp. 266-280, 2016. [21] Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin, "Scalable graph-based bug search for firmware images," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 480-491. [22] A. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm," Advances in neural information processing systems, vol. 14, 2001. [23] X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, "Neural network-based graph embedding for cross-platform binary code similarity detection," in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 363-376. [24] S. H. Ding, B. C. Fung, and P. Charland, "Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization," in 2019 IEEE Symposium on Security and Privacy (SP), 2019: IEEE, pp. 472-489. [25] F. Zuo, X. Li, P. Young, L. Luo, Q. Zeng, and Z. Zhang, "Neural machine translation inspired binary code similarity comparison beyond function pairs," arXiv preprint arXiv:1808.04706, 2018. [26] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [27] Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, and S. Wu, "Order matters: semantic-aware neural networks for binary code similarity detection," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 01, pp. 1145-1152. [28] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018. [29] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, "Neural message passing for quantum chemistry," in International conference on machine learning, 2017: PMLR, pp. 1263-1272. [30] L. Massarelli, G. A. D. Luna, F. Petroni, R. Baldoni, and L. Querzoni, "Safe: Self-attentive function embeddings for binary similarity," in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, 2019: Springer, pp. 309-329. [31] C. Heffner, "Differentiate encryption from compression using math," DEV/TTYS0 Embedded Device Hacking, blog post, vol. 12, 2013. [32] craigz28, "A simple bash script for searching the extracted or mounted firmware file system." [Online]. Available: https://github.com/craigz28/firmwalker. [33] Y. Shoshitaishvili, "Sok:(state of) the art of war: Offensive techniques in binary analysis," in 2016 IEEE Symposium on Security and Privacy (SP), 2016: IEEE, pp. 138-157. [34] B. Liu, "αdiff: cross-version binary code similarity detection with dnn," in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 667-678. [35] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013. [36] Z. Lin, "A structured self-attentive sentence embedding," arXiv preprint arXiv:1703.03130, 2017. [37] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2027-08-24 校外 Off-campus：開放下載的時間 available 2027-08-24 您的 IP(校外) 位址是 3.144.254.237 現在時間是 2024-07-27 論文校外開放下載的時間是 2027-08-24 Your IP address is 3.144.254.237 The current date is 2024-07-27 This thesis will be available to you on 2027-08-24.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2027-08-24

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS