國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,Vision Transformer神經網路硬體加速器設計,Hardware Accelerator Design for Vision Transformer Neural Network

論文名稱 Title	Vision Transformer神經網路硬體加速器設計 Hardware Accelerator Design for Vision Transformer Neural Network
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	113 學年度第 1 學期 The fall semester of Academic Year 113	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	88
研究生 Author	阮彥喆 Yen-Che Yuan
指導教授 Advisor	蕭勝夫 Hsiao,Shen-Fu
召集委員 Convenor	謝東佑 Hsieh,Tong-Yu
口試委員 Advisory Committee	鄺獻榮 Kuang,Shiann-Rong
口試日期 Date of Exam	2024-08-23	繳交日期 Date of Submission	2024-08-25
關鍵字 Keywords	Transformer、Self-attention、Vision Transformer (ViT)、Systolic Array、深度神經網路硬體加速器、Softmax、Layer Normalization Transformer, Self-attention, Vision Transformer (ViT), Systolic Array, deep neural network hardware accelerator, Softmax, Layer Normalization
統計 Statistics	本論文已被瀏覽 178 次，被下載 0 次 The thesis/dissertation has been browsed 178 times, has been downloaded 0 times.

中文摘要
Vision Transformer(ViT)是首個將原先用於自然語言處理(NLP)之模型Transformer應用於視覺領域的模型。與傳統捲積神經網路(CNN)相比，ViT在分類任務與其他任務上，有較為優異的表現。雖然ViT有著高準確度，但是其計算量龐大，對硬體資源的需求也隨之增加。本論文探討ViT硬體加速器的設計。我們針對ViT的矩陣相乘運算，分析並比較MAT(Multiplier-Adder-Tree)設計和Systolic Array設計的差異。此外，我們也分析了在不同矩陣尺寸下，Systolic Array運算時間和記憶體存取需求之差異。最後，本論文採用低記憶體取與高資源重複使用率的weight stationary Systolic Array設計。此外，針對ViT中每一層encoder內皆會出現的非線性函數Softmax與Layer Normalization，我們分別設計獨立的運算單元。在捲積神經網路中，Softmax函數通常只在最後一層使用，因此對整體模型的精確度影響不大。然而在Vision Transformer中，Softmax函數在每一層都會出現，因此運算過程需要注意精確度的保留，我們設計一高精確度的Softmax函數硬體。由於Layer Normalization的計算在硬體設計中佔了大量的面積，我們採用一硬體友善之Layer Normalization算法，以減少硬體面積的消耗。
Abstract
Vision Transformer (ViT) is the first model to apply the Transformer, originally used in Natural Language Processing (NLP), to the visual domain. Compared with traditional CNN module, Vision Transformer (ViT) has shown excellent performance in classification tasks as well as other tasks. Although ViT offers high accuracy, it also requires significant computational resources, which increases the demand for hardware resources. This paper explores the design of hardware accelerators for ViT. We analyze and compare the differences between the Multiplier-Adder-Tree (MAT) design and the Systolic Array design for ViT's matrix multiplication operations. In addition, we analyze the differences in computation time and memory access requirements of Systolic Arrays under different matrix sizes. Finally, this paper adopts a weight stationary Systolic Array design with low memory access and high resource reuse rates. Moreover, we design independent processing units for the nonlinear functions Softmax and Layer Normalization, which appear in each layer of the ViT encoder. In CNNs, the Softmax function is typically only used in the final layer, so it does not significantly impact the overall model’s accuracy. However, in Vision Transformers, the Softmax function appears in every layer, making it crucial to preserve accuracy during computation. Therefore, we designed a high-accuracy Softmax hardware unit. Since the calculation of Layer Normalization consumes a large amount of area in hardware design, we adopted a hardware-friendly Layer Normalization algorithm to reduce hardware area consumption.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 (Table of Contents) iv 圖目錄(Table of Figures) vii 表目錄(Table of Tables) x 第一章概論 1 1.1 研究動機 1 1.2 論文貢獻 2 1.3 本文大綱 2 第二章 Transformer、Vision Transformer相關模型及硬體加速器 4 2.1 Transformer模型和其內部運算介紹 4 2.1.1 Encoder and Decoder 5 2.1.2 Self-Attention 5 2.1.3 Multi-Head Self-Attention 5 2.1.4 Positional Encoding 6 2.1.5 Feed-Forward Networks 6 2.1.6 Layer Normalization 6 2.1.7 Softmax 7 2.2 Transformer模型應用與Vision Transformer 8 2.2.1 BERT[3] 8 2.2.2 Generative Pre-Trianed Transformer(GPT)[4] 8 2.2.3 Vision Transformer(ViT)[5] 9 2.2.4 Swin Transformer[7] 11 2.2.5 DeiT[8] 12 2.3 Transformer與Vision Transformer相關神經網路硬體加速器 13 2.3.1 Row-wise Accelerator for Vision Transformer [9] 13 2.3.2 Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer[10] 14 2.3.3 Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization[11] 15 2.3.4 ViTA: A Vision Transformer Inference Accelerator for Edge Applications [12] 16 2.3.5 HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers[13] 17 2.3.6 ViA: A Novel Vision Transformer Accelerator Based on FPGA[14] 18 第三章神經網路硬體加速器設計分析 19 3.1 Vision Transformer模型分析 20 3.2 Systolic Array 24 3.2.1 Weight Stationary、Input Stationary 25 3.2.2 Output Stationary 26 3.2.3 Row Stationary 27 3.3 Systolic Array Stationary選擇 28 3.4 Systolic Array形狀與SRAM存取、運算週期關係 32 3.4.1 3232 Systolic Array 32 3.4.2 6464 Systolic Array 39 3.4.3 12864 、64128 Systolic Array 46 3.5 On-Chip Buffer設計 47 3.6 Softmax函數硬體設計相關paper 49 3.6.1 Softmax函數硬體設計相關paper 49 3.6.1.1 A High Speed and Low Complexity Architecture for Softmax Function in Deep Learning[22] 49 3.6.1.2 Aggressive Approximation of the softMax Function for Power-Efficient Hardware Implementaions [23] 50 3.6.1.3 Base-2 Softmax Function:Suitability for Training and Efficient Hardware Implemetation[24] 51 3.6.2 Softmax函數硬體設計主要參考paper 52 第四章神經網路硬體加速器設計及其規格 55 4.1 整體硬體架構 55 4.2 Systolic Array硬體設計 56 4.3 On-chip buffer設計 57 4.4 Softmax函數硬體設計 59 4.5 Layer Normalization函數硬體設計 64 4.6 控制單元設計 68 第五章運算加速分析與相關論文比較 70 5.1 硬體加速器規格 70 5.2 非線性函數準確率測試 72 5.3 相關論文比較 73 第六章結論與未來展望 75 6.1 結論 75 6.2 未來展望 75 參考文獻 76

參考文獻 References
[1] A. Vaswani, Ashish, et al.,“Attention is all you need,” Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS 2017). [2] I. Sutskever, et al.,“Sequence to sequence learning with neural networks,”Advances in neural information processing systems 27 (2014). [3] J. Devlin, et al.,“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,”arXiv.org. https://arxiv.org/abs/1810.04805(2018). [4] P.-P. Ray,“ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope,”Internet of Things and Cyber-Physical Systems (2023). [5] A. Dosovitskiy, et al.,“An image is worth 16x16 words: Transformers for image recognition at scale,”International Conference on Learning Representations (ICLR 2021). [6] K. He, et al.,“Deep residual learning for image recognition,”IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). [7] Z. Liu, et al.“Swin transformer: Hierarchical Vision Transformer using shifted windows,”Proceedings of the IEEE/CVF international conference on computer vision. 2021. [8] H. Touvron, et al.,“Training data-efficient image transformers & distillation through attention,”arXiv.org. https://arxiv.org/abs/2012.12877(2021). [9] H.-Y. Wang, and T.-S. Chang,“Row-wise accelerator for Vision Transformer,”2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022. [10] S. Lu, et al., “Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer,”2020 IEEE 33rd International System-on-Chip Conference (SOCC), 2020, pp. 84-89. [11] Z. Li, et al.,“Auto-vit-acc: An fpga-aware automatic acceleration framework for Vision Transformer with mixed-scheme quantization,”2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2022. [12] S. Nag, et al.,“ViTA: A Vision Transformer Inference Accelerator for Edge Applications,”arXiv preprint arXiv:2302.09108 (2023). [13] P. Dong, et al.,“HeatViT: Hardware-Efficient Adaptive token pruning for vision transformers,”arXiv.org. https://arxiv.org/abs/2211.08110(2023). [14] T. Wang, et al.,“VIA: a novel Vision-Transformer accelerator based on FPGA,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2022). [15] 趙子賢,“MobileViT深度神經網路模型硬體加速器設計,”國立中山大學,高雄,2023. [16] H.-T. Kung,“Why systolic architectures?”Computer 15.1 (1982): 37-46. [17] Y.-H. Chen, et al.,“Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,”ACM SIGARCH computer architecture news 44.3 (2016): 367-379. [18] H. You, et al.,“ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design,”HPCA (2023). [19] B. Keller, et al.,“A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm,”IEEE Symposium on VLSI Technology and Circuits(2022). [20] Y.-F. Li, et al.,“DiVIT: Algorithm and architecture co-design of differential attention in Vision Transformer,”Journal of Systems Architecture 128 (2022): 102520. [21] W.-H. Ye, et al.,“Accelerating attention mechanism on fpgas based on efficient reconfigurable systolic array,”ACM Transactions on Embedded Computing Systems (TECS) (2022). [22] M. Wang, et al.,“A High-Speed and Low-Complexity Architecture for Softmax Function in Deep Learning,”APCCAS (2018). [23] F. Spagnolo, et al.,“Aggressive Approximation of the softMax Function for Power-Efficient Hardware Implementaions,”TCASII (2022). [24] Y. Zhang, et al.,“Base-2 Softmax Function:Suitability for Training and Efficient Hardware Implemetation,”TCASI (2022). [25] M. Milakov, et al.,“Online normalizer calculation for softmax,”https://arxiv.org/abs/1805.02867 (2018). [26] J.-R. Stevens, et al.,“Softermax: Hardware/software co-design of an efficient softmax for transformers,”Design Automation Conference (DAC) (2021). [27] W. Wang, et al.,“SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference,”IEEE/ACM International Conference on Computer Aided Design (ICCAD) (2023).

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2029-08-25 校外 Off-campus：開放下載的時間 available 2029-08-25 您的 IP(校外) 位址是 216.73.216.38 現在時間是 2025-05-25 論文校外開放下載的時間是 2029-08-25 Your IP address is 216.73.216.38 The current date is 2025-05-25 This thesis will be available to you on 2029-08-25.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2029-08-25

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS