國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,應用於神經網路之基於平行計數器的高準確度與低延遲混合型隨機運算架構設計,High Accuracy and Low Latency Hybrid Stochastic Computing for Neural Networks by Using Parallel Counter

論文名稱 Title	應用於神經網路之基於平行計數器的高準確度與低延遲混合型隨機運算架構設計 High Accuracy and Low Latency Hybrid Stochastic Computing for Neural Networks by Using Parallel Counter
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	111 學年度第 1 學期 The fall semester of Academic Year 111	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	90
研究生 Author	陳政廷 Cheng-Ting Chen
指導教授 Advisor	陳坤志 CHEN, KUN-CHIH
召集委員 Convenor	吳安宇 Wu, An-Yeu
口試委員 Advisory Committee	鄺獻榮, 謝東佑 Kuang,Shiann-Rong; Hsieh, Tong-Yu
口試日期 Date of Exam	2022-10-27	繳交日期 Date of Submission	2022-11-13
關鍵字 Keywords	隨機運算、平行計數器、神經網路、混合隨機運算乘累加、資料表示、乙狀函數、線性整流函數 Stochastic Computing, Parallel Counter, Neural Network, Hybrid Stochastic Computing, Data Representation, Sigmoid, Rectified Linear Unit
統計 Statistics	本論文已被瀏覽 78 次，被下載 0 次 The thesis/dissertation has been browsed 78 times, has been downloaded 0 times.

中文摘要
隨著科技日新月異，人工智慧為人類的生活帶來了許多益處。深度神經網路(DNN)在影像辨識、語音辨識、自然語言處理、自動駕駛等領域上展現出其優越性，並在近年來成為最受歡迎的研究題目。儘管深度神經網路在許多的應用中帶來了諸多優勢，但為了實現高精度和多任務設計，神經網路的架構越來越複雜。高複雜度的運算導致硬體面積過大。除此之外，高功耗問題導致深度神經網路無法於邊緣裝置上實現。因此，近似運算設計近年來備受關注，近似運算以計算精確度與硬體資源之間取捨，以達到高效能運算，同時將計算精確度保持在可接受的水平上。其中，隨機運算(Stochastic Computing)已經被證明為能夠降低硬體資源的有效方法。因此，近年來提出許多基於隨機運算的人工神經網路(Artificial Neural Network)。傳統隨機運算使用多工器(Multiplexer)實現加法運算，由於多工器對輸出具有縮放效應，導致傳統隨機運算的神經網路設計具有較低的計算準確度，甚至無法實現大型的神經網路架構。因此，為了要解決上述問題，我們應用平行計數器(Parallel Counter)來解決傳統隨機運算縮放效應的問題，我們提出了一種基於平行計數器的混合隨機運算乘加器(Hybrid Stochastic Computing Multiply Accumulate)設計。然而，混合隨機運算乘加器應用於神經網路會導致資料表示(Data Representation)問題，導致硬體沒辦法管線化，因此，我們提出了一種基於混合隨機運算的乙狀函數(Sigmoid)、線性整流函數(ReLU)來解決資料表示問題和實現非線性激勵函數。由於隨機運算是一組通過隨機位流表示連續值的技術，因此會花費大量的時間在運算上。因此，我們提出了基於管線化的運算架構，可以適用在連續輸入的任務上。實驗結果表明，與以前的基於傳統隨機運算的神經網路相比，提出的方法可以將準確率提高78.4%，與基於累積平行計數器的混合神經網路相比，提出的方法可以減少80%的運算延遲，此外，與基於隨機運算的相關研究相比，提出的方法可以提升105%-197%面積效率與27.5%~58.3%功率效率。
Abstract
According to the development of technology, artificial intelligence has led to a lot of benefits for human life. Deep neural network (DNN) has shown their superiority in image recognition, speech recognition, natural language processing, and autopilot. DNN has become the most popular research topic in recent years. Although DNN brings many advantages in lots of applications, the architecture of neural networks is becoming complex due to high precision and multi-task designs. Besides, it is impossible to implement DNN on edge devices with limited power. Therefore, the approximate computing has attracted much attention in recent years. The approximate computing is a trade-off between computing accuracy and hardware resources to achieve high-performance computing and maintain acceptable accuracy. Among them, Stochastic Computing (SC) has been proven to be an effective method to increase hardware efficiency. Therefore, many SC-based artificial neural networks have been proposed in recent years. The conventional SC uses the multiplexer to be the addition operation. However, since the multiplexer has a scaling effect on the output, the conventional SC designs suffer from a low calculation accuracy, and even it cannot realize large-scale neural networks. Therefore, we use the parallel counter (PC) to solve the problem of the scaling effect of MUX. We propose a parallel counter based hybrid stochastic computing multiply-accumulate design. However, because the PC consists of binary adders, the hybrid stochastic computing multiply-accumulate exists data representation problem in neural networks. Moreover, the data representation problem will result in the architecture cannot be pipelined. Therefore, we propose the binary-in-series-out sigmoid function (BISO Sigmoid), and the binary-in-series-out linear rectification function (BISO ReLU) to achieve the data conversion from the binary to the bit-stream. Furthermore, since SC uses streaming bits to encode the binary into a string bit-stream, it results in longer computing latency. Therefore, we propose a pipeline architecture that supports continuous input tasks. The experimental results show that compared with the MUX-based approach, the proposed method can improve the average accuracy by 78.4%. Compared with the APC-based approach, the proposed method can reduce by 80% latency. Moreover, compared with the SC-based approach, the proposed method can increase 105%-197% area efficiency and 27.5%-58.3% power efficiency.

目次 Table of Contents
論文審定書 i 公開授權書 ii 誌謝 iii 摘要 iv Abstract vi Chapter 1 Introduction 1 1.1 Introduction of Neural Network (NN) 1 1.2 Current DNN Accelerator on Edge Device 6 1.3 Introduction of Approximate Arithmetic Circuits 7 1.4 Design Problems 9 1.5 Thesis Contributions 10 1.6 Thesis Organization 12 Chapter 2 Background of Stochastic Computing 13 2.1 Introduction to stochastic computing 13 2.2 Complex Functions in Stochastic Computing 18 Chapter 3 Review of the Related Works 20 3.1 Neural network designs based on SC 20 3.2 The nonlinear activation function in stochastic computing 23 Chapter 4 Proposed Hybrid Binary-Stochastic Multiply-Accumulate Unit 26 4.1 Conventional Mux Based Multiply-Accumulate Unit 27 4.2 Accumulative Parallel Counter Based Multiply-Accumulate Unit 28 4.3 Proposed Parallel Counter Based Multiply-Accumulate Unit 30 Chapter 5 Proposed Binary-Input-Series-Output Activation Function for Hybrid Stochastic Computing 32 5.1 Activation Function in Neural Networks 32 5.2 Series-Input-Series-Output Hyperbolic Tangent Function 34 5.3 Proposed Binary-Input-Series-Output Sigmoid Activation Function 38 5.4 Proposed Binary-Input-Series-Output ReLU Activation Function 44 Chapter 6 Experimental Results 48 6.1 Simulation setup 48 6.2 Evaluation Results of the Proposed BISO Sigmoid Function 50 6.3 Evaluation Results of the Proposed BISO ReLU Function 52 6.4 Architecture Analysis of the Proposed PC-based NN model with the BISO Sigmoid function and the BISO ReLU function 54 Chapter 7 Architecture Design 59 7.1 The LeNet architecture of the proposed PC-based MAC unit 59 7.2 Hardware performance analysis 68 Chapter 8 Conclusion and Future work 70 8.1 Conclusion 70 8.2 Future work 72 Reference 73

參考文獻 References
[1] Peter Jeffcock. "What’s the Difference Between Al, Machine Learning, and Deep Learning?" URL https://blogs.oracle.com/bigdata/post/whatx27s-the-difference-between-ai-machine-learning-and-deep-learning. [2] Walter P. and Warren M., “A Logical Calculus of Ideas Immanent in Nervous Activity,” The bulletin of mathematical biophysics, Vol.5.4, p.115-133, 1943. [3] Rosenblatt, F., “The perceptron: A probabilistic model for information storage and organization in the brain”. Psychological Review, Vol.65.6, p.386–408,1958. [4] H. Leung and S. Haykin, “The complex backpropagation algorithm,” IEEE Trans. Signul Processing, vol. 39, no. 9, pp. 2101-2104, Sept. 1991. [5] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998 [6] D. Scherer, A. M¨ uller, and S. Behnke, “Evaluation of pooling operations in convolutional architectures for object recognition,” in International conference on artificial neural networks. Springer, 2010, pp. 92–101. [7] S. Gupta et al., “Deep Learning with Limited Numerical Precision,” in ICML, 2015. [8] B. Moons and M. Verhelst, “A 0.3-2.6 TOPS/W Precision-Scalable Processor for Real-Time Large-Scale ConvNets,” in Symp. On VLSI, 2016. [9] Shouyi Yin et al., “A 1.06-to-5.09 TOPS/W Reconfigurable Hybrid-Neural-Network Processor for Deep Learning Applications,” in VLSI Circuits, 2017 Symposium on. IEEE, 2017, pp. C26-C27. [10] Yu-Hsin Chen et al. “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,” 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Jun. 2016. [11] J. Han and M. Orshansky, "Approximate computing: An emerging paradigm for energy-efficient design," 2013 18th IEEE European Test Symposium (ETS), 2013, pp. 1-6. [12] X. Lian, Z. Liu, Z. Song, J. Dai,W. Zhou, and X. Ji, “High-performance fpga-based cnn accelerator with block-floating-point arithmetic,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 1–12, 2019. [13] T. Yang, T. Sato and T. Ukezono, "An Approximate Multiply-Accumulate Unit with Low Power and Reduced Area," 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2019, pp. 385-390 [14] K. Manikantta Reddy, M. H. Vasantha, Y. B. Nithin Kumar, and D. Dwivedi, “Design of approximate booth squarer for error-tolerant computing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 5, pp. 1230–1241, May 2020. [15] H. Jiang, L. Liu, F. Lombardi, and J. Han, “Low-power unsigned divider and square root circuit designs using adaptive approximation,” IEEE Trans. Comput., vol. 68, no. 11, pp. 1635–1646, Nov. 2019. [16] L. Chen, J. Han, W. Liu, and F. Lombardi, “Algorithm and design of a fully parallel approximate coordinate rotation digital computer (CORDIC),” IEEE Trans. Multi-Scale Comput. Syst., vol. 3, no. 3, pp. 139–151, Jul. 2017. [17] A. Alaghi, W. Qian, and J. P. Hayes, “The promise and challenge of stochastic computing,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 8, pp. 1515–1531, Aug. 2018. [18] V. Gaudet and A. Rapley, "Iterative decoding using stochastic computation", Electron. Lett., vol. 39, no. 3, pp. 299, 2003. [19] P. Li, D. J. Lilja, W. Qian, K. Bazargan and M. D. Riedel, "Computation on stochastic bit streams digital image processing case studies", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 3, pp. 449-462, Mar. 2014. [20] B. Brown and H. Card, "Stochastic neural computation. I. Computational elements", IEEE Trans. Comput., vol. 50, no. 9, pp. 891-905, Sep. 2001. [21] Y. Xie, S. Liao, B. Yuan, Y. Wang and Z. Wang, "Fully-parallel area-efficient deep neural network design using stochastic computing", IEEE Trans. Circuits Syst. II Exp. Briefs, vol. 64, no. 12, pp. 1382-1386, Dec. 2017. [22] B. R. Gaines, “Stochastic Computing Systems,” Advances in information systems science, pp. 37–172, 1969. [23] Poppelbaum et al., "Stochastic computing elements and systems," fall joint computer conference. Nov.1967, pp. 635-644. [24] H. Jiang, C. Shen, P. Jonker, F. Lombardi and J. Han, "Adaptive Filter Design Using Stochastic Circuits," 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016, pp. 122-127. [25] B. D. Brown et al., “Stochastic neural computation. I. Computational elements,” IEEE Transactions on Computers, vol. 50, no. 9, pp. 891-905, Sep. 2001. [26] D. Nguyen, H. Ho, D. Bui and X. Tran, "An Efficient Hardware Implementation of Artificial Neural Network based on Stochastic Computing," 2018 5th NAFOSTED Conference on Information and Computer Science (NICS), 2018, pp. 237-242. [27] H. Sim, D. Nguyen, J. Lee and K. Choi, "Scalable stochastic-computing accelerator for convolutional neural networks," 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017, pp. 696-70. [28] M. Lunglmayr et al., “Design and Analysis of Efficient Maximum/Minimum Circuits for Stochastic Computing,” IEEE Transactions on Computers, vol. 69, no. 3, pp. 402-409, Mar. 2020. [29] S. R. Faraji, M. Hassan Najafi, B. Li, D. J. Lilja and K. Bazargan, "Energy-Efficient Convolutional Neural Networks with Deterministic Bit-Stream Processing," 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2019, pp. 1757-1762. [30] B. Li et al., “Using Stochastic Computing to Reduce the Hardware Requirements for a Restricted Boltzmann Machine Classifier,” ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2016, pp. 36-41. [31] P.-S. Ting et al., "Stochastic logic realization of matrix operations," Euromicro Conf. Digital Syst. Design (DSD), pp. 356-364, Aug. 2014. [32] E. E. Swartzlander, "A review of large parallel counter designs," IEEE Computer Society Annual Symposium on VLSI, pp. 89-98, 2004. [33] P. Li, D. J. Lilja, W. Qian, M. D. Riedel and K. Bazargan, "Logical Computation on Stochastic Bit Streams with Linear Finite-State Machines," in IEEE Transactions on Computers, vol. 63, no. 6, pp. 1474-1486, June 2014 [34] B. D. Brown et al., “Stochastic neural computation. I. Computational elements,” IEEE Transactions on Computers, vol. 50, no. 9, pp. 891-905, Sep. 2001. [35] Rosenblatt, Murray. "A central limit theorem and a strong mixing condition." Proceedings of the national Academy of Sciences, pp. 43-47, 1956. [36] Han, Jun, and Claudio Moraga. "The influence of the sigmoid function parameters on the speed of backpropagation learning." International workshop on artificial neural networks. Springer, Berlin, Heidelberg, 1995. [37] Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. "Deep sparse rectifier neural networks." Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011. [38] J. Li et al., "Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks," 2017 International Joint Conference on Neural Networks (IJCNN), 2017, pp. 1230-1236. [39] LeCun, Yann. "The MNIST database of handwritten digits." 1998. [40] K. T. Islam et al., "Handwritten digits recognition with artificial neural network," International Conference on Engineering Technology and Technopreneurship (ICE2T), Sep. 2017, pp. 1-4. [41] M. Yang, B. Li, D. J. Lilja, B. Yuan and W. Qian, "Towards Theoretical Cost Limit of Stochastic Number Generators for Stochastic Computing," 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2018, pp. 154-159 [42] V. T. Lee, A. Alaghi, J. P. Hayes, V. Sathe and L. Ceze, "Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017, pp. 13-18 [43] Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2017, 405–418 [44] Z. Li et al., "HEIF: Highly Efficient Stochastic Computing-Based Inference Framework for Deep Neural Networks," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 8, pp. 1543-1556, Aug. 2019 [45] S. Hashemi, N. Anthony, H. Tann, R. I. Bahar and S. Reda, "Understanding the impact of precision quantization on the accuracy and energy of neural networks," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 [46] S. Thoziyoor, N. Muralimanohar, J. H. Ahn and N. Jouppi, Cacti 5.3, Palo Alto, CA, USA, 2008.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2025-11-13 校外 Off-campus：開放下載的時間 available 2025-11-13 您的 IP(校外) 位址是 216.73.216.204 現在時間是 2025-06-28 論文校外開放下載的時間是 2025-11-13 Your IP address is 216.73.216.204 The current date is 2025-06-28 This thesis will be available to you on 2025-11-13.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2025-11-13

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS