國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於偽標籤、知識蒸餾與生成重放之持續性學習物件偵測,Continual Learning in Object Detection Based on Pseudo-Labeling, Knowledge Distillation, and Generative Replay

論文名稱 Title	基於偽標籤、知識蒸餾與生成重放之持續性學習物件偵測 Continual Learning in Object Detection Based on Pseudo-Labeling, Knowledge Distillation, and Generative Replay
系所名稱 Department	機械與機電工程學系 Department of Mechanical and Electro-Mechanical Engineering
畢業學年期 Year, semester	111 學年度第 2 學期 The spring semester of Academic Year 111	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	70
研究生 Author	陳村洋 Tsun-Yang Chen
指導教授 Advisor	劉耿豪 LIU, KENG-HAO
召集委員 Convenor	程啓正 Cheng,Chi-Cheng
口試委員 Advisory Committee	魏家博 Wei, Chia-Po
口試日期 Date of Exam	2023-07-28	繳交日期 Date of Submission	2023-08-14
關鍵字 Keywords	物件偵測、深度學習、持續性學習、偽標籤、知識蒸餾、生成重放 object detection, deep learning, continual learning, pseudo-labeling, knowledge distillation, generative replay
統計 Statistics	本論文已被瀏覽 216 次，被下載 0 次 The thesis/dissertation has been browsed 216 times, has been downloaded 0 times.

中文摘要
由於人工智慧的崛起，深度學習在近年來非常熱門，成為實現 AI 的主流方法。大多的 AI 應用仰賴足夠的訓練資料與監督式學習方法。然而，若考量到永續的應用模式，傳統監督式學習模式存在的巨大的隱患，模型在訓練完資料以後，如果再讓模型訓練新的資料，此時會發生模型遺忘過去訓練過的資料，這種現象被稱作「災難性遺忘」。該現象也成為持續性學習 (Continual Learning) 的核心問題，因此持續性學習的研究大多在減緩或避免該現象的發生。持續性學習毫無疑問是未來人工智慧的重要技術，讓機器能夠更像人類，擁有終身學習的能力。　　本研究為應用偽標籤 (Pseudo-labeling)、知識蒸餾 (Knowledge distillation)、與生成重放 (Generative replay) 三種持續性學習技巧於物件偵測的議題。我們提出兩種情境，分別為類別上的持續性學習情境與標籤上的持續性學習，並依序在VOC2007與BreCaHAD兩個資料集上進行驗證。在VOC2007實驗中，我們將任務中每一類的GAN模型訓練好，之後先僅使用偽標籤與知識蒸餾兩種技巧進行訓練，再對mAP@.5低於0.5的各個類別分別加入40~60張由GAN生成出的生成影像與其偽標籤一同進行訓練，實現生成重放。同樣地，在BreCaHAD實驗中，我們先使用全體訓練影像訓練出一個GAN模型，並生成約120張混合影像，再使用前一個任務的模型標註偽標籤，之後加入這些影像到各個任務的資料集中進行訓練。這種利用 GAN 生成的偽影像來提升訓練品質屬於生成重放的技巧。實驗結果表明，在50個epochs的VOC2007任務四取得58.2% mAP@.5的表現，在2000個epochs的BreCaHAD任務三取得52.1% mAP@.5的表現，證實了結合生成重放與偽標籤的組合在持續性學習物件偵測上的巨大優勢。
Abstract
With the rise of artificial intelligence (AI), deep learning has become extremely popular in recent years, and become the primary approach to achieve AI. Most AI applications rely on an adequate amount of training data and supervised learning methods. However, considering sustainable applications, the traditional supervised learning paradigm would lead to a significant issue known as "catastrophic forgetting." This phenomenon occurs when a model that was trained on certain data forgets the previously learned data after training with new data. This issue is the core problem in Continual Learning (CL) research. Much CL research aims to mitigate or avoid this phenomenon. The CL is undoubtedly an important technology for the future of artificial intelligence, enabling machines to become more human-like with the ability for lifelong learning. 　　This study applies three CL techniques, namely Pseudo-labeling, Knowledge Distillation, and Generative Replay, to the problem of object detection. We propose two CL scenarios: class-updating scenario and label-updating scenario, and validate them on the VOC2007 and BreCaHAD datasets, respectively. In the VOC2007 experiment, we first train GAN models for all classes of the task. Then, we initially use only Pseudo-labeling and Knowledge distillation techniques for training. Subsequently, for each class with mAP@.5 lower than 0.5, we incorporate 40 to 60 GAN-generated fake images with their pseudo-labels for further training, to fulfill generative replay. Similarly, in the BreCaHAD experiment, we first train a GAN model on all training images and generate around 120 mixed images for generative replay purpose, and generate their pseudo-labels by using the previous model. Later, these images are incorporated with the datasets of respective tasks for training. Experimental results demonstrate that in the Task4 of VOC2007, after 50 epochs, a performance of 58.2% mAP@.5 is achieved, and in the Task3 of BreCaHAD, after 2000 epochs, a performance of 52.1% mAP@.5 is attained. This confirms the substantial advantages and potentials of using combining Generative replay and Pseudo-labeling simultaneously in CL for object detection.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 v 圖次 vii 表次 ix 第一章緒論 1 1.1　研究背景 1 1.2　研究動機與目的 2 1.3　本文貢獻 3 1.4　論文大綱 3 第二章文獻探討 4 2.1　持續性學習 4 2.1.1　災難性遺忘 6 2.1.2　知識蒸餾 8 2.1.3　偽標籤 9 2.1.4　生成重放 10 2.2　Yolo系列 11 2.3　GAN 15 2.4　醫學影像 17 第三章實驗方法與分析 19 3.1　持續性學習實驗情境-1 19 3.1.1　VOC2007資料集 20 3.1.2　實驗流程圖-1 22 3.2　持續性學習實驗情境-2 23 3.2.1　BreCaHad資料集 24 3.2.2　實驗流程圖-2 25 3.3　YOLOv7模型 26 3.4　StyleGAN2模型 30 3.5　實驗設定 33 3.6　評估指標 34 第四章實驗結果 36 4.1　實驗環境 36 4.2　GAN生成影像展示 37 4.3　BreCaHAD生成混合影像展示 40 4.4　實驗結果分析-VOC2007 41 4.5　實驗結果分析-BreCaHAD 48 4.6　ablation study 52 第五章結論與未來展望 56 參考文獻 58

參考文獻 References
參考文獻 [1] B. Liu, "Lifelong machine learning: a paradigm for continuous learning," Frontiers of Computer Science, vol. 11, pp. 359-361, 2017. [2] S. Thrun and T. M. Mitchell, "Lifelong robot learning," Robotics and autonomous systems, vol. 15, no. 1-2, pp. 25-46, 1995. [3] G. M. Van de Ven and A. S. Tolias, "Three scenarios for continual learning," arXiv preprint arXiv:1904.07734, 2019. [4] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, "icarl: Incremental classifier and representation learning," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001-2010. [5] F. M. Castro, M. J. Marín-Jiménez, N. Guil, C. Schmid, and K. Alahari, "End-to-end incremental learning," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 233-248. [6] Y. Wu et al., "Large scale incremental learning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 374-382. [7] C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, "Variational continual learning," arXiv preprint arXiv:1710.10628, 2017. [8] H. Shin, J. K. Lee, J. Kim, and J. Kim, "Continual learning with deep generative replay," Advances in neural information processing systems, vol. 30, 2017. [9] P. Singh, P. Mazumder, P. Rai, and V. P. Namboodiri, "Rectification-based knowledge retention for continual learning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15282-15291. [10] S. Yan, J. Xie, and X. He, "Der: Dynamically expandable representation for class incremental learning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3014-3023. [11] C. Buciluǎ, R. Caruana, and A. Niculescu-Mizil, "Model compression," in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 535-541. [12] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015. [13] D.-H. Lee, "Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks," in Workshop on challenges in representation learning, ICML, 2013, vol. 3, no. 2, p. 896. [14] Y. Teng, A. Choromanska, M. Campbell, S. Lu, P. Ram, and L. Horesh, "Overcoming Catastrophic Forgetting via Direction-Constrained Optimization," in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. [15] N. Kamra, U. Gupta, and Y. Liu, "Deep generative dual memory network for continual learning," arXiv preprint arXiv:1710.10368, 2017. [16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. [17] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7464-7475. [18] I. Goodfellow et al., "Generative adversarial networks," Communications of the ACM, vol. 63, no. 11, pp. 139-144, 2020. [19] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125-1134. [20] T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401-4410. [21] W.-C. Kuo, "非侵入式生醫斷層影像簡介," 物理雙月刊, vol. 28, no. 4, pp. 698-703, 2006. [22] R. Weissleder, "Molecular imaging: exploring the next frontier," Radiology, vol. 212, no. 3, pp. 609-614, 1999. [23] M. L. Mather and C. Baldock, "Ultrasound tomography imaging of radiation dose distributions in polymer gel dosimeters: Preliminary study," Medical physics, vol. 30, no. 8, pp. 2140-2148, 2003. [24] D. G. Gadian, "NMR and its applications to living systems," (No Title), 1995. [25] J. Li et al., "Signet ring cell detection with a semi-supervised learning framework," in Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26, 2019: Springer, pp. 842-854. [26] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, vol. 88, pp. 303-338, 2010. [27] http://host.robots.ox.ac.uk/pascal/VOC/voc2007/ [28] A. Aksac, D. J. Demetrick, T. Ozyer, and R. Alhajj, "BreCaHAD: a dataset for breast cancer histopathological annotation and diagnosis," BMC research notes, vol. 12, no. 1, pp. 1-3, 2019. [29] R. Mehta and C. Ozturk, "Object detection at 200 frames per second," in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0-0. [30] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110-8119. [31] T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive growing of gans for improved quality, stability, and variation," arXiv preprint arXiv:1710.10196, 2017.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2026-08-14 校外 Off-campus：開放下載的時間 available 2026-08-14 您的 IP(校外) 位址是 216.73.216.204 現在時間是 2025-06-28 論文校外開放下載的時間是 2026-08-14 Your IP address is 216.73.216.204 The current date is 2025-06-28 This thesis will be available to you on 2026-08-14.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2026-08-14

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS