國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,使用多任務學習和對抗式神經網路資料擴增建立真實人物的二次創作圖像角色辨識系統,Derivative Work of Real-World Characters Recognition With Multi-task Learning and GAN Data Augmentation

論文名稱 Title	使用多任務學習和對抗式神經網路資料擴增建立真實人物的二次創作圖像角色辨識系統 Derivative Work of Real-World Characters Recognition With Multi-task Learning and GAN Data Augmentation
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	111 學年度第 2 學期 The spring semester of Academic Year 111	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	45
研究生 Author	陳亭椰 Chen Ting Ye
指導教授 Advisor	林耕霈 Lin, Keng-Pei
召集委員 Convenor	楊政融 Cheng-Jung Yang
口試委員 Advisory Committee	楊惠芳, 張慈玲 Yang,Huei-Fang; ZHANG,CI-LING
口試日期 Date of Exam	2023-06-08	繳交日期 Date of Submission	2023-06-23
關鍵字 Keywords	角色辨識、電腦視覺、圖像分類、多任務學習、少樣本、對抗式網路資料擴增 Character recognition, Computer Vision, Image Classification, Multi-Task Learning, Data Scarcity, GAN-based Data Augmentation
統計 Statistics	本論文已被瀏覽 365 次，被下載 0 次 The thesis/dissertation has been browsed 365 times, has been downloaded 0 times.

中文摘要
本研究的目在於解決在現實人物在衍生藝術作品圖像中的角色識別問題，這一問題普遍存在於創作分享平台，且因衍生藝術作品中使用真實人物形象以及由此引起的肖像權問題而日益重要。為了解決這個問題，我們設計並實現了一個能夠準確高效識別此類圖像中角色的計算機視覺系統，並做了一系列實驗，以找出最佳的模型。我們還建立了一個全新的真實衍生藝術圖像數據集，並對包括ResNet、Vision Transformer、EfficientNet和SVM在內的不同深度神經網絡和機器學習模型進行了評估。此外，我們還應用多任務學習和生成對抗網絡來增強模型的分類能力，同時解決了少樣本學習的問題。實驗部份應用相關的理論和文獻，包括深度學習、多任務學習和生成對抗網絡。我們提出的模型和方法在我們的數據集上透過交叉驗證進行了性能評估，主要任務為角色識別，輔助任務為將輸入圖像分類為真實圖像、GAN生成的圖像或衍生藝術作品的圖像。我們證明了我們的方法在角色識別方面實現了高準確率，並超越了先前相關研究在這個領域的成果。
Abstract
This study tackles the issue of identifying characters in derivative artwork images from real-world scenarios, which is prevalent on creative sharing platforms and has gained importance due to the rise of derivative works featuring real-life figures and the resulting issues with portrait rights. To address this problem, we designed and implemented a computer vision system that can accurately and efficiently recognize characters in such images. We also created a novel dataset of real-life derivative artwork images and evaluated different deep neural network models for image classification, including ResNet, Vision Transformer, EfficientNet, and SVM. We additionally employed multi-task learning and generative adversarial networks to enhance the model's ability to classify, while also addressing the issue of limited data availability. The experiments section reviews relevant theories and literature, including deep learning, multi-task learning, and generative adversarial networks. The proposed model undergoes performance assessment on a designated subset of our dataset, with the main task of character recognition and the auxiliary task of classifying whether the input image is a real image, GAN-generated image, or drawing. We demonstrate that our approach achieves high accuracy in character recognition and outperforms previous studies in this area.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii TABLE OF CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii CHAPTER 1 Introduction 1 CHAPTER 2 Related Work 5 2.1 Related Character Recognition Research 5 2.2 Painting and Computer Vision 6 2.3 Data Scarcity 8 2.4 Models 10 CHAPTER 3 Methodology 15 3.1 Problem Formulation 15 3.2 Training Stage 16 3.3 Testing Stage 19 3.4 General Outline of The Proposed Approach 19 CHAPTER 4 Experimental Setup 21 4.1 Dataset Description 21 4.2 Data Pre-Processing 22 4.3 Implementation Details of Our Models 22 4.4 Implementation Details of Compared Models 23 4.5 Evaluation Metrics 24 CHAPTER 5 Experiments 25 5.1 Additional Datasets 25 5.2 Multi-Task Learning 27 5.3 Replacing Classifier with SVM 27 5.4 Compare with the Models Proposed in Related Task 28 5.4.1 Result of the Vision Transformer 28 5.4.2 Result of the ResNet 30 5.4.3 Result of the EfficientNet 30 5.5 Experimental Results Discussion 30 CHAPTER 6 Conclusion 33 REFERENCES 34

參考文獻 References
Abbas, A., M. M. Abdelsamea and M. M. Gaber (2021). "Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network." Applied Intelligence 51(2): 854-864. Bansal, M. A., D. R. Sharma and D. M. Kathuria (2022). "A systematic review on data scarcity problem in deep learning: solution and applications." ACM computing surveys (CSUR) 54(10s): 1-29. Bird, J. J., C. M. Barnes, L. J. Manso, A. Ekárt and D. R. Faria (2022). "Fruit quality and defect image classification with conditional GAN data augmentation." Scientia Horticulturae 293: 110684. bryandlee (2021). "AnimeGANv2." Retrieved May. 6, 2023, from https://github.com/bryandlee/animegan2-pytorch. Cao, Z., J. C. Principe, B. Ouyang, F. Dalgleish and A. Vuorenkoski (2015). Marine animal classification using combined CNN and hand-designed image features. OCEANS 2015-MTS/IEEE Washington, IEEE. Castellano, G. and G. Vessio (2021). "Deep learning approaches to pattern extraction and recognition in paintings and drawings: An overview." Neural Computing and Applications 33(19): 12263-12282. Chavda, A., J. Dsouza, S. Badgujar and A. Damani (2021). Multi-stage CNN architecture for face mask detection. 2021 6th International Conference for Convergence in Technology (i2ct), IEEE. Chen, S., Y. Zhang and Q. Yang (2021). "Multi-task learning in natural language processing: An overview." arXiv preprint arXiv:2109.09138. Devlin, J., M.-W. Chang, K. Lee and K. Toutanova (2018). "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805. Dosovitskiy, A., et al. (2020). "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929. Geirhos, R., P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann and W. Brendel (2018). "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness." arXiv preprint arXiv:1811.12231. He, K., X. Zhang, S. Ren and J. Sun (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. Hijazi, S., R. Kumar and C. Rowen (2015). "Using convolutional neural networks for image recognition." Cadence Design Systems Inc.: San Jose, CA, USA 9. Huang, Z., M. Dong, Q. Mao and Y. Zhan (2014). Speech emotion recognition using CNN. Proceedings of the 22nd ACM international conference on Multimedia. Jignesh Chowdary, G., N. S. Punn, S. K. Sonbhadra and S. Agarwal (2020). Face mask detection using transfer learning of inceptionv3. International Conference on Big Data Analytics, Springer. Khan, S., M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan and M. Shah (2022). "Transformers in vision: A survey." ACM computing surveys (CSUR) 54(10s): 1-41. Krizhevsky, A., I. Sutskever and G. E. Hinton (2017). "Imagenet classification with deep convolutional neural networks." Communications of the ACM 60(6): 84-90. Kurt, Z. and K. Özkan (2017). An image-based recommender system based on feature extraction techniques. 2017 International Conference on Computer Science and Engineering (UBMK), IEEE. LeCun, Y., Y. Bengio and G. Hinton (2015). "Deep learning." nature 521(7553): 436-444. Li, B., Y. Zhu, Y. Wang, C.-W. Lin, B. Ghanem and L. Shen (2021). "AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation." IEEE Transactions on Multimedia. Li, H. (2015). The research of intelligent image recognition technology based on neural network. 2015 International Conference on Intelligent Systems Research and Mechatronics Engineering, Atlantis Press. Naftali, M. G., J. S. Sulistyawan, K. Julian and F. I. Kurniadi (2022). "AniWho: A Quick and Accurate Way to Classify Anime Character Faces in Images." arXiv preprint arXiv:2208.11012. Ochoa, T. T. (2000). "Introduction: Tiger Woods and the First Amendment." Whittier L. Rev. 22: 381. Parmar, N., et al. (2018). Image transformer. International conference on machine learning, PMLR. Rios, E. A., W.-H. Cheng and B.-C. Lai (2021). "DAF: re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition." arXiv preprint arXiv:2101.08674. Russakovsky, O., et al. (2015). "Imagenet large scale visual recognition challenge." International journal of computer vision 115(3): 211-252. Soni, B., D. Thakuria, N. Nath, N. Das and B. Boro (2023). "RikoNet: A Novel Anime Recommendation Engine." Multimedia Tools and Applications: 1-20. Tan, M. and Q. Le (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. International conference on machine learning, PMLR. Tang, Y. (2013). "Deep learning using linear support vector machines." arXiv preprint arXiv:1306.0239. Vaswani, A., et al. (2017). "Attention is all you need." Advances in neural information processing systems 30. Weiss, K., T. M. Khoshgoftaar and D. Wang (2016). "A survey of transfer learning." Journal of Big data 3(1): 1-40. Yi, R., Y.-J. Liu, Y.-K. Lai and P. L. Rosin (2019). Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Zhang, H., S. Liu, C. Zhang, W. Ren, R. Wang and X. Cao (2016). Sketchnet: Sketch classification with web images. Proceedings of the IEEE conference on computer vision and pattern recognition. Zhang, X., J. Zhou, W. Sun and S. K. Jha (2022). "A lightweight CNN based on transfer learning for COVID-19 diagnosis." Computers, Materials and Continua: 1123-1137.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：開放下載的時間 available 2026-06-23 校外 Off-campus：開放下載的時間 available 2026-06-23 您的 IP(校外) 位址是 216.73.216.54 現在時間是 2025-06-17 論文校外開放下載的時間是 2026-06-23 Your IP address is 216.73.216.54 The current date is 2025-06-17 This thesis will be available to you on 2026-06-23.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 2026-06-23

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS