論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
透過潛在向量生成老化與年輕化的人臉圖像的分析 Analysis of Generative Facial Aging and De-aging Images via Latent Vectors |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
90 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2026-03-12 |
繳交日期 Date of Submission |
2026-03-12 |
關鍵字 Keywords |
生成式AI、人臉圖像、年齡增減、潛在向量、感知圖像區塊相似度 Generative AI, Face Image, Face Aging and De-Aging, Latent Vectors, Learned Perceptual Image Patch Similarity (LPIPS) |
||
統計 Statistics |
本論文已被瀏覽 116 次,被下載 3 次 The thesis/dissertation has been browsed 116 times, has been downloaded 3 times. |
中文摘要 |
目前市面上的生成式人工智慧(Generative AI, GAI),例如ChatGPT與Gemini都沒有開源程式碼(Open Source)可提供研究,雖然Age Synthesis (AS)有開源程式碼,但是因為它使用潛在向量(Latent Vectors)來生成不同年齡的人臉圖像,而且使用迴圈(Iterations)作為生成不同年齡圖像的最佳化方法,此方法除了不知道要設定多少迴圈數外,也容易導致執行時間過長。為了解決上述的問題,在本論文中,我們首先針對AS軟體進行程式碼的追蹤(Code Tracing),並且在深入研究後將它的架構整理成五個模組,我們針對這五個模組分析每個模組的運作流程與彼此參數傳遞的關係。不使用AS軟體的迴圈,本論文另外提出一個以感知圖像區塊相似度(Learned Perceptual Image Patch Similarity, LPIPS)為門檻值(Threshold)作為最佳化結束的方法。最後,我們分析本論文使用門檻值的方法與AS軟體使用迴圈的方法對執行時間與生成圖像品質的影響,我們進一步改變LPIPS的不同門檻值與不同的指定年齡,從實驗結果中,我們發現使用門檻值的方法比使用迴圈的方法在生成人臉圖像時不僅可以保留原始圖像的特徵,而且可以大幅縮短AS軟體的平均執行時間。 |
Abstract |
Most commercially available Generative Artificial Intelligence (GAI) systems, such as ChatGPT and Gemini, do not provide open-source codes for research purposes. Although Age Synthesis (AS) has open-source codes, it generates face images at different ages through latent vectors manipulation and employs an iteration-based optimization process to produce age-transformed images. This approach not only lacks clear criteria for determining the appropriate number of iterations but also tends to result in excessive execution time. To address these issues, this thesis first conducts a comprehensive code tracing of the AS software. After an in-depth investigation, we reorganize its architecture into five functional modules and analyze the operational workflow of each module as well as the data transmission among them. Instead of adopting the original iteration-based termination strategy, we propose a threshold-based optimization termination method using Learned Perceptual Image Patch Similarity (LPIPS) as the termination criterion. Finally, we compare the proposed threshold-based method with the original iteration-based method in terms of execution time and generative image quality. We further evaluate the effects of different LPIPS thresholds and different target ages on the generative results. Experimental results demonstrate that the proposed threshold-based method not only preserves the identity features of the original face image during age transformation but also significantly reduces the average execution time of the AS software, compared to the iteration-based approach. |
目次 Table of Contents |
論文審定書 i 致謝 ii 摘要 iii Abstract iv 目錄 v 圖目錄 vii 表目錄 viii 第 一 章 導論 1 1.1 研究動機 1 1.2 研究方法 2 1.3 論文貢獻 3 1.4 章節介紹 3 第 二 章 圖像生成模型與人臉年齡 4 2.1 生成對抗模型 4 2.2 Age Synthesis 6 2.3 StyleGAN2與StyleGAN3 7 2.4 Projector 10 2.5 相關研究 13 第 三 章 潛在向量的人臉年齡增減 17 3.1 人臉年齡的增減流程 17 3.2 潛在向量的最佳化 18 3.3 指定年齡人臉圖像的生成 24 3.4 人臉年齡增減的虛擬碼 28 3.4.1 Age Synthesis的虛擬碼 28 3.4.2 Projector的虛擬碼 32 3.4.3 Age Vector的虛擬碼 41 3.4.4 Latent Edit的虛擬碼 44 第 四 章 結果與分析 47 4.1 參數設定 47 4.2 使用迴圈與門檻值生成圖像 49 4.2.1 使用不同方法生成圖像 50 4.2.2 使用不同方法生成不同年齡的圖像 51 4.3 門檻值固定的增減年齡 53 4.4 門檻值改變的結果 57 4.5 不同終止方法與使用真實人臉圖像 60 第 五 章 結論與未來工作 62 5.1 結論 62 5.2 本論文遭遇的困難 62 5.3 未來工作 63 Reference 64 附錄一 計算LPIPS 69 附錄二 計算年齡位移量 74 Acronyms 77 Index 78 |
參考文獻 References |
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Networks,” Communications of the ACM, vol. 63, no. 11, pp. 139-144, Nov. 2020. [2] M. I. Keskes, “Generative Adversarial Networks for Synthetic Data Generation in Deep Learning Applications,” Journal of Artificial Intelligence Research and Innovation, vol. 1, no. 1, pp. 28–33, Jul. 2025. [3] S. Bouraya and A. Belangour, “A Comparative Analysis of Activation Functions in Neural Networks: Unveiling Categories,” Bulletin of Electrical Engineering and Informatics, vol. 13, no. 5, pp. 3301–3308, Oct. 2024. [4] L. S. Luévano, P. Korshunov, and S. Marcel, “Identity-Preserving Aging and De-Aging of Faces in The StyleGAN Latent Space,” IEEE International Joint Conference on Biometrics (IJCB), Osaka, Japan, Aug. 2025. [5] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Improving the Image Quality of StyleGAN,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, pp. 8107–8116, Jun. 2020. [6] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, “Multilayer Perceptron and Neural Networks,” WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 580–589, Jul. 2009. [7] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Representations by Back-Propagating Errors,” Nature, vol. 323, no. 6088, pp. 533–536, Oct. 1986. [8] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” International Conference on Learning Representations (ICLR), San Diego, USA, May 2015. [9] A. Melnik, M. Miasayedzenkau, D. Makaravets, D. Pirshtuk, E. Akbulut, D. Holzmann, T. Renusch, G. Reichert, and H. Ritter, “Face Generation and Editing with StyleGAN: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 46, no. 5, pp. 3557–3576, May 2024. [10] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations (ICLR), San Diego, USA, May 2015. [11] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 586–595, Jun. 2018. [12] C. Bocci, E. Carlini, and J. Kileel, “Hadamard Products of Linear Spaces,” Journal of Algebra, vol. 448, pp. 595–617, Feb. 2016. [13] G. H. Golub and C. F. Van Loan, “Matrix Computations”, The Johns Hopkins University Press, 4th ed., pp. 68-73, Feb. 2013. [14] I. Shadoul, R. Al-Hmouz, A. Hossen, M. Mesbah, and M. Deveci, “The Effect of Pooling Parameters on the Performance of Convolutional Neural Networks,” Artificial Intelligence Review, vol. 58, no. 9, pp. 271–295, Jun. 2025. [15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Advances in Neural Information Processing Systems (NeurIPS), Long Beach, USA, pp. 5998–6008, Dec. 2017. [16] C. Zheng, P. Ke, Z. Zhang, and M. Huang, “Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning,” Findings of the Association for Computational Linguistics (ACL), Toronto, Canada, pp. 1022–1040, Jul. 2023. [17] W.-H. Lai, W.-L. Chen, and S.-L. Wang, “An Audio Generation Model Based on Empirical Mode Decomposition and Generative Adversarial Networks for Enhancing Voice Quality and Diversity,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2025, no. 1, p. 42, Nov. 2025. [18] Z. Ahmad, S. Bao, and M. Chen, “DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis,” IEEE Access, vol. 13, pp. 69324–69340, Mar. 2025. [19] Y. Jain, A. Nasery, V. Vineet, and H. Behl, “PEEKABOO: Interactive Video Generation via Masked-Diffusion,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, pp.8079–8088, Jun. 2024. [20] Z. Feng, Q. Guo, X. Xiao, R. Xu, M. Yang, and S. Zhang, “Unified Video Generation via Next-Set Prediction in Continuous Domain,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawaii, pp. 19427–19438, Oct. 2025. [21] Y. Li, H. Liu, Q. Wu, F. Mu, J. Yang, J. Gao, C. Li, and Y. J. Lee, “GLIGEN: Open-Set Grounded Text-to-Image Generation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, pp. 22511–22521, Jun. 2023. [22] S. Yuan, J. Huang, X. He, Y. Ge, Y. Shi, L. Chen, J. Luo, and L. Yuan, “Identity-Preserving Text-to-Video Generation by Frequency Decomposition,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, pp. 12978–12988, Jun. 2025. [23] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” Proceedings of the 2nd International Conference on Learning Representations (ICLR), Banff, Canada, Apr. 2014. [24] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, pp. 10684–10695, Jun. 2022. [25] Z. Zhang, Y. Song, and H. Qi, “Age Progression/Regression by Conditional Adversarial Autoencoder,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, pp. 4352–4360, Jul. 2017. [26] X. Shu, J. Tang, Z. Li, H. Lai, L. Zhang, and S. Yan, “Personalized Age Progression with Bi-level Aging Dictionary Learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, pp. 905-917, Apr. 2018. [27] Q. T. M. Pham, J. Yang, and J. Shin, “Semi-Supervised FaceGAN for Face-Age Progression and Regression with Synthesized Paired Images,” Electronics, vol. 9, no. 4, Art. 603, Apr. 2020. [28] S. He, W. Liao, M. Y. Yang, Y.-Z. Song, B. Rosenhahn, and T. Xiang, “Disentangled Lifespan Face Synthesis,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, pp. 13626–13636, Oct. 2021. [29] S. Banerjee, G. Mittal, A. Joshi, C. Hegde, and N. Memon, “Identity-Preserving Aging of Face Images via Latent Diffusion Models,” Proceedings of the IEEE International Joint Conference on Biometrics (IJCB), Ljubljana, Slovenia, pp. 1-10, Sep. 2023. [30] T. Ito, Y. Endo, and Y. Kanamori, “SelfAge: Personalized Facial Age Transformation Using Self-reference Images,” The Visual Computer, vol. 41, pp. 6769–6781, May 2025. [31] L. I. A. dos Santos, J. Despois, T. Chauffier, S. O. Ba, and G. Palma, “Locally Controlled Face Aging with Latent Diffusion Models,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Gwangju, South Korea, pp. 6932–6940, Oct. 2025. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外完全公開 unrestricted 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |