國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,運用圖像級標籤及合成影像之病理組織語義分割,Histopathology Tissue Semantic Segmentation Using Image-level Labels and Synthetic Images

論文名稱 Title	運用圖像級標籤及合成影像之病理組織語義分割 Histopathology Tissue Semantic Segmentation Using Image-level Labels and Synthetic Images
系所名稱 Department	資訊管理學系 Department of Information Management
畢業學年期 Year, semester	112 學年度第 2 學期 The spring semester of Academic Year 112	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	56
研究生 Author	楊博森 Bo-Sen Yang
指導教授 Advisor	楊惠芳 Yang,Huei-Fang
召集委員 Convenor	劉宗榮 Liu,Tsung-Jung
口試委員 Advisory Committee	劉冠顯, 魏家博 Liu,Kuan-Hsien; Wei,Chia-Po
口試日期 Date of Exam	2024-07-24	繳交日期 Date of Submission	2024-09-04
關鍵字 Keywords	弱監督語義分割、病理組織影像分割、語義分割、深度監督、合成影像 weakly-supervised semantic segmentation, histopathology tissue image segmentation, semantic segmentation, deep supervision, synthetic images
統計 Statistics	本論文已被瀏覽 84 次，被下載 6 次 The thesis/dissertation has been browsed 84 times, has been downloaded 6 times.

中文摘要
病理組織影像分割旨在從組織切片中精確地分割出腫瘤或癌細胞區域，增加臨床診斷的效率。在電腦輔助診斷 (CAD) 中，病理組織影像分割是一個關鍵過程，能幫助分析和診斷病症。傳統上，在訓練分割模型時，通常會利用大量完全標註的病理影像，以確保分割結果的高可信度。然而，標註像素級標籤既耗時又昂貴。近年來，因為圖像級標籤的獲取成本較低，許多研究運用圖像級標籤作為監督訊號，並以此訓練病理組織影像分割。常見的方法是先用圖像級標籤監督分類器訓練，接著利用類別激活圖（CAMs）進行偽監督來訓練分割模型。然而，由於圖像級標籤缺乏組織邊界資訊，所獲得的類別激活圖通常無法精確描繪目標物體的輪廓。為了克服上述問題帶來的挑戰，本論文提出利用病理影像的特性生成合成影像及合成像素級標籤，從而提供更多的監督資訊。本論文的方法引入了變換器（Transformer）架構，使得模型不再依賴傳統卷積神經網絡（CNN），而能以影像區域之間的關係來進行學習。另外，本論文採用了深度監督技術，確保能夠充足地訓練網絡的中間層，從而提升模型的性能。在 LUAD-HistoSeg、BCSS-WSSS 及 GlaS 這三個病理組織影像數據集上進行驗證之結果顯示，所提之方法可提升影像分割的準確度，也證明生成合成影像和使用變換器架構之深度監督的有效性。
Abstract
Semantic segmentation of histopathology tissue images aims to delineate tumor or cancer cells from tissue slide images, aiding doctors in diagnosing patients' conditions quickly and accurately. This process is a crucial component of computer-assisted diagnosis (CAD). Achieving reliable segmentation results typically requires training a segmentation model with a significant amount of fully annotated labels, which involves considerable manual annotation effort and cost. Recently, interest in using image-level labels for weakly supervised semantic segmentation of histopathology tissues has been on the rise, as these labels are easier to obtain. The most common approach involves first training a classifier using image-level labels and generating pixel-level pseudo-labels through class activation maps (CAMs) to supervise the training of segmentation model. Even so, CAMs derived from image-level labels often fail to accurately delineate the contours of the target. To remedy this problem, this thesis presents a method that exploits the characteristics of histopathology tissue images to create synthetic images with pixel-level labels, providing more detailed supervision. Additionally, the presented method replaces traditional convolutional neural networks with a transformer-based architecture as the model backbone and applies deep supervision to ensure more comprehensive training of the intermediate network layers. Experimental results show that the method proposed in this thesis significantly improves segmentation accuracy on LUAD-HistoSeg, BCSS-WSSS and GlaS datasets, demonstrating the effectiveness of generating synthetic images and using a transformer architecture with deep supervision.

目次 Table of Contents
論文審定書 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 摘要...........................................................................ii Abstract.......................................................................iii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 Related Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Weakly Supervised Semantic Segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Histopathology Tissue Semantic Segmentation using Image-level Labels . . . . 9 2.4 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Synthetic Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Phase 1 : Enhancing Feature Learning in Intermediate Network Layers with Deep Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Generate Pseudo-labels for Real Images through the Fused Outputs from Intermediate Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Phase 2 : Combining Real and Synthetic Images for Training a Semantic Segmentation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Experiment Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Main Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Qualitative Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

參考文獻 References
[1] C. Han, J. Lin, J. Mai, Y. Wang, Q. Zhang, B. Zhao, X. Chen, X. Pan, Z. Shi, Z. Xu, S. Yao, L. Yan, H. Lin, X. Huang, C. Liang, G. Han, and Z. Liu, “Multi-layer pseudo- supervision for histopathology tissue semantic segmentation using patch-level clas- sification labels,” Medical Image Analysis, p. 102487, 2022. [2] S. Zhang, J. Zhang, Y. Xie, and Y. Xia, “Tpro: Text-prompting-based weakly su- pervised histopathology tissue segmentation,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2023, pp. 109–118. [3] Z. Fang, Y. Chen, Y. Wang, Z. Wang, X. Ji, and Y. Zhang, “Weakly-supervised se- mantic segmentation for histopathology images based on dataset synthesis and fea- ture consistency constraint,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 606–613. [4] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for seman- tic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440. [5] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241. [6] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, andfully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 40, no. 4, pp. 834–848, 2017. [7] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” in Pro- ceedings of the Neural Information Processing Systems, 2021. [8] J. Gu, H. Kwon, D. Wang, W. Ye, M. Li, Y.-H. Chen, L. Lai, V. Chandra, and D. Z. Pan, “Multi-scale high-resolution vision transformer for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2022, pp. 12 094–12 103. [9] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proceedings of the Interna- tional Conference on Learning Representations, 2020. [10] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3349–3364, 2020. [11] Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang, “Weakly-supervised semantic segmentation network with deep seeded region growing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7014–7023. [12] A. Kolesnikov and C. H. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Proceedings of European Conference on Computer Vision. Springer, 2016, pp. 695–711. [13] A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “What’s the point: Seman- tic segmentation with point supervision,” in Proceedings of European Conference on Computer Vision. Springer, 2016, pp. 549–565. [14] R.Qian,Y.Wei,H.Shi,J.Li,J.Liu,andT.Huang,“Weaklysupervisedsceneparsing with point-based distance metric learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 8843–8850. [15] J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convo- lutional networks for semantic segmentation,” in Proceedings of the IEEE Interna- tional Conference on Computer Vision, 2015, pp. 1635–1643. [16] A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele, “Simple does it: Weakly supervised instance and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 876–885. [17] D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribble-supervised convolu- tional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3159–3167. [18] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep fea- tures for discriminative localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2921–2929. [19] L. Ru, Y. Zhan, B. Yu, and B. Du, “Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 846–16 855. [20] L.Ru,H.Zheng,Y.Zhan,andB.Du,“Tokencontrastforweakly-supervisedseman- tic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3093–3102. [21] Z. Qian, K. Li, M. Lai, E. I. Chang, B. Wei, Y. Fan, Y. Xu et al., “Transformer based multiple instance learning for weakly supervised histopathology image segmenta- tion,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2022. [22] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin trans- former: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012– 10 022. [23] L. Chan, M. S. Hosseini, C. Rowsell, K. N. Plataniotis, and S. Damaskinos, “His- tosegnet: Semantic segmentation of histological tissue type in whole slide images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 10 662–10 671. [24] R.R.Selvaraju,M.Cogswell,A.Das,R.Vedantam,D.Parikh,andD.Batra,“Grad- cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 618–626. [25] Y. Li, Y. Yu, Y. Zou, T. Xiang, and X. Li, “Online easy example mining for weakly- supervised gland segmentation from histology images,” in Proceedings of the In- ternational Conference on Medical Image Computing and Computer-Assisted Inter- vention. Springer, 2022, pp. 578–587. [26] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in Proceedings of the International Conference on Machine Learning. PMLR, 2021, pp. 8748–8763. [27] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” pp. 4171––4186, 2019. [28] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification,” Neurocomputing, vol. 321, pp. 321–331, 2018. [29] A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V. Dalca, “Data augmen- tation using learned transformations for one-shot medical image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2019, pp. 8543–8553. [30] K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, P.-A. Heng, Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez et al., “Gland segmentation in colon histology images: The glas challenge contest,” Medical image analysis, vol. 35, pp. 489–502, 2017. [31] Y.-T. Chang, Q. Wang, W.-C. Hung, R. Piramuthu, Y.-H. Tsai, and M.-H. Yang, “Weakly-supervised semantic segmentation via sub-category exploration,” in Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8991–9000. [32] Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen, “Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation,” in Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 275–12 284. [33] Z. Chen, Z. Tian, J. Zhu, C. Li, and S. Du, “C-cam: Causal cam for weakly super- vised semantic segmentation on medical image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 676–11 685. [34] J.Lee,E.Kim,andS.Yoon,“Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation,” in Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2021, pp. 4071–4080.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0804124-113840.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2453 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2453 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS