基于LAE-DeepLabv3+的舌象图像分割方法
Tongue Image Segmentation Method Based on LAE-DeepLabv3+
DOI: 10.12677/sea.2026.152019, PDF,    科研立项经费支持
作者: 李月月, 石 文*, 曾庆红, 魏楚婷, 王皓宇, 孙雅星, 郑 涛:天津商业大学信息工程学院,天津;张静宇:天津市南开医院,天津
关键词: 舌象分割DeepLabv3+轻量化网络EMA模块多尺度特征融合Tongue Image Segmentation DeepLabv3+ Lightweight Network EMA Module Multi-Scale Feature Fusion
摘要: 针对舌象图像分割中模型参数量大、舌体边缘模糊及易受背景干扰等问题,提出一种轻量化高精度分割模型LAE-DeepLabv3+。首先,采用EfficientViT替换主干网络,大幅降低模型计算复杂度;其次,在低层特征分支引入EMA (Efficient Multi-scale Attention)模块,增强网络对舌体边界及局部细节的感知能力;最后,在ASPP (Atrous Spatial Pyramid Pooling)模块中增加水平与垂直条带池化分支,强化方向性上下文与长程依赖建模。在自建舌象数据集上的实验结果表明,该模型的IoU和Dice分别达到96.47%和98.20%,参数量仅为12.12 M。该方法有效实现了分割精度与轻量化的平衡,为舌象客观化分析提供了可靠支持。
Abstract: To address the challenges of large parameter volumes, blurred tongue edges, and background interference in tongue image segmentation, a lightweight and high-precision segmentation model, LAE-DeepLabv3+, is proposed. First, the EfficientViT architecture is employed to replace the original backbone network, significantly reducing computational complexity. Second, an EMA (Efficient Multi-scale Attention) module is integrated into the low-level feature branch to enhance the network's perception of tongue boundaries and local details. Finally, horizontal and vertical strip pooling branches are added to the ASPP (Atrous Spatial Pyramid Pooling) module to strengthen directional context and long-range dependency modeling. Experimental results on a self-built tongue image dataset demonstrate that the model achieves an IoU of 96.47% and a Dice coefficient of 98.20%, with a parameter count of only 12.12 M. This method effectively balances segmentation accuracy and model lightness, providing reliable support for the objective analysis of tongue diagnosis.
文章引用:李月月, 石文, 曾庆红, 张静宇, 魏楚婷, 王皓宇, 孙雅星, 郑涛. 基于LAE-DeepLabv3+的舌象图像分割方法[J]. 软件工程与应用, 2026, 15(2): 190-204. https://doi.org/10.12677/sea.2026.152019

参考文献

[1] Huang, C., Lin, H., Liao, W., Ceurvels, W. and Su, S. (2019) Diagnosis of Traditional Chinese Medicine Constitution by Integrating Indices of Tongue, Acoustic Sound, and Pulse. European Journal of Integrative Medicine, 27, 114-120. [Google Scholar] [CrossRef
[2] Pang, B., Zhang, D., Li, N. and Wang, K. (2004) Computerized Tongue Diagnosis Based on Bayesian Networks. IEEE Transactions on Biomedical Engineering, 51, 1803-1810. [Google Scholar] [CrossRef] [PubMed]
[3] 王旭阳, 刘世健. 基于多方向阈值的超分辨率图像噪声识别仿真[J]. 计算机仿真, 2021, 38(12): 132-135, 181.
[4] Wu, K. and Zhang, D. (2015) Robust Tongue Segmentation by Fusing Region-Based and Edge-Based Approaches. Expert Systems with Applications, 42, 8027-8038. [Google Scholar] [CrossRef
[5] Liu, W., Zhou, C., Li, Z. and Hu, Z. (2020) Patch-Driven Tongue Image Segmentation Using Sparse Representation. IEEE Access, 8, 41372-41383. [Google Scholar] [CrossRef
[6] 刘冬梅, 常发亮. 结合Retinex校正和显著性的主动轮廓图像分割[J]. 光学精密工程, 2019, 27(7): 1593-1600.
[7] Zhao, Z.X., Wang, A.M. and Shen, L.S. (1999) The Color Tongue Image Segmentation Based on Mathematical Morphology and HIS Model. Journal of Beijing Polytechnic University, 25, 67-71. (In Chinese)
[8] Zhai, X., Lu, H. and Zhang, L. (2009) Application of Image Segmentation Technique in Tongue Diagnosis. 2009 International Forum on Information Technology and Applications, Chengdu, 15-17 May 2009, 768-771. [Google Scholar] [CrossRef
[9] Long, J., Shelhamer, E. and Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440.[CrossRef
[10] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted InterventionMICCAI 2015, Springer, 234-241. [Google Scholar] [CrossRef
[11] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495. [Google Scholar] [CrossRef] [PubMed]
[12] Cai, Y., Wang, T., Liu, W. and Luo, Z. (2020) A Robust Interclass and Intraclass Loss Function for Deep Learning Based Tongue Segmentation. Concurrency and Computation: Practice and Experience, 32, e5849. [Google Scholar] [CrossRef
[13] Huang, X., Zhang, H., Zhuo, L., Li, X. and Zhang, J. (2020) TisNet-Enhanced Fully Convolutional Network with Encoder-Decoder Structure for Tongue Image Segmentation in Traditional Chinese Medicine. Computational and Mathematical Methods in Medicine, 2020, Article ID: 6029258. [Google Scholar] [CrossRef] [PubMed]
[14] Chen, L.C., Papandreou, G., Kokkinos, I., et al. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv: 1412.7062.
[15] Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848. [Google Scholar] [CrossRef] [PubMed]
[16] Chen, L.C., Papandreou, G., Schroff, F., et al. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv: 1706.05587.
[17] Chen, L., Zhu, Y., Papandreou, G., Schroff, F. and Adam, H. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer VisionECCV 2018, Springer, 833-851. [Google Scholar] [CrossRef
[18] Liu, X., Peng, H., Zheng, N., Yang, Y., Hu, H. and Yuan, Y. (2023) EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 14420-14430. [Google Scholar] [CrossRef
[19] Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., et al. (2023) Efficient Multi-Scale Attention Module with Cross-Spatial Learning. ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, 4-10 June 2023, 1-5. [Google Scholar] [CrossRef
[20] 许家佗. 中医舌诊临床图解[M]. 北京: 化学工业出版社, 2017.
[21] 王彦晖. 临床实用舌象图谱[M]. 北京: 化学工业出版社, 2012.
[22] Tang, C., Chen, H., Li, X., Li, J., Zhang, Z. and Hu, X. (2021) Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 13921-13930. [Google Scholar] [CrossRef
[23] Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O.R. and Jagersand, M. (2020) U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognition, 106, Article ID: 107404. [Google Scholar] [CrossRef
[24] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N. and Liang, J. (2018) UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In: Stoyanov, D., et al., Eds., Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, 3-11. [Google Scholar] [CrossRef] [PubMed]
[25] Yu, C., Gao, C., Wang, J., Yu, G., Shen, C. and Sang, N. (2021) BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. International Journal of Computer Vision, 129, 3051-3068. [Google Scholar] [CrossRef
[26] Oktay, O., Schlemper, J., Folgoc, L.L., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas. arXiv: 1804.03999.
[27] Chen, J., Lu, Y., Yu, Q., et al. (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv: 2102.04306.