基于注意力融合的OCT视网膜病变分类方法
Attention Fusion-Based Method for OCT Retinal Disease Classification
摘要: 光学相干断层扫描(Optical Coherence Tomography, OCT)是一种非侵入、高分辨率的医学成像技术,能够清晰呈现视网膜层状结构及病变特征,在眼科疾病辅助诊断中具有重要应用价值。针对OCT图像分类任务,现有方法多采用单流卷积神经网络对原始图像进行建模,虽具备一定性能,但在全局结构信息与局部细节特征的协同表征方面仍存在不足,难以充分捕捉病灶边缘、纹理异常及层间结构扰动等关键判别信息。为此,文章提出一种基于双流与双向注意力融合的OCT视网膜病变分类方法。首先构建双流输入:一支为原始OCT图像,用于保留整体结构信息;另一支为辅助表征图像,用于突出边缘与局部异常特征,从而形成互补表示。其次,通过双分支特征提取网络分别对两路输入进行编码,并引入双向注意力融合模块,实现跨分支信息交互,使不同模态特征相互引导,增强对关键病灶区域的表达能力。最后,将融合特征输入分类头完成病变类别预测。在OCT2017数据集上的实验结果表明,所提方法在准确率、召回率及F1值等指标上均优于多种对比方法,证实了双流结构与双向注意力机制的有效性。该方法为OCT图像病变分类提供了一种有效的多特征融合建模思路。
Abstract: Optical Coherence Tomography (OCT) is a non-invasive, high-resolution medical imaging technique that can clearly visualize retinal layer structures and pathological features, and has significant value in assisting the diagnosis of ophthalmic diseases. For OCT image classification tasks, existing methods mostly adopt single-stream convolutional neural networks to model raw images. Although they achieve certain performance, they still have limitations in jointly representing global structural information and local detailed features, making it difficult to fully capture key discriminative cues such as lesion boundaries, texture abnormalities, and inter-layer structural disruptions. To address this issue, this paper proposes an OCT retinal disease classification method based on dual-stream input and bidirectional attention fusion. First, a dual-stream input is constructed: one stream consists of raw OCT images to preserve global structural information, while the other stream consists of auxiliary representation images to emphasize edges and local abnormal features, forming complementary representations. Second, a dual-branch feature extraction network is employed to encode the two inputs separately, and a bidirectional attention fusion module is introduced to enable cross-branch information interaction, allowing features from different modalities to guide each other and enhance the representation of key lesion regions. Finally, the fused features are fed into a classification head to predict disease categories. Experimental results on the OCT2017 dataset demonstrate that the proposed method outperforms multiple baseline approaches in terms of accuracy, recall, and F1-Score, confirming the effectiveness of the dual-stream structure and the bidirectional attention mechanism. This method provides an effective multi-feature fusion modeling approach for OCT image-based disease classification.
文章引用:郝永栋. 基于注意力融合的OCT视网膜病变分类方法[J]. 计算机科学与应用, 2026, 16(5): 33-40. https://doi.org/10.12677/csa.2026.165161

参考文献

[1] Kermany, D.S., Goldbaum, M., Cai, W.J., et al. (2018) Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 172, 1122-1131.e9.
[2] Bouma, B.E., de Boer, J.F., Huang, D., et al. (2022) Optical Coherence Tomography. Nature Reviews Methods Primers, 2, 79.
[3] Yanagihara, R.T., Lee, C.S., Ting, D.S.W. and Lee, A.Y. (2020) Methodological Challenges of Deep Learning in Optical Coherence Tomography for Retinal Diseases: A Review. Translational Vision Science & Technology, 9, 11. [Google Scholar] [CrossRef] [PubMed]
[4] Pang, S., Zou, B., Xiao, X., Peng, Q., Yan, J., Zhang, W., et al. (2024) A Novel Approach for Automatic Classification of Macular Degeneration OCT Images. Scientific Reports, 14, Article No. 19285. [Google Scholar] [CrossRef] [PubMed]
[5] Pan, H., Miao, J., Yu, J., Dong, J., Zhang, M., Wang, X., et al. (2025) A Lightweight Model for the Retinal Disease Classification Using Optical Coherence Tomography. Biomedical Signal Processing and Control, 101, Article 107146. [Google Scholar] [CrossRef
[6] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Lecture Notes in Computer Science, Springer, 3-19. [Google Scholar] [CrossRef
[7] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[8] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[9] Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q. (2017) Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2261-2269. [Google Scholar] [CrossRef
[10] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929.