基于多通道全光卷积神经网络的语义分割
Semantic Segmentation Based on Multi-Channel All-Optical Convolutional Neural Networks
DOI: 10.12677/mos.2025.145393, PDF,   
作者: 谢小东, 张雨超*:上海理工大学光子芯片研究院,上海;上海理工大学智能科技学院,人工智能纳米光子学中心,上海
关键词: 全光卷积神经网络光学卷积语义分割多通道All-Optical Convolutional Neural Network Optical Convolution Semantic Segmentation Multi-Channel
摘要: 目前,光学神经网络作为一种全新的人工智能技术,以其高算力和低能耗得到了广泛的研究和应用。但是目前常规的光学神经网络结构简单,缺乏对复杂任务的处理能力。文章提出了一种全新的多通道光学卷积神经网络(MC-OCNN)架构,来处理机器视觉中的语义分割问题。MC-OCNN由透镜相位、随机相位,以及光栅相位构建。透镜相位用来实现光学卷积操作,随机相位用来组成光学卷积核,以及光栅相位用来实现多通道架构。MC-OCNN具备多层次结构,由光学卷积层以及光学全连接层级联而成,以光速并行进行卷积计算,并且实现了多维度提取物体不同特征,以及多层次提取物体特征的功能。我们将MC-OCNN应用在人像分割任务以及飞机检测任务中,取得了良好的结果,为其进一步在机器视觉、自动驾驶、智慧医疗等领域的应用奠定了基础。
Abstract: Currently, optical neural networks (ONNs) have attracted wide research and application due to their high computational power and low energy consumption. However, the structure of conventional ONNs is simple, and their ability to handle complex tasks is very limited. Here, we propose a novel multi-channel optical convolutional neural network (MC-OCNN) architecture to address the semantic segmentation problem in machine vision. The MC-OCNN is composed of the lens phase, random phase, and grating phase. The lens phase is used to implement the optical convolution, the random phase is used to form the optical convolutional kernel, and the grating is used to construct the multi-channel architecture. MC-OCNN has a multi-layer structure formed by cascading optical convolutional layers and fully connected optical layers, and it also could parallelly perform convolution computations at the speed of light. Besides, it achieves the extraction of different features of the objects, as well as the multi-level extraction of object features. The MC-OCNN is applied to portrait parsing tasks and aircraft segmentation tasks, achieving promising results that demonstrate great potential for application in fields of machine vision, autonomous driving, smart healthcare, and so on.
文章引用:谢小东, 张雨超. 基于多通道全光卷积神经网络的语义分割[J]. 建模与仿真, 2025, 14(5): 282-292. https://doi.org/10.12677/mos.2025.145393

参考文献

[1] Brown, T.B., Mann, B., Ryder, N., et al. (2020) Language Models Are Few-Shot Learners. Vancouver, 6-12 December 2020, Proceedings of the 34th International Conference on Neural Information Processing Systems, 1877-1901.
[2] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted InterventionMICCAI 2015, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[3] Yakubovskyi, R. and Morozov, Y. (2023) Speech Models Training Technologies Comparison Using Word Error Rate. Advances in Cyber-Physical Systems, 8, 74-80. [Google Scholar] [CrossRef
[4] 陈仁祥, 邱天然, 杨黎霞, 等. 改进YOLOv7的服务机器人密集遮挡目标识别[J]. 光学精密工程, 2024, 32(10): 1595-1605.
[5] 程换新, 徐皓天, 骆晓玲. 基于改进YOLOv7的自动驾驶目标检测方法[J]. 激光杂志, 2024, 45(7): 91-96.
[6] Lin, X., Rivenson, Y., Yardimci, N.T., Veli, M., Luo, Y., Jarrahi, M., et al. (2018) All-Optical Machine Learning Using Diffractive Deep Neural Networks. Science, 361, 1004-1008. [Google Scholar] [CrossRef] [PubMed]
[7] Yan, T., Wu, J., Zhou, T., Xie, H., Xu, F., Fan, J., et al. (2019) Fourier-Space Diffractive Deep Neural Network. Physical Review Letters, 123, Article 23901. [Google Scholar] [CrossRef] [PubMed]
[8] Williamson, I.A.D., Hughes, T.W., Minkov, M., Bartlett, B., Pai, S. and Fan, S. (2020) Reprogrammable Electro-Optic Nonlinear Activation Functions for Optical Neural Networks. IEEE Journal of Selected Topics in Quantum Electronics, 26, 1-12. [Google Scholar] [CrossRef
[9] Zuo, Y., Li, B., Zhao, Y., Jiang, Y., Chen, Y., Chen, P., et al. (2019) All-Optical Neural Network with Nonlinear Activation Functions. Optica, 6, Article 1132. [Google Scholar] [CrossRef
[10] Miscuglio, M., Mehrabian, A., Hu, Z., Azzam, S.I., George, J., Kildishev, A.V., et al. (2018) All-Optical Nonlinear Activation Function for Photonic Neural Networks [Invited]. Optical Materials Express, 8, 3851-3863. [Google Scholar] [CrossRef
[11] Chang, J., Sitzmann, V., Dun, X., Heidrich, W. and Wetzstein, G. (2018) Hybrid Optical-Electronic Convolutional Neural Networks with Optimized Diffractive Optics for Image Classification. Scientific Reports, 8, Article No. 12324. [Google Scholar] [CrossRef] [PubMed]
[12] Luo, Y., Mengu, D., Yardimci, N.T., Rivenson, Y., Veli, M., Jarrahi, M., et al. (2019) Design of Task-Specific Optical Systems Using Broadband Diffractive Neural Networks. Light: Science & Applications, 8, Article No. 112. [Google Scholar] [CrossRef] [PubMed]
[13] Wang, T., Sohoni, M.M., Wright, L.G., Onodera, T., Ma, S., Anderson, M., et al. (2023) Image Sensing with Multi-Layer Nonlinear Optical Neural Networks. AI and Optical Data Sciences IV, San Francisco, 28 January-3 February 2023. PC124380M. [Google Scholar] [CrossRef
[14] Li, J., Mengu, D., Luo, Y., Rivenson, Y. and Ozcan, A. (2020) Class-Specific Differential Detection in Diffractive Optical Neural Networks (Conference Presentation). AI and Optical Data Sciences, San Francisco, 1-6 February 2020, 112990R. [Google Scholar] [CrossRef
[15] Yu, Y., Cao, Y., Wang, G., Pang, Y. and Lang, L. (2023) Optical Diffractive Convolutional Neural Networks Implemented in an All-Optical Way. Sensors, 23, Article 5749. [Google Scholar] [CrossRef] [PubMed]
[16] Sun, Y., Dong, M., Yu, M., Xia, J., Zhang, X., Bai, Y., et al. (2021) Nonlinear All-Optical Diffractive Deep Neural Network with 10.6 μm Wavelength for Image Classification. International Journal of Optics, 2021, Article ID: 6667495. [Google Scholar] [CrossRef
[17] Song, M., Li, R., Guo, R., Ding, G., Wang, Y. and Wang, J. (2022) Single Image Dehazing Algorithm Based on Optical Diffraction Deep Neural Networks. Optics Express, 30, 24394-24406. [Google Scholar] [CrossRef] [PubMed]
[18] Bai, B., Yang, X., Gan, T., Li, J., Mengu, D., Jarrahi, M., et al. (2024) Pyramid Diffractive Optical Networks for Unidirectional Image Magnification and Demagnification. Light: Science & Applications, 13, Article No. 178. [Google Scholar] [CrossRef] [PubMed]
[19] Ashtiani, F., Geers, A.J. and Aflatouni, F. (2022) An On-Chip Photonic Deep Neural Network for Image Classification. Nature, 606, 501-506. [Google Scholar] [CrossRef] [PubMed]
[20] Zhu, H.H., Zou, J., Zhang, H., Shi, Y.Z., Luo, S.B., Wang, N., et al. (2022) Space-Efficient Optical Computing with an Integrated Chip Diffractive Neural Network. Nature Communications, 13, Article No. 1044. [Google Scholar] [CrossRef] [PubMed]
[21] Qu, Y., Zhu, H., Shen, Y., Zhang, J., Tao, C., Ghosh, P., et al. (2020) Inverse Design of an Integrated-Nanophotonics Optical Neural Network. Science Bulletin, 65, 1177-1183. [Google Scholar] [CrossRef] [PubMed]
[22] Dong, B., Aggarwal, S., Zhou, W., Ali, U.E., Farmakidis, N., Lee, J.S., et al. (2023) Higher-Dimensional Processing Using a Photonic Tensor Core with Continuous-Time Data. Nature Photonics, 17, 1080-1088. [Google Scholar] [CrossRef
[23] Al-Amri, S.S., Kalyankar, N.V. and Khamitkar, S.D. (2010) Image Segmentation by Using Edge Detection. International Journal of Advanced Trends in Computer Science and Engineering, 2, 804-807.
[24] Kaganami, H.G. and Beiji, Z. (2009) Region-Based Segmentation versus Edge Detection. 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kyoto, 12-14 September 2009, 1217-1221. [Google Scholar] [CrossRef
[25] Zhu, S., Xia, X., Zhang, Q. and Belloulata, K. (2007) An Image Segmentation Algorithm in Image Processing Based on Threshold Segmentation. 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, Shanghai, 16-18 December 2007, 673-678. [Google Scholar] [CrossRef
[26] Al-Amri, S.S., Kalyankar, N.V. and Khamitkar, S.D. (2010) Image Segmentation by Using Threshold Technique. arXiv: 1005.4020. [Google Scholar] [CrossRef
[27] Felzenszwalb, P.F. and Huttenlocher, D.P. (2004) Efficient Graph-Based Image Segmentation. International Journal of Computer Vision, 59, 167-181. [Google Scholar] [CrossRef
[28] Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440. [Google Scholar] [CrossRef
[29] Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef
[30] Sharma, N. and Aggarwal, L. (2010) Automated Medical Image Segmentation Techniques. Journal of Medical Physics, 35, 3-14. [Google Scholar] [CrossRef] [PubMed]
[31] Rahman, M.M. and Marculescu, R. (2023) Medical Image Segmentation via Cascaded Attention Decoding. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 2-7 January 2023, 6211-6220. [Google Scholar] [CrossRef
[32] Sun, J., Chen, K., He, Z., Ren, S., He, X., Liu, X., et al. (2024) Medical Image Analysis Using Improved Sam-Med2d: Segmentation and Classification Perspectives. BMC Medical Imaging, 24, Article No. 241. [Google Scholar] [CrossRef] [PubMed]
[33] Soilán, M., Riveiro, B., Martínez-Sánchez, J. and Arias, P. (2017) Segmentation and Classification of Road Markings Using MLS Data. ISPRS Journal of Photogrammetry and Remote Sensing, 123, 94-103. [Google Scholar] [CrossRef
[34] Zhu, Y., Zhang, C., Zhou, D., Wang, X., Bai, X. and Liu, W. (2016) Traffic Sign Detection and Recognition Using Fully Convolutional Network Guided Proposals. Neurocomputing, 214, 758-766. [Google Scholar] [CrossRef
[35] Qin, F., Fang, B. and Zhao, H. (2010) Traffic Sign Segmentation and Recognition in Scene Images. 2010 Chinese Conference on Pattern Recognition (CCPR), Chongqing, 21-23 October 2010, 1-5. [Google Scholar] [CrossRef
[36] Yuan, X., Shi, J. and Gu, L. (2021) A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery. Expert Systems with Applications, 169, Article 114417. [Google Scholar] [CrossRef
[37] Basdevant, J. (2009) Le Mémoire de Fresnel sur la diffraction de la lumière. BibNum. [Google Scholar] [CrossRef
[38] Zhou, T., Fang, L., Yan, T., Wu, J., Li, Y., Fan, J., et al. (2020) In situ Optical Backpropagation Training of Diffractive Optical Neural Networks. Photonics Research, 8, 940-953. [Google Scholar] [CrossRef
[39] AISegment Team (2019) Human Matting Dataset (v1.2.1). AISegment.
https://github.com/aisegmentcn/matting_human_datasets
[40] Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014) Microsoft COCO: Common Objects in Context. Computer Vision—ECCV 2014, Zurich, 6-12 September 2014, 740-755. [Google Scholar] [CrossRef