基于Transformer的U型低照度图像增强算法
U-Shaped Low-Illumination Image Enhancement Algorithm Based on Transformer
DOI: 10.12677/mos.2024.133318, PDF,    国家自然科学基金支持
作者: 缪天恒, 王 飞, 丁德锐:上海理工大学光电信息与计算机工程学院,上海;梁 艳:上海理工大学管理学院,上海
关键词: 低照度增强Transformer注意力机制特征融合Low Illumination Enhancement Transformer Attention Mechanism Feature Fusion
摘要: 针对非均匀光照环境下照度自适应算法的固有缺陷以及基于CNN的图像增强模型固有的卷积运算导致的感受野受限、无法建立长距离的全局依赖等问题,本文提出一种融合CA-Transformer模块和全局注意力融合模块的U型网络RT-UNet。本研究设计了基于轴向多头自注意力机制的CA-Transformer模块作为特征提取和重建的基础模块。该模块在兼顾CNN与Transformer结构优点的同时极大地减少了计算复杂度。进而,为了建立不同尺度,不同分辨率特征图之间的信息交互与融合,搭建了全局注意力融合模块来替代之前的残差连接,帮助网络关注到更感兴趣的区域,方便其学习到更有用,更精细的特征。实验结果表明,本算法在主客观评价指标上相比于近几年一些主流图像增强算法均具有很强的竞争力。
Abstract: To solve the inherent shortcoming of adaptive algorithms on illuminance in non-uniform illumination environments, the limited receptive field and the inability to establish long-distance global dependence caused by the inherent convolution operation of the CNN-based image enhancement models, this paper proposes a U-shaped network RT-UNet that integrates both CA-Transformer modules and global attention fusion modules. Specifically, a CA-Transformer module based on the axial multi-head self-attention mechanism is designed as the basic module for feature extraction and reconstruction. This module takes into account the advantages of CNN and Transformer structures while greatly reducing the computational complexity. To establish the information interaction and fusion between feature maps of different scales and different resolutions, a global attention fusion module is constructed to replace the previous residual connection, Such a module makes the network pay attention to the region of more interest and facilitates it to learn more useful and refined features. Finally, experimental results show that the proposed algorithm has strong competitiveness compared with some mainstream image enhancement algorithms in recent years in terms of subjective and objective evaluation indexes.
文章引用:缪天恒, 梁艳, 王飞, 丁德锐. 基于Transformer的U型低照度图像增强算法[J]. 建模与仿真, 2024, 13(3): 3491-3506. https://doi.org/10.12677/mos.2024.133318

参考文献

[1] 刘鑫. 基于深度学习的低照度图像增强方法研究[D]: [硕士学位论文]. 天津: 天津大学, 2018.
[2] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, 4-9 December 2017, 6000-6010.
[3] Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, 5-9 October 2015, 234-241. [Google Scholar] [CrossRef
[4] Jiang, L., Jing, Y., Hu, S., et al. (2018) Deep Refinement Network for Natural Low-Light Image Enhancement in Symmetric Pathways. Symmetry, 10, Article No. 491. [Google Scholar] [CrossRef
[5] Wei, C., Wang, W., Yang, W., et al. (2018) Deep Retinex Decomposition for Low-Light Enhancement.
[6] Guo, C., Li, C., Guo, J., et al. (2020) Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 1780-1789. [Google Scholar] [CrossRef
[7] Shi, P., Xu, X., Fan, X., et al. (2024) LL-UNet : UNet Based Nested Skip Connections Network for Low-Light Image Enhancement. IEEE Transactions on Computational Imaging, 10, 510-521. [Google Scholar] [CrossRef
[8] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., et al. (2018) Unet : A Nested U-Net Architecture for Medical Image Segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, 20 September 2018, 3-11. [Google Scholar] [CrossRef] [PubMed]
[9] Zhang, Z., Jiang, Y., Jiang, J., et al. (2021) Star: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement. Proceedings of the IEEE/CVF International Conference on Computer Vision, 11-17 October 2021, 4106-4115. [Google Scholar] [CrossRef
[10] Souibgui, M.A., Biswas, S., Jemni, S.K., et al. (2022) Docentr: An End-to-End Document Image Enhancement Transformer. 2022 26th IEEE International Conference on Pattern Recognition (ICPR), Montréal, 21-25 August 2022, 1699-1705. [Google Scholar] [CrossRef
[11] Jiang, Y., Chang, S. and Wang, Z. (2021) TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale up. 35th Conference on Neural Information Processing Systems (NeurIPS 2021), 6-14 December 2021, 14745-14758.
[12] Lai, W.S., Huang, J.B., Ahuja, N., et al. (2018) Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 2599-2613. [Google Scholar] [CrossRef
[13] Brauers, J. and Aach, T. (2006) A Color Filter Array Based Multispectral Camera. 12. Workshop Farbbildverarbeitung, Ilmenau, 5-6 October 2006, 1-11.
[14] Wang, Z., Bovik, A.C., Sheikh, H.R., et al. (2004) Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 13, 600-612. [Google Scholar] [CrossRef
[15] Johnson, J., Alahi, A. and Fei-Fei, L. (2016) Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, 11-14 October 2016, 694-711. [Google Scholar] [CrossRef
[16] Wang, R., Zhang, Q., Fu, C.W., et al. (2019) Underexposed Photo Enhancement Using Deep Illumination Estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 6849-6857. [Google Scholar] [CrossRef
[17] He, K., Zhang, X., Ren, S., et al. (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916. [Google Scholar] [CrossRef
[18] Hai, J., Xuan, Z., Yang, R., et al. (2023) R2rnet: Low-Light Image Enhancement via Real-Low to Real-Normal Network. Journal of Visual Communication and Image Representation, 90, Article ID: 103712. [Google Scholar] [CrossRef
[19] Zamir, S.W., Arora, A., Khan, S., et al. (2020) Learning Enriched Features for Real Image Restoration and Enhancement. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, 23-28 August 2020, 492-511. [Google Scholar] [CrossRef
[20] Wang, Z., Cun, X., Bao, J., et al. (2022) Uformer: A General U-Shaped Transformer for Image Restoration. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 17683-17693. [Google Scholar] [CrossRef
[21] Jiang, Y., Gong, X., Liu, D., et al. (2021) Enlightengan: Deep Light Enhancement without Paired Supervision. IEEE Transactions on Image Processing, 30, 2340-2349. [Google Scholar] [CrossRef
[22] Zamir, S.W., Arora, A., Khan, S., et al. (2022) Restormer: Efficient Transformer for High-Resolution Image Restoration. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 5728-5739. [Google Scholar] [CrossRef
[23] Wang, T., Zhang, K., Shen, T., et al. (2023) Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 2654-2662. [Google Scholar] [CrossRef
[24] Fan, C.M., Liu, T.J. and Liu, K.H. (2022) Half Wavelet Attention on M-Net for Low-Light Image Enhancement. 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, 16-19 October 2022, 3878-3882. [Google Scholar] [CrossRef