基于CNN和Transformer的轻量级超分辨率重建网络研究
Research on Lightweight Super-Resolution Network Based on CNN and Transformer
DOI: 10.12677/CSA.2023.131010, PDF,    科研立项经费支持
作者: 李光明, 金 瑾, 何 嘉*:成都信息工程大学计算机学院,四川 成都;张 倩:活跃网络(成都)有限公司,四川 成都
关键词: 单图像超分辨率重建卷积神经网络Swin Transformer注意力机制动态卷积Single Image Super-Resolution Convolutional Neural Network Swin Transformer Attention Dynamic Convolution
摘要: 随着深度学习的发展,单图像超分辨率技术取得了长足的进步。然而,现有的大多数研究都专注于卷积神经网络来构建具有大量层数的更深层次的网络模型。这些方法难以应用于现实场景,因为它们不可避免的伴随着复杂操作所带来的计算和内存成本问题。为此,我们提出了一种用于单图像超分辨率重建的轻量级混合模型——轻量级融合CNN-Swin Transformer网络。具体来说,我们使用带有移动窗口的Swin Transformer块充分学习图像的长期依赖性,并构建了一个基于CNN的局部特征提取块来有效地提取图像的局部特征细节。同时,设计了一个多路径动态卷积块来学习图像的边缘特征。实验结果表明,与基于Transformer的单图像超分辨率模型相比,本文提出的模型取得了更好的结果。
Abstract: With the development of deep learning, single image super-resolution technology has made great progress. However, most of the existing research focuses on convolutional neural networks to construct deeper network models with a large number of layers. These methods are difficult to apply to real-world scenarios because they inevitably come with computational and memory costs associated with complex operations. Therefore, we propose a lightweight hybrid model for super-resolution reconstruction of single image—lightweight fusion CNN-Swin Transformer network. Specifically, we use Swin Transformer block with shifted windows to fully learn the long-term dependence of the image, and build a CNN-based local feature extraction block to effectively extract the local feature details of the image. Meanwhile, a multipath dynamic convolution block is designed to learn the edge features of the image. Experimental results show that compared with the single image super-resolution model based on Transformer, the proposed model achieves better results.
文章引用:李光明, 张倩, 金瑾, 何嘉. 基于CNN和Transformer的轻量级超分辨率重建网络研究[J]. 计算机科学与应用, 2023, 13(1): 93-103. https://doi.org/10.12677/CSA.2023.131010

参考文献

[1] Ledig, C., Theis, L., Huszar, F., et al. (2016) Photo-Realistic Single Image Super-Resolution Using a Generative Adver-sarial Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 105-114. [Google Scholar] [CrossRef
[2] Kim, J., Lee, J.K. and Lee, K.M. (2016) Deep-ly-Recursive Convolutional Network for Image Super-Resolution. 2016 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1637-1645. [Google Scholar] [CrossRef
[3] Ying, T., Jian, Y. and Liu, X. (2017) Image Super-Resolution via Deep Recursive Residual Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hon-olulu, 21-26 July 2017, 2790-2798.
[4] Ahn, N., Kang, B. and Sohn, K.A. (2018) Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network.
[5] Zheng, H., Wang, X. and Gao, X. (2018) Fast and Accurate Single Image Super-Resolution via Information Distillation Network.
[6] Hui, Z., Gao, X., Yang, Y., et al. (2019) Lightweight Image Super-Resolution with Information Multi-Distillation Network. Proceedings of the 27th ACM Inter-national Conference on Multimedia, Nice, 21-25 October 2019, 2024-2032. [Google Scholar] [CrossRef
[7] Liu, J., Tang, J. and Wu, G. (2020) Residual Feature Distillation Network for Lightweight Image Super-Resolution. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog-nition (CVPR), Seattle, 13-19 June 2020, 2356-2365. [Google Scholar] [CrossRef
[8] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Atten-tion Is All You Need.
[9] Liu, Z., Lin, Y., Cao, Y., et al. (2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 Octo-ber 2021, 9992-10002. [Google Scholar] [CrossRef
[10] Liu, J., Zhang, W., Tang, Y., et al. (2020) Residual Feature Aggregation Network for Image Super-Resolution. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog-nition (CVPR), Seattle, 13-19 June 2020, 2356-2365. [Google Scholar] [CrossRef
[11] Wang, Q., Wu, B., Zhu, P., et al. (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. 2020 IEEE/CVF Conference on Computer Vision and Pat-tern Recognition (CVPR), Seattle, 13-19 June 2020, 11531-11539. [Google Scholar] [CrossRef
[12] Chen, Y., Dai, X., Liu, M., et al. (2020) Dynamic Convolu-tion: Attention Over Convolution Kernels. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 11027-11036. [Google Scholar] [CrossRef
[13] Dong, C., Loy, C.C., He, K., et al. (2016) Image Su-per-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 295-307. [Google Scholar] [CrossRef
[14] Lim, B., Son, S., Kim, H., et al. (2017) Enhanced Deep Residual Networks for Single Image Super-Resolution. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, 21-26 July 2017, 1132-1140. [Google Scholar] [CrossRef
[15] Zhang, Y., Li, K., Li, K., et al. (2018) Image Super-Resolution Using Very Deep Residual Channel Attention Networks. 15th European Conference, Munich, 8-14 September 2018, 294-310. [Google Scholar] [CrossRef
[16] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[17] Chen, H., Wang, Y., Guo, T., et al. (2020) Pre-Trained Image Processing Transformer. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 12294-12305. [Google Scholar] [CrossRef
[18] Liang, J., Cao, J., Sun, G., et al. (2021) SwinIR: Image Restoration Using Swin Transformer. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 1833-1844. [Google Scholar] [CrossRef
[19] Lu, Z., Li, J., Liu, H., et al. (2022) Transformer for Single Image Super-Resolution. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Orleans, 19-20 June 2022, 456-465. [Google Scholar] [CrossRef
[20] Jie, H., Li, S. and Gang, S. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, 18-23 June 2018, 7132-7141.
[21] Cai, J., Gu, S., Timofte, R., et al. (2019) NTIRE 2019 Challenge on Real Image Su-per-Resolution: Methods and Results. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work-shops (CVPRW), Long Beach, 16-17 June 2019, 2211-2223.
[22] Bevilacqua, M., Roumy, A., Guillemot, C., et al. (2012) Low-Complexity Single-Image Super-Resolution Based on Nonnegative Neighbor Embedding. Proceedings British Machine Vision Conference 2012, Surrey, 3-7 September 2012, 135.1-135.10. [Google Scholar] [CrossRef
[23] Zeyde, R., Elad, M. and Protter, M. (2010) On Single Image Scale-Up Using Sparse-Representations. Curves and Surfaces—7th International Conference, Avignon, 24-30 June 2010, 711-730.
[24] Martin, D., Fowlkes, C., Tal, D., et al. (2002) A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. IEEE International Conference on Computer Vision, Vancouver, 7-14 July 2001, 416-423.
[25] Huang, J.B., Singh, A. and Ahuja, N. (2015) Single Image Super-Resolution from Transformed Self-Exemplars. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 5197-5206. [Google Scholar] [CrossRef
[26] Aizawa, K., Fujimoto, A., Otsubo, A., et al. (2020) Building a Manga Dataset “Manga109” with Annotations for Multimedia Applications. IEEE MultiMedia, 27, 8-18. [Google Scholar] [CrossRef
[27] Kim, J., Lee, J.K. and Lee, K.M. (2016) Accurate Image Su-per-Resolution Using Very Deep Convolutional Networks. IEEE Conference on Computer Vision & Pattern Recognition, Las Vegas, 27-30 June 2016, 1646-1654. [Google Scholar] [CrossRef
[28] Lai, W.S., Huang, J.B., Ahuja, N., et al. (2017) Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 5835-5843. [Google Scholar] [CrossRef