基于多尺度局部与全局特征融合的轻量级超分辨率网络

doi:10.12677/jisp.2026.152020

期刊菜单

基于多尺度局部与全局特征融合的轻量级超分辨率网络
Lightweight Super-Resolution Network Based on Multi-Scale Local and Global Feature Fusion

DOI: 10.12677/jisp.2026.152020, PDF,
作者: 冯俊杰, 洪智勇^*, 熊利平^*, 劳雪颖：五邑大学电子与信息工程学院，广东江门
关键词: 轻量级超分辨率；大核卷积；特征融合；多尺度；Lightweight Super-Resolution； Large Kernel Convolution； Feature Fusion； Multi-Scale

摘要: 当前图像超分辨率(SR)的研究重点方向是开发一种平衡图像重建性能和低资源消耗量的轻量级模型。Transformer擅长全局特征建模，但资源消耗量和参数量极高，同时欠缺探索精细局部细节的能力。而卷积神经网络(CNN)资源消耗低，但缺乏捕获全局特征的能力。为了解决这些问题，本文提出了一种基于纯CNN的模型，称为多尺度局部与全局特征融合网络(MLGFFN)。MLGFFN采用多尺度大核卷积块和局部特征提取块双分支分别捕获全部和局部特征信息，并且改进了特征融合方式。此外，MLGFFN还引入了特征增强前馈网络进一步细化前一级的特征图，对其中的重要特征信息进行强调，同时对冗余特征进行抑制。大量实验表明，本文提出的MLGFFN方法优于现有的轻量级SR模型，在轻量化设计和图像重建性能之间取得了良好的平衡。特别是在×4放大因子的Set14数据集上，MLGFFN的PSNR指标比SMSR模型高出0.21 dB，而参数量和FLOPs仅为SMSR的37.38%和40%。

Abstract: Current research in Image Super-Resolution (SR) focuses on developing lightweight models that achieve a balance between reconstruction performance and low resource consumption. While Transformer excels at global feature modeling, it suffers from excessive computational overhead and parameter counts, while lacking the ability to explore fine-grained local details. Conversely, Convolutional Neural Networks (CNNs) offer lower resource consumption but are deficient in capturing global features. To address these issues, this paper proposes a pure CNN-based model termed the Multi-scale Local and Global Feature Fusion Network (MLGFFN). MLGFFN utilizes a dual-branch architecture, comprising Multi-scale Large Kernel Convolution blocks and Local Feature Extraction blocks, to capture global and local feature information, respectively, while incorporating an improved feature fusion mechanism. Furthermore, a Feature Enhancement Feed-forward Network (FEFN) is introduced to further refine the feature maps from previous stages, emphasizing critical information while suppressing noise. Extensive experiments demonstrate that the proposed MLGFFN outperforms existing lightweight SR models, achieving a superior balance between lightweight design and image reconstruction performance. Notably, on the Set14 dataset with a ×4 upscaling factor, MLGFFN achieves a PSNR improvement of 0.21 dB over SMSR model, while utilizing only 37.38% of the parameters and 40% of the FLOPs required by SMSR.

文章引用：冯俊杰, 洪智勇, 熊利平, 劳雪颖. 基于多尺度局部与全局特征融合的轻量级超分辨率网络[J]. 图像与信号处理, 2026, 15(2): 235-247. https://doi.org/10.12677/jisp.2026.152020

参考文献

[1]	Dong, C., Loy, C.C., He, K. and Tang, X. (2016) Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 295-307. [Google Scholar] [CrossRef] [PubMed]
[2]	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[3]	Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L. and Timofte, R. (2021) SwinIR: Image Restoration Using Swin Transformer. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 1833-1844. [Google Scholar] [CrossRef]
[4]	Zhou, Y., Li, Z., Guo, C., Bai, S., Cheng, M. and Hou, Q. (2023) SRFormer: Permuted Self-Attention for Single Image Super-Resolution. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 12780-12791. [Google Scholar] [CrossRef]
[5]	Zheng, M., Sun, L., Dong, J. and Pan, J. (2024) SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution. In: European Conference on Computer Vision, Springer, 359-375. [Google Scholar] [CrossRef]
[6]	Zong, Z., Zha, L., Jiang, J. and Liu, X. (2022) Asymmetric Information Distillation Network for Lightweight Super Resolution. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, 19-20 June 2022, 1249-1258. [Google Scholar] [CrossRef]
[7]	Lee, D., Yun, S. and Ro, Y. (2024) Partial Large Kernel CNNS for Efficient Super-Resolution. arXiv: 2404.11848.
[8]	Wang, Y., Li, Y., Wang, G. and Liu, X. (2024) Multi-Scale Attention Network for Single Image Super-Resolution. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, 17-18 June 2024, 5950-5960. [Google Scholar] [CrossRef]
[9]	Yu, Z., Chen, L., Zeng, Z., Yang, K., Luo, S., Chen, S., et al. (2024) LGFN: Lightweight Light Field Image Super-Resolution Using Local Convolution Modulation and Global Attention Feature Extraction. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, 17-18 June 2024, 6712-6721. [Google Scholar] [CrossRef]
[10]	Zhao, L., Wang, Y., Qing, Y., Zeng, D. and Xu, L. (2026) MCFINet: A Cost-Efficient Multi-Channel Feature Integration Network for Surface Scenarios Image Super-Resolution. ACM Transactions on Multimedia Computing, Communications, and Applications, 22, 1-17. [Google Scholar] [CrossRef]
[11]	Zhang, A., Guo, B., Liu, X. and Liu, W. (2025) HIEN: A Hybrid Interaction Enhanced Network for Horse Iris Super-Resolution. Applied Sciences, 15, Article No. 7191. [Google Scholar] [CrossRef]
[12]	Gao, T. and Liu, Y. (2025) PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation. Symmetry, 17, Article No. 1833. [Google Scholar] [CrossRef]
[13]	Wang, Y. and Zhang, T. (2024) Osffnet: Omni-Stage Feature Fusion Network for Lightweight Image Super-Resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 5660-5668. [Google Scholar] [CrossRef]
[14]	Gao, S., Cheng, M., Zhao, K., Zhang, X., Yang, M. and Torr, P. (2021) Res2net: A New Multi-Scale Backbone Architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 652-662. [Google Scholar] [CrossRef] [PubMed]
[15]	Dong, C., Loy, C.C. and Tang, X. (2016) Accelerating the Super-Resolution Convolutional Neural Network. In: European Conference on Computer Vision, Springer International Publishing, 391-407. [Google Scholar] [CrossRef]
[16]	Feng, Z., Lai, J., Xie, X. and Zhu, J. (2018) Image Super-Resolution via a Densely Connected Recursive Network. Neurocomputing, 316, 270-276. [Google Scholar] [CrossRef]
[17]	Ahn, N., Kang, B. and Sohn, K. (2018) Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network. In: Ferrari, V., et al., Eds., Computer Vision—ECCV 2018, Springer International Publishing, 256-272. [Google Scholar] [CrossRef]
[18]	Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B. and Fu, Y. (2018) Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In: Ferrari, V., et al., Eds., Computer Vision—ECCV 2018, Springer International Publishing, 294-310. [Google Scholar] [CrossRef]
[19]	Yang, X., Guo, Y., Li, Z., Zhou, D. and Li, T. (2022) MRDN: A Lightweight Multi-Stage Residual Distillation Network for Image Super-Resolution. Expert Systems with Applications, 204, Article ID: 117594. [Google Scholar] [CrossRef]
[20]	Gao, X., Zhou, Y., Wu, S., Wu, X., Wang, F. and Hu, X. (2024) Residual Multi-Branch Distillation Network for Efficient Image Super-Resolution. Multimedia Tools and Applications, 83, 75217-75241. [Google Scholar] [CrossRef]
[21]	Xie, C., Zhang, X., Li, L., Meng, H., Zhang, T., Li, T., et al. (2023) Large Kernel Distillation Network for Efficient Single Image Super-Resolution. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, 17-24 June 2023, 1283-1292. [Google Scholar] [CrossRef]
[22]	Hong, Z., Liang, G. and Xiong, L. (2025) Gradient Pooling Distillation Network for Lightweight Single Image Super-Resolution Reconstruction. PeerJ Computer Science, 11, e2679. [Google Scholar] [CrossRef] [PubMed]
[23]	Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556.
[24]	Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T. and Xie, S. (2022) A ConvNet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11966-11976. [Google Scholar] [CrossRef]
[25]	Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 10012-10022. [Google Scholar] [CrossRef]
[26]	Ding, X., Zhang, X., Han, J. and Ding, G. (2022) Scaling up Your Kernels to 31 × 31: Revisiting Large Kernel Design in CNNs. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11963-11975. [Google Scholar] [CrossRef]
[27]	Cheng, M., Guo, M., Hou, Q., Hu, S., Liu, Z. and Lu, C. (2022) SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. In: Advances in Neural Information Processing Systems 35, Neural Information Processing Systems Foundation, Inc., 1140-1156. [Google Scholar] [CrossRef]
[28]	Guo, M.H., Lu, C.Z., Liu, Z.N., Cheng, M.M. and Hu, S.M. (2023) Visual Attention Network. Computational Visual Media, 9, 733-752. [Google Scholar] [CrossRef]
[29]	Zhou, L., Cai, H., Gu, J., Li, Z., Liu, Y., Chen, X., et al. (2023) Efficient Image Super-Resolution Using Vast-Receptive-Field Attention. In: European Conference on Computer Vision, Springer, 256-272. [Google Scholar] [CrossRef]
[30]	Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A.P., Bishop, R., et al. (2016) Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1874-1883. [Google Scholar] [CrossRef]
[31]	Hendrycks, D. (2016) Gaussian Error Linear Units (GELUs). arXiv: 1606.08415.
[32]	Zhang, X., Zhou, X., Lin, M. and Sun, J. (2018) ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6848-6856. [Google Scholar] [CrossRef]
[33]	Ba, J.L., Kiros, J.R. and Hinton, G.E. (2016) Layer Normalization. arXiv: 1607.06450.
[34]	Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H. and Zhang, L. (2017) Ntire 2017 Challenge on Single Image Super-Resolution: Methods and Results. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 21-26 July 2017, 114-125.
[35]	Bevilacqua, M., Roumy, A., Guillemot, C. and Morel, M.A. (2012) Low-Complexity Single-Image Super-Resolution Based on Nonnegative Neighbor Embedding. In: Proceedings of the British Machine Vision Conference 2012, BMVA Press, 135.1-135.10. [Google Scholar] [CrossRef]
[36]	Zeyde, R., Elad, M. and Protter, M. (2012) On Single Image Scale-Up Using Sparse-Representations. In: International Conference on Curves and Surfaces, Springer, 711-730. [Google Scholar] [CrossRef]
[37]	Martin, D., Fowlkes, C., Tal, D. and Malik, J. (2001) A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. Proceedings 8th IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, 416-423. [Google Scholar] [CrossRef]
[38]	Huang, J., Singh, A. and Ahuja, N. (2015) Single Image Super-Resolution from Transformed Self-Exemplars. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 5197-5206. [Google Scholar] [CrossRef]
[39]	Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., et al. (2016) Sketch-Based Manga Retrieval Using Manga109 Dataset. Multimedia Tools and Applications, 76, 21811-21838. [Google Scholar] [CrossRef]
[40]	Kingma, D.P. (2014) Adam: A Method for Stochastic Optimization. arXiv: 1412.6980.
[41]	Loshchilov, I. and Hutter, F. (2016) SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv: 1608.03983.
[42]	Wang, L., Dong, X., Wang, Y., Ying, X., Lin, Z., An, W., et al. (2021) Exploring Sparsity in Image Super-Resolution for Efficient Inference. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 4917-4926. [Google Scholar] [CrossRef]
[43]	Pan, J., Sun, L. and Tang, J. (2022) ShuffleMixer: An Efficient Convnet for Image Super-Resolution. Advances in Neural Information Processing Systems 35, New Orleans, 28 November-9 December 2022, 17314-17326. [Google Scholar] [CrossRef]
[44]	Sun, L., Dong, J., Tang, J. and Pan, J. (2023) Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 13190-13199. [Google Scholar] [CrossRef]
[45]	Wu, Z., Liu, W. and Huang, D. (2024) When Handcrafted Filter Meets CNN: A Lightweight Conv-Filter Mixer Network for Efficient Image Super-Resolution. Proceedings of the 2024 International Conference on Multimedia Retrieval, Phuket, 10-14 June 2024, 722-730. [Google Scholar] [CrossRef]
[46]	Song, W., Yan, X., Guo, W., Xu, Y. and Ning, K. (2025) MSWSR: A Lightweight Multi-Scale Feature Selection Network for Single-Image Super-Resolution Methods. Symmetry, 17, Article No. 431. [Google Scholar] [CrossRef]
[47]	Zhang, C., Tu, X., Cui, Z., Gu, X., Li, K. and Lu, Y. (2025) A General Lightweight Image Super-Resolution with Sharpening Enhancement and Double Attention Network. Scientific Reports, 15, Article No. 40848. [Google Scholar] [CrossRef]

友情链接