基于注意力机制的人脸超分辨率重建方法
Face Super-Resolution Reconstruction Method Based on Attention Mechanism
摘要: 人脸超分辨率重建技术旨在从低分辨率图像中恢复高分辨率人脸,但其核心挑战在于大倍率放大时高频细节的严重丢失。现有方法虽尝试引入人脸结构化先验信息(如关键点与解析图)以提供结构约束,但仍普遍存在模型复杂、特征利用不充分等问题。本文提出一种基于高效通道注意力与结构化先验融合的双阶段人脸超分网络。本方法的核心创新在于:1) 将人脸边缘图与人脸解析图作为互补的结构化先验信息,共同引导重建过程,以精准恢复面部几何结构;2) 在网络的关键位置引入高效通道注意力模块,该模块以近乎零计算开销增强了对关键特征的利用能力,有效提升了纹理细节的恢复质量。在CelebA Mask-HQ与Helen数据集上的实验表明,本方法在主观视觉质量与客观指标(PSNR/SSIM)上均优于现有主流方法。特别地,通过系统的消融研究,我们验证了ECA模块相较于其他注意力机制(如SENet、CBAM)在性能与效率上的优越性,以及其与结构化先验信息结合的协同增强效应。
Abstract: Face super-resolution reconstruction technology aims to recover high-resolution face images from low-resolution inputs. However, its core challenge lies in the severe loss of high-frequency details during large-scale upscaling. Although existing methods have attempted to incorporate facial prior information (such as keypoints and parsing maps) to provide structural constraints, they still commonly suffer from issues like model complexity and insufficient feature utilization. This paper proposes a dual-stage face super-resolution network based on the fusion of an Efficient Channel Attention mechanism and structured priors. The core innovations of our method are as follows: 1) Facial edge maps and facial parsing maps are utilized as complementary structural prior information to jointly guide the reconstruction process, enabling the precise recovery of facial geometry. 2) The Efficient Channel Attention (ECA) module is introduced at critical positions within the network. This module enhances the utilization of key features with nearly zero computational overhead, effectively improving the recovery quality of texture details. Experiments on the CelebA Mask-HQ and Helen datasets demonstrate that our method outperforms existing mainstream methods in both subjective visual quality and objective metrics (PSNR/SSIM). Specifically, through systematic ablation studies, we validate the superiority of the ECA module over other attention mechanisms (e.g., SENet, CBAM) in terms of both performance and efficiency, as well as the synergistic enhancement effect achieved by its combination with prior information.
文章引用:鲍娴婧, 吴成红. 基于注意力机制的人脸超分辨率重建方法[J]. 人工智能与机器人研究, 2026, 15(1): 254-266. https://doi.org/10.12677/airr.2026.151025

参考文献

[1] Baker, S. and Kanade, T. (2000) Hallucinating Faces. IEEE International Conference on Automatic Face & Gesture Recognition, Grenoble, 28-30 March 2000, 83-88.
[2] Chao, D., Chen, C.L., He, K., et al. (2014) Learning a Deep Convolutional Network for Image Super-Resolution. Springer International Publishing.
[3] Liu, H., Han, J.G., Hou, S.D., Shao, L. and Ruan, Y. (2018) Single Image Super-Resolution Using a Deep Encoder-Decoder Symmetrical Network with Iterative Back Projection. Neurocomputing, 282, 52-59. [Google Scholar] [CrossRef
[4] Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017) Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 105-114. [Google Scholar] [CrossRef
[5] Yu, C., Ying, T., Liu, X., et al. (2018) FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, 18-22 June 2018, 2492-2501.
[6] Zhang, H., Goodfellow, L., Metaxas, D., et al. (2019) Self-Attention Generative Adversarial Networks. [Google Scholar] [CrossRef
[7] Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., et al. (2017) Residual Attention Network for Image Classification. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6450-6458. [Google Scholar] [CrossRef
[8] Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B. and Fu, Y. (2018) Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In: Lecture Notes in Computer Science, Springer, 294-310. [Google Scholar] [CrossRef
[9] Bastidas, A.A. and Tang, H. (2019) Channel Attention Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, 16-17 June 2019, 881-888. [Google Scholar] [CrossRef
[10] Longstaff, I.D. and Cross, J.F. (1987) A Pattern Recognition Approach to Understanding the Multi-Layer Perception. Pattern Recognition Letters, 5, 315-319. [Google Scholar] [CrossRef
[11] Zhou, X.Z., Cheng, D.Z., Zhang, Z., et al. (2019) An Empirical Study of Spatial Attention Mechanisms in Deep Networks. 2019 International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 6687-6696.
[12] Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Lecture Notes in Computer Science, Springer, 3-19. [Google Scholar] [CrossRef
[13] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[14] Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W. and Hu, Q. (2020). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 11531-11539.[CrossRef
[15] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G. and Sang, N. (2018) Bisenet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In: Lecture Notes in Computer Science, Springer, 334-349. [Google Scholar] [CrossRef
[16] Bulat, A. and Tzimiropoulos, G. (2018) Super-FAN: Integrated Facial Landmark Localization and Super-Resolution of Real-World Low Resolution Faces in Arbitrary Poses with Gans. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 109-117. [Google Scholar] [CrossRef
[17] Wang, J., Yuan, Y. and Yu, G. (2017) Face Attention Network: An Effective Face Detector for the Occluded Faces.
https://arxiv.org/abs/1711.07246
[18] Bulat, A. and Tzimiropoulos, G. (2017) How Far Are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks). 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1021-1030. [Google Scholar] [CrossRef
[19] Newell, A., Yang, K. and Deng, J. (2016) Stacked Hourglass Networks for Human Pose Estimation. In: Lecture Notes in Computer Science, Springer, 483-499. [Google Scholar] [CrossRef
[20] Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017) Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 105-114. [Google Scholar] [CrossRef
[21] Liu, Z., Luo, P., Wang, X. and Tang, X. (2015) Deep Learning Face Attributes in the Wild. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 3730-3738. [Google Scholar] [CrossRef
[22] Zhou, F., Brandt, J. and Lin, Z. (2013) Exemplar-Based Graph Matching for Robust Facial Landmark Localization. 2013 IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 1025-1032. [Google Scholar] [CrossRef
[23] Lee, C., Liu, Z., Wu, L. and Luo, P. (2020) MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 5548-5557. [Google Scholar] [CrossRef
[24] Xiang, F.U. and Guo, B.L. (2009) Overview of Image Interpolation Technology. Computer Engineering and Design, 30, 141-157.
[25] Dong, C., Loy, C.C., He, K.M., et al. (2016) Image Super-Resolution Using Deep Convolutional Networks. [Google Scholar] [CrossRef
[26] Lim, B., Son, S., Kim, H., Nah, S. and Lee, K.M. (2017) Enhanced Deep Residual Networks for Single Image Super-Resolution. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, 21-26 July 2017, 1132-1140. [Google Scholar] [CrossRef
[27] Yu, X. and Porikli, F. (2016) Ultra-Resolving Face Images by Discriminative Generative Networks. In: Lecture Notes in Computer Science, Springer, 318-333. [Google Scholar] [CrossRef