基于嵌套残差注意力模块的人物姿态转换方法
Nested Residual Attention Module-Based for Human Pose Transfer
DOI: 10.12677/CSA.2021.114090, PDF,   
作者: 钟晓静:广东工业大学计算机学院,广东 广州
关键词: 深度学习注意力机制姿态转换Deep Learning Attention Mechanism Pose Transfer
摘要: 近几年,人们围绕人物图像合成技术展开了多项研究,姿态转换就是其中一个。作为条件输入的姿态信息的引导有局限性,视角变换时生成模型难以处理复杂的人物外观特征。注意力机制可以有效提取图像中的重要部分,通过将提取特征用的残差块嵌入到残差注意力模块中,通过短跳跃连接来逐步学习姿态相关性,自适应地选择空间像素,充分利用姿态转换过程中的全局空间信息,提高生成网络的表征能力,生成具有目标姿态的高质量人物图像。在多类别大型服装数据集DeepFashion上进行测试,验证了所提出算法的有效性。
Abstract: In recent years, several researches have been carried out around the techniques of human image synthesis, and pose transfer is one of them. The guidance of pose information as conditional input has limitations, and it is difficult to generate models to handle complex character appearance features during perspective transformation. The attention mechanism can effectively extract the important parts of the image, learn the pose correlation step by step by through the short skip connections embedding the residual blocks for feature extraction into the residual attention module, select spatial pixels adaptively making full use of the global spatial information in the pose transformation process, improve the representational capability of the generative network and generate high-quality person images with target pose. The effectiveness of the proposed algorithm is verified by testing on a multi-category large clothing dataset DeepFashion.
文章引用:钟晓静, 谭台哲. 基于嵌套残差注意力模块的人物姿态转换方法[J]. 计算机科学与应用, 2021, 11(4): 876-884. https://doi.org/10.12677/CSA.2021.114090

参考文献

[1] 孙义博, 张文靖, 王蓉, 李冲, 张琪. 基于通道注意力机制的行人重识别方法[J/OL]. 北京航空航天大学学报, 2021: 1-10. 2021-03-07.[CrossRef
[2] 刘若雯, 杨建喜, 赵海博. 基于对偶学习的图像翻译技术研究[J]. 北京电子科技学院学报, 2020, 28(2): 12-18.
[3] 禹立. 基于纹理修复的虚拟试衣网络[D]: [硕士学位论文]. 上海: 东华大学, 2020.
[4] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Networks. arXiv preprint arXiv: 1406.2661.
[5] Lassner, C., Pons-Moll, G. and Gehler, P.V. (2017) A Generative Model of People in Clothing. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 853-862. [Google Scholar] [CrossRef
[6] Ma, L., Sun, Q., Georgoulis, S., et al. (2018) Disentangled Person Image Generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 99-108. [Google Scholar] [CrossRef
[7] Ma, L., Jia, X., Sun, Q., et al. (2017) Pose Guided Person Image Generation. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. and Garnett, R., Eds., Advances in Neural Information Processing Systems, 406-416.
[8] Siarohin, A., Sangineto, E., Lathuiliere, S. and Sebe, N. (2018) Deformable GANs for Pose-Based Human Image Generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3408-3416. [Google Scholar] [CrossRef
[9] Mnih, V., Heess, N. and Graves, A. (2014) Recur Rent Models of Visual Attention. Proceedings of the 27th International Conference on Neural Information Processing Systems, 2, 2204-2212.
[10] Xiao, T.J., Xu, Y.C., Yang, K.Y., et al. (2015) The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 842-850.
[11] He, K., Zhang, X., Ren, S., et al. (2016) Deep Re-sidual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[12] Maas, A.L., Hannun, A.Y. and Ng, A.Y. (2013) Rectifier Nonlineari-ties Improve Neural Network Acoustic Models. International Conference on Machine Learning (ICML), 30, 3.
[13] Johnson, J., Alahi, A. and Li, F.F. (2016) Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., European Conference on Computer Vision, Springer, Cham, 694-711. [Google Scholar] [CrossRef
[14] Cao, Z., Hidalgo, G., Simon, T., et al. (2019) OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 43, 172-186. [Google Scholar] [CrossRef
[15] Hu, J., Shen, L. and Sun, G. (2020) Squeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023. [Google Scholar] [CrossRef
[16] Yarotsky, D. (2017) Error Bounds for Approximations with Deep ReLU Networks. Neural Networks, 94, 103-114. [Google Scholar] [CrossRef] [PubMed]
[17] Mirza, M. and Osindero, S. (2014) Conditional Generative Ad-versarial Nets. arXiv preprint arXiv:1411.1784.
[18] Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.
[19] Wang, Z., Bovik, A.C., Sheikh, H.R., et al. (2004) Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Im-age Processing, 13, 600-612. [Google Scholar] [CrossRef
[20] Salimans, T., Goodfellow, I., Zaremba, W., et al. (2016) Improved Techniques for Training Gans. arXiv preprint arXiv:1606.03498.