多维注意力与部件关注的无监督行人重识别
Unsupervised Person Re-Identification Based on Multi-Dimensional Attention and Part Focus Network
摘要: 行人重识别的目的是通过将人物的探测图像与图像库中的所有图像进行比较,从而在图像库中找到感兴趣的人。大多数的行人重识别算法都是在一些小的带标签的数据集上进行监督训练,直接将这些训练好的模型部署到真实世界的大型摄像机网络中可能会由于拟合不足而导致性能低下。因此,有必要在没有明确监督的情况下,自主地对模型进行训练。因此本文提出了一个多维注意力网络和部件关注网络联合学习的无监督行人重识别方法。首先多维注意力网络对行人图像复杂的高阶统计信息进行建模和利用,其次使用部件关注网络关注不同的部件,最后是一系列的损失函数来引导部件关注网络学习未标记数据集上的部件特征。在Market-1501和DukeMTMC-reID两个数据集上的实验结果表明,本文提出的方法有效并取得了显著的效果。
Abstract: Person re-identification (Re-ID) aims at finding a person of interest in the image gallery by comparing the probe image of this person with all the gallery images. Most of the Re-ID algorithms conduct supervised training in some small labeled datasets, so directly deploying these trained models to the real-world large camera networks may lead to a poor performance due to underfitting. Therefore, it is necessary to train models without explicit supervision in an autonomous manner, and propose an unsupervised Re-ID method based on Multi-dimensional Attention Network (MDAN) and Part Focus Network (PFN). MDAN can model and utilize the complex higher-order statistics in-formation in attention mechanism, so as to capture the subtle differences among pedestrians and to produce the discriminative attention proposals. Then there is a PFN, which is deployed into an improved spatial transform network (STN) so that each branch can focus on different parts of the pedestrian. We evaluate the proposed method on two public datasets, including Market-1501 and DukeMTMC-reID. Extensive experimental results show that the proposed method is effective and achieves impressive results.
文章引用:麻可可, 薛丽霞, 汪荣贵, 杨娟. 多维注意力与部件关注的无监督行人重识别[J]. 计算机科学与应用, 2021, 11(5): 1301-1304. https://doi.org/10.12677/CSA.2021.115132

参考文献

[1] Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D. and Brossard, E. (2016) The Megaface Benchmark: 1 Million Faces for Recognition at Scale. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 4873-4882. [Google Scholar] [CrossRef
[2] Liao, S., Lei, Z., Yi, D. and Li, S.Z. (2014) A Benchmark Study of Large-Scale Unconstrained Face Recognition. IEEE International Joint Conference on Biometrics, Clearwater, 29 September-2 Octoner 2014, 1-8. [Google Scholar] [CrossRef
[3] Taigman, Y., Yang, M., Ranzato, M. and Wolf, L. (2014) DeepFace: Closing the Gap to Human-Level Performance in Face Verification. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 1701-1708. [Google Scholar] [CrossRef
[4] LeCun, Y., Bengio, Y. and Hinton, G.E. (2015) Deep Learning. Nature, 521, 436-444. [Google Scholar] [CrossRef] [PubMed]
[5] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) Imagenet Classi-fication with Deep Convolutional Neural Networks. 25th International Conference on Neural Information Processing Systems, Stateline, December 2012, 1097-1105.
[6] Schmidhuber, J. (2015) Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117. [Google Scholar] [CrossRef] [PubMed]
[7] Zheng, Z., Zheng, L. and Yang, Y. (2017) A Discriminatively Learned CNN Embedding for Person Re-Identification. ACM Transactions on Multimedia Computing, Communications, and Applications, 14, Article No. 13. [Google Scholar] [CrossRef
[8] Lin, Y., Zheng, L., Zheng, Z., Wu, Y. and Yang, Y. (2016) Improving Per-son Re-Identification by Attribute and Identity Learning. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, March 2017, 20-28.
[9] Rahul, V., Rama, Mrinal, H. and Gang, W. (2016) Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification. European Conference on Computer Vision, 8-16 October, Amsterdam, 791-808. [Google Scholar] [CrossRef
[10] Cheng, D., Gong, Y., Zhou, S., Wang, J. and Zheng, N. (2016) Person Re-Identification by Multichannel Parts-Based CNN with Improved Triplet Loss Function. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 1335-1344. [Google Scholar] [CrossRef
[11] Hermans, A., Beyer, L. and Leibe, B. (2017) In Defense of the Tri-plet Loss for Person Re-Identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, March 2017, 5767-5782.
[12] Varior, R.R., Shuai, B., Lu, J., Xu, D. and Wang, G. (2016) A Siamese Long Short-Term Memory Architecture for Human Re-Identification. European Conference on Computer Vision, Amsterdam, 8-16 October, 135-153. [Google Scholar] [CrossRef
[13] Zheng, L., Huang, Y., Lu, H. and Yang, Y. (2017) Pose-Invariant Embedding for Deep Person Re-Identification. IEEE Transactions on Image Processing, 28, 4500-4509. [Google Scholar] [CrossRef
[14] Wei, L., Zhang, S., Yao, H., Gao, W. and Tian, Q. (2017) GLAD: Global-Local-Alignment Descriptor for Scalable Person Re-Identification. IEEE Transactions on Multimedia, 21, 986-999. [Google Scholar] [CrossRef
[15] Farenzena, M., Bazzani, L., Perina, A., Murino, V. and Cristani, M. (2010) Person Re-Identification by Symmetry-Driven Accumulation of Local Features. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 2360-2367. [Google Scholar] [CrossRef
[16] Chen, D., Yuan, Z., Chen, B. and Zheng, N. (2016) Similarity Learning with Spatial Constraints for Person Re-Identification. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 1268-1277. [Google Scholar] [CrossRef
[17] Liao, S., Hu, Y., Zhu, X. and Li, S.Z. (2015) Person Re-Identification by Local Maximal Occurrence Representation and Metric Learning. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 2197-2206. [Google Scholar] [CrossRef
[18] Zhao, R., Ouyang, W. and Wang, X. (2013) Unsupervised Sa-lience Learning for Person Re-Identification. 2013 IEEE Conference on Computer Vision and Pattern Recognition, Port-land, 23-28 June 2013, 3586-3593. [Google Scholar] [CrossRef
[19] Wang, H., Gong, S. and Xiang, T. (2014) Unsupervised Learning of Generative Topic Saliency for Person Re-Identification. Proceedings of 2014 British Machine Vision Conference, Not-tingham, 1-5 September 2014, 1-11.
[20] Fan, H., Zheng, L., Yan, C. and Yang, Y. (2018) Unsupervised Person Reidentification: Clustering and Fine-Tuning. ACM Transactions on Multimedia Computing, Communications, and Ap-plications, 14, Article No. 83. [Google Scholar] [CrossRef
[21] Ding, G., Khan, S., Tang, Z. and Zhang, J. (2019) Towards Better Validity: Dispersion Based Clustering for Unsupervised Person Re-Identification. IEEE Conference on Computer Vision and Pat-tern Recognition, Long Beach, June 2019, 1485-1494.
[22] Lin, Y., Dong, X., Zheng, L., Yan, Y. and Yang, Y. (2019) A Bottom-up Clustering Approach to Unsupervised Person Re-Identification. AAAI Conference on Artificial Intelligence, Honolulu, 27 January-1 February 2019, 8738-8745.
[23] Chen, H., Wang, Y., Shi, Y., Yan, K., Geng, M., Tian, Y., et al. (2018) Deep Transfer Learning for Person Re-Identification. IEEE 4th International Conference on Multimedia Big Data, Xi’an, 13-16 September 2018, 1-5. [Google Scholar] [CrossRef
[24] Peng, P., Xiang, T., Wang, Y., Pontil, M., Gong, S., Huang, T. and Tian, Y. (2016) Unsupervised Cross-Dataset Transfer Learning for Person Re-Identification. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 1306-1315. [Google Scholar] [CrossRef
[25] Wang, J., Zhu, X., Gong, S. and Li, W. (2018) Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2275-2284. [Google Scholar] [CrossRef
[26] Zhong, Z., Zheng, L., Li, S. and Yang, Y. (2018) Generalizing A Person Retrieval Model Hetero- and Homogeneously. Proceedings of the 2018 European Conference on Computer Vi-sion, Munich, 8-14 September 2018, 172-188.
[27] Li, G. and Yu, Y. (2016) Visual Saliency Detection Based on Mul-tiscale Deep CNN Features. IEEE Transactions on Image Processing, 25, 5012-5024. [Google Scholar] [CrossRef
[28] Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W. and Chua, T.-S. (2017) SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 6298-6306. [Google Scholar] [CrossRef
[29] Chen, L.-C., Yang, Y., Wang, J., Xu, W. and Yuille, A.L (2016) Attention to Scale: Scale-Aware Semantic Image Segmentation. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 3640-3649. [Google Scholar] [CrossRef
[30] Liu, H., Feng, J., Qi, M., Jiang, J. and Yan, S. (2016) End-to-End Comparative Attention Networks for Person Re-Identification. IEEE Transactions on Image Processing, 26, 3492-3506. [Google Scholar] [CrossRef
[31] Ba, J., Mnih, V. and Kavukcuoglu, K. (2014) Multiple Object Recognition with Visual Attention. arXiv:1412.7755.
[32] Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L. and Wang, X. (2017) Multi-Context Attention for Human Pose Estimation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 1831-1840. [Google Scholar] [CrossRef
[33] Si, J., Zhang, H., Li, C.-G., Kuen, J., Kong, X., Kot, A. and Wang, G. (2018) Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-Identification. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 5363-5372. [Google Scholar] [CrossRef
[34] Li, W., Zhu, X. and Gong, S. (2018) Harmo-nious Attention Network for Person Re-Identification. Proceedings of the 2018 IEEE/CVF Conference on Computer Vi-sion and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2285-2294. [Google Scholar] [CrossRef
[35] Xu, J., Zhao, R., Zhu, F., Wang, H. and Ouyang, W. (2018) At-tention-Aware Compositional Network for Personre-Identification. IEEE/CVF Conference on Computer Vision and Pat-tern Recognition, Salt Lake City, 18-23 June 2018, 2119-2128. [Google Scholar] [CrossRef
[36] Chang, X., Hospedales, T. and Xiang, T. (2018) Multi-Level Factorisation Net for Person Re-Identification. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2109-2118. [Google Scholar] [CrossRef
[37] Xu, J., Zhao, R., Zhu, F., Wang, H. and Ouyang, W. (2018) At-tention-Aware Compositional Network for Personre-Identification. IEEE/CVF Conference on Computer Vision and Pat-tern Recognition, Salt Lake City, 18-23 June 2018, 2119-2128. [Google Scholar] [CrossRef
[38] Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. [Google Scholar] [CrossRef
[39] Xiao, T., Li, S., Wang, B., Lin, L. and Wang, X. (2017) Joint De-tection and Identification Feature Learning for Person Search. 2017 IEEE International Conference on Computer, Hono-lulu, 21-26 July 2017, 3376-3385. [Google Scholar] [CrossRef
[40] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J. and Tian, Q. (2015) Scalable Person Re-Identification: A Benchmark. 2015 IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 1116-1124. [Google Scholar] [CrossRef
[41] Zheng, Z., Zheng, L. and Yang, Y. (2017) Unlabeled Samples Gen-erated by GAN Improve the Person Re-Identification Baseline in Vitro. 2017 IEEE International Conference on Com-puter, Venice, 22-29 October 2017, 3774-3782. [Google Scholar] [CrossRef
[42] Ristani, E., Solera, F., Zou, R.S., Cucchiara, R. and Tomasi, C. (2016) Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. 2016 ECCV Workshop on Benchmark-ing Multi-Target Tracking, Amsterdam, 8-16 October 2016, 17-35. [Google Scholar] [CrossRef
[43] Wei, L., Zhang, S., Gao, W. and Tian, Q. (2018) Person Transfer GAN to Bridge Domain Gap for Person Re-Identification. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 79-88. [Google Scholar] [CrossRef
[44] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef
[45] Deng, J., Dong, W., Socher, R., Li, J., Li, K. and Li, F. (2009) ImageNet: A Large-Scale Hierarchical Imagedatabase. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef
[46] Qi, L., Wang, L., Huo, J., Zhou, L., Shi, Y. and Gao, Y. (2019) A Novel Unsupervised Camera-aware Domain Adaptation Framework for Person Re-Identification. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 27 Oc-tober-2 November 2019, 8079-8088. [Google Scholar] [CrossRef
[47] Lin, Y., Xie, L., Wu, Y., Yan, C. and Tian, Q. (2020) Unsupervised Person Re-Identification via Softened Similarity Learning. 2020 IEEE/CVF Con-ference on Computer Vision and Pattern Recognitionn, Seattle, 13-19 June 2020, 3387-3396. [Google Scholar] [CrossRef
[48] Zhong, Z., Zheng, L., Luo, Z., Li, S. and Yang, Y. (2019) Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-Identification. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 598-607. [Google Scholar] [CrossRef