基于改进FairMOT特征解耦的多目标跟踪算法

doi:10.12677/CSA.2022.128196

期刊菜单

基于改进FairMOT特征解耦的多目标跟踪算法
Multi-Object Tracking Algorithm Based on Improved FairMOT Feature Decoupling

DOI: 10.12677/CSA.2022.128196, PDF,
作者: 刘文强, 李阳, 王家宝, 王彩玲, 苗壮, 裘杭萍^*：陆军工程大学指挥控制工程学院，江苏南京
关键词: 多目标跟踪；目标重识别；特征解耦；注意力机制；Multi-Object Tracking； Object Re-Identification； Feature Decoupling； Attention Mechanism

摘要: 联合检测和重识别跟踪模型(Joint-Detection-and-Embedding Models, JDE)的两个子任务所需要的特征存在矛盾，通过目标中心点提取重识别特征的方式难以得到遮挡目标的有效特征，这导致在复杂环境下模型提取的目标重识别特征可靠性下降，造成数据关联错误。针对目标检测和重识别任务间的矛盾问题，文中基于FairMOT跟踪算法提出了一种特征解耦模块。该模块使用协调注意力(Coordinate Attention, CA)将骨干网输出的多尺度特征图进行初步解耦，然后以自底向上的方式融合不同分辨率的重识别特征图。为了提取遮挡目标的有效信息，文中提出一种根据目标可视度调整高斯核方差的策略，用于构建目标中心点监督热图，加大训练时对遮挡目标及其周围区域的关注。最后在MOT17数据集上对所提算法进行了测试，实验结果验证了各模块的有效性，表明了算法能够有效应对遮挡，实现稳定跟踪。

Abstract: The features required for the two sub-tasks of joint object detection and re-identification mul-ti-object tracking algorithm (Joint-Detection-and-Embedding Models, JDE) are contradictory. It is difficult to extract the effective features of occluded objects by extracting object re-identification features through the object center point. This leads to unreliable object re-identification features extracted by the model in complex environments, resulting in data association errors. A feature decoupling module is proposed based on the FairMOT tracking algorithm, aiming at the contradiction between object detection and re-identification tasks. This module uses coordinate attention to initially decouple the multi-scale feature maps output by the backbone network, and then fuses re-identification feature maps of different resolutions in a bottom-up manner. In order to extract the effective information of the occluded object, a strategy of adjusting the variance of the Gaussian kernel according to the visibility of the object is proposed, which is used to construct a supervised heat map of the object center point, and pay more attention to the occluded object and its surrounding areas during training. The proposed algorithm is tested on the MOT17 dataset, and the experimental results verify the effectiveness of each module, indicating that the algorithm can effectively deal with occlusion and achieve stable tracking.

文章引用：刘文强, 李阳, 王家宝, 王彩玲, 苗壮, 裘杭萍. 基于改进FairMOT特征解耦的多目标跟踪算法[J]. 计算机科学与应用, 2022, 12(8): 1952-1963. https://doi.org/10.12677/CSA.2022.128196

参考文献

[1]	Ciaparrone, G., Sánchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R. and Herrera, F. (2020) Deep Learning in Video Multi-Object Tracking: A Survey. Neurocomputing, 381, 61-88. [Google Scholar] [CrossRef]
[2]	Sun, Z., Chen, J., Chao, L., Ruan, W. and Mukherjee, M. (2021) A Survey of Multiple Pedestrian Tracking Based on Tracking-by-Detection Framework. IEEE Transactions on Circuits and Systems for Video Technology, 31, 1819-1833. [Google Scholar] [CrossRef]
[3]	Takahashi, N., Gygli, M. and Van Gool, L. (2018) AENet: Learning Deep Audio Features for Video Analysis. IEEE Transactions on Multimedia, 20, 513-524. [Google Scholar] [CrossRef]
[4]	Luo, W., Yang, B. and Urtasun, R. (2018) Fast and Furious: Re-al Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net. IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3569-3577. [Google Scholar] [CrossRef]
[5]	Manglik, A., Weng, X., Ohn-Bar, E. and Kitanil, K.M. (2019) Forecasting Time-to-Collision from Monocular Video: Feasibility, Dataset, and Challenges. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau (China), 3-8 November 2019, 8081-8088. [Google Scholar] [CrossRef]
[6]	Wojke, N., Bewley, A. and Paulus D. (2017) Simple Online and Realtime Tracking with a Deep Association Metric. 2017 IEEE International Conference on Image Pro-cessing (ICIP), Beijing, 17-20 September 2017, 3645-3649. [Google Scholar] [CrossRef]
[7]	Du, Y., Song, Y., Yang, B. and Zhao, Y. (2022) StrongSORT: Make DeepSORT Great Again. ArXiv, abs/2202.13514.
[8]	Wang, Z., Zheng, L., Liu, Y., Li, Y. and Wang, S. (2020) Towards Real-Time Multi-Object Tracking. European Conference on Computer Vision (ECCV) Workshops, Glasgow, 23-28 August 2020, 107-122. [Google Scholar] [CrossRef]
[9]	Zhang, Y., Wang, C., Wang, X., Zeng, W. and Liu, W. (2021) FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking. International Journal of Computer Vision, 129, 3069-3087. [Google Scholar] [CrossRef]
[10]	Li, J., Ding, Y., Wei, H.-L., Zhang, Y. and Lin, W. (2022) Sim-pleTrack: Rethinking and Improving the JDE Approach for Multi-Object Tracking. Sensors, 22, Article No. 5863. [Google Scholar] [CrossRef] [PubMed]
[11]	Liang, C., Zhang, Z., Zhou, X., Li, B., Zhu, S. and Hu, W. (2022) Re-thinking the Competition between Detection and ReID in Multi-Object Tracking. IEEE Transactions on Image Pro-cessing, 31, 3182-3196. [Google Scholar] [CrossRef]
[12]	Yu, E., Li, Z., Han, S. and Wang, H. (2022) RelationTrack: Rela-tion-Aware Multiple Object Tracking with Decoupled Representation. IEEE Transactions on Multimedia. [Google Scholar] [CrossRef]
[13]	Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. ArXiv, abs/1804.02767.
[14]	Zhou, X., Wang, D. and Krähenbühl, P. (2019) Objects as Points. ArXiv, abs/1904.07850.
[15]	Lu, Z., Rathod, V., Votel, R. and Huang, J. (2020) RetinaTrack: Online Single Stage Joint Detec-tion and Tracking. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 14656-14666. [Google Scholar] [CrossRef]
[16]	Lin T, Y., Goyal, P., Girshick, R., He, K. and Dollár, P. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2999-3007. [Google Scholar] [CrossRef]
[17]	Wang, Y., Kitani, K. and Weng X. (2021) Joint Object Detection and Multi-Object Tracking with Graph Neural Networks. IEEE International Conference on Robotics and Automation (ICRA), Xi’an, 30 May-5 June 2021, 13708-13715. [Google Scholar] [CrossRef]
[18]	Wang, Q., Zheng, Y., Pan, P. and Xu, Y. (2021) Multiple Object Tracking with Correlation Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 3875-3885. [Google Scholar] [CrossRef]
[19]	Hu, J., Shen, L., Albanie, S., Sun, G. and Wu, E. (2020) Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023. [Google Scholar] [CrossRef]
[20]	Woo, S., Park, J., Lee, J.Y. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. European Conference on Computer Vision (ECCV) Workshops, Munich, 8-14 September 2018, 3-19. [Google Scholar] [CrossRef]
[21]	Hou, Q., Zhou, D. and Feng, J. (2021) Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 13708-13717. [Google Scholar] [CrossRef]
[22]	Newell, A., Yang, K. and Deng, J. (2016) Stacked Hour-glass Networks for Human Pose Estimation. European Conference on Computer Vision (ECCV) Workshops, Amster-dam, 11-14 October 2016, 483-499. [Google Scholar] [CrossRef]
[23]	Yu, F., Wang, D., Shelhamer, E. and Darrell, T. (2018) Deep Layer Aggregation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2403-2412. [Google Scholar] [CrossRef]
[24]	Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014) Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV) Workshops, Zurich, 6-12 September 2014, 740-755. [Google Scholar] [CrossRef]
[25]	Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., et al. (2018) CrowdHuman: A Benchmark for Detecting Human in a Crowd. ArXiv, abs/1805.00123. http://arxiv.org/abs/1805.00123
[26]	Milan, A., Leal-Taixé, L., Reid I, D., Roth, S. and Schindler, K. (2016) MOT16: A Benchmark for Multi-Object Tracking. ArXiv, abs/1603.00831. http://arxiv.org/abs/1603.00831
[27]	Kingma, D.P. and Ba, J. (2015) Adam: A Method for Stochastic Optimization. In-ternational Conference on Learning Representations (ICLR), San Diego, 7-9 May 2015, 13. https://hdl.handle.net/11245/1.505367
[28]	Zhou, B., Khosla, A., Lapedriza, À., Oliva, A. and Torralba, A. (2016) Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR), Las Vegas, 27-30 June 2016, 2921-2929. [Google Scholar] [CrossRef]

为你推荐

友情链接