[1]
|
Bemelmans, R., Gelderblom, G.J., Jonker, P. and de Witte, L. (2012) Socially Assistive Robots in Elderly Care: A Systematic Review into Effects and Effectiveness. Journal of the American Medical Directors Association, 13, 114-120.E1. https://doi.org/10.1016/j.jamda.2010.10.002
|
[2]
|
Bolme, D., Beveridge, J.R., Draper, B.A. and Lui, Y.M. (2010) Visual Object Tracking Using Adaptive Correlation Filters. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 2544-2550. https://doi.org/10.1109/cvpr.2010.5539960
|
[3]
|
Dee, H.M. and Velastin, S.A. (2007) How Close Are We to Solving the Problem of Automated Visual Surveillance? Machine Vision and Applications, 19, 329-343. https://doi.org/10.1007/s00138-007-0077-z
|
[4]
|
Feichtenhofer, C., Pinz, A. and Wildes, R.P. (2017) Spatiotemporal Multiplier Networks for Video Action Recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 7445-7454. https://doi.org/10.1109/cvpr.2017.787
|
[5]
|
李宝珍, 张晋, 王宝录, 等. 融合多层次视觉信息的人物交互动作识别[J]. 计算机科学, 2022, 49(S2): 643-650.
|
[6]
|
吴伟, 刘泽宇. 基于图的人-物交互识别[J]. 计算机工程与应用, 2021, 57(3): 175-181.
|
[7]
|
Wang, T., Anwer, R.M., Khan, M.H., Khan, F.S., Pang, Y., Shao, L., et al. (2019) Deep Contextual Attention for Human-Object Interaction Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 5693-5701. https://doi.org/10.1109/iccv.2019.00579
|
[8]
|
Wan, B., Zhou, D., Liu, Y., Li, R. and He, X. (2019) Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 9468-9477. https://doi.org/10.1109/iccv.2019.00956
|
[9]
|
Kim, B., Choi, T., Kang, J. and Kim, H.J. (2020) UnionDet: Union-Level Detector towards Real-Time Human-Object Interaction Detection. Computer Vision-ECCV 2020, Glasgow, 23-28 August 2020, 498-514. https://doi.org/10.1007/978-3-030-58555-6_30
|
[10]
|
Liao, Y., Liu, S., Wang, F., Chen, Y., Qian, C. and Feng, J. (2020) PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 479-487. https://doi.org/10.1109/cvpr42600.2020.00056
|
[11]
|
Wang, T., Yang, T., Danelljan, M., Khan, F.S., Zhang, X. and Sun, J. (2020) Learning Human-Object Interaction Detection Using Interaction Points. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 4115-4124. https://doi.org/10.1109/cvpr42600.2020.00417
|
[12]
|
Zhong, X., Qu, X., Ding, C. and Tao, D. (2021) Glance and Gaze: Inferring Action-Aware Points for One-Stage Human-Object Interaction Detection. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 13229-13238. https://doi.org/10.1109/cvpr46437.2021.01303
|
[13]
|
Newell, A., Yang, K. and Deng, J. (2016) Stacked Hourglass Networks for Human Pose Estimation. Computer Vision-ECCV 2016, Amsterdam, 11-14 October 2016, 483-499. https://doi.org/10.1007/978-3-319-46484-8_29
|
[14]
|
Yu, F., Wang, D., Shelhamer, E. and Darrell, T. (2018) Deep Layer Aggregation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2403-2412. https://doi.org/10.1109/cvpr.2018.00255
|
[15]
|
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
|
[16]
|
Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. https://doi.org/10.1109/iccv.2015.169
|
[17]
|
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. https://doi.org/10.1109/cvpr.2016.91
|
[18]
|
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q. and Tian, Q. (2019) CenterNet: Keypoint Triplets for Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 6568-6577. https://doi.org/10.1109/iccv.2019.00667
|
[19]
|
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. Computer Vision-ECCV 2020, Glasgow, 23-28 August 2020, 213-229. https://doi.org/10.1007/978-3-030-58452-8_13
|
[20]
|
Tamura, M., Ohashi, H. and Yoshinaga, T. (2021) QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 10405-10414. https://doi.org/10.1109/cvpr46437.2021.01027
|
[21]
|
Zou, C., Wang, B., Hu, Y., Liu, J., Wu, Q., Zhao, Y., et al. (2021) End-to-End Human Object Interaction Detection with HOI Transformer. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 11820-11829. https://doi.org/10.1109/cvpr46437.2021.01165
|
[22]
|
Kim, B., Lee, J., Kang, J., Kim, E. and Kim, H.J. (2021) HOTR: End-to-End Human-Object Interaction Detection with Transformers. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 74-83. https://doi.org/10.1109/cvpr46437.2021.00014
|
[23]
|
Chen, M., Liao, Y., Liu, S., Chen, Z., Wang, F. and Qian, C. (2021) Reformulating HOI Detection as Adaptive Set Prediction. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 9000-9009. https://doi.org/10.1109/cvpr46437.2021.00889
|
[24]
|
Zhang, A., Liao, Y., Liu, S., et al. (2021) Mining the Benefits of Two-Stage and One-Stage HOI Detection. Advances in Neural Information Processing Systems, 34, 17209-17220.
|
[25]
|
Qu, X., Ding, C., Li, X., Zhong, X. and Tao, D. (2022) Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 19536-19545. https://doi.org/10.1109/cvpr52688.2022.01895
|
[26]
|
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M. and Zhang, L. (2022) DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 13609-13617. https://doi.org/10.1109/cvpr52688.2022.01325
|
[27]
|
Chen, J., Wang, Y. and Yanai, K. (2023) Focusing on What to Decode and What to Train: Efficient Training with HOI Split Decoders and Specific Target Guided DeNoising. arXiv:2307.02291.
|
[28]
|
Gao, P., Zheng, M., Wang, X., Dai, J. and Li, H. (2021) Fast Convergence of DETR with Spatially Modulated Co-Attention. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 3601-3610. https://doi.org/10.1109/iccv48922.2021.00360
|
[29]
|
Zhu, X., Su, W., Lu, L., et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv:2010.04159. https://doi.org/10.48550/arXiv.2010.04159
|
[30]
|
Chen, J. and Yanai, K. (2023) QAHOI: Query-Based Anchors for Human-Object Interaction Detection. 2023 18th International Conference on Machine Vision and Applications (MVA), Hamamatsu, 23-25 July 2023, 1-5. https://doi.org/10.23919/mva57639.2023.10215534
|
[31]
|
Ma, S., Wang, Y., Wang, S. and Wei, Y. (2024) FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 2415-2429. https://doi.org/10.1109/tpami.2023.3331738
|
[32]
|
Kim, B., Mun, J., On, K., Shin, M., Lee, J. and Kim, E. (2022) MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 19556-19565. https://doi.org/10.1109/cvpr52688.2022.01897
|
[33]
|
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/cvpr.2016.90
|
[34]
|
Tan, M. and Le, Q. (2021) EfficientNetV2: Smaller Models and Faster Training. arXiv: 2104.00298. https://doi.org/10.48550/arXiv.2104.00298
|
[35]
|
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929.
|
[36]
|
Park, J., Park, J. and Lee, J. (2023) ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 17152-17162. https://doi.org/10.1109/cvpr52729.2023.01645
|
[37]
|
Lim, J., Baskaran, V.M., Lim, J.M., Wong, K., See, J. and Tistarelli, M. (2023) ERNet: An Efficient and Reliable Human-Object Interaction Detection Network. IEEE Transactions on Image Processing, 32, 964-979. https://doi.org/10.1109/tip.2022.3231528
|
[38]
|
Kamath, A., Singh, M., LeCun, Y., Synnaeve, G., Misra, I. and Carion, N. (2021) MDETR-Modulated Detection for End-to-End Multi-Modal Understanding. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 1760-1770. https://doi.org/10.1109/iccv48922.2021.00180
|
[39]
|
Cai, Z., Kwon, G., Ravichandran, A., Bas, E., Tu, Z., Bhotika, R., et al. (2022) X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks. Computer Vision-ECCV 2022, Tel Aviv, 23-27 October 2022, 290-308. https://doi.org/10.1007/978-3-031-20059-5_17
|
[40]
|
Li, L.H., Zhang, P., Zhang, H., Yang, J., Li, C., Zhong, Y., et al. (2022) Grounded Language-Image Pre-Training. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 10955-10965. https://doi.org/10.1109/cvpr52688.2022.01069
|
[41]
|
Yao, L., Han, J., Wen, Y., et al. (2022) DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-Training for Open-World Detection. Advances in Neural Information Processing Systems, 35, 9125-9138.
|
[42]
|
Liao, Y., Zhang, A., Lu, M., Wang, Y., Li, X. and Liu, S. (2022) GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 20091-20100. https://doi.org/10.1109/cvpr52688.2022.01949
|
[43]
|
Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. arXiv:2103.00020. https://doi.org/10.48550/arXiv.2103.00020
|
[44]
|
Yuan, H., Jiang, J., Albanie, S., et al. (2022) RLIP: Relational Language-Image Pre-Training for Human-Object Interaction Detection. Advances in Neural Information Processing Systems, 35, 37416-37431.
|
[45]
|
Yuan, H., Zhang, S., Wang, X., Albanie, S., Pan, Y., Feng, T., et al. (2023) RLIPv2: Fast Scaling of Relational Language-Image Pre-training. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 21592-21604. https://doi.org/10.1109/iccv51070.2023.01979
|
[46]
|
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., et al. (2020) The Open Images Dataset V4. International Journal of Computer Vision, 128, 1956-1981. https://doi.org/10.1007/s11263-020-01316-z
|
[47]
|
Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, X., et al. (2019) Objects365: A Large-Scale, High-Quality Dataset for Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 8429-8438. https://doi.org/10.1109/iccv.2019.00852
|
[48]
|
Li, J., Li, D., Xiong, C., et al. (2022) BLIP: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation. arXiv:2201.12086, https://doi.org/10.48550/arXiv.2201.12086
|
[49]
|
Ning, S., Qiu, L., Liu, Y. and He, X. (2023) HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 23507-23517. https://doi.org/10.1109/cvpr52729.2023.02251
|
[50]
|
Zhang, F.Z., Campbell, D. and Gould, S. (2021) Spatially Conditioned Graphs for Detecting Human-Object Interactions. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 13299-13307. https://doi.org/10.1109/iccv48922.2021.01307
|
[51]
|
Zhang, F.Z., Campbell, D. and Gould, S. (2022) Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 20072-20080. https://doi.org/10.1109/cvpr52688.2022.01947
|
[52]
|
Zhang, Y., Pan, Y., Yao, T., Huang, R., Mei, T. and Chen, C. (2022) Exploring Structure-Aware Transformer over Interaction Proposals for Human-Object Interaction Detection. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 19526-19535. https://doi.org/10.1109/cvpr52688.2022.01894
|
[53]
|
Zhang, F.Z., Yuan, Y., Campbell, D., Zhong, Z. and Gould, S. (2023) Exploring Predicate Visual Context in Detecting of Human-Object Interactions. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 10377-10387. https://doi.org/10.1109/iccv51070.2023.00955
|
[54]
|
Gupta, S. and Malik, J. (2015) Visual Semantic Role Labeling. arXiv: 1505.04474.
|
[55]
|
Chao, Y., Liu, Y., Liu, X., Zeng, H. and Deng, J. (2018) Learning to Detect Human-Object Interactions. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, 12-15 March 2018, 381-389. https://doi.org/10.1109/wacv.2018.00048
|
[56]
|
Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014) Microsoft COCO: Common Objects in Context. Computer Vision-ECCV 2014, Zurich, 6-12 September 2014, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48
|