|
[1]
|
Judd, T., Ehinger, K., Durand, F., et al. (2009) Learning to Predict Where Humans Look. 2009 IEEE 12th International Conference on Computer Vision, Kyoto, 29 September-02 October 2009, 2106-2113. [Google Scholar] [CrossRef]
|
|
[2]
|
Recasens, A., Khosla, A., Vondrick, C., et al. (2015) Where Are They Looking? Advances in Neural Information Processing Systems, 28, 199-207.
|
|
[3]
|
Chong, E., Ruiz, N., Wang, Y., et al. (2018) Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Lecture Notes in Computer Science, Vol. 11209, Springer, Cham, 383-398. [Google Scholar] [CrossRef]
|
|
[4]
|
Bao, J., Liu, B. and Yu, J. (2022) Escnet: Gaze Target Detection with the Understanding of 3d Scenes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 14126-14135. [Google Scholar] [CrossRef]
|
|
[5]
|
Chong, E., Wang, Y., Ruiz, N., et al. (2020) Detecting Attended Visual Targets in Video. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 5396-5406. [Google Scholar] [CrossRef]
|
|
[6]
|
Lian, D., Yu, Z. and Gao, S. (2018) Believe It or Not, We Know What You Are Looking at! In: Jawahar, C., Li, H., Mori, G. and Schindler, K., Eds., Computer Vision—ACCV 2018, Lecture Notes in Computer Science, Vol. 11363, Springer, Cham, 35-50. [Google Scholar] [CrossRef]
|
|
[7]
|
Recasens, A., Vondrick, C., Khosla, A., et al. (2017) Following Gaze in Video. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 1435-1443. [Google Scholar] [CrossRef]
|
|
[8]
|
Fan, L., Chen, Y., Wei, P., et al. (2018) Inferring Shared Attention in Social Scene Videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6460-6468. [Google Scholar] [CrossRef]
|
|
[9]
|
Zhou, Q., Li, X., He, L., et al. (2022) TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 7853-7869. [Google Scholar] [CrossRef]
|
|
[10]
|
Dai, J., Qi, H., Xiong, Y., et al. (2017) Deformable Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 764-773. [Google Scholar] [CrossRef]
|
|
[11]
|
Miao, Q., Hoai, M. and Samaras, D. (2023) Patch-Level Gaze Distribution Prediction for Gaze Following. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, 2-7 January 2023, 880-889. [Google Scholar] [CrossRef]
|
|
[12]
|
Fang, Y., Tang, J., Shen, W., et al. (2021) Dual Attention Guided Gaze Target Detection in the Wild. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 11390-11399. [Google Scholar] [CrossRef]
|
|
[13]
|
Jin, T., Yu, Q., Zhu, S., et al. (2022) Depth-Aware Gaze-Following via Auxiliary Networks for Robotics. Engineering Applications of Artificial Intelligence, 113, Article 104924. [Google Scholar] [CrossRef]
|
|
[14]
|
Tu, D., Min, X., Duan, H., et al. (2022) End-to-End Human-Gaze-Target Detection with Transformers. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 2192-2200. [Google Scholar] [CrossRef]
|
|
[15]
|
Tonini, F., Dall’Asen, N., Beyan, C., et al. (2023) Object-Aware Gaze Target Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, 1-6 October 2023, 21860-21869. [Google Scholar] [CrossRef]
|
|
[16]
|
Tonini, F., Beyan, C. and Ricci, E. (2022) Multimodal across Domains Gaze Target Detection. Proceedings of the 2022 International Conference on Multimodal Interaction, Bengaluru, 7-11 November 2022, 420-431. [Google Scholar] [CrossRef]
|
|
[17]
|
Long, F., Qiu, Z., Pan, Y., et al. (2022) Stand-Alone Inter-Frame Attention in Video Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 3192-3201. [Google Scholar] [CrossRef]
|
|
[18]
|
Zhu, X., Su, W., Lu, L., et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection.
|
|
[19]
|
Saran, A., Majumdar, S., Short, E.S., et al. (2018) Human Gaze Following for Human-Robot Interaction. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, 1-5 October 2018, 8615-8621. [Google Scholar] [CrossRef]
|
|
[20]
|
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 5998-6008.
|
|
[21]
|
田永林, 王雨桐, 王建功, 等. 视觉 Transformer 研究的关键问题: 现状及展望[J]. 自动化学报, 2022, 48(4): 957-979.
|
|
[22]
|
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.
|
|
[23]
|
Carion, N., Massa, F., Synnaeve, G., et al. (2020) End-to-End Object Detection with Transformers. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Lecture Notes in Computer Science, Vol. 12346, Springer, Cham, 213-229. [Google Scholar] [CrossRef]
|
|
[24]
|
Cheng, Y. and Lu, F. (2022) Gaze Estimation Using Transformer. 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, 21-25 August 2022, 3341-3347. [Google Scholar] [CrossRef]
|
|
[25]
|
He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. [Google Scholar] [CrossRef]
|
|
[26]
|
Glorot, X. and Bengio, Y. (2010) Understanding the Difficulty of Training Deep Feedforward Neural Networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 9, 249-256.
|
|
[27]
|
Pan, J., Sayrol, E., Giro-i-Nieto, X., et al. (2016) Shallow and Deep Convolutional Networks for Saliency Prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 598-606. [Google Scholar] [CrossRef]
|