|
[1]
|
Li, J., Dong, S. and Adelson, E. (2018) Slip Detection with Combined Tactile and Visual Information. 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, 21-25 May 2018, 7772-7777. [Google Scholar] [CrossRef]
|
|
[2]
|
Cui, S., Wang, R., Wei, J., et al. (2020) Grasp State Assessment of Deformable Objects Using Visual-Tactile Fusion Perception. 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, 31 May-31 August 2020, 538-544. [Google Scholar] [CrossRef]
|
|
[3]
|
Zhang, W., Sun, F., Wu, H., et al. (2017) A Framework for the Fusion of Visual and Tactile Modalities for Improving Robot Perception. Science China Information Sciences, 60, Article No. 12201. [Google Scholar] [CrossRef]
|
|
[4]
|
Francomano, M.T., Accoto, D. and Guglielmelli, E. (2013) Artificial Sense of Slip—A Review. IEEE Sensors Journal, 13, 2489-2498. [Google Scholar] [CrossRef]
|
|
[5]
|
Yan, G., Schmitz, A., Tomo, T.P., et al. (2022) Detection of Slip from Vision and Touch. 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, 23-27 May 2022, 3537-3543. [Google Scholar] [CrossRef]
|
|
[6]
|
黄兆基, 高军礼, 唐兆年, 等. 基于注意力机制和视触融合的机器人抓取滑动检测[J/OL]. 信息与控制: 1-9. 2024-04-06.[CrossRef]
|
|
[7]
|
Bahdanau, D., Cho, K. and Bengio, Y. (2014) Neural Machine Translation by Jointly Learning to Align and Translate. arXiv: 1409.0473.
|
|
[8]
|
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. In: Guyon, I., Von Luxburg, U., et al., Eds., Advances in Neural Information Processing Systems 30, Long Beach, 4-9 December 2017, 1-15.
|
|
[9]
|
Cui, S., Wang, R., Wei, J., et al. (2020) Self-Attention Based Visual-Tactile Fusion Learning for Predicting Grasp Outcomes. IEEE Robotics and Automation Letters, 5, 5827-5834. [Google Scholar] [CrossRef]
|
|
[10]
|
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929.
|
|
[11]
|
Bertasius, G., Wang, H. and Torresani, L. (2021) Is Space-Time Attention All You Need for Video Understanding? ICML, 2, 1-12.
|
|
[12]
|
Arnab, A., Dehghani, M., Heigold, G., et al. (2021) Vivit: A Video Vision Transformer. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 6836-6846. [Google Scholar] [CrossRef]
|
|
[13]
|
Cao, G., Zhou, Y., Bollegala, D., et al. (2020) Spatio-Temporal Attention Model for Tactile Texture Recognition. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 24 October 2020-24 January 2021, 9896-9902. [Google Scholar] [CrossRef]
|
|
[14]
|
Kim, H., Ohmura, Y. and Kuniyoshi, Y. (2021) Transformer-Based Deep Imitation Learning for Dual-Arm Robot Manipulation. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, 27 September-1 October 2021, 8965-8972. [Google Scholar] [CrossRef]
|
|
[15]
|
Li, J., Selvaraju, R., Gotmare, A., et al. (2021) Align Before Fuse: Vision and Language Representation Learning with Momentum Distillation. Advances in Neural Information Processing Systems, 34, 9694-9705.
|
|
[16]
|
Bao, H., Wang, W., Dong, L., et al. (2022) Vlmo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts. Advances in Neural Information Processing Systems, 35, 32897-32912
|
|
[17]
|
Cui, S., Wei, J., Li, X., et al. (2020) Generalized Visual-Tactile Transformer Network for Slip Detection. IFAC-Pa-persOnLine, 53, 9529-9534. [Google Scholar] [CrossRef]
|