|
[1]
|
Mao, J.G., Shi, S.S., Wang, X.G., et al. (2023) 3D Object Detection for Autonomous Driving: A Comprehensive Survey. International Journal of Computer Vision, 131, 1909-1963. [Google Scholar] [CrossRef]
|
|
[2]
|
Roddick, T., Kendall, A. and Cipolla, R. (2019) Orthographic Feature Transform for Monocular 3D Object Detection. Proceedings of the British Machine Vision Conference, Cardiff, 9-12 September 2019, Article No. 285.
|
|
[3]
|
Philion, J. and Fidler, S. (2020) Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D. In: Vedaldi, A., et al., Eds., Proceedings of the European Conference on Computer Vision, Springer International Publishing, 194-210. [Google Scholar] [CrossRef]
|
|
[4]
|
Huang, J.J., Huang, G., Zhu, Z., et al. (2021) BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View. https://arxiv.org/abs/2112.11790
|
|
[5]
|
Li, Y., Ge, Z., Yu, G., Yang, J., Wang, Z., Shi, Y., et al. (2023) BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 1477-1485. [Google Scholar] [CrossRef]
|
|
[6]
|
Li, Y., Bao, H., Ge, Z., Yang, J., Sun, J. and Li, Z. (2023) BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 1486-1494. [Google Scholar] [CrossRef]
|
|
[7]
|
Huang, J.J. and Huang, G. (2022) BEVDet4D: Exploit Temporal Cues in Multi-Camera 3D Object Detection. https://arxiv.org/abs/2203.17054
|
|
[8]
|
Li, Z.Q., Wang, W.H., Li, H.Y., et al. (2022) BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers. In: Avidan, S., et al., Eds., Proceedings of the European Conference on Computer Vision, Springer, 1-18. [Google Scholar] [CrossRef]
|
|
[9]
|
Simonyan, K., Vedaldi, A. and Zisserman, A. (2014) Deep inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. Workshop at International Conference on Learning Representations, Banff, 14-16 April 2014, 1-8. [Google Scholar] [CrossRef]
|
|
[10]
|
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2017) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 618-626. [Google Scholar] [CrossRef]
|
|
[11]
|
Sundararajan, M., Taly, A. and Yan, Q.Q. (2017) Axiomatic Attribution for Deep Networks. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 3319-3328.
|
|
[12]
|
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. and Samek, W. (2015) On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE, 10, e0130140. [Google Scholar] [CrossRef] [PubMed]
|
|
[13]
|
Shrikumar, A., Greenside, P. and Kundaje, A. (2017) Learning Important Features through Propagating Activation Differences. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 3145-3153.
|
|
[14]
|
Ribeiro, M.T., Singh, S. and Guestrin, C. (2016) “Why Should I Trust You?” Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 1135-1144. [Google Scholar] [CrossRef]
|
|
[15]
|
Lundberg, S.M. and Lee, S.I. (2017) A Unified Approach to Interpreting Model Predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates, 4765-4774.
|
|
[16]
|
Xu, K., Ba, J., Kiros, R., et al. (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the 32nd International Conference on Machine Learning, Lille, 6-11 July 2015, 2048-2057.
|
|
[17]
|
Choi, E., Bahadori, M.T., Sun, J., et al. (2016) RETAIN: An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Curran Associates, 3504-3512.
|
|
[18]
|
Jain, S. and Wallace, B.C. (2019) Attention Is Not Explanation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, 3543-3556.
|
|
[19]
|
Abnar, S. and Zuidema, W. (2020) Quantifying Attention Flow in Transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 4190-4197. [Google Scholar] [CrossRef]
|
|
[20]
|
Chefer, H., Gur, S. and Wolf, L. (2021) Transformer Interpretability Beyond Attention Visualization. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, 782-791. [Google Scholar] [CrossRef]
|
|
[21]
|
Chefer, H., Gur, S. and Wolf, L. (2021) Generic Attention-Model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11-17 October 2021, 397-406. [Google Scholar] [CrossRef]
|
|
[22]
|
Ali, A., Schnake, T., Eberle, O., et al. (2022) XAI for Transformers: Better Explanations through Conservative Propagation. Proceedings of the 39th International Conference on Machine Learning, Baltimore, 17-23 July 2022, 435-451.
|
|
[23]
|
Ferrando, J., Gállego, G.I., Tsiamas, I. and Costa-Jussà, M.R. (2023) Explaining How Transformers Use Context to Build Predictions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Volume 1, 5486-5513. [Google Scholar] [CrossRef]
|
|
[24]
|
Achtibat, R., Hatefi, S.M.V., Dreyer, M., et al. (2024) AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers. Proceedings of the 41st International Conference on Machine Learning, Vienna, 21-27 July 2024, 135-168.
|
|
[25]
|
Arras, L., Puri, B., Kahardipraja, P., et al. (2025) A Close Look at Decomposition-Based XAI-Methods for Transformer Language Models. https://arxiv.org/abs/2502.15886
|
|
[26]
|
Petsiuk, V., Jain, R., Manjunatha, V., Morariu, V.I., Mehra, A., Ordonez, V., et al. (2021) Black-Box Explanation of Object Detectors via Saliency Maps. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, 11443-11452. [Google Scholar] [CrossRef]
|