|
[1]
|
Yang, X., Dong, J., Cao, Y., Wang, X., Wang, M. and Chua, T. (2020) Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 25-30 July 2020, 1339-1348. [Google Scholar] [CrossRef]
|
|
[2]
|
Wang, Z., Zhong, Y., Miao, Y., et al. (2022) Contrastive Video-Language Learning with Fine-Grained Frame Sampling. arXiv: 2210.05039.
|
|
[3]
|
Chen, S., Zhao, Y., Jin, Q. and Wu, Q. (2020) Fine-Grained Video-Text Retrieval with Hierarchical Graph Reasoning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 10635-10644. [Google Scholar] [CrossRef]
|
|
[4]
|
Bar-Shalom, G., Leifman, G. and Elad, M. (2024) Weakly-Supervised Representation Learning for Video Alignment and Analysis. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2024, 6895-6904. [Google Scholar] [CrossRef]
|
|
[5]
|
Luo, H., Ji, L., Zhong, M., Chen, Y., Lei, W., Duan, N., et al. (2022) Clip4clip: An Empirical Study of CLIP for End to End Video Clip Retrieval and Captioning. Neurocomputing, 508, 293-304. [Google Scholar] [CrossRef]
|
|
[6]
|
Ma, Y., Xu, G., Sun, X., Yan, M., Zhang, J. and Ji, R. (2022) X-CLIP: End-To-End Multi-Grained Contrastive Learning for Video-Text Retrieval. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, 10-14 October 2022, 638-647. [Google Scholar] [CrossRef]
|
|
[7]
|
Gorti, S.K., Vouitsis, N., Ma, J., Golestan, K., Volkovs, M., Garg, A., et al. (2022) X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 4996-5005. [Google Scholar] [CrossRef]
|
|
[8]
|
Zhang, H., Zeng, P., Gao, L., Song, J. and Shen, H.T. (2024) MPT: Multi-Grained Prompt Tuning for Text-Video Retrieval. Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, 28 October-1 November 2024, 1206-1214. [Google Scholar] [CrossRef]
|
|
[9]
|
Wang, Z., Sung, Y., Cheng, F., Bertasius, G. and Bansal, M. (2023) Unified Coarse-To-Fine Alignment for Video-Text Retrieval. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 2804-2815. [Google Scholar] [CrossRef]
|
|
[10]
|
Bain, M., Nagrani, A., Varol, G. and Zisserman, A. (2021) Frozen in Time: A Joint Video and Image Encoder for End-To-End Retrieval. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 1708-1718. [Google Scholar] [CrossRef]
|
|
[11]
|
Wang, J., Wang, P., Sun, G., Liu, D., Dianat, S., Rao, R., et al. (2024) Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 16551-16560. [Google Scholar] [CrossRef]
|
|
[12]
|
Li, H., Song, J., Gao, L., et al. (2023) Prototype-Based Aleatoric Uncertainty Quantification for Cross-Modal Retrieval. Advances in Neural Information Processing Systems, 36, 24564-24585.
|