|
[1]
|
周建同, 杨海涛, 刘东, 等. 视频编码的技术基础及发展方向[J]. 电信科学, 2017, 33(8): 16-25,
|
|
[2]
|
Farahani, R., Timmerer, C. and Hellwagner, H. (2024) Towards Low-Latency and Energy-Efficient Hybrid P2P-CDN Live Video Streaming. arXiv: 2403.16985. https://arxiv.org/abs/2403.16985
|
|
[3]
|
Dalal, N. and Triggs, B. (2005) Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, 20-25 June 2005, 886-893. [Google Scholar] [CrossRef]
|
|
[4]
|
Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. [Google Scholar] [CrossRef]
|
|
[5]
|
Redmon, J., Divvala, S., Girshick, R., et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. [Google Scholar] [CrossRef]
|
|
[6]
|
Cheng, T., Song, L., Ge, Y., et al. (2024) YOLO-World: Real-Time Open-Vocabulary Object Detection. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 16901-16911. [Google Scholar] [CrossRef]
|
|
[7]
|
Kirillov, A., Mintun, E., et al. (2023) Segment Anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 3992-4003. [Google Scholar] [CrossRef]
|
|
[8]
|
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929. [Google Scholar] [CrossRef]
|
|
[9]
|
Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. arXiv: 2103.00020. https://api.semanticscholar.org/CorpusID:231591445
|
|
[10]
|
Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., et al. (2024) Qwen2.5 Technical Report. arXiv: 2412.15115. https://arxiv.org/abs/2412.15115
|
|
[11]
|
Zhu, D., Chen, J., Shen, X., Li, X. and Elhoseiny, M. (2023) MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv: 2304.10592. https://api.semanticscholar.org/CorpusID:258291930
|
|
[12]
|
Zhao, P., Zhang, H., Yu, Q., et al. (2024) Retrieval-Augmented Generation for AI-Generated Content: A Survey. arXiv: 2402.19473.
|
|
[13]
|
韩建亭, 张夙. 基于智能终端的视频通信业务服务质量评测模型研究[J]. 电信科学, 2013, 29(4): 27-32.
|
|
[14]
|
胡敏达, 徐泽华, 杨东鹏. 运营商视联网产业发展分析: 现状、挑战与未来路径[J]. 通信企业管理, 2025(1): 46-48.
|
|
[15]
|
郝鹏. 视联网数据管理的挑战与机遇[J]. 中国战略新兴产业, 2025(5): 120-122.
|