基于大模型的AI通话分析智能体研究与实现
Research and Implementation of AI Voice Call Analysis Agent Based on Large Model
摘要: 随着国家“数据要素×”战略的深入推进,如何释放企业海量语音通话数据的潜在价值成为关键课题。针对当前企业在语音数据利用中存在的采集渠道分散、处理模式低效及语义挖掘浅层化等瓶颈,本文提出并实现了一种基于运营商合规数据的AI通话分析智能体。该智能体依托联通云犀平台与元景大模型,构建了“智能体实时动态调度CoE (Collaboration of Experts)”引擎和AI通话分析智能体。CoE引擎通过任务规划与多模型混合调度机制,实现了从语音采集、高精度转写、多视角语义理解到结构化价值输出的全链路自动化。AI通话分析智能体实现了基于利益相关者理论的多视角标签体系,以适应不同业务角色的分析需求。对比实验与实证分析结果表明,本文提出的方法在关键意图识别的F1分数上显著优于传统基线模型;在某物流企业的实际应用中,推动订单揽收率从62%提升至92%,投诉处理时效由48小时压缩至15分钟。本文方法为企业通信数据要素的资产化与市场化流通提供了可复制的技术范式。
Abstract: With the deepening implementation of the national “Data Elements×” strategy, unlocking the latent value of massive enterprise voice call data has become a critical imperative. Addressing the bottlenecks in current data utilization—such as fragmented collection channels, inefficient processing modes, and shallow semantic mining—this paper proposes and implements an AI-powered voice call analysis agent based on compliant telecom operator data. Leveraging the China Unicom Yunxi Platform and the Yuanjing Large Model, this paper constructs an “Intelligent Agent Real-time Dynamic Scheduling CoE (Collaboration of Experts)” engine and an AI-powered voice call analysis agent. The CoE engine implements task planning and multi-model hybrid scheduling mechanisms. This engine achieves end-to-end automation encompassing voice collection, high-precision transcription, multi-view semantic understanding, and structured value output. The AI-powered voice call analysis agent constructs a multi-view tagging system based on stakeholder theory, which is introduced to adapt to the analytical needs of various business roles. Comparative experiments and empirical analysis demonstrate that the proposed solution significantly outperforms traditional baseline models in the F1 score for key intent recognition. In a practical application within a logistics enterprise, the solution drove an increase in the order pickup rate from 62% to 92% and reduced customer complaint resolution time from 48 hours to 15 minutes. This research provides a replicable technical paradigm for the assetization and market circulation of enterprise communication data elements.
文章引用:廖红虹, 赵文博, 黄莉梅, 许健君, 郭昊淞, 刘剑波. 基于大模型的AI通话分析智能体研究与实现[J]. 计算机科学与应用, 2025, 15(12): 265-273. https://doi.org/10.12677/csa.2025.1512342

参考文献

[1] 中华人民共和国国家发展与改革委员会. 加快构建数据基础制度体系, 促进数据要素市场培育[EB/OL].
https://www.ndrc.gov.cn/xxgk/jd/jd/202212/t20221219_1343665.html, 2022-12-20.
[2] 中华人民共和国中央人民政府. 中共中央国务院关于构建更加完善的要素市场化配置体制机制的意见[EB/OL].
https://www.gov.cn/zhengce/2020-04/09/content_5500622.htm, 2020-04-09.
[3] 中华人民共和国国家数据局. 十七部门关于印发《“数据要素×”三年行动计划(2024-2026年)》的通知[EB/OL].
https://www.cac.gov.cn/2024-01/05/c_1706119078060945.htm, 2024-01-05.
[4] 芦宇. J公司智能语音业务的竞争战略研究[D]: [硕士学位论文]. 北京: 对外经贸大学, 2024.
[5] 中国联通政企在线. 联通云犀-COP平台[EB/OL].
https://gec.10010.com/sc/product/1003181, 2025-11-17.
[6] 中国联通元景大模型MaaS平台. 元景大模型MaaS平台介绍[EB/OL].
https://maas.ai-yuanjing.com/doc/pages/216543011/#_1-中国联通元景大模型maas平台, 2025-11-17.
[7] Yu, D. and Deng, L. (2015) Automatic Speech Recognition a Deep Learning Approach. Springer-Verlag.
[8] Chavan, R.S. and Sable, G.S. (2013) An Overview of Speech Recognition Using HMM. International Journal of Computer Science and Mobile Computing, 2, 233-238.
[9] 郭晓哲, 彭敦陆, 张亚彤, 等. GRS: 一种面向电商领域智能客服的生成-检索式对话模型[J]. 华东师范大学学报(自然科学版), 2020(5): 156-166.
[10] Mohamed, A., Dahl, G. and Hinton, G. (2010) Deep Belief Networks for Phone Recognition. NIPS Workshop on Deep Learning for Speech Recognition and Related Applications 2010, Whistler, 2010, 1-9.
[11] Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., et al. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29, 82-97. [Google Scholar] [CrossRef
[12] Sun, Y., ten Bosch, L. and Boves, L. (2010) Hybrid HMM/BLSTM-RNN for Robust Speech Recognition. In: Lecture Notes in Computer Science, Springer Berlin Heidelberg, 400-407. [Google Scholar] [CrossRef
[13] 王颖, 李承桓. 基于语音分析的智能质检系统设计[J]. 中国新通信, 2021, 23(12): 59-61.
[14] 李如雄. 基于语音分析的智能质检系统设计[J]. 自动化与仪器仪表, 2017(6): 114-116.
[15] Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems 2017, San Diego, 2017, 6000-6010.
[16] Li, Y., Yu, J., Zhang, M., Ren, M., Zhao, Y., Zhao, X., et al. (2024) Using Large Language Model for End-To-End Chinese ASR and Ner. arXiv: 2401.11382.
[17] Mohammadi, M., Li, Y., Lo, J. and Yip, W. (2025) Evaluation and Benchmarking of LLM Agents: A Survey. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto, 3-7 August 2025, 6129-6139. [Google Scholar] [CrossRef