大语言模型字幕英汉翻译译文质量评估——以文心一言和智谱清言为例
Evaluating the Quality of English-to-Chinese Subtitle Translation by Large Language Models—A Case Study of ERNIE Bot and Zhipu Qingyan
摘要: 随着全球流媒体平台对多语言字幕需求的激增,大语言模型(LLMs)成为字幕翻译的重要工具。然而,LLMs在文化适配、口语表达及语境把握等方面的表现尚待系统评估,现有自动化指标(如BLEU)难以反映多模态语境下的真实可接受性。本研究融合翻译目的论与图灵测试“自然等效性”理念,构建三维英译汉质量评估模型,从文化适应性、语句流畅性、翻译准确性三个维度对字幕质量进行量化评估。研究选取6类影视题材的9段字幕,对比人工译本、文心一言(ERNIE Bot)和智谱清言(Zhipu Qingyan)的译文表现,并结合201份问卷与统计检验进行实证分析。结果表明,文心一言在三个核心维度上综合表现最优,尤其在语句流畅性上具有显著优势;受访者对AI翻译的核心诉求集中于提升准确性与语境理解。本研究为多模态适配的翻译质量评估提供了可操作框架与实证依据。
Abstract: With the surging demand for multilingual subtitles on global streaming platforms, Large Language Models (LLMs) have emerged as pivotal tools for subtitle translation. However, the performance of LLMs in cultural adaptation, colloquial expression, and context comprehension remains to be systematically evaluated, as existing automated metrics (e.g., BLEU) fail to adequately reflect authentic acceptability within multimodal contexts. Integrating Skopos Theory with the concept of “Natural Equivalence” from the Turing Test, this study constructs a “Three-Dimensional English-to-Chinese Translation Quality Assessment Model” (3D-ECTQA Model) to quantitatively evaluate subtitle quality across three dimensions: Cultural Adaptability, Linguistic Fluency, and Translation Accuracy. The study selects nine subtitle segments from six genres of film and television, comparing the performance of human translation against ERNIE Bot and Zhipu Qingyan. Empirical analysis is conducted through 201 questionnaires and statistical tests. The results indicate that ERNIE Bot demonstrates the best overall performance across the three core dimensions, exhibiting a significant advantage in linguistic fluency. Furthermore, respondents’ core demands for AI translation center on enhancing accuracy and context understanding. This study provides an operable framework and empirical evidence for translation quality assessment adapted to multimodal contexts.
参考文献
|
[1]
|
Reiss, K., Nord, C. and Vermeer, H.J. (2014) Towards a General Theory of Translational Action: Skopos Theory Explained. Routledge.
|
|
[2]
|
高红. 中国电影字幕翻译之“切”的原则[J]. 上海翻译, 2015(2): 28-33.
|
|
[3]
|
Jacquet, B., Jamet, F. and Baratgin, J. (2021) On the Pragmatics of the Turing Test. 2021 International Conference on Information and Digital Technologies (IDT), Zilina, 22-24 June 2021, 123-130. [Google Scholar] [CrossRef]
|
|
[4]
|
Turing, A.M. (2021) Computing Machinery and Intelligence (1950). Mind, 59, 33-60.
|
|
[5]
|
Achiam, J., Adler, S., Agarwal, S., et al. (2023) GPT-4 Technical Report.
|
|
[6]
|
赵鑫, 窦志成, 文继荣. 大语言模型时代下的信息检索研究发展趋势[J]. 中国科学基金, 2023, 37(5): 786-792.
|
|
[7]
|
孙光耀, 赵志枭, 沈思, 等. 基于大语言模型的人文社会科学汉英机器翻译研究[J]. 数据分析与知识发现, 2025, 9(4): 32-45.
|
|
[8]
|
侯钰涛, 阿布都克力木·阿布力孜, 史亚庆, 等. 面向“一带一路”的低资源语言机器翻译研究[J]. 计算机工程, 2024, 50(4): 332-341.
|
|
[9]
|
赵衍, 张慧, 杨祎辰. 大语言模型在文本翻译中的质量比较研究——以《繁花》翻译为例[J]. 外语电化教学, 2024(4): 60-66+109.
|
|
[10]
|
Shen, S. and Garg, A. (2025) Adapting Large Language Models for Movie Domain with Narrative Understanding Tasks. Proceedings of the 29th Conference on Computational Natural Language Learning, Vienna, 31 July-1 August 2025, 187-200. [Google Scholar] [CrossRef]
|
|
[11]
|
雷静, 李明明. 机翻字幕质量评估的模型构建研究——以“人人译视界”和“讯飞听见字幕”为例[J]. 大连民族大学学报, 2023, 25(2): 176-182.
|