汉译英翻译体的量化特征与大语言模型识别研究
A Study on Quantitative Features of Chinese‑to‑English Translationese and LLM‑Based Identification
摘要: 为构建汉译英翻译体的核心风格特征体系并评估国内主流大语言模型的识别效果,本研究以106条英语原生文本与汉译英翻译文本为语料,选取词汇丰富度(TTR)、平均句长、虚词占比及标点密度四类量化指标,系统探究翻译体的多维特征,并对比豆包与DeepSeek的文本识别能力。研究发现:汉译英翻译体呈现“词汇丰富度偏低、平均句长略短、低虚词占比、高标点密度”的显著特征,其中词汇丰富度是区分两类文本最稳定的指标,印证了翻译普遍性假说与汉英语言转换的独特规律;模型评估结果显示,DeepSeek整体识别正确率达97.17%,仅在80词以内超短文本中存在特征密度不足导致的局部偏差,而豆包正确率为83.02%,存在特征识别维度单一、领域化适配不足的系统性偏差。本研究明确了汉译英翻译体的核心量化特征体系,厘清了国内大语言模型的识别能力差异与问题根源,为翻译体量化研究、机器翻译优化及语料库建设提供了实证依据与技术参考。
Abstract: To construct a core stylistic feature system for Chinese-to-English translationese and evaluate the identification performance of mainstream domestic large language models, this study employs a corpus of 106 texts, including both native English texts and Chinese-to-English translated texts. Four quantitative indicators are selected: Type-Token Ratio (TTR), average sentence length, function word ratio, and punctuation density, to systematically explore the multi-dimensional characteristics of translationese and compare the text identification capabilities of Doubao and DeepSeek. The results reveal that Chinese-to-English translationese exhibits significant features of “lower lexical richness, slightly shorter average sentence length, lower function word ratio, and higher punctuation density”. Among these indicators, lexical richness is the most stable index for distinguishing the two types of texts, which supports the translation universals hypothesis and the unique laws of Chinese-English language transformation. The model evaluation results show that DeepSeek achieves an overall identification accuracy of 97.17%, with only local deviations in ultra-short texts within 80 words due to insufficient feature density. In contrast, Doubao obtains an accuracy of 83.02%, suffering from systematic biases such as single-dimensional feature recognition and insufficient domain adaptation. This study establishes a core quantitative feature system for Chinese-to-English translationese, clarifies the differences in identification capabilities and root causes of domestic large language models, and provides empirical evidence and technical references for quantitative research on translationese, machine translation optimization, and corpus construction.
文章引用:张旻璐, 李华东. 汉译英翻译体的量化特征与大语言模型识别研究[J]. 现代语言学, 2026, 14(5): 824-830. https://doi.org/10.12677/ml.2026.145464

参考文献

[1] Nida, E.A. and Taber, C.R. (1969) The Theory and Practice of Translation. Brill Publishers, 12-15.
[2] Shuttleworth, M. and Cowie, M. (1997) Dictionary of Translation Studies. St. Jerome Publishing, 89-90.
[3] Baker, M. (1993) Corpus Linguistics and Translation Studies: Implications and Applications. Target, 5, 223-243.
[4] Laviosa, S. (1998) The Corpus-Based Approach: A New Paradigm in Translation Studies. Journal des traducteurs, 43, 474-479. [Google Scholar] [CrossRef
[5] 王子瑞, 李红满. 译者文体印记与翻译体特征互动研究[J]. 外语教学, 2023, 44(5): 98-104.
[6] Liu, K.L. and Afzaal, M. (2021) Syntactic Complexity in Translated and Non-Translated Texts: A Corpus-Based Study of Simplification. PLOS ONE, 16, e0253454. [Google Scholar] [CrossRef] [PubMed]
[7] 秦洪武, 王克非. 基于对应语料库的英译汉语言特征分析[J]. 外语教学与研究, 2009, 41(2): 131-136.
[8] Wang, L. and Jiang, Y. (2024) Do Translation Universals Exist at the Syntactic-Semantic Level? A Study Using Semantic Role Labeling and Textual Entailment Analysis of English-Chinese Translations. Humanities and Social Sciences Communications, 11, Article No. 848. [Google Scholar] [CrossRef
[9] Church, K., Li, B., Vickers, P., Dudy, S. and Yue, R. (2025) Emerging Trends: Translationese. Natural Language Processing, 31, 965-981. [Google Scholar] [CrossRef
[10] Lembersky, L., Goldberg, Y. and Levy, O. (2023) Identifying Translationese with Neural Models. Computational Linguistics, 49, 457-492.
[11] Zhang, T. (2022) Deep Learning Classification Model for English Translation Styles Introducing Attention Mechanism. Mathematical Problems in Engineering, 2022, 1-10. [Google Scholar] [CrossRef
[12] 杨晓琳, 李德超. 语料库翻译研究背景下的“translationese”与“翻译共性”刍议[J]. 山东外语教学, 2024, 45(3): 108-119.
[13] 李德超, 王克非. 汉英同传中词汇模式的语料库考察[J]. 现代外语, 2012, 35(4): 409-415.
[14] 庞双子, 王克非. 基于历时语料库的文学翻译文本和原创文本语体特征演变研究[J]. 外国语, 2023, 46(6): 78-88.
[15] 周彦君. 英汉翻译中的翻译体研究[J]. 河北理工大学学报(社会科学版), 2009, 9(3): 172-174.
[16] Frawley, W.J. (1984) Translation and Language: Linguistic Theories of Translation. Garland Publishing, 210-215.
[17] 柴秀娟. Translationese 及相关概念探析[J]. 当代外语研究, 2012(3): 104-108.