在线语料库与AI辅助翻译的“成本–收益”再评估
Reassessment of the “Cost-Benefit” of Online Corpora and AI-Assisted Translation
摘要: 在线文献库极大地提升了翻译研究的资料可及性,但其中存在的影像与转写内容错位、OCR识别误差以及接口与版权限制,使其难以直接作为成熟语料使用;文本工程处理——包括版本甄别、清洗、对齐与归一化——仍是保障学术可靠性的必要前提。与此同时,神经机器翻译与大语言模型显著提高了术语检索、候选生成、对齐核验与风格统一等环节的效率,但也伴随着事实性偏差与体裁漂移的风险。本文提出一条以“权威文本–受控AI–可复核报告”为核心逻辑的写作路径:将在线文献库定位为证据仓库,以联合国官方文件系统(UN ODS)与联合国术语库(UNTERM)为参照基准,借助COMET、BLEURT、MQM等自动化评估工具形成校验共识,并将AI工具限定在可追溯的辅助角色中。研究结论表明,通过权威文本、透明参数与最小可复核单元共同约束工作流程,可以在控制检索、对齐与核验所需人时成本的同时,提升最终结论的可验证性与跨语言可比性。
Abstract: Online corpora have significantly expanded the accessibility of translation studies, yet challenges such as image-transcription mismatches, OCR errors, and interface/copyright restrictions prevent them from serving as readily usable linguistic resources. Text engineering—including version authentication, cleaning, alignment, and normalization—remains a prerequisite for academic reliability. Meanwhile, neural machine translation and large language models enhance efficiency in terminology retrieval, candidate generation, alignment verification, and style harmonization, though they introduce risks of factual inaccuracy and genre drift. This paper proposes a writing framework grounded in the “authoritative text-controlled AI-verifiable report” triad: positioning online corpora as evidence repositories anchored by UN ODS and UNTERM, supplemented with evaluation consensus tools like COMET, BLEURT, and MQM, while constraining AI to a traceable assistant role. The conclusion emphasizes that when authoritative texts, transparent parameters, and minimal verifiable documentation collectively govern the workflow, human-hour costs for retrieval, alignment, and verification become controllable, while enhancing the verifiability of conclusions and cross-linguistic comparability.
文章引用:孙灵, 李升炜. 在线语料库与AI辅助翻译的“成本–收益”再评估[J]. 现代语言学, 2026, 14(1): 563-574. https://doi.org/10.12677/ml.2026.141073

参考文献

[1] Baker, M. (1995) Corpora in Translation Studies: An Overview and Some Suggestions for Future Research. Target. International Journal of Translation Studies, 7, 223-243. [Google Scholar] [CrossRef
[2] McEnery, T. and Hardie, A. (2012) Corpus Linguistics: Method, Theory and Practice. Cambridge University Press. [Google Scholar] [CrossRef
[3] Lommel, A., Uszkoreit, H. and Burchardt, A. (2014) Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics. Tradumàtica tecnologies de la traducció, 12, 455-463. [Google Scholar] [CrossRef
[4] 李晓倩. 中国翻译学知识体系的构建: 主要议题与未来发展[J]. 中国翻译, 2025, 46(3): 27-33.
[5] Koehn, P. (2020) Neural Machine Translation. Cambridge University Press. [Google Scholar] [CrossRef
[6] 王金铨. 计算机辅助翻译评价系统中的翻译质量评估[J]. 上海翻译, 2023(6): 52-57.
[7] 王巍巍. 中国语言服务行业应用人工智能辅助机器翻译工具的现状调研[J]. 外语电化教学, 2025(2): 25-30, 100.
[8] House, J. (2015) Translation Quality Assessment: Past and Present. Routledge.
[9] 耿芳, 胡健. 人工智能辅助译后编辑新方向——基于ChatGPT的翻译实例研究[J]. 中国外语, 2023, 20(3): 41-47.
[10] 周兴华, 王传英. 人工智能技术在计算机辅助翻译软件中的应用与评价[J]. 中国翻译, 2020, 41(5): 121-129.
[11] Post, M. (2018) A Call for Clarity in Reporting BLEU Scores. Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, 31 October-1 November 2018, 186-191. [Google Scholar] [CrossRef
[12] Baker, M. (1993) Corpus Linguistics and Translation Studies—Implications and Applications. In: Baker, M., Francis, G. and Tognini-Bonelli, E., Eds., Text and Technology, John Benjamins Publishing Company, 233-250. [Google Scholar] [CrossRef
[13] Bowker, L. and Pearson, J. (2002) Working with Specialized Language: A Practical Guide to Using Corpora. Routledge. [Google Scholar] [CrossRef
[14] Olohan, M. (2004) Introducing Corpora in Translation Studies. Routledge. [Google Scholar] [CrossRef
[15] Olohan, M. (2016) Scientific and Technical Translation. Routledge. [Google Scholar] [CrossRef
[16] Zanettin, F. (2012) Translation-Driven Corpora: Corpus Resources for Descriptive and Applied Translation Studies. Routledge.
[17] Zanettin, F. and Rundle, C. (2022) The Routledge Handbook of Translation and Methodology. Routledge. [Google Scholar] [CrossRef
[18] Koehn, P. (2010) Statistical Machine Translation. Cambridge University Press. [Google Scholar] [CrossRef
[19] Toral, A. and Way, A. (2018) What Level of Quality Can Neural Machine Translation Attain on Literary Text? In: Moorkens, J., Castilho, S., Gaspari, F. and Doherty, S., Eds., Translation Quality Assessment, Springer International Publishing, 263-287. [Google Scholar] [CrossRef
[20] Castilho, S., Moorkens, J., Gaspari, F., Calixto, I., Tinsley, J. and Way, A. (2017) Is Neural Machine Translation the New State of the Art? The Prague Bulletin of Mathematical Linguistics, 108, 109-120. [Google Scholar] [CrossRef
[21] Castilho, S., Moorkens, J., Way, A. and Gaspari, F. (2020) Machine Translation and Post-Editing in Practice. Springer.
[22] Reiß, K. and Vermeer, H.J. (1984) Grundlegung Einer Allgemeinen Translationstheorie. Niemeyer. [Google Scholar] [CrossRef
[23] Toury, G. (1995) Descriptive Translation Studies—And Beyond. John Benjamins Publishing Company. [Google Scholar] [CrossRef
[24] Toury, G. (2012) Descriptive Translation Studies—And Beyond. 2nd Edition, John Benjamins Publishing Company. [Google Scholar] [CrossRef
[25] Venuti, L. (1995) The Translator’s Invisibility: A History of Translation. Routledge.
[26] Venuti, L. (2017) The Translator’s Invisibility. 2nd Edition, Routledge.
[27] Papineni, K., Roukos, S., Ward, T. and Zhu, W. (2001) BLEU: A Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting on Association for Computational LinguisticsACL’02, Philadelphia, 7-12 July 2002, 311-318. [Google Scholar] [CrossRef
[28] Popović, M. (2015) chrF: Character N-Gram F-Score for Automatic MT Evaluation. Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, 17-18 September 2015, 392-395. [Google Scholar] [CrossRef
[29] Rei, R., Stewart, C., Farinha, A.C. and Lavie, A. (2020) COMET: A Neural Framework for MT Evaluation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 16-20 November 2020, 2685-2702. [Google Scholar] [CrossRef
[30] Sellam, T., Das, D. and Parikh, A.P. (2020) BLEURT: Learning Robust Metrics for Text Generation. Proceedings of ACL 2020, 5-10 July 2020, 7881-7892.
https://aclanthology.org/2020.acl-main.704/
[31] 侯林平. 语料库辅助的翻译认知过程研究模式: 特征与趋势[J]. 外语研究, 2019, 36(6): 69-75.
[32] 刘晓东. 认知导向的翻译语料库研制与评析[J]. 外语学刊, 2023(4): 52-60.
[33] 王华树, 刘世界. 中国语言服务企业机器翻译与译后编辑应用调查研究[J]. 北京第二外国语学院学报, 2021, 43(5): 23-37.
[34] 刘济超, Ömer Sahin Ganiyusufoglu, 许文胜. 计算机辅助同声传译系统的设计、开发与验证[J]. 外语教学与研究, 2025, 57(3): 463-475.
[35] Läubli, S., Sennrich, R. and Volk, M. (2018) A Case for Document-Level Evaluation in Machine Translation. Proceedings of the Third Conference on Machine Translation (WMT 2018), Brussels, 31 October-1 November, 1134-1144.