基于预训练语言模型微调的DJI Osmo Pocket 3用户手册中英翻译研究
Fine-Tuning Pre-Trained Language Models for Bilingual Translation of the DJI Osmo Pocket 3 User Manual
摘要: 本文旨在利用预训练语言模型(Pre-Trained Language Models, PLM)对DJI Osmo Pocket 3用户手册进行中英文双语翻译优化。通过构建消费电子领域平行语料,并结合翻译语言模型(TLM)与交叉注意力掩码语言模型(CAMLM)进行定向微调,提高模型在专业技术文本中的翻译准确性与术语一致性。在数据处理阶段,针对中英文句法差异,分别采用基于规则的中文分句方法与基于统计模型的英文分句方法,构建高质量句对语料(N1 = 130),并通过数据增强扩展至N2 = 260。实验结果表明,微调后模型在BLEU、METEOR、TER等指标上分别提升12.3%、9.7%,并降低8.9%,人工评估亦显示术语准确性与语义一致性显著提升。研究表明,基于领域语料的定向微调能够有效提升预训练语言模型在技术文本翻译中的性能。
Abstract: This study investigates the application of pre-trained language models (PLMs) for bilingual translation of the DJI Osmo Pocket 3 user manual. A domain-specific parallel corpus was constructed and enhanced using Translation Language Modeling (TLM) and Cross-attention Masked Language Modeling (CAMLM) techniques. A hybrid sentence segmentation strategy was adopted to address linguistic differences between Chinese and English, yielding 130 high-quality aligned sentence pairs, which were further expanded to 260 via data augmentation. Experimental results show that the fine-tuned model improves BLEU and METEOR scores by 12.3% and 9.7%, respectively, while reducing TER by 8.9%. Human evaluation further confirms improvements in terminology accuracy and semantic consistency. The findings demonstrate that domain-adaptive fine-tuning significantly enhances translation performance in technical documentation.
文章引用:曾炳凤. 基于预训练语言模型微调的DJI Osmo Pocket 3用户手册中英翻译研究[J]. 现代语言学, 2026, 14(5): 570-576. https://doi.org/10.12677/ml.2026.145435

参考文献

[1] Min, B., Ross, H., Sulem, E., et al. (2023) Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey. ACM Computing Surveys, 56, 1-40.
[2] Liu, Y.H. Ott, M., Goyal, N., et al. (2019) Roberta: A Robustly Optimized Bert Pretraining Approach.
[3] Sun, C., Qiu, X., Xu, Y. and Huang, X. (2019) How to Fine-Tune BERT for Text Classification? Lecture Notes in Computer Science, Springer International Publishing, 194-206. [Google Scholar] [CrossRef
[4] Qasim, R., Bangyal, W.H., Alqarni, M.A. and Ali Almazroi, A. (2022) A Fine-Tuned Bert-Based Transfer Learning Approach for Text Classification. Journal of Healthcare Engineering, 2022, 1-17. [Google Scholar] [CrossRef] [PubMed]
[5] Dagan, G., Synnaeve, G. and Roziere, B. (2024) Getting the Most Out of Your Tokenizer for Pre-Training and Domain Adaptation.
[6] Kim, Y., Jernite, Y., Sontag, D. and Rush, A. (2016) Character-Aware Neural Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 30. [Google Scholar] [CrossRef