合作原则视角下提示词对人工智能清洗海事英汉混合语料的影响研究——以DeepSeek清洗海事事故报告语料为例
The Impact of Prompts on AI-Assisted Corpus Cleaning in Maritime Domain from the Perspective of Cooperative Principle—A Case Study of DeepSeek in Processing Maritime Accident Reports
DOI: 10.12677/ml.2026.145377, PDF,   
作者: 张熙田:上海海事大学外国语学院,上海
关键词: 语料清洗提示词海事话语会话合作原则Corpus Cleaning Prompt Maritime Discourse Cooperative Principle
摘要: 本研究聚焦于提示词设计对人工智能大语言模型清洗海事英汉混合语料效能的影响。针对海事语料专业性强、英汉混杂的特点,研究设计了“简单指令”、“角色 + 指令”、“角色 + 指令 + 约束”及“角色 + 背景 + 指令 + 约束”四种提示词框架,以DeepSeek模型为实验工具,对收集的海事事故调查报告语料进行清洗对照实验。结果显示,“角色 + 指令 + 约束”型提示词能够产生最符合预期的清洗结果,在去除格式噪音的同时最大程度保持原文的专业内容与结构。研究进一步依据格赖斯(H.P. Grice)的会话合作原则对结果进行了理论阐释,指出最优提示词框架在信息量、真实性、相关性和表达方式上均满足了有效人机交互的准则。本研究为利用大语言模型高效处理垂直领域混合语料提供了可复用的提示词设计框架,对推动人工智能与语言学研究方法的结合具有参考价值。
Abstract: This study focuses on the impact of prompt design on the efficacy of large language models in cleaning English-Chinese mixed maritime corpora. In response to the highly specialized and linguistically hybrid nature of maritime texts, the research designs four prompt frameworks: “Simple Instruction,” “Role + Instruction,” “Role + Instruction + Constraint,” and “Role + Context + Instruction + Constraint.” Using the DeepSeek model as the experimental platform, a controlled cleaning experiment was conducted on a collected corpus of maritime accident investigation reports. The results indicate that the “Role + Instruction + Constraint” prompt yields the most desirable cleaning outcomes, effectively removing formatting noise while maximally preserving the original professional content and structure. Furthermore, the study provides a theoretical interpretation of the findings based on H.P. Grice’s Cooperative Principle, suggesting that the optimal prompt framework satisfies the maxims of quantity, quality, relation, and manner essential for effective human-machine interaction. This research offers a reusable prompt design framework for efficiently processing domain-specific mixed corpora using large language models, contributing to the integration of artificial intelligence and linguistic research methodologies.
文章引用:张熙田. 合作原则视角下提示词对人工智能清洗海事英汉混合语料的影响研究——以DeepSeek清洗海事事故报告语料为例[J]. 现代语言学, 2026, 14(5): 85-91. https://doi.org/10.12677/ml.2026.145377

参考文献

[1] 戴光荣, 郑宇. 机器翻译的数据与算法偏见规避策略研究[J]. 外语教学, 2025, 46(6): 51-57.
[2] 陈秋娜, 徐彩华, 孙素宇. “中文+”视域下职业汉语词表的研制——工程机械技术汉语分级词表示例[J]. 南宁职业技术大学学报, 2025, 33(1): 63-72.
[3] 桂诗春, 宁春岩. 语言学研究方法[J]. 外语教学与研究, 1997(3): 17-23, 83.
[4] 陈钊. 国内外语料库语言学发展研究概述[J]. 辽宁教育行政学院学报, 2021, 38(3): 83-87.
[5] Wang, M. and Hu, F. (2021) The Application of NLTK Library for Python Natural Language Processing in Corpus Research. Theory and Practice in Language Studies, 11, 1041-1049. [Google Scholar] [CrossRef
[6] Kageura, K. and Umino, B. (1996) Methods of Automatic Term Recognition. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication, 3, 259-289. [Google Scholar] [CrossRef
[7] Srivastava, J., Sanyal, S. and Srivastava, A.K. (2019) An Automatic and a Machine-Assisted Method to Clean Bilingual Corpus. ACM Transactions on Asian and Low-Resource Language Information Processing, 19, 1-19. [Google Scholar] [CrossRef
[8] Giray, L. (2023) Prompt Engineering with ChatGPT: A Guide for Academic Writers. Annals of Biomedical Engineering, 51, 2629-2633. [Google Scholar] [CrossRef] [PubMed]
[9] 胡壮麟. 语言学教程[M]. 第五版. 北京: 北京大学出版社, 2017: 173-180.
[10] 刘华, 陈凯艺. 从表达到调度: 提示语驱动的人机协同与语言能力再理解[J]. 湖南师范大学社会科学学报, 2025, 54(5): 138-147.