面向大语言模型的黑盒文本水印算法
Black-Box Text Watermarking Algorithm towards Large Language Models
DOI: 10.12677/mos.2025.144276, PDF,    科研立项经费支持
作者: 姚奕辰, 熊 成, 周元鼎, 韩彦芳:上海理工大学光电信息与计算机工程学院,上海
关键词: 机器生成文本检测文本水印强化学习文本质量AI-Generated Text Detection Text Watermarking Reinforcement Learning Text Quality
摘要: 为应对大语言模型快速发展及其API接口日常生活使用带来的机器生成文本内容检测问题,本文提出了一种面向大语言模型生成文本的黑盒文本水印算法。该算法使用强化学习技术,通过设计的水印嵌入状态、水印嵌入智能体和水印嵌入奖励三个部分进行文本水印嵌入的训练,使得方法能够在水印可检测性、文本语义一致性和抗常规文本水印攻击上均表现出色。与其他算法相比,所提方法在平均准确率达到94.22%的同时,仍能保持更优的文本质量。所提出的算法不仅提升了文本水印的嵌入质量,也为保护模型使用和知识产权保障提供了支持。
Abstract: To address the detection issue of AI-generated text content arising from the rapid development of large language models and the daily usage of their API interfaces, this paper proposes a black-box text watermarking algorithm for text generated from the API of large language model. The algorithm employs reinforcement learning techniques to train text watermark embedding through three parts: watermark embedding statement, watermark embedding agent, and watermark embedding reward, which enables the good performance in watermark detectability, text semantic consistency, and resistance to conventional text watermarking attacks. The proposed algorithm has better text quality than other algorithms with an average detection accuracy of 94.22%. The proposed algorithm not only improves the embedding quality of text watermarking, but also provides significant support for protecting model usage and intellectual property rights.
文章引用:姚奕辰, 熊成, 周元鼎, 韩彦芳. 面向大语言模型的黑盒文本水印算法[J]. 建模与仿真, 2025, 14(4): 172-180. https://doi.org/10.12677/mos.2025.144276

参考文献

[1] Wu, T., He, S., Liu, J., Sun, S., Liu, K., Han, Q., et al. (2023) A Brief Overview of ChatGPT: The History, Status Quo and Potential Future Development. IEEE/CAA Journal of Automatica Sinica, 10, 1122-1136. [Google Scholar] [CrossRef
[2] Biswas, S.S. (2023) Role of Chat GPT in Public Health. Annals of Biomedical Engineering, 51, 868-869. [Google Scholar] [CrossRef] [PubMed]
[3] Allam, H., Dempere, J., Akre, V., Parakash, D., Mazher, N. and Ahamed, J. (2023) Artificial Intelligence in Education: An Argument of Chat-GPT Use in Education. 2023 9th International Conference on Information Technology Trends (ITT), Dubai, 24-25 May 2023, 151-156. [Google Scholar] [CrossRef
[4] Ilias, L., Michail Kazelidis, I. and Askounis, D. (2024) Multimodal Detection of Bots on X (Twitter) Using Transformers. IEEE Transactions on Information Forensics and Security, 19, 7320-7334. [Google Scholar] [CrossRef
[5] Gradon, K.T. (2023) Electric Sheep on the Pastures of Disinformation and Targeted Phishing Campaigns: The Security Implications of ChatGPT. IEEE Security & Privacy, 21, 58-61. [Google Scholar] [CrossRef
[6] Khalil, M. and Er, E. (2023) Will ChatGPT Get You Caught? Rethinking of Plagiarism Detection. In: Zaphiris, P., Ioannou, A., Eds., Learning and Collaboration Technologies. Lecture Notes in Computer Science, Springer, 475-487. [Google Scholar] [CrossRef
[7] Macdonald, C., Adeloye, D., Sheikh, A. and Rudan, I. (2023) Can ChatGPT Draft a Research Article? An Example of Population-Level Vaccine Effectiveness Analysis. Journal of Global Health, 13, Article 01003. [Google Scholar] [CrossRef] [PubMed]
[8] Grbic, D.V. and Dujlovic, I. (2023) Social Engineering with ChatGPT. 2023 22nd International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, 15-17 March 2023, 1-5. [Google Scholar] [CrossRef
[9] Guo, B., Zhang, X., Wang, Z., et al. (2023) How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. arXiv:2301.07597
[10] Chen, Y., Kang, H., Zhai, V., et al. (2023) GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content. arXiv:2305.07969v2
[11] Mitchell, E., Lee, Y., Khazatsky, A., et al. (2023) Detectgpt: Zero-Shot Machine-Generated Text Detection Using Probability Curvature. International Conference on Machine Learning (ICML). Honolulu, 23-29 July 2023, 24950-24962.
[12] Yang, X., Cheng, W., Wu, Y., et al. (2023) DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text. arXiv:2305.17359
[13] Yu, P., Chen, J., Feng, X. and Xia, Z. (2025) CHEAT: A Large-Scale Dataset for Detecting ChatGPT-Written Abstracts. In: IEEE Transactions on Big Data, IEEE, 1-9. [Google Scholar] [CrossRef
[14] Liang, W., Yuksekgonul, M., Mao, Y., Wu, E. and Zou, J. (2023) GPT Detectors Are Biased against Non-Native English Writers. Patterns, 4, Article 100779. [Google Scholar] [CrossRef] [PubMed]
[15] Abdelnabi, S. and Fritz, M. (2021) Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding. 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, 24-27 May 2021, 121-140. [Google Scholar] [CrossRef
[16] Kirchenbauer, J., Geiping, J., Wen, Y., et al. (2023) A Watermark for Large Language Models. International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 23-29 July 2023, 17061-17084.
[17] Dathathri, S., See, A., Ghaisas, S., Huang, P., McAdam, R., Welbl, J., et al. (2024) Scalable Watermarking for Identifying Large Language Model Outputs. Nature, 634, 818-823. [Google Scholar] [CrossRef] [PubMed]
[18] Jacob, D., Ming-Wei, C., Kenton, L, et al. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, Minneapolis, 2-7 June 2019, 4171-4186.
[19] Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781.
[20] Reimers, N. and Gurevych, I. (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong SAR, 3-7 November 2019, 3982-3992. [Google Scholar] [CrossRef
[21] Schulman J, Wolski F, Dhariwal P, et al. (2017) Proximal Policy Optimization Algorithms. arXiv:1707.06347.
[22] He, X., Xu, Q., Lyu, L., Wu, F. and Wang, C. (2022) Protecting Intellectual Property of Language Generation Apis with Lexical Watermark. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 10758-10766. [Google Scholar] [CrossRef
[23] Radford, A., Wu, J., Child, R., et al. (2019) Language Models Are Unsupervised Multitask Learners. OpenAI Blog, 1, 9.