面向风格转变的文本抄袭检测方法
Style Change-Oriented Text Plagiarism Detection Method
摘要: 近年来,随着大语言模型(LLM)的飞速发展,抄袭检测任务面临着前所未有的挑战。针对这一问题,文章提出了一种面向风格转变的检测模型。所提出模型通过结合BERT与图注意力网络,能够有效学习文本的风格特征并实现风格分类。同时,还巧妙地引入对比学习机制,进一步增强了文本的风格特征表示能力,从而显著提升了模型对写作风格改变的检测性能。实验结果表明,在PAN 2022写作风格改变检测数据集上,本文提出的模型相较于现有代表性方法取得了更优秀的检测效果。此外,通过消融实验验证了风格增强机制的有效性,并证明了图注意力网络在捕捉文本写作风格特征方面的优势。本文提出的方法不仅提高了风格转变检测的准确性,还为后续抄袭检测任务提供了前置条件。
Abstract: In recent years, with the rapid development of large language models (LLM), plagiarism detection is facing unprecedented challenges. To solve this problem, this paper proposes a style change oriented detection model. By combining BERT and graph attention networks, the proposed model can effectively learn text style features and realize style classification. At the same time, it also cleverly introduces a contrastive learning mechanism to further enhance the representation ability of text style features, thus significantly improving the model’s detection performance of writing style changes. The experimental results show that the model proposed in this paper achieves better detection results than the existing representative methods on the PAN 2022 writing style change detection dataset. In addition, the effectiveness of the style enhancement mechanism was verified through ablation experiments, and the advantage of the graph attention network in capturing stylistic features of text writing was demonstrated. The method proposed in this paper not only improves the accuracy of style change detection, but also provides preconditions for subsequent plagiarism detection tasks.
文章引用:罗楚淋, 周元鼎, 韩彦芳. 面向风格转变的文本抄袭检测方法[J]. 建模与仿真, 2025, 14(5): 1051-1063. https://doi.org/10.12677/mos.2025.145456

参考文献

[1] Sindhu, B., Prathamesh, R.P., Sameera, M.B. and KumaraSwamy, S. (2024) The Evolution of Large Language Model: Models, Applications and Challenges. 2024 International Conference on Current Trends in Advanced Computing (ICCTAC), Bengaluru, 8-9 May 2024, 1-8. [Google Scholar] [CrossRef
[2] Pudasaini, S., Miralles-Pechuán, L., Lillis, D. and Llorens Salvador, M. (2024) Survey on AI-Generated Plagiarism Detection: The Impact of Large Language Models on Academic Integrity. Journal of Academic Ethics. [Google Scholar] [CrossRef
[3] Lee, J., Le, T., Chen, J. and Lee, D. (2023) Do Language Models Plagiarize? Proceedings of the ACM Web Conference 2023, Austin, 30 April-4 May 2023, 3637-3647. [Google Scholar] [CrossRef
[4] Foltýnek, T., Meuschke, N. and Gipp, B. (2019) Academic Plagiarism Detection: A Systematic Literature Review. ACM Computing Surveys, 52, 1-42. [Google Scholar] [CrossRef
[5] 郭凯威, 杨奎武, 张万里, 胡学先, 刘文钊. 面向文本识别的对抗样本攻击综述[J]. 中国图象图形学报, 2024, 29(9): 2672-2691.
[6] Franke, J. and Oberlander, M. (1993) Writing Style Detection by Statistical Combination of Classifiers in Form Reader Applications. Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR’93), Tsukuba, 20-22 October 1993, 581-584. [Google Scholar] [CrossRef
[7] Chong, M. and Specia, L. (2011) Lexical Generalisation for Word-Level Matching in Plagiarism Detection. Proceedings of the International Conference Recent Advances in Natural Language Processing, Hissar, 12-14 September 2011, 704-709.
[8] Alzahrani, S. and Salim, N. (2010) Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection. CLEF 2010 LABs and Workshops, Notebook Papers, 1-8.
[9] Bergroth, L., Hakonen, H. and Raita, T. (2000) A Survey of Longest Common Subsequence Algorithms. Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000, A Curuna, 27-29 September 2000, 39-48. [Google Scholar] [CrossRef
[10] Christian, H., Agus, M.P. and Suhartono, D. (2016) Single Document Automatic Text Summarization Using Term Frequency-Inverse Document Frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7, 285. [Google Scholar] [CrossRef
[11] Jakkula, V. (2006) Tutorial on Support Vector Machine (SVM). School of EECS, Washington State University.
[12] 周大为, 徐一搏, 王楠楠, 刘德成, 彭春蕾, 高新波. 针对未知攻击的泛化性对抗防御技术综述[J]. 中国图象图形学报, 2024, 29(7): 1787-1813.
[13] Chen, Q. and Wu, R. (2017) CNN Is All You Need. arXiv: 1712.09662. [Google Scholar] [CrossRef
[14] Huang, Z., Ye, Z., Li, S. and Pan, R. (2017) Length Adaptive Recurrent Model for Text Classification. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6-10 November 2017, 1019-1027. [Google Scholar] [CrossRef
[15] Devlin, J., Chang, M.W., Lee, K., et al. (2019) BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv: 1810.04805. [Google Scholar] [CrossRef
[16] Liu, Y., Ott, M., Goyal, N., et al. (2019) Roberta: A Robustly Optimized Bert Pretraining Approach. arXiv: 1907.11692. [Google Scholar] [CrossRef
[17] Reimers, N. and Gurevych, I. (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv: 1908.10084. [Google Scholar] [CrossRef
[18] Radford, A., Wu, J., Child, R., et al. (2019) Language Models Are Unsupervised Multitask Learners. OpenAI Blog, 1, 1-24.
[19] Xu, K., Hu, W., Leskovec, J., et al. (2018) How Powerful Are Graph Neural Networks? arXiv: 1810.00826. [Google Scholar] [CrossRef
[20] Bevendorff, J., Chulvi, B., Fersini, E., Heini, A., Kestemont, M., Kredens, K., et al. (2022) Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, and Style Change Detection. Experimental IR Meets Multilinguality, Multimodality, and Interaction, Bologna, 5-8 September 2022, 382-394. [Google Scholar] [CrossRef
[21] Veličković, P., Cucurull, G., Casanova, A., et al. (2017) Graph Attention Networks. arXiv:1710.10903. [Google Scholar] [CrossRef
[22] Popescu, M.C., Balas, V.E., Perescu-Popescu, L., et al. (2009) Multilayer Perceptron and Neural Networks. WSEAS Transactions on Circuits and Systems, 8, 579-588.