生成式人工智能训练数据合理使用问题研究
A Study on the Proper Use of Training Data in Generative Artificial Intelligence
DOI: 10.12677/ojls.2025.1312403, PDF,   
作者: 张陈晨:武汉工程大学法商学院(知识产权学院),湖北 武汉
关键词: 生成式人工智能训练数据合理使用Generative AI Training Data Fair Use
摘要: 生成式人工智能(Generative AI)作为新质生产力的重要代表,其发展高度依赖于海量数据的训练。然而,如何在保障著作权人合法权益的前提下,合法使用受版权保护的作品进行训练,已成为制约其发展的核心法律瓶颈。当前,我国《著作权法》及相关司法解释虽对合理使用制度作出框架性规定,但在面对生成式人工智能这种兼具技术创新与商业驱动的新型应用时,暴露出深刻的适用困境。一方面,合理使用制度通常适用于非营利性目的,而生成式人工智能服务提供者多为商业机构,其训练数据的使用具有明显的商业目的,因此难以直接适用合理使用制度。另一方面,合理使用制度中的“三步检验法”在生成式人工智能场景下的适用也面临较大挑战,如如何界定“影响正常使用”和“不合理损害”等问题,使得法律适用存在不确定性。此外,随着生成式人工智能技术的不断演进,其训练数据的来源和使用方式也在不断变化,传统的法律规则难以完全适应新的技术环境。因此,有必要从法律制度层面出发,系统分析生成式人工智能训练数据合理使用的法律困境,并提出相应的制度完善路径,以推动人工智能与知识产权法律体系的协调发展。
Abstract: As a significant representative of new quality productive forces, the development of Generative AI highly depends on training with massive data. However, how to legally use copyrighted works for training while protecting the legitimate rights and interests of copyright holders has become a core legal bottleneck restricting its development. Currently, although China’s Copyright Law and related judicial interpretations provide a framework for the fair use system, they reveal profound applicability challenges when confronted with new applications like Generative AI that combine technological innovation and commercial drive. On the one hand, the fair use system typically applies to non-commercial purposes, whereas Generative AI service providers are mostly commercial entities whose use of training data has evident commercial objectives, thus making it difficult to directly apply the fair use system. On the other hand, applying the “three-step test” from the fair use system in the context of Generative AI also faces significant challenges. Issues such as how to define “affecting the normal exploitation of the work” and “unreasonably prejudicing the legitimate interests of the copyright holder” create legal uncertainties. Furthermore, with the continuous evolution of Generative AI technology, the sources and usage methods of training data are constantly changing, making it difficult for traditional legal rules to fully adapt to the new technological environment. Therefore, it is necessary to systematically analyze the legal dilemmas concerning the fair use of training data for Generative AI from the perspective of the legal system and propose corresponding pathways for institutional improvement to promote the coordinated development of artificial intelligence and the intellectual property legal framework.
文章引用:张陈晨. 生成式人工智能训练数据合理使用问题研究[J]. 法学, 2025, 13(12): 2969-2975. https://doi.org/10.12677/ojls.2025.1312403

参考文献

[1] 王雪蕾. 人工智能数据挖掘适用著作权合理使用制度的审思[J]. 河北法学, 2025, 43(3): 185-200.
[2] 张伟君. 论大模型训练中使用数据的著作权规制路径[J]. 东方法学, 2025(2): 79-92.
[3] 施小雪. 重塑复制权: 生成式人工智能数据训练的合法化路径[J]. 东方法学, 2024(6): 70-83.
[4] 林秀芹. 人工智能时代著作权合理使用制度的重塑[J]. 法学研究, 2021, 43(6): 170-185.
[5] 知产财经(吴子芳). 生成式人工智能发展中值得关注的著作权问题[EB/OL].
https://mp.weixin.qq.com/s/2KEwCqmOTKG2WZwdQj1dww, 2024-07-12.
[6] 刘祖兵. 生成式人工智能使用在先作品数据的适法路径、梗阻与制度完善[J]. 西华大学学报(哲学社会科学版), 2025, 44(2): 17-29.
[7] 熊琦. 著作权合理使用司法认定标准释疑[J]. 法学, 2018(1): 182-192.
[8] 马一德, 汪婷. 人工智能训练数据版权侵权风险规制: 欧盟实践、本土困境与解决路径[J]. 德国研究, 2025, 40(1): 82-99, 150-151.
[9] 金海军. 演绎作品创作的专有权与合理使用抗辩[J]. 中国版权, 2022(6): 52-62.
[10] 张涛. 人工智能大模型训练的著作权困境及其调适路径[J]. 现代法学, 2025, 47(2): 189-208.
[11] 张平. 生成式人工智能实现突破创新需要良法善治——以数据训练合法性为例[J]. 新经济导刊, 2023(8): 26-28.
[12] 傅宏宇. 生成式人工智能的治理模式与风险辨析[J]. 数字法治, 2023(4): 191-206.
[13] 丁道勤. 生成式人工智能训练阶段的数据法律问题及其立法建议[J]. 行政法学研究, 2024(6): 16-28.