生成式人工智能数据训练的著作权法因应
Copyright Law Responses to Generative Artificial Intelligence Training Data
摘要: 生成式人工智能已经成为人工智能领域的研究重点,其生成作品的质量有时已经可以超越人类作品。这一现象离不开对生成式人工智能进行海量数据练,通过机器学习优化模型性能、提升创新能力。这些数据既有可能是公共领域的开源数据库也无法避免使用到在版权法保护范围内的作品数据,因而也就产生了数据输入端侵犯原作品著作权的问题。现有的授权使用规则与法定许可规则会极大增加AI运营商的开发成本,不利于人工智能产业的创新发展。为了保护新业态发展模式,在输入阶段,我国应对人工智能输入端数据设置合理使用规则,明确其适用的四要素标准和前提条件;在输出阶段,应当明确生成式人工智能输出内容侵权问题时AI运营商的过错责任承担,并且要呼吁AI运营商构建合理的预防机制与补救措施,如设置关键词过滤、举报投诉机制等。通过上述建议以期促进人工智能产业的发展,实现促进版权保护与维护公共利益的平衡。
Abstract: Generative artificial intelligence has become a research focus in the field of artificial intelligence, and the quality of its generated works sometimes surpasses that of human works. This phenomenon is inseparable from the massive data training of generative artificial intelligence, through which the performance of the model is optimized and the innovation ability is enhanced by machine learning. These data may be from open-source databases in the public domain, but it is also inevitable to use works data within the scope of copyright protection, thus giving rise to the problem of copyright infringement of the original works at the data input end. The existing authorization and use rules and statutory licensing rules will greatly increase the development costs of AI operators, which is not conducive to the innovative development of the artificial intelligence industry. In order to protect the new business model, at the input stage, China should set reasonable use rules for the data input end of artificial intelligence, clarify the four-element standards and preconditions for their application; at the output stage, it should clarify the liability for fault of AI operators when the output content of generative artificial intelligence infringes on copyright, and also call on AI operators to build reasonable prevention mechanisms and remedial measures, such as setting up keyword filtering and reporting and complaint mechanisms. Through the above suggestions, it is expected to promote the development of the artificial intelligence industry and achieve a balance between promoting copyright protection and maintaining public interests.
文章引用:胡衍惠. 生成式人工智能数据训练的著作权法因应[J]. 法学, 2025, 13(11): 2457-2463. https://doi.org/10.12677/ojls.2025.1311336

参考文献

[1] 刘强, 孙青山. 人工智能创作物著作权侵权问题研究[J]. 湖南大学学报(社会科学版), 2020, 34(3): 140-146.
[2] 陈锐, 江奕辉. 生成式AI的治理研究: 以ChatGPT为例[J]. 科学学研究, 2024, 42(1): 21-30.
[3] 冯晓青. 论著作权限制的合理性及其在著作权制度价值构造中的意义[J]. 湖南社会科学, 2011(5): 49-52.
[4] 张平. 人工智能生成内容著作权合法性的制度难题及其解决路径[J]. 法律科学(西北政法大学学报), 2024, 42(3): 18-31.
[5] 阮开欣, 黄歆瑜. 生成式人工智能数据训练中的版权问题研究[J]. 中国版权, 2024(5): 61-72.
[6] 林秀芹. 人工智能时代著作权合理使用制度的重塑[J]. 法学研究, 2021, 43(6): 170-185.
[7] 廖小莉, 潘凤湘. 生成式人工智能数据挖掘合理使用适用性及规范路径[J]. 产业创新研究, 2025(6): 14-18.
[8] 张镇涛. 人工智能生成作品的著作权之问[J]. 法制与社会, 2020(2): 214-215.
[9] 魏远山. 生成式人工智能训练数据的著作权法因应: 确需设置合理使用规则吗? [J]. 图书情报知识, 2025, 42(1): 78-88.