학술논문저스티스2025.08 발행

의거 요건 탐구 Ⅱ: 생성 AI의 출력과 의거

Rethinking Dependence (Copying) Requirement, Part Ⅱ:Do Generative AI Models Copy Training Data?

류시원(전남대학교)

209권, 472~526쪽

초록

사회 전반에 중대한 변혁을 불러오고 있는 인공지능(Artificial Intelligence, AI) 기술 중에서도 생성 AI(Generative AI)는 저작권법 논의 지평에 특별한 영향을 미친다. 생성 AI 출력 단계의 저작권 침해 쟁점에 관한 국내의 논의는 주로 공정이용 조항의 적용이나 TDM 예외조항 도입 등에 맞춰져 있고, 저작권 침해의 기본 요건인 의거에 관해서는 폭넓은 논의가 이루어지지 못했다. 입력(학습) 단계의 접근과, 출력(생성) 단계의 결과에서 보이는 유사성의 존재 외관만으로 의거관계가 당연히 추정될 수 있는 것으로 전제하는 견해들이 많다. 그러나 이들 종래 견해는 두 간접사실이 각각 관찰되는 입력과 출력 단계 사이에 위치한 생성 AI 모델이라는 고리에서 그 연결이 희석 내지 단절될 수 있다는 점에 대해서는 크게 고려하지 않았다. 또한 추정 법리의 당연한 적용을 주장하면서도 그와 관련하여 고려되어야 할 생성 AI 기술의 원리를 상세히 들여다 보지 않았다. 이 글은 생성 AI 모델에 입력된 학습데이터와 모델에서 출력된 AI 산출물 간 의거 판단 문제를 의거관계의 법적 구성과 생성 AI 기술의 특성을 바탕으로 검토하였다. 이 글에서는 의거 요건의 개념적 구성을 법적 인과관계를 요체로 하는 객관적 사실로서 규명하고, 학습데이터와 AI 산출물을 잇는 연결경로 위에 위치하는 생성 AI가 각 모델 기술별로 그 인과적 연결성에 차이가 있다는 점을 검토했다. 현대 생성 AI의 대세로 부상한 디퓨전 모델과 LLM을 중심으로, AI 모델을 그 학습데이터의 사본으로 평가하기 곤란하다는 점을 기술 문헌들에 보고된 암기 현상을 중심으로 살펴보고, 모델이 학습데이터의 사본이 아니라고 할 때 그 산출물은 학습데이터의 표현이 아닌 아이디어에 의거하였다고 보아야 하므로 간접의거에 의한 의거관계가 인정되기 어렵다는 점을 검토하였다. 또한 의거의 증명과 관련하여, 인간의 저작물 이용과정에 관한 경험법칙에 기반하여 형성된 의거 추정방법이 생성 AI의 학습ㆍ작동 원리와 조화되기 어려운 이유들을 살펴보고, 의거의 추정을 위한 유사성의 요구 수준 상향, 및 의거 추정에 있어 암기 현상의 재현 빈도와 같은 추가적인 간접사실을 고려할 것을 제안하였다. 마지막으로, 다수의 이해관계인이 참여하는 생성 AI 생태계에서 의거 인정의 법적 효과를 누구에게 귀속시킬 것인가의 문제와 관련해, 암기 현상 억제를 위한 기술적 조치의 적용과 같은 인과관계 약화 요인 및 특정 저작물의 출력을 의도한 이용자의 프롬프팅 등 인과관계 강화 요인이 존재하는 경우 이를 의거관계의 판단에 참작할 수 있다는 점에 대해 논하였다.

Abstract

Among artificial intelligence (AI) technologies, generative AI (GenAI) has a particular impact on copyright law. Until now, Korean scholarship on GenAI’s copyright infringement issues has mainly focused on the application of fair use provisions or the introduction of TDM exceptions, while paying little attention to ‘dependence’, one of the basic requirements for copyright infringement. It is commonly assumed that access to a prior work as training data in the input stage, and its similarity to the AI output generated in the output stage, are sufficient to presume dependence. However, these viewpoints have not sufficiently considered the possibility that the connection between the two indirect facts, observed in the input (access) and output (similarity) stages, may be diluted or severed due to the GenAI model acting as an intermediary. Additionally, while asserting the application of the presumption doctrine, they have not thoroughly examined the characteristics of GenAI technology that should be considered in this context. This article examines the issue of dependence between the training data input and the AI output, based on the legal structure of the dependence requirement and the specific features of GenAI technology. This article reconceptualizes the structure of dependence as objective facts centered on normative causality, and examines the differences in causal connection across various GenAI models. This article also adresses the difficulty of evaluating AI models as copies of their training data, based on the understanding of the memorization observed in the field of technical research. It also examines the difficulty of recognizing indirect dependence when the model cannot be regarded as a copy of the training data, as the model’s output is based on ideas rather than expressions contained in the training data. Also, this article examines the reasons why the traditional presumption method of dependence, which is based on the typical process by which a natural person copies works, is difficult to harmonize with the learning and operational mechanisms of GenAI models. The suggestions provided are, among others, further consideration of the degree of similarity and the frequency of memorization in presuming dependence. Finally, regarding the issue of attributing the legal effects of recognizing dependence within the GenAI ecosystem in which multiple stakeholders participate, this article reviews how weakening factors introduced by GenAI developers or service providers, and strengthening factors introduced by users, should be appropriately considered in determining the AI output’s dependence on training data.

발행기관:: 한국법학원
DOI:: http://dx.doi.org/10.29305/tj.2025.8.209.472
분류:: 기타법학

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작