학술논문지식재산연구2026.03 발행

KorPatSTS 기반 특허 유사 기술 대조 학습을 활용한 특허 문헌 검색 모델에 관한 연구

A Contrastive Learning Patent Document Retrieval Model for Similar Patent Technologies Based on KorPatSTS

민재옥(한국특허정보원); 황솔빈(한국특허정보원); 전영훈(한국특허정보원); 채송아(한국특허정보원); 이봉건(한국특허정보원)

21권 1호, 181~207쪽

초록

본 연구는 특허 출원 과정에서 발생하는 기술적 중복으로 인한 거절 사유를 보다 정밀하게 판단하기위해, 출원 특허와 선행 특허 간의 의미적·기술적 유사성을 기반으로 인용 특허 문헌을 추천하는 고도화된 특허 검색 딥러닝 모델 NCE-KorPat과 고품질의 학습 데이터셋 KorPatSTS(Korean Patent Semantic Textual Similarity)를 제안한다. KorPatSTS는 한국 지식재산처 「AI 심사관 자문단」의 전문성을 바탕으로 출원 특허의 청구항 기술 구성 요소와 이를 설명하는 발명의 상세한 설명 문장에서 실제 거절 사유로 인용된 선행 특허의 대응되는 내용을 문장 단위 수준으로 정밀하게 매핑하여 구축한 유사 특허 기술 문장쌍 데이터셋이다. 본 연구에서는 특허 분야에 특화된 언어모델인 KorPatBERT 기반으로 CPC 서브그룹 단위 분류 파인튜닝에 이어서 KorPatSTS 데이터셋을 활용한 대조 학습(Contrastive Learning) 및 학습 최적화를 통해NCE-KorPat 모델을 개발하였으며, 이를 한국어 특허 검색 실험에 적용한 결과, 기존 최고 성능의 한국어 임베딩 모델과 최신 글로벌 임베딩 모델을 모두 상회하는 우수한 성능을 달성하였다. 본 연구는 한국 지식재산처 특허 심사관의 전문성과 도메인 지식을 결합하여 유사 기술 문장쌍을 체계적으로 구축하고 이를 실제 검색 모델에 적용한 최초의 연구로, 향후 특허 심사의 정확성과 효율성을동시에 높이는 데 실질적으로 기여할 것으로 기대된다.

Abstract

This study proposes an advanced deep-learning–based patent retrieval model, NCE-KorPat, and a high-quality training dataset, KorPatSTS (Korean Patent Semantic Textual Similarity), to more precisely assess grounds for rejection arising from technological redundancy during the patent application process. The model recommends cited patent documents based on semantic and technical similarity between an application and prior art patents. KorPatSTS is a sentence-level dataset of similar patent-technology sentence pairs, drawing on the expertise of the Korea Ministry of Intellectual Property (MOIP) AI Examiner Advisory Group. The dataset aligns the technical constituent elements of claims in an application patent and the corresponding sentences in a detailed description, which explains those elements by considering the matching portions of prior art patents cited as grounds for rejection, thereby forming highly precise sentence-level correspondence pairs. In this study, the NCE-KorPat model was first developed by fine-tuning KorPatBERT, a patent-domain–specific language model, for CPC subgroup-level classification and then subsequently applying contrastive learning and optimization using the KorPatSTS dataset. When applied to Korean patent retrieval experiments, the proposed model demonstrated superior performance, outperforming both the previously best-performing Korean embedding models and state-of-the-art global embedding models. To the best of our knowledge, this study represents the first attempt to construct similar-technology sentence pairs systematically by integrating the domain expertise of Korean patent examiners and directly applying them to a practical patent retrieval model. The proposed approach is expected to substantially contribute to improving the accuracy and efficiency of patent examinations in the future.

발행기관:: 한국지식재산연구원
분류:: 지적재산권법

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작