학술논문차세대컨버전스정보서비스기술논문지2024.02 발행KCI 피인용 3

자연어처리와 기계학습을 활용한 기술 특허 분류

Classification of Technology Patents Using Natural Language Processing and Machine Learning Models

이우식(경상국립대학교); 이예진(경상국립대학교)

13권 1호, 93~102쪽

초록

최근 빅데이터 시대의 도래로 인공신경망을 포함한 기계학습 모델들이 의학, 유전체 연구, 기업 경영 등 다양한 분야에 광범위한 영향을 미치고 있음에도 불구하고, 기술 특허 분석에 자연어 처리와 기계학습을 적용한 국내 리걸테크 연구는 충분히 발전하지 못한 상황이다. 본 연구는 이산화탄소 포집·활용에 대한 특허 데이터, 자연어 전처리 기법 그리고 기계학습모형 기반의 기술 특허 분류 시스템을 설계하고, 정확도, 카파 상관계수 그리고 F1-점수를 비교·분석하였다. 주요 결과를 요약·정리하면 다음과 같다. 첫째, 다섯 가지 이산화탄소 포집 및 활용 기술 분류에서 그래디언트 부스팅, 랜덤 포레스트, 의사결정나무 순으로 성능이 나타났다. 이를 통해 단일 결정 나무보다 배깅과 부스팅 기법을 적용한 랜덤포레스트 모형과 그래디언트 부스팅 모형이 더 우수한 학습 성능을 제공함을 확인할 수 있었다. 둘째, 특허의 요약과 제1 청구항을 활용한 기술 분류에서 비슷한 성능이 관찰되었다. 이는 자연어 처리 과정에서 중요한 키워드를 명사로만 추출한 것이 주요 요인으로 보인다. 본 연구는 자연어 전처리와 기계학습 모형을 이산화탄소 포집 및 활용 기술 특허 분류에 처음으로 적용한 의미 있는 연구로 사무 로봇 기술을 통해 반복적인 업무를 자동화하는 데 응용될 수 있는 가능성을 제시한다.

Abstract

With the advent of the big data era, machine learning models, including artificial neural networks, have had a wide-ranging impact on various fields such as medicine, genomics research, and corporate management. Despite this, domestic research in legal tech, particularly applying natural language processing and machine learning to technical patent analysis, has not sufficiently developed. This study designs a system for classifying patents on Carbon Dioxide Capture and Utilization (CCU) based on patent data, natural language pre-processing techniques, and machine learning models, and compares and analyzes accuracy, kappa coefficient, and F1-score. The main findings are summarized as follows: First, in classifying five types of CCU technologies, the performance was observed in the order of gradient boosting, random forest, and decision trees. This confirms that random forest and gradient boosting models, which apply bagging and boosting techniques, respectively, provide superior learning performance over single decision trees. Second, similar performance was observed in classifying technologies based on the abstract and first claim of patents. This suggests that the extraction of important keywords as nouns during the natural language processing is a significant factor. This research is meaningful as it applies natural language pre-processing and machine learning models to the classification of CCU technology patents for the first time, presenting the potential for applying robotic automation technology to automate repetitive tasks.

발행기관:: 차세대컨버전스정보서비스학회
DOI:: http://dx.doi.org/10.29056/jncist.2024.02.09
분류:: 학제간연구

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작