Bigbird-Pegasus 기반의 청구범위 생성요약을 통한 특허분류 방법론
A Methodology for Patent Classification through Bigbird-Pegasus Based Claim Abstractive Summarization
이영재(고려대학교); 김지호(고려대학교); 이홍철(고려대학교)
51권 1호, 61~72쪽
초록
Patent classification is a crucial process in the examination procedure, matching the invention technology of the application with technical classification codes, and manually classifying is significant time and cost. To automate this, various machine learning-based AI methods have been researched, and recently, Transformer-based patent classification models have shown excellent performance. However, Transformer models are limited to a maximum of 512 tokens for input, there is a possibility of information loss. This study proposes a method to improve performance by using Bigbird-Pegasus and PatentSBERTa to summarize the entire text data of the claims into a fixed size before inputting it into the classification model. Experimental results show that the F1 score achieved up to 67.554% in a small-scale patent data environment, representing a 4% point performance improvement over existing methods. Additionally, this study suggests an effective patent automatic classification method through the optimal combination of summarized text and other patent items.
Abstract
Patent classification is a crucial process in the examination procedure, matching the invention technology of the application with technical classification codes, and manually classifying is significant time and cost. To automate this, various machine learning-based AI methods have been researched, and recently, Transformer-based patent classification models have shown excellent performance. However, Transformer models are limited to a maximum of 512 tokens for input, there is a possibility of information loss. This study proposes a method to improve performance by using Bigbird-Pegasus and PatentSBERTa to summarize the entire text data of the claims into a fixed size before inputting it into the classification model. Experimental results show that the F1 score achieved up to 67.554% in a small-scale patent data environment, representing a 4% point performance improvement over existing methods. Additionally, this study suggests an effective patent automatic classification method through the optimal combination of summarized text and other patent items.
- 발행기관:
- 대한산업공학회
- 분류:
- 산업공학