학술논문한국컴퓨터정보학회논문지2025.08 발행

한국어 법률 QA를 위한 하이브리드 RAG 시스템 최적화

Optimization of a Hybrid RAG System for Korean Legal QA

서준원(인하공업전문대학); 민정혜(인하공업전문대학)

30권 8호, 53~63쪽

초록

법률 질의응답 시스템에는 높은 수준의 신뢰성과 정확성이 요구되며, 이를 위한 방법으로 최근대규모 언어 모델 (LLM)을 활용한 연구가 활발히 진행되고 있다. 그러나 사전학습 기반의 LLM은최신 판례나 세부 법령의 반영이 어려워, 사실과 다른 내용을 생성하는 이른바 ‘환각(hallucination)’ 현상이 발생할 수 있다. 이를 보완하기 위해, 외부 문서를 기반으로 응답을 생성하는 검색 증강 생성 (Retrieval-Augmented Generation, RAG) 기법이 주목받고 있다. 본 연구에서는한국어 법률 도메인에 특화된 RAG 시스템을 구축하고자, 문서 분할, 임베딩 모델, 검색 기법을조합하여 최적의 구조를 설계하고 성능을 분석하였다. 실험 결과 의미 기반으로 청킹된 문서를대상으로, 한국어 법률 데이터로 파인튜닝한 E5 임베딩 모델과 BM25를 결합한 하이브리드 검색전략을 적용했을 때 가장 우수한 성능을 보였으며, 검색 정확도와 응답의 사실성 모두에서 기존방법을 상회하는 결과를 얻었다.

Abstract

Legal question-answering systems demand high reliability and accuracy, and large language models (LLMs) have recently been actively explored to meet these requirements. However, pretrained LLMs often struggle to reflect the most recent case law or specific legal provisions, which can lead to so-called “hallucination” — the generation of factually incorrect information. To address this issue, Retrieval-Augmented Generation (RAG), which generates responses based on external documents, has received growing attention. This study aims to develop a RAG system tailored to the Korean legal domain by optimizing key components including document chunking, embedding models, and retrieval strategies. Experimental results show that combining BM25 with a fine-tuned embedding model trained on Korean legal data, applied to semantically chunked documents, yields the best performance. The proposed hybrid retrieval approach outperformed baseline methods in both retrieval accuracy and factual consistency of the generated answers.

발행기관:: 한국컴퓨터정보학회
DOI:: http://dx.doi.org/10.9708/jksci.2025.30.08.053
분류:: 컴퓨터학

AI 법률 상담

이 논문의 주제에 대해 더 알고 싶으신가요?

460만+ 법률 자료에서 관련 판례·법령·해석례를 찾아 답변합니다

AI 상담 시작