법률상담 도메인의 자연어이해 모델 학습을 위한언어자원 구축 방법론
A Methodology for Building Linguistic Resources for Natural Language Understanding Model Training in a Legal Counseling Domain
황창회(한국외국어대학교); 남지순(한국외국어대학교)
29권 4호, 181~212쪽
초록
This study proposes a methodology for constructing linguistic resources to train Natural Language Understanding (NLU) models for the legal counseling service. A dataset based on the language resources we propose is essential for developing non-face-to-face legal services that provide information related to legal problems. The linguistic resources were constructed through a bottom-up analysis of linguistic patterns of legal expressions, background descriptions, and discourse types in online legal counseling texts. Moreover, we analyzed the hierarchical classification of keywords in existing legal service systems and newly determined 20 keywords that belong to 4 representative legal categories. Local Grammar Graphs (LGGs), effective in describing local linguistic phenomena, were adopted to describe various linguistic patterns in this domain. These local language patterns, modularized in LGG format, are converted into Finite State Transducers (FSTs) and generate datasets required for training a language model for NLU. To evaluate this processing, we trained an NLU model of the open-source chatbot architecture Rasa with our dataset. The model performance shows a 0.91 f1-score, which affirms that the linguistic resources and the methodology proposed in this study can be practically applied in developing legal counseling chatbot systems.
Abstract
This study proposes a methodology for constructing linguistic resources to train Natural Language Understanding (NLU) models for the legal counseling service. A dataset based on the language resources we propose is essential for developing non-face-to-face legal services that provide information related to legal problems. The linguistic resources were constructed through a bottom-up analysis of linguistic patterns of legal expressions, background descriptions, and discourse types in online legal counseling texts. Moreover, we analyzed the hierarchical classification of keywords in existing legal service systems and newly determined 20 keywords that belong to 4 representative legal categories. Local Grammar Graphs (LGGs), effective in describing local linguistic phenomena, were adopted to describe various linguistic patterns in this domain. These local language patterns, modularized in LGG format, are converted into Finite State Transducers (FSTs) and generate datasets required for training a language model for NLU. To evaluate this processing, we trained an NLU model of the open-source chatbot architecture Rasa with our dataset. The model performance shows a 0.91 f1-score, which affirms that the linguistic resources and the methodology proposed in this study can be practically applied in developing legal counseling chatbot systems.
- 발행기관:
- 한국언어과학회
- 분류:
- 언어학