Enhancing Korean–Chinese Legal Translation in Low-Resource Scenarios Using Back Translation and Transfer Learning
Enhancing Korean–Chinese Legal Translation in Low-Resource Scenarios Using Back Translation and Transfer Learning
장아남(영산대학교 컴퓨터정보공학과); 소길자(영산대학교)
6권 3호, 132~144쪽
초록
Legal translation between Korean and Chinese faces significant challenges due to complex legal terminology, distinct linguistic structures, and the scarcity of high-quality bilingual corpora. This study proposes an approach to improve neural legal translation in low-resource scenarios by integrating back translation-based data augmentation with transfer learning. Specifically, the multilingual pre-trained mBART model is fine-tuned in two stages: initial fine-tuning with authentic Korean–Chinese legal parallel data, followed by enhanced fine-tuning using pseudo-parallel data generated through back translation and enriched with legal terminology annotations. Experiments on domain-specific datasets demonstrate substantial improvements over baseline Transformer and fine-tuned mBART models, achieving a BLEU score of 34.5 and a TER of 0.42. Human evaluation by bilingual legal experts further confirms enhanced fluency, adequacy, and legal consistency. This work not only advances Korean–Chinese legal neural machine translation in low-resource contexts but also discusses legal implications, including accountability, compliance, and the potential of blockchain for translation traceability. The proposed framework provides a practical foundation for developing reliable AI-assisted legal translation systems.
Abstract
Legal translation between Korean and Chinese faces significant challenges due to complex legal terminology, distinct linguistic structures, and the scarcity of high-quality bilingual corpora. This study proposes an approach to improve neural legal translation in low-resource scenarios by integrating back translation-based data augmentation with transfer learning. Specifically, the multilingual pre-trained mBART model is fine-tuned in two stages: initial fine-tuning with authentic Korean–Chinese legal parallel data, followed by enhanced fine-tuning using pseudo-parallel data generated through back translation and enriched with legal terminology annotations. Experiments on domain-specific datasets demonstrate substantial improvements over baseline Transformer and fine-tuned mBART models, achieving a BLEU score of 34.5 and a TER of 0.42. Human evaluation by bilingual legal experts further confirms enhanced fluency, adequacy, and legal consistency. This work not only advances Korean–Chinese legal neural machine translation in low-resource contexts but also discusses legal implications, including accountability, compliance, and the potential of blockchain for translation traceability. The proposed framework provides a practical foundation for developing reliable AI-assisted legal translation systems.
- 발행기관:
- 한국인공지능교육학회
- 분류:
- 교육학