기계번역의 특허 전문용어 한영번역에서 나타나는 오류 양상 분석 – 구글 번역 결과를 중심으로
Analysis of Korean-English Patent Terminology Translation Errors in Neural Machine Translation — A Case Study Based on Google Translation Results.
최효은(이화여자대학교)
27권 1호, 217~244쪽
초록
Terminology translation plays a key role in technical machine translation. This paper conducts quantitative and qualitative evaluation and presents a comprehensive error analysis on terminology translation from Korean to English. For this study, 637 terms are selected from a patent domain corpus. Among them, 425 terms are translated correctly while other 212 terms have one or more errors. Those identified errors are classified into seven error categories which include addition, omission, lexical errors, order errors, morphological errors, partial errors, and miscellanies. Among them, partial errors and morphological errors are the two top error categories, and lexical errors and omission cases are also high. The study also presents n-gram wise error rates. The calculation of n-gram wise terminology translation error rates shows that the curve for the Google Translation results goes upwards with the increasing size of n. This shows that neural machine translation can be vulnerable in translating a higher-order of n-gram terms. This study has implications for Korean-English machine translation research, in that it classifies error cases spotted in machine translation of terminology which is critical in securing accuracy of translations.
Abstract
Terminology translation plays a key role in technical machine translation. This paper conducts quantitative and qualitative evaluation and presents a comprehensive error analysis on terminology translation from Korean to English. For this study, 637 terms are selected from a patent domain corpus. Among them, 425 terms are translated correctly while other 212 terms have one or more errors. Those identified errors are classified into seven error categories which include addition, omission, lexical errors, order errors, morphological errors, partial errors, and miscellanies. Among them, partial errors and morphological errors are the two top error categories, and lexical errors and omission cases are also high. The study also presents n-gram wise error rates. The calculation of n-gram wise terminology translation error rates shows that the curve for the Google Translation results goes upwards with the increasing size of n. This shows that neural machine translation can be vulnerable in translating a higher-order of n-gram terms. This study has implications for Korean-English machine translation research, in that it classifies error cases spotted in machine translation of terminology which is critical in securing accuracy of translations.
- 발행기관:
- 한국언어연구학회
- 분류:
- 언어학