KLI

Enhancing the performance of Vietnamese-Korean Neural Machine Translation using Contextual Embedding

Metadata Downloads
Abstract
Since deep learning was introduced, a series of achievements have been published in the field of automatic machine translation (MT). However, Vietnamese-Korean MT systems face many challenges because of a lack of data. In this research, we built the open extensive Vietnamese-Korean parallel corpora for training MT models consisting of over 412 thousand sentence pairs.
Besides, the problem of multiple meanings of words depending on their contexts leads to difficulty to understand the meaning of the corpus for MT. This dissertation discusses a method of applying a linguistic annotation named Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Vietnamese-Korean MT systems. The experimental results indicate that tagging POS in Vietnamese sentences can improve the quality of Vietnamese-Korean Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. After applying POS tagging to the Vietnamese corpus, our Vietnamese-Korean MT system improved by 1.07 BLEU points and 2.96 TER scores, respectively.
In addition, in recent years, a state-of-the-art context-based embedding model called BERT introduced by Google has appeared in the MT models in different ways to boost the accuracy of MT systems. The BERT model for Vietnamese has been built up and significantly improved in natural language processing (NLP) tasks such as POS, NER, dependency parsing, and natural language inference. This dissertation discusses a method for applying POS tagging that is also developed based on the BERT model to Vietnamese sentences to improve the performance of Vietnamese-Korean MT systems. Moreover, our research experiment injected the Vietnamese BERT into the NMT model where the BERT model for Vietnamese is concurrently connected to both encoder layers and decoder layers in the NMT model. MT results show that using the contextual embedding model significantly enhances the performance of Vietnamese-Korean MT by 2.78 BLEU points at the sentence-level and 3.01 BLEU points at the document-level, respectively.
Author(s)
부 반 하이
Issued Date
2022
Awarded Date
2022-08
Type
dissertation
URI
https://oak.ulsan.ac.kr/handle/2021.oak/10039
http://ulsan.dcollection.net/common/orgView/200000632938
Affiliation
울산대학교
Department
일반대학원 전기전자컴퓨터공학과
Advisor
옥철영
Degree
Doctor
Publisher
울산대학교 일반대학원 전기전자컴퓨터공학과
Language
eng
Rights
울산대학교 논문은 저작권에 의해 보호 받습니다.
Appears in Collections:
Computer Engineering & Information Technology > 2. Theses (Ph.D)
공개 및 라이선스
  • 공개 구분공개
파일 목록

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.