Design and implementation of Morpheme Based Bi-Directional Machine Translation The Case of Ge’ez to Tigrigna
ABSTRACT
Both Ge‟ez and Tigrigna languages, which are the native Ethiopian languages, are morphologically rich and complex for bi-directional machine translation. To overcome this machine translation problem, this study explored the effect of a morpheme-based translation unit for bidirectional Ge‟ez and Tigrigna languages. The corpus was taken from Ten Bible Books that contained 384 that contained 9189 verses. The corpus was used both for developing the pre-trained model and for validation. Accordingly, to train the morfessor, 12173 simple Ge‟ez, and 16708 Tigrigna words were taken from SQLite database. Explicitly, from the total of 7290 verses data, 80%, that is 7290 Verses were used to develop the pre-trained model and 20% which is 1899 Verses were used for testing or validation purposes. we used Mosses for the translation process, MGIZA++ for the alignment of words and morphemes, morfessor and IRSTLM techniques for the language modeling. After preparing and designing the prototype and the corpus, different experiments were conducted. The BLUE score which is standard for automatic machine translation evaluation was used to measure how much of the system output is correct. Experimental results showed a better performance of 9.23% and 8.67% BLEU scores using morpheme-based from Geez to Tigrigna and from Tigrigna to Geez translation, respectively. That is, it was found that the model or the system output was correct. Regarding the BLUE metrics evaluation tool, it was also found to show proper validation scores or results. As to the alignment challenges, many-to-many alignment is the major challenge. Hence, there is a need to conduct further research to handle the issue of the many-to-many alignment challenge.
Design and implementation of Morpheme Based Bi-Directional Machine Translation The Case of Ge’ez to Tigrigna, GET MORE COMPUTER SCIENCE PROJECT TOPICS AND MATERIALS