Machine Translation Customization Project

This Spring, my classmates and I worked on a hands-on machine translation customization project. We learned how to train an MT engine according to the best industry practices.

For this project, my group and I used Microsoft Custom Translator, which allows us to build customized neural machine translation (NMT) systems. This project was originally intended to estimate the amount of time and costs required to train a statistical machine translation engine designed to translate ocean-related conservation reports.

After our client meeting, at which we discussed how we needed to adjust the way we planned to use the data we had found, that intention shifted to estimating the time and
costs required to train a statistical machine translation engine designed to translate UN reports about the Sustainable Development Goals.

The main components of our process were:
– Data research
– Data alignment
– Troubleshooting Microsoft Custom Translator
– Data cleaning
– More data research
– More data cleaning
– File conversions
– Data substitution
– Switching data between Tuning and Training
– Glossary creation for official translations of committees, publications, etc

You can find the deliverables of our MT Customization Project below:

Proposal
Lessons Learned
Updated Proposal