Translation Technology Simulations

Overall Intro

I and my teammates Aleria Amaral, Naomi Stock, and Albert Phan undertook two simulations in regard to translation technology. In this blog post, I will outline the both and describe the processes and challenges that we faced, as well as overall takeaways.

As these are simulations, we refer to real businesses and websites, as well as our own made-up creation for our group. Please understand that we have no official relation to any of these and used them only for these exercises.

Part 1

Neural Machine Translation Engine Training Proposal

Initially, we created a project proposal for Neural Machine Translation (NMT) engine training. Our objectives were to take an NMT engine and improve it with data sets so that, in practice, post-edited machine translation (PEMT) would be: 40% more efficient than human translation; acquire a 40% savings of cost compared to human translation; and stay within an acceptable quality score for PEMT.

You can see our initial proposal just below, and there is also a download link for the PDF version.

Our project was based off the idea that we were working to improve the Japanese to English machine translations for the “American-English division” of the Kyoto Tourism Federation. You can find the bilingual corpora of Wikipedia’s Kyoto Articles that we used here.

Tools and Process

To train the NMT engine, we used Microsoft Custom Translator to put together our tuning, testing, and training data. We also took articles from MATCHA , a Japanese tourism website, to align in Trados/memoQ and use for testing and tuning.

A major process was cleaning our corpora of XML of tags and extra segments through use of Notepad++. This took over five days to clean the fourteen thousand documents. While this was a huge drain on our time and resources, we found that the cleaner our training data, the better our engine performed in Microsoft Custom Translator.

To measure the quality of our PEMT, we revised the Multidimensional Quality Metrics (MQM) Framework slightly for our purposes. We reviewed our testing data for accuracy and fluency and set the score on a 0-100% scale, with 82% being a passing grade. The types of errors we encountered were divided into Minor (1 or 2 points), Major (5 points), and Critical (9 points); we created a set of examples to model what error counted as which.

These tools are detailed more in our updated proposal below.

Updated Proposal and Lessons Learned

You can view our updated proposal (and its download link) below, as well as our video detailing our lessons learned.

Part 2

Translation Management System Proposal

For this simulation, our objective was to advise a company of what translation management system (TMS) to purchase. To that end, we chose a company, Daikin Industries, and two popular TMSs to compare: Trados SDL Worldserver and XTM International.

Process

We outlined our process as thus:

  1. Identify ten key business requirements for the TMS
  2. Weigh these requirements in a scorecard
  3. Research each TMS and score them against the requirements
  4. Summarize the process and present our recommendation

Findings and Proposal

We created documents for our key business requirements, scorecard, and project charter, which you can see and download PDFs of below.

We also compiled our findings and proposal into a video:

Overall Challenges and Takeaways

What seemed to hinder us most was our initial inexperience with these mediums with both projects. While many of our ideas on how to continue were “correct” and logical, we often underestimated how much time and effort would have to go into each step of the process. However, at the end of these two simulations, we have all come forward with much more experience than we did in the beginning.

Another hinderance we faced was lack of resources. As this was a simulation, we lacked proper funding to properly try out our proposed TMSs (Microsoft Custom Translator includes a free trial for a certain amount of words); we had to rely a lot on general research. We estimate that we would have more funding and resources in a proper business setting.

Do you have any questions or comments about our simulations? Please feel free to reach out. We’d love to hear from you.