This semester in Advanced CAT, we learned many advanced technologies in CAT and MT, including Microsoft Custom Translator, regular expression (Regex), and some utility tools.

SMT Training Project

The first technology we learned was Microsoft Custom Translator, which we used to train Statistical Machine Translation (SMT) so that it could produce better translation results and save cost and time. So we started our first big project on SMT training.

Microsoft Custom Translator

Original Pilot Proposal

Before we started the project, we wanted to convince our client to adopt SMT. So we drew up a pilot project proposal. The pilot project aimed to estimate the work involved in training an SMT engine. We chose political speech as our
subject field, especially speeches given by President Xi Jinping. Our source content came from the official website of China Academy of Translation which provides the Chinese government’s speeches in both Chinese and English.

In the proposal, we estimate what it will take to meet Post-Edited Machine Translation (PEMT) goals, including efficiency, cost, and quality goals. We lay out the data we’ll use to train the engine, including training data, tuning data, and testing data. We come up with the process we’ll go through to achieve those goals. We set up the timeline with specific dates of each stage. We estimate the time and cost for each task. And we list the deliverables we’ll make at the end of the project.

Updated Proposal

After we went throught the pilot project, we got the actual results from our training, and we needed to update the original proposal. We used a sample from the MT results and compared it with the human translated version we pulled from the website. It turned out that the two were pretty close. There were some cases where the MT didn’t do a good job and they were typically numbers, proper nouns, verb tenses, capitalization, and plurality.

Based on the comparison, we were able to calculate the actual efficiency, cost, and quality results, and compare them to the original goals. We revised the original objective according to the calculation and updated the costs. We also gave our recommendation based on the outcome of the project. At the end of the proposal, we attached the MT sample and the detailed training rounds.

Lessons Learned

After finishing this project, we gave a presentation on what we learned from doing this project. There were some things we did right and also some problems we encountered. All in all, we met our goals and according to the training results, implementing this SMT engine in government documents like political speeches could save a lot of time and cost.

Based on the training results, we summarized what the SMT engine is good at translating and what it’s not. And finally, we made our recommendation.

Regular Expression

Later this semester, we learned some basic Regex. We used it to impelemt some QA settings in Trados. First, we experimented on how to use it on regex101.com. We learned how to use it to find some simple expressions like date, number, letter, and email address. Then we incorporated what we learned into Trados where we added some rules in the QA checker.

Trados QA Checker

After that, we each came up with some Regex specific to our language pair which we can use to check the quality of translation.

Utility Tools

At the end of this semester, we learned some useful utility tools. We each chose one that we were interested in and made a demo video about it where we talk about how to use it and how it can be useful in localization.