During my Spring 2019 semester at the Middlebury Institute of International Studies at Monterey, I took a course on advanced computer assisted translation (Advanced CAT). This course built upon the knowledge that we already had acquired during our Intro to CAT course the previous semester. A lot of focus was put on learning how to use or manipulate advanced features of many CAT tools like creating filter configuration for XML files and using regular expressions (regex) for quality assurance purposes.
We have also experimented with the Okapi suite like Okapi Olifant and Okapi Checkmate to maintain and clean up TMs and MT Corpora.

Language-specific Regex for Trados QA Checker

For this particular exercise, I had the chance to work with my teammates on coming up with some regular expressions to identify potential errors and verify translations in Chinese, a language I cannot read or speak. We used a handy online tool, Regex101, to test our expressions, and then we implemented them in SDL Trados QA Checker.

Here are some of the expressions we came up with:

Regex to verify date format in Chinese (year/month/day)
Regex to verify that parentheses have more space
around them in zh-CN and zh-TW
Regex to test quotation marks used in zh-CN
Regex implementation in SDL Trados QA Checker

SMT Engine Training

Microsoft Custom Translator

Our most challenging project by far was training a statistical machine translation (SMT) engine using Microsoft Custom Translator. The goal for this project was to design and implement a pilot project for a custom-trained SMT engine for translating the Annual Reports published by the European Court of Auditors (ECA) from English into French. This project involved many tasks, some of them were time consuming like document alignment and cleaning, especially that we were working mainly with PDFs. Other tasks involved file conversion, calculation of costs, post editing, manual quality evaluation (inspired by DQF and MQM error typology), and of course continuous troubleshooting and multiple iterations for each round of training. Our major key take-away from this project was the importance of risk management.  All our drawbacks combined were not accounted for in our timeline, and resulted in pushing us beyond our estimated time budget and the scale of the project greatly exceeded our initial expectations

To look at our project in detail, you can check out both our initial and updated proposal and presentation on lessons learned here.  

We also experimented a bit individually with Microsoft Translator Hub, the predecessor of Microsoft Custom Translator, before its retirement on May 17, 2019. I am grateful that I got the chance to see how training an SMT Engine looked like before Custom Translator came to existence.

Project Fluent

We also had a look at Project Fluent, an interactive software localization system developed by Mozilla. Project Fluent is designed to help translators produce natural-sounding translations by enabling them to use the entire expressive power of their language without asking developers for permission. Fluent makes it possible to cater to the grammar and style of many languages, independently of the source language, which in most cases is English.

For this exercise, I played around with Fluent to come up with customized rules for the gender and plural categories in Arabic. In addition to the masculine and feminine gender, Arabic has singular, plural and dual forms of pronouns, nouns, verbs, adjectives, etc. The dual form is used to refer to two people or two things. In the below rule, one line of code was included to account for the dual form. Also, in most cases, possessive pronouns in Arabic do not come in an isolate form. These pronouns are always attached to the objects they refer to, which is why the rule was customized, as shown below, by repeating the Arabic translation of the two words “comment” and “post” three times each with the corresponding pronoun attached to each instance.

Project Fluent Editor

Utility Training Video

One of the final mini projects was to record an instructional video for any utility tool of our choice that could help in a localization workflow. I chose to do mine on ClipMate— a clipboard manager that could potentially save hours of coping and pasting content everywhere. It also helps the user with relocating their saved files and folders by keeping clipboard history. You can check out my instructional video here.