Adv CAT: QA Model, Pseudo Translation, Machine Translation

There are several main focuses of the source Advanced CAT. One is to use Trados to congifure the filter settings of XML files for pseudo localization, the other one is training a machine translation engine. I would say without hesitation that MT training is one of the most strenuous projects I have done!

Here is a brief summary of what I have gained in this course:

Pseudo localization with Trados:

I learned to set filter configurations for XML files, excluding unnecessary contents such as hyperlinks, and including encoded contents, image descriptions, titles and bodies. To do that, we need to download the XML file from wordpress and import it to Trados, then add the tags and elements to exclude. Finally, don’t forget to choose to include embedded content, as this is easily overlooked. I used WordPress for practice as a testing website. Click here to view the website with pseudo translation uploaded.

Quality Assurance models:

The model we explored the most is Multidimensional Quality Metrics (MQM). I’ve learned the structure and concepts behind this model, and the flexible and adaptable characteristic of the metrics. Click here to view an article written by me and know more about MQM and its customizable features. Other focuses of this theme include Taus Dynamic Quality Framework (DQF) and  LISA QA Metric developed by Localization Industry Standards Association.

These QA metrics can be integrated into the QA functions of CAT tools or can be used to develop a QA scroring system, such as MQM Scorecard. On this platform, you can tag the target or source segments with MQM error types, calculate the scores and generate quality reports.

Tools to help quality check:

Okapi Olifant: using this tool, you can customize filters to show translations that met the rules. For example, you can view target segments that are the same with their source, or use the operations options to build your own rule. Below is a screenshot of the app interface:

 

Okapi Checkmate: This is a tool for customizing QA model and performing checks on the target texts. It provides a concise interface showing the flagged issues, as the image below can show.

 

Utilities for localization: 

I also did a video introduction about Greenshot, a light, handy and convenient screenshot tool that can be helpful for your daily work. It is configured with the most used apps like MS Office Suite and can be linked with Adobe Creative Suite, which promotes the convenience of editing and sharing threenshots. Click here to view my introduction and comments on this tool!

Machine Translation Engine Training:

In this course, I learned about three types of machine translation, and they are rule-based machine translation, statistical machine translation (also called phrase-based MT), and nueral machine translation.

The most rewarding thing I experienced is Machine Translation Engine Training. It is a really exciting project and also very challenging. The following texts of this post will introduce you the project I did with my teammates and the lessons we gained from this experience. I have also published a detailed introduction on the lessons learned from MT pilot project.

For translation machine training, we were asked to use Microsoft Translator Hub to do a test project. I initiated the topic of patent law translation, because it is easier to find open source data in this field and me and my teammates are all farmiliar with patent contents.

We proudly conducted a pilot project for MT training and we achieved a satisfying result of translation using the MT engine we deployed. To train the engine, we need to gather a huge number of data to feed it, and test the quality with human translators. Below is the workflow of our pilot project:

I played different roles in this project, as the project manager, engineer and reviewer. I initiated the topic and won the support of my teammates, evalutated the progress and made adjustments to meet our goal, and I did cost and time analysis. In addition, I was also the reviewer, who reviewed the translated strings and grade them based on the QA model of MQM.

From this project, I learned the importance of doing a pilot project when the picture is not clear, and how to apply the data from pilot project and predict the cost and time of the whole project. Below is the proposals of the pilot project and updated proposal based on our practice. You can also find the presentation on the lessons we learned from this try out project.

Original Proposal

Update Proposal 

Lessons Learned

 

Sites DOT MIISThe Middlebury Institute site network.