From the advanced CAT class, I have learned more about how machine translation works after carrying out statistical machine translation (MT) training and MT quality evaluation. Based on the knowledge of SDL Trados, I have gained more insights into its filter configuration and QA (Quality Assurance) checker. Translation technology is developing rapidly. Getting the hang of varied tools is really helpful for localization-related projects.

SMT Training Project Using Microsoft Custom Translator

Microsoft Custom Translator enables us to build customized neural machine translation (NMT) systems. My team has completed a news commentaries SMT training project with a parallel corpus as our training data and news commentaries excerpted from the New York Times as tuning and testing data. We did a total of 42 training rounds and tested the English to Chinese translation of 13 news commentaries. During the process, we cleaned up training data using Okapi Olifant and compared the quality of tuning data to achieve a better BLEU (Bilingual Evaluation Understudy) score, which measures the differences between a machine-generated automatic translation and one or more human-created reference translations of the same source sentence.

After downloading the machine translation results, we did post-editing and QA check to calculate translation time and cost savings and evaluate the translation quality. In terms of the return on investment, machine translation is not recommended for translating English news commentaries into Chinese.

From the entire neural machine translation training process, I have learned how Microsoft Custom Translator works. One or more parallel documents with a minimum of 10,000 sentences needed to be uploaded to start training and a minimum of 500 tuning sentences or system selected tuning set should be selected. Customer Translator’s sentence alignment may not be able to extract and use all sentences successfully. If the model has been trained on a narrow domain, and training data is consistent with testing data, a high BLEU score can be expected.

According to Microsoft Translator Hub User Guide, while picking the tuning set manually, choose not too long and not too short sentences. In practice, a sentence length of 8 to 18 words will produce optimal results. Before starting machine training, it would have been better for my team to research more about the mechanism of Customer Translator to improve the tuning data set.

Having a comprehensive understanding of Microsoft Custom Translator in the first place does great help to nail down the whole process and analyze different results produced from each model.

Deliverables

Tips for Trados QA (Regular Expressions)

Regular expressions (RegEx)  is helpful to identify potential translation errors. To translate from English to Chinese, there are simple rules we can apply.

SDL Trados Studio QA Checker has the option to use RegEx to identify language patterns that may be quality issues. Steps.
Project Settings > Verification > QA Checker 3.0 > Regular Expressions

Using RegEx in Trados QA Check
Using RegEx in Trados QA Check
  • No four consecutive full-width punctuation marks in Chinese.
    • RegEx to check: \W\W\W\W
  • No space before full-width punctuation marks in Chinese.
    • RegEx to check: \s\W
  • No space between full-width punctuation marks in Chinese.
    • RegEx to check: \W\s\W
  • Space between a number/ letter and a Chinese character, and vice versa.
    • RegEx to check: ([^\w\s]\w|\w[^\w\s])

Find out more about RegEx here.

Localization Utility: XnView

XnView is a useful tool for you to organize and view images, decide which images need to be translated or generated again, and do batch convert. The following video shows all the basics you have to know.

XnView Demo