1. Introduction:
    • In this semester, we built on on what we had learned about CAT tools in the Intro to CAT class and explored some advanced Trados features. The major projects we worked on include training Microsoft Translator hub (SMT) and Microsoft Custom Translator (NMT), using Regex (regular expression) to create QA filters in Trados, and creating a training video on a tool that can be useful in localization workflows. In this post, I will present the result of these projects and discuss my biggest takeaways from each of them.
  2. NMT Training Project Using Microsoft CustomTranslator
    • This is absolutely my favorite projects out of all. Our group Team Machine trained a NMT model to translate AI related TED talks from English to simplified Chinese. I learned so much from this hands-on experience of training an neural machine translation engine. I now know how to:
      • extract multlingual srt files from the TED talks on Youtube with this useful website
      • choose from several alignment solutions, including Trados, MemoQ, TMX Mall Alignment (paid per use, intuitive user interface), and Wordfast Aligner (free, fully automatic, but must work with 100% texts with identical numbers of segments)
      • clean up files in word using advanced find and replace feature in MS Word
      • decide what materials should be used in training tuning, or testing:
        1. Training: general materials of the same but broader domains and from a larger time frame
        2. Tuning: specific and recent materials of the exact same domains
        3. Testing: : specific and recent materials of the exact same domains. There should only be one single set of data to be compared against with through out the entire training process.
      • train NMT in Microsoft Custom Translator (fter 11 rounds of training, I have been quite familiar with the process.)
      • evaluate MT outputs with the quality evaluation model with weights and threshold we established
      • streamline the post editing and evaluation process with a specially designed Google sheet
      • root cause analyze the quality issues
    • Project Deliverables (Link to Google Drive):
      1. Original Pilot Proposal
      2. Updated Proposal
      3. Presentation on Lessons Learned
  3. Utility Demo/Training Video
    • I also recorded a demo for XnView, a powerful photo view, organizer, and converter, with Zoom and a magnifier tool called Zoomit. XnView is especially helpful for localization project managers and DTP specialists because we often have to manage a large number of images and multimedia files like those in user manuals, brochures, product catalogues, and commercial videos. I navigated its interface and demonstrated some of its most useful features in the video.
    • Project Deliverable (Link to Google Drive):
      1. video
  4. Tips for Trados QA (using Regex)
    • We have learned the basics of Regex and applied it not only to the QA setting in Trades, but also to Olifant and XBench (to clean up our tuning data for the NMT training project). For this small Regex project, we analyzed the specific punctuation and stylistic issues in Chinese and established QA checker rules to catch these issues.
    • Project Deliverable:
      • Rules:
        1. 4 consecutive full-width punctuation marks in Chinese: \W\W\W\W
          • In Chinese, we use full-width punctuation marks, and they should be no more than three in a row, e.g. ……)
        2. Chinese date format should be YYYY/MM/DD: \d\d(/|.)\d\d(/|.)\d\d\d\d
          • We write our dates with year at the beginning 2019/04/19/.
        3. There should be no double space before full-width punctuation marks: \s\s\W
          • Full-width punctuation marks should be with words without any space in between, e.g. 下雨了。
        4. There should be no half-width quotation marks ‘ ’  “ ” ‘ ‘  ” “: [‘’“””””]
          • If we have English in target text, we use our full-width punctuation mark to modify it.
        5. There should be no space between full-width punctuation marks: \W\s\W
          • We use our full-width punctuation in a row, e.g !)
          • Result
The rules capture all the errors successfully!