Author Archives: Amelia Wolford

Future of Translation Technology

 

If I were in charge of making a substantial financial investment in translation technology, I would invest in Statistical Machine Translation (SMT). While this may seem like a boring investment because it is not an unbelievable, and ultra-futuristic sounding technology, it is the most viable investment. The fact is that machine translation is still a rather primitive tool, and it cannot beat the quality of human translators. That is not to say that SMT should be abandoned. On the contrary, it is here to stay, and it will inevitably improve, and  that is why I believe that SMT would be the smartest translation technology to invest in.

Of course there are draw backs to data-driven machine translation. In the famous debate between Daniel Marcu and Alan Melby, these drawbacks are discussed. I find myself siding with Daniel Marcu’s defense of MT. He argues that the machine does not necessarily have to meet the same requirements as a human translator (i.e. the machine does not have to understand the source or target text); however, he also notes that there is no way of knowing if the machine “understands.” His statement, “An airplane doesn’t bat its wings to fly,” perfectly demonstrates how a SMT can produce worthwhile translations without having to meet the same requirements as humans1.

I also feel SMT particularly needs more human involvement. What I mean is that when you invest in a SMT, you are not just investing in a technology. You are also investing in people (linguists and translators) who will become valuable resources as the SMT is trained and as it reaches specific quality goals that are set in advance. While massive data collection will help SMT improve in quality as time goes on, human involvement is needed at various points to ensure quality and make adjustments to the system as needed.

  1. Marcu, Daniel; Melby, Alan “Data Driven Machine Translation: a conversation with linguistics and translation studies” AMTA. 2016. http://www.ttt.org/amta/AMTA2006wMarcuCommentsV1c.pdf

 

Introduction

This blog contains the body of work that I, Amelia Wolford, have completed in the Middlebury Institute of International Studies course on Advanced Computer Translation.  During this course I, along with a group of three other students, trained a statistical machine translation engine (SMT) using Microsoft Translator Hub. The documents used for translation in this project were taken from General Electric, and the subject matter was medical devices. Working from English into Russian, we used both bi-directional and monolingual documents to train the system. We made only one adjustment during each round of testing in order to determine which methods were working the best; these methods mainly involved changing the original PDFs to various formats, and aligning the documents using different CAT tools. While this was only a pilot project, our group developed a proposal for full scale training of an SMT, and this proposal (along with the original) is enclosed. This project gave me real life experience using the scientific method to train an SMT and helped me to determine what methods I would use if asked to take on such a project by a client. Moreover, this gave me experience in working with a team and appropriately managing tasks and time as a team. The supporting aterials for this project (original proposal, updated proposal, and presentation slides) can be found in subsequent posts.

This blog also contains an XML filter assignment that was completed individually. The assignment is described below. The link to the project template can be found in subsequent posts. One of the pseudo-translated pages that resulted from this assignment can be found on this blog as well; it is the only page titled with Chinese characters.

Assignment Description

The makers of the website pseudol10n.wordpress.com need you to translate the website to Simplified Chinese (PRC). Before you do that, they want you to confirm you can create the appropriate filter to handle the site.

Tasks

  1. Create a Trados Project Template that is able to do the following:
    • Filter the XML file “websitelocalization.wordpress.2015-09-22.xml” so all appropriate content is translated and other text/code is not translated
    • Pseudo translate the file “websitelocalization.wordpress.2015-09-22.xml”
  2. Create an empty WordPress website and export all the content from the same WordPress site you created to demonstrate that you know how to export from WordPress
  3. Import the pseudo translated XML file “websitelocalization.wordpress.2015-09-22.xml”  into your new WordPress website to demonstrate that your Project Template works