SMT Project

Statistical Machine Translation (SMT) engines are used to allow translators to translate more content and accelerate the delivery time. An engine is trained by feeding lots of data including, but not limited to, parallel texts, monolingual texts, translation memories, and glossaries. 

In this project, my colleagues and I trained an SMT engine on Microsoft Translator Hub specifically for translating Taiwanese laws into English. We spent approximately two weeks in training the engine and the results were surprisingly adequate for such complex content.

In the pilot proposal, we set a clear goal on how much time and money a trained SMT engine could save. In addition, we described the timeline of the pilot and the steps we would take to train the engine. Lastly, we estimated an approximate quote of the pilot project.

In the updated proposal, we factored in the actual training and updated our objectives and timeline. We also suggested steps to take to fully train an SMT engine as well as the approximate pricing for a fully trained engine.

We then presented the results of the pilot project, which can be viewed on our presentation slides.

Last but not least, we offer a look into another SMT engine training platform and compared it with the Microsoft Translator Hub. Our analysis and conclusion can be viewed in the Comparing KantanMT vs Microsoft Translator Hub post.