MT Customization – Ziqi Zhou

If 2009-2016 was the era of cloud and integration for language industry, year 2017 marked the burst of machine learning. The widely applied machine learning technology in the language field – machine translation (MT) – has penetrated into every corner of language service. Nowadays, however, not only language service providers are branding themselves as AI/MT solution provider, tech companies are leading the way of MT development.

There are more MT vendors in the field: Google, Microsoft, Amazon, DeepL, Alibaba, Tencent, etc. MT engines are easier to train, users can customize their own MT engines using tools like AutoML Translation from Google or Microsoft Translator Hub. MT aggregators like Inten.to, translate4eu, are emerging at a rapid speed. And we also witnessed the born and growth of LPSs like Unbabel and Lilt, with MT as its DNA.

Powered by technology growth, we can easily find MT use cases across such fields as eCommerce, travel, international laws, technical support, automated subtitling, new drug releases, scientific article releases, financial news and disclosure, etc.

In this blogpost, I’m going to give an overview of some big players in MT industry, present some examples of how MT can be applied to efficiently improve business profit, and consider how we can manage MT projects from the perspective of a project manager.

An Overview of MT Engines with Big Names

An Overview of MT Engines with Big Names

(Source: Intento. *There are more MT engines which are not listed here in the market.)

Among these MT engines, some of them are stock engines for deployment and leverage, some are customizable stock engines which allows users to customize their NMT for a specific domain using relevant corpora, others are customizable NMT engines which are built from users’ own corpora for their specific projects.

Given so many MT engines to choose from, you may be wondering which one is the best. Actually, there is NO BEST MT engine out there. Base on your own budget, language pair, turnaround time, and client needs, the standard for the “best” varies. To make this point more intuitive, let’s compare some of these engines in terms of price (pre-built engines as an example), language pair (Chinese as an example) and time needed to train or deploy.

Price

Cost of Different MT Engines (in USD/Million)

(Source: Inten.to)

Above chart shows the big variety of monthly price for translating with different MT engines according to statistics from Intento. Please note, price is per character sent to the API for processing, including whitespace characters. These engines also provide free translation options with limited time frame or total number of characters, but the chart only shows their regular pricing. Some engines, for example Google and Amazon, charge a universal price per million characters, others like Microsoft and Alibaba, charge differently based on the volume of characters.

Language Pair (Chinese as an example)

Before hiring any MT engine, one thing to keep in mind is that there is no best engine for all the language pairs. Different engines perform differently for different languages. Take Chinese as an example, guess which engine translate best? Google? Microsoft? Or Baidu? Tencent?

Wait, before you spit out the answer, ask again: “which language to Chinese?” or, “Chinese to which language?” Now you see the tricky part.

Each year, the Conference on Machine Translation (WMT) gives out training data for shared tasks and announces the winner of MT for each language pair. In 2018, Mr TranslatorTencent tops the list of Chinese to English news test with the highest BLEU score while GTCOM gains the first place in the Chinese to English translation.

Ranking of Systems for ZH > EN Translation

(Source: http://matrix.statmt.org/matrix/systems_list/1892?metric_id=4)

Ranking of Systems for EN>ZH Translation

(Source: http://matrix.statmt.org/matrix/systems_list/1893)

Yes, language pair matters. To give you a better idea, here are some best systems for news test 2018.

Best Systems for Different Language Pairs

(Source: http://matrix.statmt.org/?metric%5Bid%5D=5&mode=bestn)

Although Tencent’s MT engine seems to be best at handling ZH>EN translation tasks, it’s unadvisable to employ it right away. At least, we need to take into account the translation domain that you’re working on. The above table shows Tencent get the highest BLEU score in the news domain, what if you’re translating in another domain? Or is the BLEU score really reliable? Additionally, Tencent’s MT engine is NOT customizable. What if you want to tune your own MT engine, which MT engine should be used?

Time for Training and Deployment

For those who want to customize an NMT, time and cost devoted for training and deploying the engine should be considered when rolling out an MT solution.

Time to train custom NMT varies across these providers and some charge for training (Microsoft, Google) and some (IBM, Modern MT) don’t.

Time Needed for Training

(Source: Intento)

How MT Can Be Leveraged on the Enterprise Level

Talking about MT, even some translators/interpreters in the language industry would first think of throwing some source content into the web-based MT engine an copying its output back to desktop or any scenario where AI robots would probably (or not) take over the seat of an interpreter. However, when it comes to the enterprise level, MT shows great potential in minimizing the cost, international market expansion, and reshaping the workflow of translation/localization projects.

For example, cross-border eCommerce companies like eBay relies heavily on MT given the large volume of daily produced contents globally. It is just impossible to hire translators to translate all of them. With MT applied, a user can type in the name of an item in his/her own language and be given the result with names of goods from another country, although may be originally in a foreign language, in the local language.

Language service providers (LSP) are also hiring MT to lower the cost and improve the efficiency. Moreover, LSPs born with MT at its core emerges and are growing at a rapid speed. Take Unbabel as an example, this LSP uses MT for translation and contracts vendors for translating with both the original text and suggested translations produced by MT. They also hire freelance linguists for translation assessments, glossaries and thesaurus reviews, mainly working with language data. In its last funding series, Unbabel was backed by 7 investors with 23M USD raised.

Funding Rounds Info of Unbabel

(Source: Crunchbase)

Who are the Localization Project Managers in the MT Era?

Driven by advances in artificial intelligence, machine translation and emerging adaptive translation solutions continue to improve rapidly and are able to meet increasingly complex requirements. Anyone involved in language business must re-examine where and how these newer options can be leveraged. As a localization majored student who wishes to become a machine translation solution architect, I want to end this article by discussing some new requirements for project managers who want to leverage the power of machine translation (or MT PMs).

We need to keep in mind that the workflow of MT projects will be different from traditional projects, depending on conditions like MT resources, language resources, and client needs. Before a project is launched, scoping the work comes first: How many words need to be translated? Usually, they come in much larger volumes, like millions of words, than traditional translation projects. Will DTP, MT training, human editing be included? Which MT engine to employ? What’s the evaluation metrics for QA?

When it comes to quoting, can we simply quote by number of words? As we are not sure about the quality of MT’s initial output, we may not be able to know the time needed for post editing. Also, as the project goes, new discoveries or issues may appear, making it even uncertain for sending the quote to the client at first.

The people to work with even changed when it comes to MT projects. In the past, PM usually work with the account manager, engineer, and vendor manager but in a MT project team, we may need a machine translation solution architect who has expertise in the process of an MT project and is familiar with MT tools, so he/she can pick out (or train) the best one for this specific project. Engineers in MT projects may want to invest more time in connecting MT resources with CAT tools or Command Line in some cases. Also, different from traditional projects, MT projects will need more reliable post-editors if the bulky translation is done by machine. These post-editors should be experienced linguists who are familiar with the common error patterns of MT and know how to stick to the style guide.

Given the above possible changes, the PM should have both a relatively strong technical background, communication, organization and coordination skills, as an effective project plan needs the cooperation of so many parties. He/she must make sure that each person in the team can access the needed resource at the right time or step. Once needed, he/she should also be able to take over some technical configuration related to machine translation.

(This blogpost is inspired by Renato Beninatto’s lecture at IMUG on September 20, 2018. Special thanks to Inten.to!)

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tag: MT Customization

Embrace MT for a Zero-Barrier Future: MT Engines, MT’s Application, MTPMs for the future.

Recent Posts

Meet me on social media!

My Blog Post Calendar

About my program

Embrace MT for a Zero-Barrier Future: MT Engines, MT’s Application, MTPMs for the future.

Recent Posts

Meet me on social media!

My Blog Post Calendar

About my program

Tags