Problems in quality evaluation: Human & Machine
We all know quality is an important aspect of translation. Every time when I was asked about my opinion towards a piece of translation work, all I could say was based on my personal feelings and impression, which means human evaluation of translation quality can be very subjective and inconsistent.
How about letting machine do the job? There are some existing methods to evaluate translation. For example, BLEU Scores is a metric for evaluating a generated sentence to a reference sentence. Instead of telling the translation work is good or bad, it can only tell how much similarity there is when comparing machine translation to human translation. It is possible that machine translation has done a decent job but got a low score if it used different expressions from the reference sentence.
MQM- Multidimensional Quality Metrics
MQM, Multidimensional Quality Metrics, is a translation quality assessment framework enables users to customize their own metrics for quality evaluation. It was developed to create shared quality metrics system for both human and machine translation and to improve automatic translation quality estimation.
Before we talk about how to use MQM to develop metrics appropriate for different types of translation, we should understand what translation quality is.
What is translation quality?
According to Alan Melby, a quality translation demonstrates required accuracy and fluency for the audience and purpose and complies with all other negotiated specifications, taking into account end-user needs.
If you noticed the words I highlighted, you would understand that there is not only one definition of quality. Different projects have different quality requirements as the standard of the translation work is various. For example, Lingotek, a company providing cloud-based translation management system, divides content into three levels: automatic, community and professional. “Automatic” content, such as comments, tweets and some blog and forum posts, is less important, which only needs to be readable. “Professional” content, such as a legal document, needs to be highly accurate. Therefore, translation quality is multidimensional, including fluency, accuracy, verity, formatting, engineering and so forth.
MQM contains eight major dimensions, namely Accuracy, Fluency, Terminology, Locale convention, Style, Verity, Design, and Internationalization. Based on the nature and requirements of the project, users can select the suitable dimensions for the project. Once find the dimensions, the user can select issues types. MQM has a catalog of more than 100 issue types with definitions and examples. Don’t be daunted by this number! MQM has a smaller set of issue types suitable for most purpose, which is called MQM Core. The MQM Core can be graphically represented as follows:
Multiple ways to use MQM
There are many ways to use MQM:
- Define your own metric based on dimensions.
- Develop Specifications
- Select Dimensions
- Select Assessment Method
- Select Issues
- Set Issues Weights
- Determine Thresholds
- Implement a workflow
- Use a predefined metric. This way is good for comparison across similar projects.
- Mark texts up or use a score card