How Can We Make Machine Translation Better?


Neural machine-learning (nMT) has recently made significant advances in quality and is being used more and more often for translation.

At Logrus IT we recently conducted a study of the usability and performance of MT for translating IT security texts. The results showed that using machine translation for English-to-Russian texts in this field allowed us to more than double our performance, and the quality was just as good as human translation.

We also participated in a comparison of various MT engines (we performed LQA on the test results) and confirmed yet again that machine translation is a force to be reckoned with.

Needless to say, the area in which MT can be used is still quite limited, and we can hardly expect MT to perform decent translations of marketing, gaming, or literary texts. However, for translating homogeneous legal and technical documentation or reference material, MT does a respectable job (although the quality can vary drastically from one language pair to another).

When it comes to neural systems, the quality of MT output depends directly on the quality and volume of the training corpora.

Quality can also be improved by using engines with domain adaptation (i.e. engines with the ability to supplement the learning of stock models with industry-specific corpora). This makes it possible to “coach” the neural network to translate texts on a certain topic. One engine — ModernMT — allows you to adapt the output on the document level.

The problem is that domain adaptation requires training corpora of an impressive size (10,000-100,000 segments on average), and these corpora frequently aren't available. A decent solution is Google AutoML Translation, a model that can significantly increase the quality of MT output when trained on a good TM (translation memory file). You don’t need to be a computer programmer to create a TM — everything is done via a graphical interface.

The quality of MT can also be improved by using a glossary. As a rule, glossaries are more readily available than training corpora. Unfortunately, this option also has its stumbling blocks — there are many nMT engines in which this feature either hasn’t been implemented or is still in the testing phase (such as Google AutoML, which we use for MT experiments).

We decided not to wait for Google to roll out its fully functional service and wrote our own utility called Glosser, which makes it possible to connect and register glossaries in the system even if you aren’t a programmer.

In general, glossaries can be useful for machine translation in the following circumstances:

  • Translating proper names, including names of brands and products. For example, Google Home shouldn’t be translated into French as “chez Google.”
  • Translating interface elements in reference materials.
  • Translating ambiguous words. For example, the word “bat” can refer either to a baseball bat or a flying rodent based on context.
  • Translating loan words, archaisms, or uncommon words. For example, the word “bouillabaisse” came into English from French in the 19th century. However, for most native English speakers it doesn't mean much unless they happen to be well-versed in cooking or French culture. So in some cases it might be better to translate words such as this descriptively (as, say, “fish stew”).

All in all, customizing nMT engines, training them to meet your specific needs, and using glossaries seems to be a very promising approach. It’s entirely possible that translation agencies will offer this as an additional service in the near future.

In the meantime, we’re going to closely follow the development of this technology and continue testing it in an R&D context.


This website uses cookies. If you click the ACCEPT button or continue to browse the website, we consider you have accepted the use of cookie files. Privacy Policy