Neural machine-learning (nMT) has recently made significant advances in quality and is being used more and more often for translation.
At Logrus IT we recently conducted a study of the usability and performance of MT for translating IT security texts. The results showed that using machine translation for English-to-Russian texts in this field allowed us to more than double our performance, and the quality was just as good as human translation.
Needless to say, the area in which MT can be used is still quite limited, and we can hardly expect MT to perform decent translations of marketing, gaming, or literary texts. However, for translating homogeneous legal and technical documentation or reference material, MT does a respectable job (although the quality can vary drastically from one language pair to another).
When it comes to neural systems, the quality of MT output depends directly on the quality and volume of the training corpora.
Quality can also be improved by using engines with domain adaptation (i.e. engines with the ability to supplement the learning of stock models with industry-specific corpora). This makes it possible to “coach” the neural network to translate texts on a certain topic. One engine — ModernMT — allows you to adapt the output on the document level.
The problem is that domain adaptation requires training corpora of an impressive size (10,000-100,000 segments on average), and these corpora frequently aren't available. A decent solution is Google AutoML Translation, a model that can significantly increase the quality of MT output when trained on a good TM (translation memory file). You don’t need to be a computer programmer to create a TM — everything is done via a graphical interface.
The quality of MT can also be improved by using a glossary. As a rule, glossaries are more readily available than training corpora. Unfortunately, this option also has its stumbling blocks — there are many nMT engines in which this feature either hasn’t been implemented or is still in the testing phase (such as Google AutoML, which we use for MT experiments).
We decided not to wait for Google to roll out its fully functional service and wrote our own utility called Glosser, which makes it possible to connect and register glossaries in the system even if you aren’t a programmer.
In general, glossaries can be useful for machine translation in the following circumstances:
All in all, customizing nMT engines, training them to meet your specific needs, and using glossaries seems to be a very promising approach. It’s entirely possible that translation agencies will offer this as an additional service in the near future.
In the meantime, we’re going to closely follow the development of this technology and continue testing it in an R&D context.