Machine Translation (MT)

Machine Translation (MT) is a highly polarizing topic. Over the years, we’ve heard a multitude of pre-existing concerns, reservations, and strong opinions about MT.

At Logrus IT, we take a pragmatic approach and consider MT technology to be one of the many factors contributing to cost and productivity optimization. Alternatively, MT provides solutions in cases where human translation is out of the question due to time and/or budget restrictions.

When clients request that we avoid using MT, the cause for their concern has less to do with MT, specifically, but rather with insufficient human editing, lack of adequate output quality control procedures, and other processes resulting in potentially substandard translations delivered to them. At Logrus IT, we offset potential MT pitfalls with a comprehensive, multi-layered translation and quality assurance process, and will gladly provide all details related to this process. Numerous built-in checks and substantial human involvement prevent substandard translations from making it into the final delivered documents.

To dispel fears that our clients might have and demystify the subject, we have decided to publish our equivalent of an MT application code.

Logrus IT Machine Translation (MT) Application Protocol

  • Clients are always notified that MT is part of the translation process, and in instances where costs are significantly reduced, we share these savings with our clients.
  • We never use MT on projects for which clients have explicitly instructed us to avoid it.
  • All MT or Neural MT (NMT) suggestions and Translation Memory (TM) matches are evaluated and edited by a human translator/editor.
  • Quality of resulting translation is not sacrificed compared to the traditional, human-only translation process.
  • Translation Memory and Termbase matches always have precedence over MT-generated suggestions; all relevant TMs and TBs are applied first.
  • MT is only used for units which lack appropriate TM matches and simply provides suggestions similar to those coming from a TM. Translators always explicitly see the source of translation and can clearly distinguish between TM- and MT-sourced suggestions, and give more attention to the latter.
  • We always strive to select the MT engine best suited for a particular language pair, subject area and project line.

Optimal results are usually achieved with parallel use of multiple MT engines (different engines may produce the best pre-translations for different document sections.)

We regularly use such engines as Microsoft MT, Microsoft Neural MT, LILT (adaptive MT), Google MT, and others. Results vary, often significantly, among engines.

  • Logrus IT never uses free MT engines for business. All free MT engines add translations to the public domain, which runs contrary to the NDAs and/or contracts signed with our clients.
  • We always use a subscription level that keeps accumulated translations (the corpus) private across MT engines. These subscription levels tend to support MT training, which is required for most subject areas or product lines.

Cases Where MT Works Best

MT efficiency depends on multiple factors, including the language pair (both source and target), availability of a large and clean enough corpus (the latter of which is more important), subject area, and document structure.

While the quality of MT suggestions for each new translation project may vary significantly, there are indicators known to produce positive outcomes, a number of which are outlined below.

  • Both source and target languages belong to the same group. For instance, translating English into languages like German, French, Spanish, Italian, etc. typically results in a significantly higher quality MT output.
  • Translation from Chinese to Germanic or other popular European languages (English, German, etc.) is an MT favorite, often producing raw translations that are very “humanlike”. This is a direct consequence of the highly structured nature of Chinese, which is not a phonetic language (like most European ones), but a topic-prominent language built around notions and concepts. At the same time, Chinese word order very closely matches English word order.
  • The subject matter is generic enough or widely popular (indicating ample corpus size).
  • Documents have predictable, repeating, and formal structure.
  • Documents use shorter sentences or strings, and contain no complex grammar.
  • A large, high-quality TM exists for the project line or subject matter and it can be used to train the engine. (One of the most important factors contributing to TM quality is terminology and translation consistency.)

Applying MT most noticeably reduces cost and increases speeds during the initial stages of a new project, when a large TM based on existing translations is unavailable.

Expected Savings

MT often boosts translator efficiency. At the same time:

  • Using MT engines in “private” mode costs money (up to 0.5 cents/word depending on the engine).
  • Proper MT application requires more human attention, with more time spent on various checks.
  • MT only saves effort for new sentences or units; it doesn’t provide any savings for recycled units from the TM.

When MT works well, suggestions generated by MT can be treated more or less like traditional low fuzzy matches from the TM. Total savings generally range from 5% to 30% for units where MT was applied.

How We Integrate MT into Our Translation Process

The days of setting up and training proprietary, standalone Moses servers on local networks are long gone.

A modern, efficient translation process requires a cloud-based CAT (computer-assisted translation) system with simultaneous online access for all parties involved, including PMs, translators and editors. All translations are stored in the cloud, along with glossaries and TMs.

Within this paradigm, cloud-based MT engines are seamlessly connected to the cloud CAT system through APIs and used by translators in a manner similar to TM. MT training is also performed from within the CAT system, which is the most natural way (all TMs are stored in the CAT systems) that also saves significant effort. Well-designed cloud CAT systems already have connectors to most popular MT engines, the selection continues to grow, and it is neither too time-consuming nor costly to add more engines upon request.

As mentioned earlier, using a single MT engine for all projects, or even within a single project, is often not optimal. Quality of MT output varies significantly from engine to engine, depending on the language pair, subject area, and text structure/specifics.

This topic receives extensive coverage, including a recent research study published by our esteemed colleagues from DFKI which compares errors made by neural MT vs. “traditional” MT. For instance, Neural MT (NMT) typically produces translations that sound better, but often make less sense than those produced by statistical and hybrid MT engines.

As a result, Logrus IT prefers translators to have access to multiple MT engines at once, thus granting the opportunity to select the best translation or to discard all MT suggestions and translate the unit from scratch. We regularly use suggestions from Microsoft MT, Microsoft Neural MT, Google MT, LILT (adaptive MT), and others. Free MT options (with translations added to the public domain) are blocked in all cases.

In its entirety, the translation process works as follows:

  1. A project is created in a cloud-based CAT system.
    1. This includes attaching appropriate TMs and glossaries and allowing the use of relevant MT engines.
  2. Resources (translators and editors) are assigned to the project.
  3. For each unit, the translator sees both TM fuzzy matches (if available) and MT suggestions from multiple engines.
    1. TM matches have precedence over MT-generated suggestions, and are located at the top of the list.
    2. MT is used only for units without good TM matches.
    3. The translation origin of each suggestion is clearly visible to the translator, so more attention is devoted to MT-sourced suggestions.
  4. A special status is assigned to each unit for which an MT suggestion was applied without any editing. Unit finalization is separated from the translation process.
    1. This allows us to apply a filter, select all MT-sourced translations, and ensure that these are all edited and finalized by a human.
  5. All finalized translations are uploaded to the cloud TM.