Quality Triangle
metrics

Logrus IT’s Signature Quality Metrics

Each quality metric developed at Logrus IT relies on the Quality Triangle methodology . This unique approach incorporates a holistic component, which addresses the overall sentiment of the text, as well as the more technical atomistic component. Our signature quality metrics hinge on three key variables:

 

  • Holistic Adequacy of the text/content as a whole
  • Holistic Readability of the text/content as a whole
  • Atomistic Quality, or the overall “cleanness” of the content; average technical quality of constituent units (i.e. sentences, strings, etc.)

These three factors are independent, and assessed separately on a 0-10 scale, which quantifies the extent to which preset expectations have been met.

In marketing texts, for example, a minimum score of 8 is expected for both Adequacy and Readability, and Atomistic quality is expected to be no lower than 9 (or even 10, with no technical issues whatsoever).

For content on the other side of the spectrum, like knowledge bases, for example, the emphasis is on holistic adequacy, wherein only borderline readability is required, and expectations can be substantially lowered: The Holistic Adequacy acceptance threshold can be lowered to 7, Readability to 5, and Atomistic Quality to 5 or 6.

Expectations can vary dramatically, depending on factors such as the area, budget and timeframe of the project. Content might not “pass” when measured according to the strictest set of criteria (as the one applied for marketing materials, for instance) and fulfill less stringent criteria (as in the example of a knowledge base) with flying colors.

It is important to emphasize that the definition of quality is not dependent on our expectations: It is the tolerance level of our diversion from perfection (from the highest mark of “10”) that is adjusted according to the circumstances of a specific project.

Quality Metric Components

Each quality metric we create or use is comprised of the following components:

 

An original scale, developed in-house at Logrus IT, for measuring semi-objective, holistic quality

 

This scale was created to measure both Holistic Adequacy and Readability. It introduces some much-needed order and predictability to holistic quality measurement, which can otherwise become extremely subjective.

A catalogue of atomistic issues (potential problems at the text unit level)

This catalogue includes a variety of language and technical issue types, such as missing or broken tags or placeholders, grammar and syntax errors, inconsistency with country standards and other locale-related issues, etc.

At Logrus IT, we’ve refrained from reinventing the wheel: Rather, we’ve taken the best publicly available issue framework, the Multidimensional Quality Metrics (MQM), and introduced custom modifications.

We can also use other issue catalogues, including proprietary error classifications utilized by our clients, as the methodology is completely neutral in this respect.

A system for assigning weight to each atomistic issue category or subcategory

All issues are not made equal: An error in country standards or a missing placeholder in software, for example, is generally more important than a missed comma. To reflect this component of error severity in a metric, we need to assign a relative weight to each issue category or subcategory from a given catalogue. Higher weights correspond to more important issues with a stronger effect on quality and the overall sentiment. For issues that are irrelevant or were excluded for simplification reasons, we can simply apply the weight of “zero”.

For each issue category, the number of issues found is multiplied by the category’s preset weight. The sum of these weighted totals across categories is then divided by the total word count, which generates the Atomistic Quality value.

Three acceptance thresholds:

a. One for Holistic Quality (Adequacy)

b. One for Holistic Quality (Readability)

c. One for Atomistic Quality

Each of the three thresholds depends on content type, client expectations, time and budget. Each quality assessment is measured against its respective threshold (tolerance level), and material is considered acceptable only when all three assessments satisfy their respective tolerance level requirements.

Taken together, the collective weights assigned to atomistic issue categories and the three acceptance thresholds (preset expectations) form the Quality Vector—the only thing that differentiates one quality metric from another.

The Holistic Quality Metric: Simplified

When budget and timing constraints take priority, or a quick and inexpensive preliminary evaluation of translation quality is needed, the metric can be adjusted to account for holistic criteria alone.

This simplified approach fails to provide an in-depth picture, which is, for many applications, unacceptable. At the same time, it still provides a reliable assessment of content quality at a fraction of the cost and time required for the more elaborate, 3D metric.

In cases when content is important to the general public (as with public portals or government websites) or generates significant traction within a certain group, this simplified holistic assessment can actually be crowdsourced and still yield reliable results.

By adjusting tolerance levels for the three factors described above, we are able to cover the full range of applications—from a quick, inexpensive, crowdsourced assessment to an in-depth, sophisticated product review.

Measuring Holistic Quality

There is no absolute way of calculating the Holistic Adequacy or Readability rating: It simply reflects an expert evaluation of a text’s overall impression in these areas. We have to describe the disparate categories of imperfection (represented by numbers on the scale) individually to ensure they can be effectively distinguished from one another. This is not a simple task.

Without an elaborate and detailed metric for Holistic Adequacy and Readability evaluations, a sizable grain of arbitrariness is introduced to the entire process, compromising the perceived and/or actual validity of the results. This happens because the large majority of texts are neither perfect, nor terrible (evaluation is simple in both marginal cases), but fall somewhere in the middle.

At Logrus IT, we approach this crucial issue with the utmost attention and objectivity. As described in the Quality Triangle methodology article, holistic evaluations cannot be properly measured on a scale with too few gradations (0 to 4, for instance): It is essential to account for the inevitable deviations from the median evaluation value (opinions do vary from reviewer to reviewer). Without a sufficient number of gradations, slight changes, such as using different reviewers, can lead to a dramatic difference in results and significantly undermine the metric’s reliability. Conversely, an excessive number of gradations would be both laborious and inefficient—imagine adhering to a hundred quality gradations!

As a result, we’ve chosen what we believe to be the optimal scale (0-10), along with definitions for both Holistic Adequacy and Readability for most of its constituent values. In other words, we’ve created a mechanism that permits us to clearly identify and differentiate between the Holistic Readability values of 6 and 7, for example. The basis for our metric and holistic quality definitions date back to the 60s, when the Automatic Language Processing Advisory Committee (ALPAC) first called attention to the crucial importance of translation Adequacy and Readability in quality assessment, and introduced the original scale and definitions for measuring both factors. We had to significantly rework original definitions and adapt them for the task.

This scale and set of definitions are a part of our competitive advantage, so we’re not quite ready to make them available to the public in their entirety (yet), but the complete holistic metric is available to all Logrus IT clients ordering a third-party LQA. We will also happily advise your company on the topic or create customized metrics to best suit your needs.

Measuring Atomistic Quality

At Logrus IT, we apply a traditional, calculation-based approach to atomistic quality. Model flexibility and extensive coverage is achieved in multiple ways:

1. Adjusting the issue catalogue that is the basis for atomistic quality evaluation.

The full MQM framework includes 150+ issue types, which is a lot to learn, making it quite challenging to utilize. Depending on the context and expectations, we can often substantially reduce the number of issues taken into account without making serious sacrifices. A weight of “zero” is given to ignored issues, making the metric easily customizable.

For example:

a. Issues related to software or firmware problems are irrelevant for printed text, and can be ignored in a metric dealing with marketing materials.

b. In many cases, we can narrow down the full set of issue categories to a smaller set of exclusively high-level categories. For example, we can merge the subcategories of incorrect word form and incorrect word order into their parent category of grammatical error, for instance. While this takes some finesse out of the process and inhibits our ability to assign different weights to particular lower-level issues, it is an approach that works well in many circumstances and significantly simplifies the task.

c. Depending on a client’s preferences, we can substitute an MQM-based issue catalogue developed at Logrus IT with a proprietary issue catalogue utilized by the client.

2. Adjusting individual issue (sub)category weights.

Depending on context, issue types become more or less important, and their respective weights need to reflect this variation. Errors in national standards or units are always considered serious, but their relative weight can be further increased in the case of reference books or repair manuals, where the consequences of an error caused by the incorrect measurement units can become critical.


3. Integrating reviewer-assigned severity into the metric.

All metrics created at Logrus IT get an additional degree of freedom in the form of error severity.

Let’s consider typos: Typically, typos have a relatively low weight compared to many other issues in the issue catalogue and, under most circumstances, cannot seriously distort meaning or prevent the reader from understanding the content correctly. However, assigning a single relative weight to the “typo” category fails to adequately describe the whole spectrum of potential errors. A typo in a home page headline, for instance, or one that significantly affects meaning or even results in pejorative language, should be addressed apart from a comparatively inconsequential one.

Here, reviewer-assigned Severity comes into play. For most “regular” typos, this factor stays at its median or lowest level, but especially conspicuous errors increase the cumulative relative weight of an otherwise benign issue category. For such “show-stopping” errors as those mentioned above, this weight is increased by one or two orders of magnitude. Doing so guarantees that issues whose contextual negative effect seriously exceeds the “standard” weight value of that issue type are escalated and result in LQA failures.

Logrus IT has preprepared atomistic quality metrics for a variety of cases, including firmware, software, web materials, marketing and sales texts, and many more. We will also gladly develop a custom metric for you based on any issue catalogue of your choice.