PRE-EDITING AND POST-EDITING IN MACHINE TRANSLATION

The unstoppable arrival of automatic translation engines marks a paradigm shift in the sector and also in translators’ work processes. Just as computer-assisted translation (CAT) tools represented a quantum leap and a change in workflow, translation engines also substantially change the way translators work, only corrected and augmented. CATs put an end to tasks such as overwriting the original texts with the translation, digging through archives to reuse previous translations, translating or copying and pasting repeated phrases into the same document, or using paper glossaries (or worse, blindly relying on human memory) to find the right terminology. Well, now it’s time to introduce new concepts and here they are: “pre-editing” and “post-editing”.

With the generalisation of automatic translation engines, they are going to become a very common pair of terms in the sector. They do not always go hand in hand, as the former is not necessary in all cases, while the latter is absolutely essential. It is important for all of us to be aware that in no universe or dimension can the output of an engine be considered final or of equivalent quality to that provided by human translators. Human parity is still a long way off. Having said that, let’s look at what pre-editing and post-editing a text consists of, and what kind of skills it will require from translators.

As its name suggests, “pre-editing” consists of preparing the source text so that the engine “understands” it better, which optimises the end result. It’s like making an initial investment only to reap the rewards later. The important thing is to identify when it is necessary to invest time and resources in it.

It is clear that the quality of the output of an engine is directly proportional to the quality of the source text. The problem is that translators often have to deal with originals the quality of which sometimes leaves something to be desired. Fortunately, we are human and we know how to intuit, extrapolate, take advantage of the context in our favour, etc., but not an engine; we can’t demand that from a machine. So here it is a matter of eliminating the pitfalls that could make it difficult for the AI to understand the text.

In PowerPoint presentations this is usually less necessary because, by their nature, they have concise and direct titles, little text and short sentences in general, as they serve as visual support for the speakers, who assume the weight of the presentation. But when we handle a text document in Word, for example, the room for manoeuvre increases, in line with the need for pre-editing. Pay attention and make sure that the sentences are grammatically well constructed and that there are no typos or omissions. It is also advisable to identify words that, for some reason, we do not want the engine to translate, either because they are proper nouns or foreign words that we want to keep. In this case they can be replaced by several X’s or by some symbol that we can easily find later to replace it with the word in question. Attention also to acronyms and abbreviations, as we may find ourselves with the unpleasant surprise that the engine, with its overwhelming logic, has translated WC (working capital) as “toilet”….

Let’s think that the work we don’t do in pre-editing will have to be done in post-editing. Here the maxim is to revise the text with such thoroughness and rigour that at the end it doesn’t look like it has been through an engine. It’s that simple…and that difficult. It’s quite a challenge. And to achieve this level of excellence, it is necessary to pay attention to several aspects; aspects that are not the same as when we review a translation done by a human being. In other words, machine translation engines make different mistakes than human translators do. The most obvious is usually the copying of structures from the original text. The engines “stick” to the text and duplicate grammatical constructions that do not directly belong to the target language. This mistake is also often made by less experienced translators, but it is a tendency that will undoubtedly be corrected over time.

Another significant area of improvement is context-related errors, i.e., the engines do not know that the same word can have several translations depending on the context and do not choose the most appropriate one. A classic example is the word “performance”, which in Spanish, depending on the context, can mean “desempeño”, “rendimiento” or “resultados. Attention should also be paid to terminological homogeneity and adequacy, syntactic, orthotypographical and grammatical correctness, additions and the need to localise and format the text.

Some engines may also randomly omit symbols (e.g. currency) and certain parts of speech. That is why it is vital to verify the integrity of the result and to see that nothing has been lost along the way. We can’t forget figures either: it’s important to check that they are correctly written, as punctuation adjustments or magnitude conversions may have to be made.

And as the engines do not understand metaphors or idioms, you will have to be very careful if the source text contains this type of figures, as the engines will probably not understand them and will make a literal translation that does not make any sense in the target language.

Over time, the eye is trained and experienced post-editors know in advance which texts will get the best performance from machine translation engines. It may even be the case that we decide not to use any engine at all. They will be very exceptional, but it may be the case that we believe that it will take longer to post-edit the result of the engine than to do the translation from scratch. And we may be right. But depending on the type of texts we translate, this will be the exception, not the rule.