chapter 2 The Trouble with Translation - yanghaocsg/machine_translation GitHub Wiki
2 The Trouble with Translation
Before addressing machine translation, it is important to investigate the notion of translation in itself. How do we proceed when we translate? What makes a translation a good translation? In the course of this chapter, we will see that these questions are hard to answer and have already given rise to an abundant literature. In the second part of this chapter, we will investigate why understanding a sentence— something that is easy and natural for humans—is one of the most difficult things to do with computers, despite their incredible calculation power.
What Does It Mean to Translate?
The answer to this question may seem obvious: to translate is to transpose a source-language text into a target-language text. However, one can easily see that this deceptively simple answer refers in fact to a dramatically complex problem. What does it mean to “transpose a text”? How do we go from a source language to a target language? How does one find equivalent expressions between two languages? Should the translation be based on words, chunks of words, or even sentences? And, more fundamentally, how can one determine what the meaning of a text or an expression is? Does everybody have the same understanding of a text? If not, how can this issue be handled in the translation process?
As should be clear from the previous paragraph, translation is connected with a large number of questions dealing with linguistics, but also psychology or even philosophy when the nature of meaning is at stake. Instead of addressing these highly complex questions (with no clear answer!), it is probably more useful to take a step to the side and try to determine what the characteristics of a “good” translation are.
What Is a Good Translation?
A first crucial issue when addressing translation is that no one knows how to formally define what constitutes a “good” translation. We should thus not expect to make much headway from this perspective, but at least some criteria can be found in the literature. A crucial issue when addressing translation is that no one knows how to formally define what constitutes a “good” translation.
Criteria for a “Good” Translation
The translation of a text should be faithful to the original text: it should respect the main characteristics of the original text, the tone and style, the details of the ideas as well as its overall structure. The result should be easy to read in the target language, and it should also be linguistically correct, which means that a subtle process of reformulation must be used. Ideally, the reader should not realize he is reading a translation if he does not know the origin of the text, which implies that all formulaic and idiomatic expressions should be rendered appropriately.
As a result, the translator must perfectly understand the text he has to translate, but he must also have an even better knowledge of the target language. This is the reason why professional translators usually only translate into their mother language so that they have a perfect understanding and knowledge of the expressions to be used to render the source text accurately.
The inherent subjectivity of these criteria is undeniable, however. What is considered as a “good” translation by some readers may be a bad one according to another person. This situation frequently crops up when professional translators work with authors they are not familiar with or when the translator does not know in what context his translation will be used.
What is expected of a translation can vary radically depending on the clients, the era, the nature of the text, its usage, or even context. Technical texts are not translated in the same way as literary texts. A specific adaptation of the original text is necessary when the text concerns a world that is remote from the world of the reader in the target language (for example, if a Japanese text from the twelfth century is translated into modern English). The translator has to choose between staying close to the original text or making use of paraphrasing to ensure comprehension (especially with historical contexts, unfamiliar events, etc.). The tone and the style of a text are also highly subjective notions that are largely related to the language under consideration.
As one can easily see from this quick overview, all these subjective features make the evaluation of the task a difficult problem. Some pitfalls are, however, well known and frequently addressed in the literature on the topic. Word-for-word translation is not a good practice, since the result is often hard to understand and not idiomatic in the target language. Deceptive cognates and syntactic duplicates should, of course, be banished since they lead to nonsense (the French word “achèvement” should be translated as “completion” in English, and not as “achievement,” for example). It is also well known that a translator should first read the whole text, or at least a large part of the text, to be translated so as to avoid local mistranslations. A good knowledge of the clients, context, and future use of the translated text can also help to adjust the translation to the target.
Consequences for Machine Translation
From what we have seen so far, it is clear that translation is a complex process involving high-level cognitive and linguistic capabilities. A translator must be at ease with the two languages involved, and he must have special skills to reformulate a source language in a target language that does not have the same wording or the same structure.
These kinds of skills are not directly available to machines. Artificial systems are still in their infancy from this point of view and are very far from the capacities of a human being when it comes to reasoning, inferring, and reformulating. To be able to reformulate a sentence, one must of course have a good command of the language itself, but one must also master the search for analogy between concepts, which is much more complicated that just equivalencies between words and expressions.
Developers of artificial systems are aware of these limitations. Very few researchers have tried to develop machine translation systems for literary texts: nearly everybody agrees that machine translation is a difficult task that is far from being resolved, and that only mundane texts (e.g., news, technical texts) should be addressed. The idea is not to replace human translators who are the only ones able to translate novels or poetry. Even technical texts pose specific difficulties since they employ a very technical vocabulary that has first to be introduced into the system in order to obtain relevant translations. The goal of machine translation is now considered mainly to be that of providing the user with some help and, in some professional contexts, enabling him to decide whether a human translator needs to be called on or not. The overall quality achievable by machine translation has also been a matter of much debate. The ultimate goal is to obtain a quality of translation equivalent to that of a human being. People agree that this is highly challenging and also hard to formalize, since the quality of a translation is related to the nature and complexity of the text to be translated.
For a long time, machine translation used local techniques that could be compared, to a certain extent, to a word-for-word translation process, even if most systems now also take more complex expressions into consideration. Information at the text level is rarely taken into account, even though it is well known that the text can provide important information for the translation process. The tonality or the style of a text, for example, is always ignored: this kind of information is in fact too hard to formalize for automatic systems.
In a way, even the sentence level is too complex for most current systems. It is generally assumed that these systems perform a sentence-by-sentence translation, which is true to a certain extent, but the translation process generally involves, in fact, fragments of sentences.1 The translation of a full sentence then consists in assembling the translations of these local fragments. It is therefore not surprising that machine translation sometimes provides strange results and quite often utter nonsense. Morphology (the analysis of the structure of words) and syntax (the analysis of the structure of sentences) are rarely taken into account, and this has particularly dramatic consequences for some languages. For example, some are said to be highly inflectional, which means that word forms can change depending on the grammatical function of the word in the sentence (subject, complement, etc.). In this context, it is clear that an automatic process will not be able to provide the right word form in the target language without a proper syntactic analysis (i.e., an analysis of the relative grammatical function of the different words in the sentence).
Last but not least, one should understand why processing languages with computers is difficult, even when dealing with easier tasks than machine translation. A language has thousands of words, with different surface forms (“to dance,” “danced,” “dancing”), different meanings, and different structures. Compounds (e.g., “round table,” which generally designates an event and not an object), light verbs (e.g., “to take a shower,” where “take” has little semantic content), and idioms or frozen expressions (e.g., “kick the bucket,” the meaning of which has nothing to do with “kick” or “bucket”) make the task even more complex, since it is then necessary to spot complex expressions and not only isolated words. The following section aims at showing some of the issues at stake.
Why Is It Difficult to Analyze Natural Language with Computers?
Apart from the lack of information on the client, the context, or the style of the text under consideration for translation, the main issue is related to the task itself. Processing natural languages (as opposed to processing formal languages, such as the programming languages used by computers) is difficult in itself, mainly because at the heart of natural language lie vagueness and ambiguity.
Natural Languages and Ambiguity
Linguists as well as computer scientists have been interested ever since the creation of computers in natural language processing, a field also called computational linguistics. Natural language processing is difficult because, by default, computers do not have any knowledge of what a language is. It is thus necessary to specify the definition of a word, a phrase, and a sentence. So far, things may not seem too difficult (however, think about expressions like: “isn’t it,” “won’t,” “U.S.,” “$80”: it is not always clear what is a word and how many words are involved in such expressions) and not so different from formal languages, which are also made of words. The main difference lies in the fact that every word and every expression of a given natural language can be ambiguous.
Let’s take some famous examples such as “the chicken is ready to eat” or “there was not a single man at the party.” These are textbook examples and may seem a bit far-fetched. However, they illustrate some well-known problems in language processing: in the first example, should one give the chicken something to eat, or is it the chicken that is ready to be eaten? In the second example, does the speaker mean that there were no men at the party, or does he mean that all the men there were married? These sophisticated examples should not mask the fact that ambiguity is in fact pervasive and is also part of the most mundane words and expressions. Just in these two examples, we can remark that “chicken” can refer to an animal or a kind of meat, but also a coward. A party can designate (according to Wordnet2) “an organization to gain political power,” “a group of people gathered together for pleasure,” “a band of people associated temporarily in some activity,” “an occasion on which people can assemble for social interaction and entertainment,” or even “a person involved in legal proceedings”). “Party” can also be a verb for “have or participate in a party,” etc.
One answer to this problem is just to record all these different meanings in a dictionary, and this in a way already exists since we mentioned, for example, Wordnet, a lexical database that can be used by humans as well as by computers. However, one quickly realizes that this is not a working solution, since once all these meanings have been stored in the dictionary, the problem is then to find a way to choose the right meaning of each occurrence (that is to say, for each word used in context). A normal dictionary usually contains around 50,000 to 100,000 entries (i.e., different words) that can in turn generate more surface forms, or words as they are found in texts. For example, “texts” is not a dictionary entry, since it is just a surface form of the word “text” (“texts” is the plural of “text,” and this is supposed to be known by the end user). This point of departure is assumed by nearly all dictionaries made for normal human perusal. In a dictionary, only the singular is stored for nouns and adjectives, and the infinitive for verbs; the dictionary form of a word is usually called a lemma. In English, the number of surface forms is limited, but the problem is worse for a language like French. For other languages like Finnish, the theoretical number of surface forms is huge and could even be considered infinite, since the language has at least 12 cases and lots of suffixes and particles that can be combined in various ways. Trying to store all these forms in a dictionary is probably not a good idea!
What makes things even more difficult is that to decide on the meaning of a word or an expression (does “party” here mean “an organization to gain political power” or “a group of people gathered together for pleasure”?), one has to take context into account. But the context itself is generally ambiguous, leading to a potentially unresolvable problem. Moreover, it has been demonstrated that word senses are not mutually exclusive and that “word usages often fall between dictionary definitions” (Kilgarriff, 2006). This is one of the main consequences of the pervasive vagueness of languages.
What may seem paradoxical is that humans, who cannot process numbers as fast or as accurately as computers, are in fact very good at handling these kinds of problems. Most of us do not see any ambiguity in most sentences, even when there are thousands of meanings that could possibly be considered. This aspect of language complexity was simply not grasped by most of the early researchers in the domain or, to be more exact, this complexity was largely underestimated.
The way language is processed (and more specifically the way an utterance is understood) remains largely obscure, even nowadays in the era of neuroimaging. Understanding seems to be natural, direct, and largely unconscious. It is highly doubtful that all possibilities are considered in order to obtain a semantic representation of a sentence. Thanks to the communication context, the brain probably directly activates the “right” meaning, without even considering alternate solutions. A parallel has sometimes been proposed with the Necker cube, the representation of a cube seen in perspective with no depth cue (figure 1).
The drawing is “ambiguous” in that no cue makes it possible to determine which side of the cube is in front and which side is at the back. However, it was noticed by Necker (and others before him) that humans naturally select one of the representations so that it makes sense and is coherent with the image of a cube in nature. Both interpretations, that is to say two different cubes, can be seen alternately, but they cannot both be considered simultaneously, since this would violate predefined conceptions embedded in the brain. One can also think about Escher’s asymmetrical drawings that take advantage of quirks of perception and perspective: these images are largely based on representations that violate our pre-conceived representations of space.
These examples should remind us that the brain is able to interpret (and sometimes correct) perceptions in accordance with predefined schemas. Without going into too many details, this theory is also in line with the notion of Gestalt, which refers to the idea that the brain interprets a whole from its parts and a part from the whole. Applied to language, this means that the meaning of a word is largely determined by the larger context, which itself depends on the meaning of the words it is composed of. There is a dynamic co-construction of interpretation in the brain that is absolutely natural and unconscious.3 Consequences for Machine Translation We have seen in this chapter that the main issue for natural language processing is ambiguity: it is simply difficult to determine the meaning of a word. Meaning depends on context, but the notion of context is itself also vague and ambiguous.
We should add that determining the number of meanings per word (what is generally called word sense) is also an open issue, since from one dictionary to another, the number of word senses differs: some dictionaries are more precise and contain a more fine-grained description of meaning, while others prefer to limit the number of senses per word, depending on their conceptual choices and on their intended readership.
Despite these issues, it has been generally assumed that in order to produce high-quality translations, one must first provide a precise and accurate description of the meaning of sentences. Advances in machine translation were then linked to the progress made in text understanding, which largely drove the field for several years, as we will see in the following chapters. However, these hypotheses are called into question: statistical approaches can make use of large quantities of text available on the web and calculate possible equivalencies between language without using any predefined dictionaries or high-level formalisms. In subsequent chapters (see, especially, chapters 9 to 12), we will discuss how accurate these models are and to what extent they avoid (or integrate) semantic information
Artificial and Natural Systems
A much-debated question in the field of machine translation is the extent to which artificial systems should reproduce the strategies used by humans for translating. In other words, can we learn something from observing the working practices of professional translators?
This is another hard question, and the first thing to stress is that we do not know much about the cognitive processes involved in the have described in detail the different approaches considered in the domain.
Notes
- One should note however that very recent advances in the field, based on deep learning, try to avoid translating isolated groups of words and consider instead the whole sentence directly.
- https://wordnet.princeton.edu.
- Advertising often plays with ambiguity and double meaning (for example in a slogan like “Trust Sleepy’s, for the rest of your life,” where “rest” refers both to the act of resting and to what remains of your life). Most people will not immediately see the double meaning, which means that humans are naturally prone to select one interpretation and not even consider alternate solutions.