Machine translation

Machine translation

Many translators regard machine translation with a mixture of disparagement and resentment, because in the past its quality has been poor and translators are now afraid that it may put them out of work. However, it has become an inescapable feature of the translation environment, its quality is improving at a remarkable machine translationrate, and some translators find it a useful tool. Claude Piron argued that machine translation, at its best, automates the easier part of a translator’s job; the harder and more time-consuming part usually involves doing extensive research (see section on machine translation in the Wikipedia article on “Translation“; see also a recent comment on adaptive machine translation here).


The history of machine translation goes back to March 1947, with a letter from Warren Weaver to Norbert Wiener on the possibility of using computers to translate natural human languages. High initial expectations for machine translation were not realised as soon as many people expected, and in the United States R&D in this area was cut back for some years after the ALPAC report in 1966. For many years it was only useful for very restricted applications with limited demands on vocabulary and grammar, such as the translation of weather reports, although active research continued in Europe, driven by the demands of the European Commission (see also  Hutchins (2000) ‘Early years in machine translation. Memoirs and Biographies of Pioneers’ (John Benjamin) and Hutchins (2005), ‘The history of machine translation in a nutshell‘.

One of the earliest commercial systems for machine translation used a rule-based technology developed by SYSTRAN. However, this technology was then supplemented or replaced by statistical machine translation, first suggested by Warren Weaver in 1949 and then reintroduced by researchers at IBM in 1991. It is based on the analysis of large corpora of bilingual texts. Google Translate converted from a SYSTRAN-based system to phrase-based statistical translation technology in 2007, and its API (and other machine translation APIs) were incorporated into most computer-aided translation tools.
Since then, companies such as Google and Microsoft have developed neural machine translation  (see below) to the point where it is capable of producing results that are significantly better than those produced by statistical (corpus-based) paradigms.

There is an impressive video on YouTube demonstrating the current potential of machine interpreting, involving speech recognition, speech to text conversion and machine translation, followed by text to speech conversion. And Microsoft has developed the Skype Translator app, which currently allows you to communicate in 8 languages for voice calls. This was demonstrated in May 2014 by a conversation between Gurdeep Pall (speaking English) and Diana Heinrichs (speaking German).

Neural machine translation

A relatively recent development is the use of  neural networks in machine translation; see for example ‘A Deep Dive into Systran’s Neural Machine (NMT) Technology‘. In fact Google Translate has now switched over to what they call Google Neural Machine Translation (GNMT), which can translate between language pairs that the system has never seen before (‘Zero Shot’ translation). For example, if the system has been trained to translate between Japanese and English and between Korean and English, it is able to produce a ‘reasonable’ translation between Japanese and Korean (see also my blog posts on neural machine translation and adaptive machine translation).


However, there are concerns that machine translation could lead to breaches of confidentiality. For example:

Confidentiality is a fundamental value in translation, which is compromised by those MT tools that store information online. The information fed into an online MT tool remains stored in the engine and can be accessed and used by others. Thus, access to proprietary information could be in the hands of third parties alien to the duty of confidentiality that binds a translator with a client.” [IAPTI]

Google’s statements on confidentiality send mixed messages. So with regard to their API, which is available in most translation software, they say in their ‘Translate API FAQ‘ under ‘Data Confidentiality’ (last accessed 14 February 2017):

“Will Google share the text I translate with others? We will not make the content of the text that you translate available to the public, or share it with anyone else, except as necessary to provide the Translate API service. For example, sometimes we may need to use a third-party vendor to help us provide some aspect of our services, such as storage or transmission of data. We won’t share the text that you translate with any other parties, or make it public, for any other purpose.”

A slightly different impression is given by the following statement in Google’s ‘Terms of Service‘ under ‘Your Content in our Services’ (as of 14 February 2017):

When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide licence to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes that we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights that you grant in this licence are for the limited purpose of operating, promoting and improving our Services, and to develop new ones.

Concerns about the possibility of breaching confidentiality by using on-line machine translation are therefore understandable. However, most documents that are sent between clients, agencies and translators are transmitted by email. It is well known that email cannot be considered a confidential mode of communication, but it has not received nearly the same level of concern as machine translation.