Posts tagged ‘Translation’

Interesting functionalities

Admittedly the number of people who will find the following funny, or of any interest at all, is pretty limited, so it is a good bet that about 99.86% of the faithful readers of this blog can safely wait for the next post. To appreciate the present one you must be a French-speaking nerd with an interest in both Pushkin’s poetry and Bayesian language translation. Worse, you must have a sense of the comic that somehow matches mine. (Then let’s make that 99.98%.)

Anyway, if you are still with me: there is a famous Pushkin love poem entitled Я помню чудное мгновенье…: “I remember the magic moment…” (the moment when he first saw her). I have no idea why I fed it into Google Translate; sheer idleness I suppose. In the line

И снились милые черты

the poet states that the delicate features of his beloved came to him in his sleep. Literally: “And [your] gentle [face’s] features came to my dreams”.

Google renders this into French as “Et rêvé de fonctionnalités intéressantes”: And dreamed of interesting functionalities. (The direct translation to English is less fun.)

Kudos to Google for capturing the subtle mood of 21st-century  passion. When the true nerd — I know what I am talking about — feels romantic, falls asleep, and starts dreaming, we now know what he dreams of: interesting functionalities.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

Fun with Bayes

 

Try this:  go to translate.google.com, choose Russian as the source and English as the target languages. In the input field, type“Андрей Иванович мне писал” or, if you do not have a Cyrillic keyboard, the transliteration into the Latin alphabet, “Andrej Ivanovich mne pisal”, which (unless you uncheck the default option) will be automatically transcribed into Cyrillic as you go. The correct English translation appears: “Andrey wrote to me”.

Correct yes, but partial: the input did not read “Andrey” but “Andrey Ivanovich”. Russians have a first name, a last name and also a “patronymic”  based on the father’s name: our Andrey’s father is or was called Ivan. Following the characters in, say, War and Peace, would be next to impossible without patronymics (it’s hard enough with them). On the other hand, English usually omits the patronymic, so if you are translating something simpler than a Tolstoy novel it is reasonable for an automated tool to yield “Andrey” as the translation of “Andrey Ivanovich”. In some cases, depending on the context, it gives “Andrei”, and in some others the anglicized “Andrew”.

Google Translate has yet another translation for “Andrey Ivanovich”. Assume that you want to be specific; maybe you know two people called Andrey and use the patronymic to distinguish between them. You want to say, for example,

       I have in mind not Andrey Nikolaevich, but Andrey Petrovich.

You can enter this as “Ja imeju v vidu ne Andreja Nikolaevicha, no Andreja Petrovicha”, or copy-paste the Cyrillic: Я имею в виду не Андрея Николаевича, но Андрея Петровича. (Note for Russian speakers: the word expected after the comma is of course а, but Google Translate, knowing that а can mean “and” in other contexts, translates it here with the opposite of the intended meaning. This is why I use но, which sounds strange but understandable, and is correctly translated.). Try it now and see what comes out on the English side:

      I do not mean Kolmogorov, but Andrei Petrovich.

Google Translate, in other words, has another translation for “Andrey Nikolaevich”:

       Kolmogorov

The great mathematician A.N. Kolmogorov (1903-1987) indeed had this first name and this patronymic; to conclude that anyone with these names is also called Kolmogorov is not, however, a step that most of us are prepared to take, especially those of us with a (living) friend called Andrey Nikolaevich.

A favorite of Russians is “Dobroje Utro, Dmitri Anatolevich”, meaning “Good morning, Dimitri Anatolevich”, which Google translates into “Good morning, Mr. President”. I will let you figure this one out.

All the translations cited were, by the way, obtained on the date of this post; algorithms can change. For other examples, see this page, in Russian (thanks to Sergey Velder for bringing it to my attention).

What happened? Automatic translation has made great progress in recent years, largely as a result of the switch from structural, precise techniques based on linguistic theory to approximate methods based on statistics. These methods rely on an immense corpus of existing human translations, accessible on the Internet, and apply Bayesian techniques to match every text element to the most frequently encountered translation of a similar phrase in existing translations. This switch has caused a revolution in translation, making it possible to get approximate equivalents. Personally I find them most useful for a language I do not know at all: if I want to read a Web page in Korean, I can get its general idea, which I could not have done fifteen years ago without finding a native speaker. For a language that I know imperfectly, the help is less clear, because the translations are almost never entirely right; in fact they are almost always, beyond the level of simple phrases, grammatically incorrect.

With Bayesian techniques it is understandable why “Andrey Nikolaevich” sometimes comes out in English as “Kolmogorov”: he is probably the most famous of all Andrey Nikolaevichs in Google’s database of Russian-English translation pairs. If you do not know the database, the behavior is mysterious, as you cannot usually guess whether the translation in a particular context will be “Andrey, “Andrei”, “Andrew”, “Andrey N.” or “Kolmogorov”, the five variants that I have seen (try your own experiments!). Some cases are predictable once you know that the techniques are statistical: if you include the word “Teorema” (theorem) anywhere close, you are sure to get “Kolmogorov”. But usually there is no obvious clue.

Statistical techniques are great but such examples, beyond the fun, show their limits. I truly hope that in the future they can be combined with more exact techniques based on sound linguistics.

Postscript: are you bytypal?

Perhaps I should explain why I use Google Translate with Russian as the source language. I do not use it for translation, but I do need it to type texts in Russian. I could use a Cyrillic keyboard, but I don’t because I am a very fast touch typist on the English (QWERTY) keyboard. (Learning to type at a professional level early in life was one of the most useful skills I ever acquired — not as useful as grammar, set theory or axiomatic semantics, but far more useful than separation logic.) So it is convenient for me to type in Latin letters, say “Dostojeksky”, and rely on a tool that immediately transliterates into the Cyrillic equivalent, here Достоевский . Then I can copy-paste the result into, for example, an email to a Russian-speaking recipient.

I used to rely on a tool that does exactly this, Translit (www.translit.ru); I have of late found Google Translate generally more convenient because it does not just transliterate but relies on its database to correct some typos. I do not need the translation (except possibly to check that what I wrote makes sense), but I see it anyway; that is how I ran into the Bayesian fun described above.

As a matter of fact the transliteration tool is good but, as often with software from Google, only  “almost” right. Sometimes it simply refuses to transliterate what I wrote, because it insists on its own misguided idea of what I meant. The Auto Correct option of Microsoft Office has the good sense, when it wrongly corrects your input and  you retype it, to obey you the second time around; but Google Translate’s transliteration facility does not seem to have any such policy: it sticks to its own view, right or wrong. As a consequence it has occasionally taken me a good five minutes of fighting the tool to enter a single word. Such glitches might be removed over time, but at the moment they are sufficiently annoying that I am thinking of teaching myself to touch-type in Cyrillic.

Is this possible? Initially I learned to type on the French (AZERTY) keyboard and I had to unlearn it, since otherwise a Q would occasionally come out as an A, a Z as a W and a semicolon as an M. I know bilingual people, but none who have programmed themselves to touch-type on different keyboards. Anyone out there willing to comment on the experience of bytypalism?

VN:F [1.9.10_1130]
Rating: 8.4/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

PGT-PPP (Pretty Good Translation, Pretty Poor Privacy)

What’s in a URL? To someone who gains access to your computer, your browsing history will provide interesting information. More interesting, in some cases, than you might think.

Try Google Translate, for example. Say you want to translate “To be or not to be, that is the question” into the language of your choice. Go to http://translate.google.com, type your text, select the source language (actually you can skip this step, the tool will detect the language automatically) and the target language (that you will have to do, Google does not read into your mind yet). You get the translation; rather, a Pretty Good Translation, almost never quite right in my experience, but sufficient to give you a Pretty Good Idea. This is the result of modern work in computational linguistics, based on statistics and large-scale data mining rather than a traditional syntax-directed attempt at perfection.

Now look into the URL:

    http://translate.google.com/#auto|sv|To%20be%20or%20not%20to%20be%2C%20that%20is%20the%20question

Your text is encoded in it! This is true even for very long texts. Building up such complex URLs is one of the time-honored techniques for simulating state in the stateless HTTP protocol. But now anyone who sees your browsing history will know the precise texts that interested you enough to make you want to translate them. A Pretty Good Window on your personal interests! And not necessarily something that you want automatically archived.

Google Translate and other translation sites are great tools to facilitate our life, especially when dealing with languages we know superficially or not at all. But maybe there is a way to provide the service without opening such a large window on the detailed questions that occupy our minds?

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)