How Google Converted Language Translation Into a Problem of Vector Space Mathematics

  • Post author:
  • Post category:General

To translate one language into another, find the linear transformation that maps one to the other. Simple, say a team of Google engineers

To translate one document, actual translation services rely on dictionaries and oversions of the same document in different languages. The problem, of course, is that these dictionaries that have to be compiled by human experts and this takes significant time and effort.

Now Tomas Mikolov and a couple of colleagues at Google in Mountain View have developed a new technique that uses data mining to model the structure of a single language and represent words as vectors. It turns out that different languages share many similarities in this vector space. That means the process of converting one language into another is equivalent to finding the transformation that converts one vector space into the other. This turns the problem of translation from one of linguistics into one of mathematic. So the problem for the Google team is to find a way of accurately mapping one vector space onto the other. For this they use a small bilingual dictionary compiled by human experts–comparing same corpus of words in two different languages gives them a ready-made linear transformation that does the trick.

Mikolov and co say it works remarkably well. “Despite its simplicity, our method is surprisingly effective: we can achieve almost 90% precision for translation of words between English and Spanish,” they say. The method can also be used to extend and refine existing dictionaries, and even to spot mistakes in them. Indeed, the Google team do exactly that with an English-Czech dictionary, finding numerous mistakes.
MIT Technology Review. See more at :