Deep Learning Will Revolutionize Machine Translation

Deep learning, a technique derived from neural networks (an early AI technology), is poised to revolutionize machine translation. Google recently open sourced its word2vec package, which analyzes English language texts to discover the meanings and relationships of words. The results are pretty impressive and point toward a significant advance in machine translation technology in the next few years.

Current machine translation systems rely on statistical techniques to train systems. Statistical machine translation works by feeding a large corpus of aligned texts, one in each language, to train the system. The system does not understand either language in any meaningful sense, but simply learns that Hello is highly correlated with Hola in Spanish. Given enough text, it can produce decent quality output that is suitable for comprehension, but not useful for publication without post-editing.

Deep learning has the potential to change all of this by enabling the construction of systems that autonomously learn how words within each language relate to each other, their synonyms, and other linguistic structures. This will enable machine translation systems to understand the material they are translating.

For example, the word2vec package enables you to find the “distance” to the closest words or phrases from an input phrase. Type in “France”, and it will display:

                spain              0.678515
              belgium              0.665923
          netherlands              0.652428
                italy              0.633130
          switzerland              0.622323
           luxembourg              0.610033
             portugal              0.577154
               russia              0.571507
              germany              0.563291
            catalonia              0.534176

Google is investing heavily in deep learning technology, as it outperforms traditional machine learning systems by a wide margin, and as machine learning is closely related to search. The better their system understands the queries users are making, the better the search results will be. (I suspect they have already implemented this in some of their search tools, as I’ve noticed a significant improvement in related links in search results in the past year).

Machine translation vendors would be wise to take a close look at word2vec and related projects, as they have the potential to render current techniques obsolete. (Its a good bet that Google’s translation team is already working on this problem). Otherwise, they stand a good chance of being left in the dust by new systems and companies.

This entry was posted in machine translation and tagged , , . Bookmark the permalink.

Post Your Comments About This Vendor or Article (Short & Sweet Please)

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s