Machine translation usually needs huge datasets to work its magic, preferably involving identical texts Rosetta Stone-style. New work from German academics breaks fresh ground.
Today, Ehsaneddin Asgari and Hinrich Schutze at Ludwig-Maximilian University of Munich in Germany say they have done just that. Their new approach reveals important elements of almost any language that can then be used as a stepping stone for machine translation.
The new technique is based around a single text that has been translated into at least 2,000 different languages. This is the Bible, and linguists have long recognized its importance in their discipline.
Consequently, they have created a database called the Parallel Bible Corpus, which consists of translations of the New Testament in 1,169 languages. This data set is not big enough for the kind of industrial machine learning that Google and others perform. So Asgari and Schutze have come up with another approach based on the way tenses appear in different languages.
Computational linguistics has had a profound impact on our understanding of language, the way it varies around the world and how machines can understand it. This emerging discipline has made it possible to automatically translate many languages directly into others in written and spoken form. Indeed, the promise is that instantaneous machine translation will soon match and then outperform the ability of human interpreters.
Star Trek fans recall the Universal Translator that can translate between ordinary english and an obscure, never-before-seen tongue - lizardese! In the 1967 episode Arena, Captain Kirk is whisked off to a distant planet to battle a creature never encountered before by humans - a Gorn. (Skip to 45 seconds into the trailer if not a Trek fan.)
(Star Trek Arena trailer)
Note that the translator works immediately. Compare to the more realistic (hah!) translator discs from Larry Niven's 1970 masterpiece Ringworld:
The tattooed one made a short speech. That was luck. The autopilot would need data before it could begin a translation...
Presently the discs were filling in words and phrases...