How does Google Translate know which words form the best translation for a particular sentence? A team of researchers from the University of Amsterdam (UvA) has developed a new method that assists computer translation systems such as Google Translate. The method makes it possible to select the right word forms in certain grammatically complex languages – so-called ‘morphologically rich languages’ – such as German. It does so by analysing the sentence structure and adjacent words in the source language, which leads to a better translation.
The researchers will present their results at the leading international conference ‘Empirical Methods in Natural Language Processing’ (EMNLP 2014) in Qatar.
Researchers are continuously working to make it easier for computer translation systems to find the right translation. In some cases, this is extremely difficult, such as when the target language is more grammatically complex than the source language.
The UvA research team focuses specifically on morphologically rich languages. These languages feature many different word forms per word group. While ‘the man’ is the only word form in its word group in English, for instance, German – the morphologically richer language – has several word forms for the same word group: ‘der Mann’, ‘des Mannes’, ‘dem Mann’ and ‘den Mann’. The correct form depends on the word’s grammatical function in the sentence. When translating a sentence from English to German, it is easy for a human translator who masters both languages to make the right choice. For computer translation systems, however, these kinds of choices are a lot more difficult.
‘The new method developed at the University of Amsterdam uses artificial neural networks, which are models in which a computer imitates the human brain. Whereas previous translation systems generally select the most frequently occurring word forms, the new method chooses the correct word form by analysing the sentence structure in the source language. The neural network is capable of deducing grammatical functions of words on its own, without having an explicit knowledge of grammar,’ says Ke Tran, a member of the research team.
As such, the newly developed method does not rely on handwritten rules for learning functions of words – a limitation faced by many previous methods. As it is, obtaining such handwritten examples can be difficult and costly, especially when it comes to small-scale languages.
In the future, the method will be integrated into a translation system (named Oister) that Christof Monz’s group is developing at the UvA.
The research is being conducted as part of the NWO Vidi project ‘Surface Realization in Statistical Machine Translation’.
Ke Tran, Arianna Bisazza and Christof Monz: ‘Word Translation Prediction for Morphologically Rich Languages with Bilingual Neural Networks.’ Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.