myscience.org › science wire › Essays in English yield information about other languages

Essays in English yield information about other languages

23 July 2014

An artistic rendition of a language similarity tree based on the researchers&rsq — An artistic rendition of a language similarity tree based on the researchers’ new algorithm. "The striking thing about this tree is that our system inferred it without having seen a single word in any of these languages," Yevgeni Berzak says.

Computer scientists at MIT and Israel's Technion have discovered an unexpected source of information about the world's languages: the habits of native speakers of those languages when writing in English. The work could enable computers chewing through relatively accessible documents to approximate data that might take trained linguists months in the field to collect. But that data could in turn lead to better computational tools. "These [linguistic] features that our system is learning are of course, on one hand, of nice theoretical interest for linguists," says Boris Katz, a principal research scientist at MIT's Computer Science and Artificial Intelligence Laboratory and one of the leaders of the new work. "But on the other, they're beginning to be used more and more often in applications. Everybody's very interested in building computational tools for world languages, but in order to build them, you need these features. So we may be able to do much more than just learn linguistic features.