For generations, we have dreamed of machines with artificial intelligence with which we can have real conversations but, despite amazing technological advances, such devices seem some way off. Now researchers at Cambridge are changing the picture, by remodelling the essence of spoken dialogue systems.—Professor Steve Young
Following the death of Steve Jobs, one of many videos which started to circulate widely on the internet showed the Apple Co-founder at a watershed moment, launching the very first Macintosh in 1984. After demonstrating the machine’s facility for word processing, design and even animation, the climax came when Macintosh literally announced itself to the world, talking to an amazed audience with synthetic speech before handing back to Jobs and announcing that it was going to "sit back and listen". A beaming Jobs received a five-minute ovation.
How far we seem to have travelled. Modern smartphones are pocket computers that talk to us using speech recognition software, and owners of an Apple iPhone 4S can ask their device about the weather, or tell it to text a friend. Unlike the early Macintosh, this is no slick gimmick using pre-programmed speech on a floppy disk. Machines can listen to us, interpret our words, and respond.
Yet in a sense we have also come less distance than we hoped. An historian of science might argue that the self-aware illusion of intelligent speech that Jobs created back in 1984 met with euphoria because of a vision that is more science fiction than fact. Computing pioneers in the mid-to-late 20th century imagined conversations with far more sophisticated artificial intelligence in the future. They dreamed less of the iPhone 4S, more of HAL from 2001: A Space Odyssey.
This type of interface remains a distant prospect. Siri, the speech recognition software used in the iPhone, is a system we talk to, but not one with which we converse. Achieving that remains a complex mathematical challenge and usually throws up new problems with every breakthrough achieved. In this demanding field, researchers at Cambridge have traditionally been leaders. Today, the University’s Dialogue Systems Group, in the Department of Engineering, are making more advances than most.
"Siri is a sort of personal assistant," Professor Steve Young, who leads the group, said. "If you ask it a question, it comes back with an answer, but after that you more or less have to start again. We want to develop systems with which you can have a proper conversation."
Such devices are likely to become more necessary over time. The amount of information on the internet is rapidly growing and, before long, it will take more than question-answer interfaces to cut through it. We need systems that are attuned to our needs - in short, we need computers that discuss things.
Young’s group, along with an international team of collaborators, are developing one such spoken dialogue system (or SDS), in a European Union (EU) project called PARLANCE. As with some of their earlier work, this is a project which involves statistically modelling a system that talks to humans and learns as it goes. Fundamentally, the idea is not dissimilar to teaching a child new vocabulary, and the shifting set of ideas the words may represent.
Made marketable, PARLANCE would be far more three-dimensional than current systems. Where an existing SDS can, for instance, help house-hunters find properties for sale in a given town, PARLANCE would be able to process a request for a three-bedroomed house, with two bathrooms, near a good school and within walking distance of the local supermarket. Users would be able to ask it for one of these attributes, then add more to refine their results.
Creating this, however, requires a reconception of how such systems work. A ’cognitive’ SDS like PARLANCE has to be able to model uncertainty, or cope with the fact that humans rarely mean exactly what they say. No current SDS is able to handle this, because their modelling is too simple. In existing systems, speech is converted into data, then given to a ’dialogue manager’, which tests the data’s assorted attributes against an internal database of pre-programmed information, looking for what it thinks is an appropriate response.