The Conversational Assistant War – part 1

Guillaume Renouard

8 years ago

The competition rages on between the giants of the American Web, who seek to build the most powerful conversational assistant. Product quality aside, the winner could well be the one who knows how to insert its artificial intelligence into a rich and coherent ecosystem.

At the last Google I/O event in Mountain View, company CEO Sundar Pichai caused a sensation by broadcasting a recorded telephone conversation between a hair salon employee and a customer on stage. This simple phone conversation made headlines because the customer was not a real human, but rather an advanced conversational artificial intelligence developed by Google, called Duplex. The assistant, which is able to have a fluid and natural conversation with humans, has a simple objective: to make appointments or reservations over the phone on behalf of its user. The program is still under development.

A few days later, Microsoft, Google’s rival in the virtual assistant market, retaliated at an event in London. The company’s CEO, Satya Nadella, presented the capabilities of its conversational intelligence, Xiaoice, currently being tested in China. The virtual assistant is able to converse both in writing and by phone, and even switch from one to the other. For example, Xiaoice can call a user she was writing to on WeChat, a popular Chinese messaging application. During the demonstration, the CEO said that artificial intelligence had already made a million calls. Earlier this year, Alexa, Amazon’s star digital assistant, exceeded 30,000 different skills in the US market. Facebook has also planned to launch two smart speakers this summer, while Apple released its Homepod last February.

Fierce competition

For the giants of new American technologies, the competition is heating up around conversational agents, digital assistants designed to assist humans in most everyday tasks. This may involve, as Google has shown, scheduling appointments, launching a playlist on Spotify, ordering a taxi, starting a TV program, launching a kitchen timer, or turning off the lights. In the ideal connected home, humans could converse in a fluid and intuitive way with their digital personal assistants, asking them for a whole range of services, which the latter would be responsible for accomplishing by reigning over a coherent ecosystem of connected objects.

The stakes are high. As speech recognition technology improves, the use of voice will become one of the preferred means of accessing information. However, Google shows, access to information is an important source of power and wealth. While no one has won the jackpot just yet, the market is dominated by Amazon and Google, while Microsoft, Apple, Facebook or Samsung, and its assistant Bixby, designed for the smart home, are competing. The Competition is structured around two main axes: the product and its network effect.

Interview with a robot

Amazon’s product advantage with Alexa is that it offers more possibilities than its competitors. While Alexa has passed the 30,000 skills mark in the US market, Google Assistant has just under 2,000, and Cortana, Microsoft’s digital assistant, less than 300.
Sundar Pichai’s demonstration at the last Google I/O caused buzz because no artificial intelligence had so far been able to discuss with humans in such a natural way. In addition, Google’s search engine expertise has enabled it to design an assistant that can effectively answer general questions.

For example, marketing company Stone Temple asked Apple, Microsoft, Google and Amazon virtual assistants 5,000 questions in a study. Google Assistant obtained 91% of good answers, against 87% for Alexa, 82% for Cortana, and 62% for Siri (Apple). However, a survey conducted by ComScore in May 2017 shows that the main use that users make of their digital assistant is precisely to ask general questions, before asking about the weather or launching a playlist.

But all that isn’t enough to make up a natural conversation partner. As impressive as Google’s performance is, Duplex remains a prototype, and no digital assistant is currently capable of sustaining a prolonged conversation with a human. Smart speakers regularly make Internet users laugh by their out-of-context answers. Every year Amazon organizes the Alexa Prize, a contest in which the online retail giant challenges several teams of the world’s best computer science students to design an artificial intelligence capable of converse coherently with a human for twenty minutes. On average, the teams in the running barely manage to reach three minutes. Nevertheless, thanks to massive investments by these companies, technology is progressing rapidly, and the race for the most fluid and natural conversational intelligence possible continues to give rise to fierce competition.

Amazon and Google have also begun to differentiate themselves from their competitors by allowing their respective artificial intelligences to communicate with images, in addition to text and voice. Last year, Amazon launched the Echo Look, a smart speaker with a camera that acts as a smart mirror to help its owner choose his outfit for the day. For its part, Google has been at the forefront of image recognition research for several years and can count on its expertise to develop the capabilities of its assistant. For example, it can capitalize on its ability to identify a musician on an image and propose a list of his compositions to inquiring users.

What about you? What do you think of all the innovations offered by these internet giants? Major American digital companies will continue to compete to offer the most powerful digital assistant on the market by pushing their research around artificial intelligence even further. We will explore the importance of setting up a coherent ecosystem of connected objects in second article on the subject.