Artificial Intelligence

Google Duplex Plants Flag on the Moon

I couldn’t miss that. At the Google IO conference about a month ago, Google Duplex did quite the demonstration. They presented a machine that calls your hair salon to book you an appointment. Some were amazed, others skeptical, still others, afraid. What was I to take away from this show of strength?


The Holy Grail when working on artificial intelligence is to pass the Turing test. Even if the modalities of this test are very precisely defined and are very different from the framework of the demonstration presented, for an artificial intelligence to pass this test means making a human unable to differentiate the artificial intelligence from another human.

Google presented this spectacular demonstration to the general public. It was a kind of flag on the moon of artificial intelligence, which brings to mind the recent AlphaGo game win against Lee Sedol just 1 year ago.

The performance level of artificial intelligence technologies used by Google

But what does this demonstration really show about the performance level of artificial intelligence technologies used by Google? Axios wonders if the merchants the AI is speaking with in the demonstration (a hairdresser and an employee of a restaurant) are really merchants, and whether they are really ignorant of the experiment. It’s like in a magic show: how do I know if my neighbor whose been chosen to go on stage, isn’t an accomplice of the magician? There is indeed no tangible evidence, but in the end, it doesn’t matter very much. No matter the case, these conversations have been chosen for the demonstration, so they are by no means real evidence of performance. It would be easy, even for an artificial intelligence of very poor quality, to select 2 examples among thousands that give interesting results. In other words, if the demonstration were entirely scripted, it would look exactly like the one we saw. To have an objective measurement, even qualitative, of the level of performance, it would have been necessary to have a demonstration performed in real time with interlocutors entirely ignorant of the experiment.

Technological feats

But let us consider that Google is presenting us with representative examples of the performance of its assistant. What technological feats would this highlight?

In my opinion, we must first distinguish the three major families of technologies that make it possible to give life to such an assistant.

Speech-to-text and text-to-speech technologies

On the one hand we have speech-to-text and text-to-speech technologies. These are the ones that generally produce the most spectacular results and are the most visible, because they’re present in our daily lives. During this particular demonstration, Google Duplex impresses with a very natural intonation, with a very human ‘uhuh‘, and understanding of poor quality audio conversations with people with very varied accents. If the conversations exhibited are indeed representative of the majority of real cases that the assistant deals with, it is indeed impressive compared to what exists today, especially when it comes to understanding varied accents.

“Google Duplex impresses with a very natural intonation ” @NicolasMarlierTweet:

Technologies for understanding intent and interpreting metadata

On the other hand, we have technologies for understanding intent and interpreting metadata. In other words, the part that makes it so that “the machine understands the meaning of what is said to it“. This is a subject that we know well at Julie Desk, and with a large volume of data (largely available to Google), an algorithm can conceivably give satisfactory results. For this technological element, the problem lies in passing from an algorithm which is wrong in 10% of the cases, to an algorithm which is wrong in 5%, 1% or even 0.1% of the cases. This is an issue we work on at Julie Desk and one of the reasons why human supervision comes into play. When it comes to this, this single demonstration cannot really be that impressive. Instead, the rate of quantitative understanding would be, during multiple occurrences of the same test.

The “central brain” of the assistant

Finally, we have the decision engine technology or the “central brain” of the assistant. This is the technology that makes it possible to generate relevant action in response to a demand. For example, the fact that the assistant answers “9 people” if asked “How many?” as part of a restaurant reservation. When it comes to this aspect, the variety and relevance of the actions taken by the assistant in the demonstration translates a good performance decision engine. The real problem with such a decision engine is to react well in the most open context possible. If we restrict ourselves to making appointments, for example, the subject is already rather complex, but the nature of requests and actions in this context is countable and understandable. On the other hand, if you want the assistant to respond appropriately in an open context, it becomes extremely difficult.

Overall, the demonstration is important and can reflect a real technological advance on the part of Google’s teams, but without any guarantees. Indeed, it is certain that within these families of technologies, the greatest difficulty is not to go from 50% performance to 80% performance, but from 80% to 90%, and beyond. In other words, there is a huge gap between this demonstration and a usable, reliable system.

“There is a huge gap between this demonstration and a usable, reliable system.” @NicolasMarlierTweet:

To conclude, this demonstration is the proposed vision of a fully digitalized world, where the virtual world can interact in the real world. It is interesting to note that at the beginning of the demonstration, the presenter explains that 60% of small companies in the United States are not equipped with an online reservation system. In the vision of a fully digitalized world, it is indeed necessary that a request from the virtual world can book one of these restaurants, hairdressers or others. I share the vision of a digitalized world, which opens up countless opportunities, but I am convinced that 95% of these companies will be equipped with online reservation systems long before an artificial intelligence assistant can reliably conduct complex conversations over the phone.

Between intrigue, fear and wonder the launch of Google Duplex the launch of Google Duplex has attracted much attention! Are you going to test this technology and it will be release? Tell-us in comment! 

Nicolas MarlierNicolas Marlier is one of the co-founders of Julie Desk, and also the CTO.

You can find his interview here or follow him on Twitter or LinkedIn.


meet julie desk - infographics