Voice recognition in the cloud, along with the Artificial Intelligence (AI) to interpret and respond intelligently (which is a very different task to traditional Voice to text) is highly disruptive. The last major disruption we saw in the mobile market, was Apple’s introduction of the iPhone. That was disruptive because it changed the dominant skillset from Radio Frequency (RF) expertise (the preserve of the previous cadre of suppliers – Nokia, Ericsson and Motorola) to User Experience. The iPhone won customer’s hearts because of its ease of use and what you could do with it. It can be argued that that change is what allowed Samsung to rise to its number one position in the handset market. The incumbents (Nokia, Ericsson and Motorola) were too arrogant to copy; Samsung wasn’t, and the rest is history.
In the same way that user experience changed the game in Apple’s favour, the AI behind Voice recognition is poised to change the game again. This time, the companies which will succeed are those with the cloud AI expertise. Amazon has made the running, leveraging its Amazon Web Services experience. Google is well placed to challenge, helped by its acquisition of Deep Mind, who are already showing their capabilities with Google’s Neural Machine Translation. Microsoft’s recent acquisition of Maluuba shows that it intends to be one of the key players. However, this puts physical product companies like Apple and Samsung at a distinct disadvantage. Even with Siri and Viv, without the AI expertise to make the Internet Of Voice (IoV) compelling, they could quickly slip from market leaders to low margin followers.
Although Amazon, Google and Microsoft have played, or are playing with mobile hardware, this is not a hardware play – it’s an AI play, where the company which can acquire the most Voice data (i.e. users) will be best placed to win. It’s remarkably cheap and easy for any manufacturer to incorporate the basics in their product, as all the local hardware has to do is to recognise a key word or phrase – “Alexa” in Amazon’s case, “OK Google” in Google’s, at which point it then streams the Voice signal to the cloud and gives ownership of the user to Amazon or Google. It’s why the keyword is so important – it becomes the brand, rather than the device which provides the route through to the cloud.
“This is not a hardware play – it’s an AI play, where the company which can acquire the most voice data will be best placed to win”
This is where Amazon has a clear advantage. Alexa is sufficiently divorced from the Amazon name that other brands are happy to use it – something which Amazon is actively encouraging, both through Alexa Services, which let hardware vendors build it into their products, and Alexa Skills, which allows applications to use their AI. At CES this year Alexa was generally considered to be the star of the show, even though Amazon weren’t present.
More and more companies are jumping on the bandwagon. Ford lets you talk to your car, Huawei and LG let you talk to your phone and fridge; ADT lets you talk to your burglar alarm “Alexa, can you tell the burglars they’re naughty people”, while Brinks Array have it in their door lock, so your burglars can ask to be let out whilst telling all of your other Voice activated goodies that they’re about to get new owners.
Some applications will be trivial and die, but with the proliferation of things to talk to and a growing range of Alexa skills to provide answers, everything is in position for users to change the way they interact with the internet. A further advantage is that “Alexa” does not have the implicit brand baggage of “OK Google”, making Amazon a more attractive partner for many who don’t want to water down their own brand.
Ironically, Google tells the story better. Google’s Gill is clear when he makes the point that “using speech in this way means the interface almost disappears”. Speech is so second nature, that as long as the AI and applications respond correctly, then talking to the internet becomes natural. We will interact with it in the same way we do with friends. (Although anyone interested in how much we have already lost that conversational skill should read Sherry Turkle’s “Reclaiming Conversation”.)
This is why the IoV has the potential to be so disruptive. The history of computing and telephony has always involved touch – tapping a keyboard or holding a phone. The Internet of Voice removes that constraint – we just converse via a microphone which may exist on any number of household products. Futuresource Consulting reckon that 6.3 million Voice assistants were shipped in 2016; Amazon admit that they had difficulty meeting demand for Echos in the run-up to Christmas. If we can believe CES, this is the year when we’ll start talking to (or through) tens of millions of devices.
“As long as the AI and applications respond correctly, then talking to the internet becomes natural.”
Once we start talking rather than touching or tapping, it won’t take long to lose our connection with our smartphones. We’ll still need them for connectivity, but without a need to touch them to initiate a question, they may quickly become less relevant to our lives. Apple’s decision to remove the 3.5mm jack is inadvertently driving this transition even faster, as it encourages manufacturers to put more functionality into their wireless headsets and earbuds.
Part of that functionality will be smart microphones which can listen for the key phrases. Knowles – a manufacturer of miniature microphones for phones and hearables have already launched their VoiceIQ, a low power, always listening, Voice detector which connects to a voice Digital Signal Processor (DSP) for key phrase detection. Within the next year I expect to see these functions condensing onto a single chip and appearing as standard in most hearables. Makers have already demonstrated Echo functionality with Raspberry Pi’s and a $9 microcontroller board. For any device with a slightly better than minimum spec microcontroller and an internet connection, that’s just some additional code.
The IoV should be good news for a range of other largely unseen companies with expertise in analytics and voice processing. Lesser known audio processing specialists like Alango, who have been putting Voice enhancement algorithms into cars for years, are looking at how they can leverage their Intellectual Property (IP) in this new market. Other key enablers are also making their move, as demonstrated by ARM’s recent introduction of their Audio Analytic’s based ai3 Artificial Audio Intelligence platform for cortex chips and Mindmeld’s announcement of a deep-domain conversational AI platform.
All of this takes functionality and user ownership away from the smartphone. So how quickly will we fall out of love with them? That’s difficult to predict. A recent survey of users about their favourite phone features suggests it may be sooner rather than later. The top three applications were GPS directions, messaging and setting alarms without the need to touch your phone. The growth of hearable devices can take care of directions and alarms. That leaves messaging, and it will be interesting to see what Voice does to that.
Incorporating Voice into its product may prove to be Facebook’s biggest challenge yet. If someone else gets Voice right for social media it could be the chink in the armour which ends Facebook’s dominance and consigns it to becoming the next MySpace. Voice may also enable new services which attract our attention. We’re already seeing them emerging on Alexa Skills. I particularly like the Earplay skill on Alexa, which lets you take part in telling a story, heralding a new level of user interaction.
“If someone else gets Voice right for social media it could be the chink in the armour which ends Facebook’s dominance.”
It’s clear that the battle between Google and Amazon is ramping up. Amazon is not just engaging with developers to integrate Alexa and develop skills, but has set up a $100 million Alexa Fund to invest in companies that want to innovate with Voice. Google has launched its Assistant and Now and potentially has the better analytics engine, but need to get users talking to it to build up its response AI database. Microsoft is keeping its powder dry, whilst Apple with Siri and Samsung with Viv are increasingly looking like hardware vendors whose Voice roadmap doesn’t go much beyond Voice to text.
There is little to suggest they will be contenders. Phone vendors also have a difficult choice – do they support IoV applications like Alexa and Now, which direct the Voice questions to a competitor, or do they try to block them in favour of their own options. If they block them, they risk alienating the consumer, speeding up the point at which users defect from that phone.
Amazon has an interesting advantage in terms of monetisation, as they get a cut of revenue when users place an order using Alexa. The investment firm Mizuho reckons that could be as much as $7 billion by 2020. In contrast, Google’s revenue model focuses on advertising and it’s not clear how that can be mapped onto Voice. But they have the cash to ignore revenue in the short term and buy customers while they improve their user experience. They should also have the better infrastructure to support smart home devices, leveraging the work they’ve already done with Nest and Thread. Despite that, and recently signing up NVIDIA, who have smart home aspirations, Alexa seems to be making the running.
Bye Bye Smartphone, Welcome Internet Of Voice
Although the battle is between Amazon and Google, it will not stop others trying to define their own niche in the Internet of Voice. Oakley’s Radar Pace is one example – a spectacle in all senses of the word. It uses AI as your personal trainer, allowing you to ask questions like: “OK Radar, what’s today’s workout?”, OK Radar, what’s my power?”, but presumably not, “OK Radar, do I look like a dick when I wear this?”
There are times when you realise companies have been seduced by the promise of too much technology. As Saint Exupery said in his biography “Wind, Sand and Stars”, “Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away” – a maxim which many of today’s manufacturers should try to understand. It’s why the smartphone has nowhere else to go – the next step in simplicity is the Internet of Voice. Amazon’s Echo shows they have taken that maxim to heart.
So, “Alexa – I think I want to divorce my smartphone, as I don’t love it anymore. Will you marry me? I want your babies.”
This is Part 2 of a 2 part series. In part 1, Nick Hunn explains how The Internet Of Things will become ‘The Internet Of Voice’.