Siri. Alexa. Google. Voice computing is emerging as the next wave of “no ui” in the post-smartphone world. What’s the current context for this paradigm shift? What’s around the corner in the next 3-5 years? How will this change the way writers and UX people work?
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Master’s voice: the rise of voice assistants
1. M A S T E R ’ S V O I C E : T H E
R I S E O F V O I C E
A S S I S T A N T S
U X L I V E
@ D A N C H A R V E Y # V O I C E C O M P
2. L A N G U A G E A S A N
I N T E R F A C E : T H E C O M M A N D
L I N E
@ D A N C H A R V E Y # V O I C E C O M P
3. L A N G U A G E A S A N
I N T E R F A C E : C H A T B O T S
E V E R Y W H E R E
@ D A N C H A R V E Y # V O I C E C O M P
4. • Voice kinda works
• Hardware isn’t quite so hard
• GAFA really wants voice to the the next big thing
W H Y V O I C E & W H Y N O W ?
@ D A N C H A R V E Y # V O I C E C O M P
5. @ D A N C H A R V E Y # V O I C E C O M P
T H E D R E A M V S T H E R E A L I T Y
6. @ D A N C H A R V E Y # V O I C E C O M P
2 0 1 1
/ S I R I
K E Y P R O D U C T L A U N C H E S
7. @ D A N C H A R V E Y # V O I C E C O M P
2 0 1 4
/ A L E X A
K E Y P R O D U C T L A U N C H E S
8. @ D A N C H A R V E Y # V O I C E C O M P
2 0 1 6
/ G O O G L E H O M E
K E Y P R O D U C T L A U N C H E S
9. @ D A N C H A R V E Y # V O I C E C O M P
2 0 1 6
/ A I R P O D S
K E Y P R O D U C T L A U N C H E S
10. @ D A N C H A R V E Y # V O I C E C O M P
2 0 1 7
/ P I X E L 2 F A M I L Y
K E Y P R O D U C T L A U N C H E S
11. @ D A N C H A R V E Y # V O I C E C O M P
W E L I V E I N T H E F U T U R E
12. • 7% of U.S. households currently have a smart speaker
with an embedded voice assistant
• 10x growth in just three years or a 81% compound
annual growth rate
• 63% of US Population know of Amazon Alexa or
Google Home
N O W
@ D A N C H A R V E Y # V O I C E C O M P
D A T A B Y G A R T N E R
13. • Smart speaker sales expected to reach 12m units
• 68% expected to run on Amazon’s Alexa platform
• 20% expected to run on Google Assistant
N E A R ( Q 4 2 0 1 7 )
@ D A N C H A R V E Y # V O I C E C O M P
D A T A B Y G A R T N E R
14. • 75% of US households will have smart speakers
• 20% of households will have two devices and 5% will
have three
• 94.2 million US households will have 138.2 million
smart speaker devices
• 258 million U.S. consumers will have access to smart
speakers
N E X T ( 2 0 2 0 P R E D I C T I O N S )
@ D A N C H A R V E Y # V O I C E C O M P
D A T A B Y G A R T N E R
15. – M I C H A E L W O L F , C E O A T A C T I V A T E
“Smart speakers will have the fastest growth curve
in gadget history — but it won’t last.”
@ D A N C H A R V E Y # V O I C E C O M P
I T ’ S T H E S O F T W A R E S T U P I D
16. • Siri: 41.4m MAU in the U.S. 15% decline yoy.
Engagement also dropped from 21% to 11%
• Alexa: 325% increase in MAU (from 0.8m to 2.6m),
Engagement increased from 10% to 22%
• Cortana: 350% increase in MAU (0.2m to 0.7m),
Engagement tripled from 19% to 60%
U S A G E D A T A ( O N M O B I L E )
@ D A N C H A R V E Y # V O I C E C O M P
D A T A B Y V E R T O A N A L Y T I C S
17. • Smartphone use in decline, speaker use on the rise
• On mobile peak usage is from 10am to 1pm and 2pm
to 7pm
• 54% of usage is by women
• Older age groups dominate — 45-54 & 55+
U S A G E D A T A ( O N M O B I L E )
@ D A N C H A R V E Y # V O I C E C O M P
D A T A B Y V E R T O A N A L Y T I C S
18. 1. Music (14.2%)
2. Alarms (12.6%)
3. Weather (12.2%)
4. Number lookup (9.4%)
5. “Fun” questions (9.1%)
6. Voicemail (8.1%)
7. Headlines (7.3%)
8. Calendar lookup (7.3%)
9. Traffic (7.0%)
10. Song lookup (6.6%)
C O M M O N U S E C A S E S ( M O B I L E )
@ D A N C H A R V E Y # V O I C E C O M P
D A T A B Y E M A R K E T E R
19. 1. Questions (60%)
2. Weather (57%)
3. Music (54%)
4. Alarms (41%)
5. Reminders (39%)
6. Calendar (27%)
7. Home automation (27%)
8. News (22%)
9. Find local biz (16%)
10. Games (14%)
11. Order products (11%)
12. Order food/services (8%)
C O M M O N U S E C A S E S
( S P E A K E R S )
@ D A N C H A R V E Y # V O I C E C O M P
D A T A B Y C O M S C O R E
20. @ D A N C H A R V E Y # V O I C E C O M P
T H E M O S T C O M M O N U S E C A S E O F
A L L
21. • Voice only kinda works
• “The Alexa Generation”
• Troll fast, troll furious
• Privacy
• Sexism
C O M M O N C O N C E R N S
@ D A N C H A R V E Y # V O I C E C O M P
22. @ D A N C H A R V E Y # V O I C E C O M P
V O I C E O N L Y K I N D A W O R K S
23. @ D A N C H A R V E Y # V O I C E C O M P
G R O W I N G U P W I T H A L E X A
W I L L K I D S G R O W U P
T O B E L A Z Y J E R K S ?
24. @ D A N C H A R V E Y # V O I C E C O M P
T R O L L V E R T I S I N G & I T S
B A C K L A S H
M A D E W I T H 1 0 0 % M E D I U M -
S I Z E D C H I L D & C Y A N I D E
25. @ D A N C H A R V E Y # V O I C E C O M P
T R O L L I N G F O R L O L S
26. @ D A N C H A R V E Y # V O I C E C O M P
T R O L L I N G F O R L O L S
27. @ D A N C H A R V E Y # V O I C E C O M P
P R I V A C Y
A L W A Y S - O N L I S T E N I N G =
C O R P O R A T E
S U R V E I L L A N C E
28. V O I C E A S S I S T A N T S
A R E S E X I S T A F
@ D A N C H A R V E Y # V O I C E C O M P
S E X I S M
29. – I V Y R O S S , H E A D O F U S E R E X P E R I E N C E F O R
G O O G L E ’ S H A R D W A R E P R O D U C T S
“In the last year we’ve had the opportunity to marry
software and industrial design, to have my team
and [Matias’s] team together. He and I really talk
about the future and how these two things need to
dance.”
@ D A N C H A R V E Y # V O I C E C O M P
D E S I G N I N G F O R V O I C E
30. @ D A N C H A R V E Y # V O I C E C O M P
Y O U R B R A N D V O I C E
M A T T E R S M O R E T H A N
E V E R
D E S I G N I N G F O R V O I C E
31. @ D A N C H A R V E Y # V O I C E C O M P
D E S I G N I N G F O R V O I C E
“When the conversation is the interface,
experience design is all about crafting the right
words.”
– J O H N P A V L U S , F I L M M A K E R / W R I T E R & M U L T I M E D I A
C O N S U L T A N T
32. T H A N K S !
U X L I V E
@ D A N C H A R V E Y # V O I C E C O M P
Hinweis der Redaktion
Thanks, Luke. A few things before I start:
I’ve actually left Zone recently. Now I’m head of product design at The Dots. We’re a London-based startup that Forbes has called a “LinkedIn for Creatives.” So if you’re want to connect with other designers or find your dream job then please check us out at the-dots.com.
This is a sequel to a talk I first gave last year at Interact London about chatbots. Today I’ll be talking about their siblings, voice assistants. While I was at Zone we got our hands dirty with some work with various voice platforms with clients in financial services, retail, etc. I’m hear to share some of that learning with you.
I lived in New York for 20 years. So I hope you give zero fucks that I’m probably going to say fuck a fucking lot. Cool? Alright. Let’s go. We’ve got some drinking to do…
Command line interfaces were the primary mode of human-computer interaction from the 1960s to the 1980s. It’s all about language. If you know the right SYNTAX/GRAMMAR then you have a very precise and powerful way to control programs or operating systems. Obviously the command line is still alive and well for more advanced users but for the rest of us Graphical User Interfaces popularized with Apple’s Lisa in 1983 quickly became the norm. Let’s jump over that though and talk about emerging conversational interfaces for a second.
Conversational interfaces are quickly emerging as a new paradigm. Like the command line before it language is the interface. Once you know the right SYNTAX/GRAMMAR you can do a lot of precise and powerful things. But unlike the command line you don’t have to be a computer science whizkid. It’s a much more natural language dynamic. We’re going to be talking to everything that’s connected to the internet sooner or later. Chatbots were one expression of that and talk about them was everywhere last year when they were “the next big thing.” Boon Yew Chew already gave a great talk about that so I’ll move on to this year’s “next big thing”: embodied voice assistants.
Since 2012, improvements in voice recognition and natural language processing means has dropped error rates for these tasks have gone from 33% to under 5%.
Smartphone supply chain means that making a box with microphones, a fast-enough CPU and a wireless chip is much easier.
Google Apple Facebook and Amazon have more money & data than God and they have the motive and opportunity to experiment with voice as mobile winds down.
The problem is engineers & technologists are the worst at figuring out whether or not people actually WANT the stuff they’re making. There’s a difference between the art of the possible and the art of the desirable. That’s why designers and marketers come in. (And admittedly we like new shiny things too and are easily distractible.)
Siri was born out of a DARPA-funded project at SRI International’s Artificial Intelligence Center,
It received praise for its voice recognition and contextual knowledge of user information, including calendar appointments
but was criticized for requiring stiff user commands and having a lack of flexibility. It was also criticized for lacking information on certain nearby places, and for its inability to understand certain English accents.
Tech pundits had a mixed reaction to the launch of Echo back in 2014:
You have to remember that was after the Amazon Fire phone debacle so pundits were prone to being baffled by Amazon’s hardware efforts OR likely to slag it.
Some obviously saw the potential of it as a control hub for the smart home
Once people started using it however, the quality of Alexa’s voice alone made it the new standard in voice assistants. Poor Siri was left in the dust.
David Pierce of Wired compared Google Home to Amazon Echo
”Sometimes Home feels like sci-fi magic. Sometimes it reaches beyond its grasp and falls flat. The Echo is less impressive, but more reliable".
He praised the aesthetics saying “it feels minimalist, thoughtful, and warm in the environment.”
Pierce also raved about its speaker, saying it was "richer, brighter and more dynamic than the Echo, and loud enough to fill a room".
EAR COMPUTERS!! OMG! WHO DOESN’T WANT EAR COMPUTERS.MG Siegler rightly gushed about the AirPods in part because they ushered in a new era for Siri. Hands-free always with you interaction with Apple’s voice assistant had an immediate potential value to commuters as just one persona/use-case.
Josh Topolsky: “The stuff Google showed off on October 4 was brazenly designed and strangely, invitingly touchable. These gadgets were soft, colorful… delightful? They looked human, but like something future humans had made; people who'd gotten righteously drunk with aliens.”
According to Gartner…
The research firm eMarketer estimates that 60.5 million people in the U.S.—a little less than a fifth of the population—will use a digital assistant at least once a month this year, and about 36 million will do so on a speaker-based device like Amazon Echo or Google Home.
Market researcher and tech consultant, Michael Wolf says that smart speakers such as Amazon’s Echo will have the fastest growth curve in gadget history–but that they’ll peak around this time, and services such as Alexa will be built into gizmos of all sorts thereafter, obviating the need for dedicated devices. You’ve already seen that “Alexa everywhere” integration in the halls of CES this year. Single-use devices with long refresh cycles always see their sales plummet in time. Think of dedicated GPS, e-readers sales etc. Voice computing IS and will continue to be more about the software and the digital service than the hardware and the physical product.
In fact, the demographics of an assistant “superuser,” someone who spends twice the amount of time with personal assistants on a monthly basis than average – is a 52-year old woman, spending 1.5 hours per month with assistant apps.
Instead of doing anything heavy duty, smart speaker owners are mainly using them for basic tasks like getting a weather report or playing some music, according to a recent study from ComScore
“Fuck you, Siri” and similar variants is one of the most common commands people utter. On mobile poor affordances often mistakenly trigger voice assistants. It happens less on smart speakers because the triggers are more direct but it still happens. Poor results also drive people off the deep end.
Beneditct Evans: “The trap that some voice UIs fall into is that you pretend the users are talking to HAL 9000 when actually, you've just built a better IVR, and have no idea how to get from the IVR to HAL.”
Rachel Metz journalist for MIT technology review pondered: “It’s a little worrisome. Leaving aside the privacy implications of kids telling an Internet-connected computer all kinds of things, we don’t know much about how this kind of interaction with artificial intelligence and automation will affect how children behave and what they think about computers. Will they become lazy because it’s so easy to ask Alexa and its peers to do and buy things? Or jerks because many of these interactions compel you to order the technology around? (Or both?)”
“The Whopper is a burger, consisting of a flame-grilled patty made with 100% medium-sized child with no preservatives or fillers, topped with sliced tomatoes, onions, lettuce, cyanide, pickles, ketchup, and mayonnaise, served on a sesame-seed bun.”
And of course that ad and the news items that inspired it were fertile ground for Trey Parker and Matt Stone of South Park fame…
https://www.theatlantic.com/technology/archive/2016/05/the-privacy-problem-with-digital-assistants/483950/
Request gets sent to server farm > gets paired with Device ID > audio gets stored > gets paired with an assistant identifier = a portrait about YOU that is waiting to be purloined by hackers, abused by law enforcement and govt officials, etc. Or just used by the company in question to try to sell you shit you don’t need.
Assistants aren’t quite smart enough to do all the necessary thinking on their own but that likely won’t always be the case. As handheld devices are stuffed with more and more powerful processors, the prospect of an artificial-intelligence assistant that lives entirely on your phone is coming within reach. Engineers at MIT announced a chip developed expressly for artificial intelligence earlier this year that would enable A.I. to live entirely on your phone. IBM just tested AI processes that can run entirely in memory and not even need processors.
Voice assistants reinforce gender stereotypes. And even in the rare instances when a male voice is offered it’s often for sexist reasons — Siri as Daniel in the UK was based on a concern that male apple customers wouldn’t trust directions from a female voice.
With the rise of smart speakers and other connected products assistants are increasingly “embodied.” As a result it’s imperative for best experiences for hardware and software teams to work together on these efforts. That means industrial designers and interaction designers, hardware engineers and software programmers rolling up sleeves and mucking in together and putting aside petty differences. This is especially true in the short and near-term where we still need the dedicated devices to get people comfortable with these new behaviours. It also means as these teams come together it’s just as important that they’re not just professionally diverse but also gender-diverse, ethnically-diverse, etc. Dialects, accents, etc. continue to be a stumbling block in voice experiences and it’s because these teams reflect their silicon valley biases.
The requirements of designing a successful voice experience are different than building websites or delivering apps. Figuring out the personality of the brand is key. Is your brand voice funny, smart, or authoritative? How is the assistant going to behave when a someone asks an unrelated question or isn’t able to clearly communicate his/ her issue? Those are questions that we’ve always asked when creating branded experiences, but now they take real prominence.
Writer John Pavlus recently said, “When the conversation is the interace, experience design is all about crafting the right words.” That’s exactly why tech companies are hiring writers with acting and improv backgrounds as their designers. Google hired writers from Pixas and The Onion to work on it’s Assistant.
Navigating stories and dialogue are tricky businesses. Fortunately, there’s an emerging sense of best practices: For example, avoiding rhetorical questions and gendered pronouns are both examples of advice offered by design leads. They also encourage building in “kill switches” to give users control. In their case, telling Siri or Google to “shut up” causes the Assistant to retreat from the current conversation. Would be grand if we could all just be a bit more polite about it.