Weitere ähnliche Inhalte Ähnlich wie Vision mobile beyond_siri (20) Mehr von 中文互联网数据研究资讯中心--199it (20) Kürzlich hochgeladen (20) Vision mobile beyond_siri1. 1
2
Beyond Siri: the next frontier in User Interfaces
© VisionMobile 2012. Some rights reserved.
1
2. 1
2
Beyond Siri: the next frontier in User Interfaces
About VisionMobile Contents
VisionMobile is a leading market analysis and strategy firm, 1. Virtual assistants: four generations in 20 years
for all things connected. We offer competitive analysis,
market due diligence, industry maps, executive training and 2. The evolving VA technology landscape
strategy, on topics ranging from the industry's hottest trends
3. The VA Competitive landscape
to under-the-radar market sectors. Our mantra: distilling
market noise into market sense. 4. VA business models: Revenue share rather than
paid app downloads
VisionMobile Ltd.
90 Long Acre, Covent Garden, 5. Leaders and challengers in the VA value chain
London WC2E 9RZ
+44 845 003 8742 6. Beyond Siri: What’s in store in the VA market
www.visionmobile.com/blog
Follow us: @visionmobile Behind this report
Lead researcher: Marlène Sellebråten
About i-Free
Project lead: Michael Vakulenko
i-Free Innovations is specialised in development, testing and Marketing lead: Matos Kapetanakis
implementation of venture projects, advanced technological
designs and innovative products. It has a unique team of Editorial: Andreas Constantinou
experts and IT-specialists, and great experience in
implementation of innovative projects in high-tech industry. Companies interviewed
For more information see :
http://www.i-free.com/en/innovations Artificial Solutions AB
AT&T Labs, Inc.
License Dexetra Software Solutions Private Limited
i-Free Innovations
Licensed under Creative Commons Nuance Communications, Inc.
Attribution 3.0 license. Speaktoit LLC
Any reuse or remixing of the work should SRI International
be attributed to the VisionMobile xbrainsoft
“Beyond Siri: The next frontier in User
Interfaces” report.
Also by VisionMobile
Copyright © VisionMobile 2012 Mobile Industry Atlas | 5th Edition
The complete map of the mobile industry
landscape, mapping 1,700+ companies
Disclaimer across 90+ market sectors.
VisionMobile believes the statements contained in this Available in wallchart and online version
publication to be based upon information that we consider atlas.visionmobile.com
reliable, but we do not represent that it is accurate or
complete, and it should not be relied upon as such. Opinions
expressed are current opinions as of the date appearing on
this publication only, and the information, including the
opinions contained herein, are subject to change without
notice.
Use of this publication by any third party for whatever
purpose should not and does not absolve such third party
from using due diligence in verifying the publication’s
contents. VisionMobile disclaims all implied warranties,
including, without limitation, warranties of merchantability
or fitness for a particular purpose. VisionMobile, its affiliates
and representatives shall have no liability for any direct,
incidental, special, or consequential damages or lost profits,
if any, suffered by any third party as a result of decisions
made, or not made, or actions taken, or not taken, based on
this publication. © VisionMobile 2012. Some rights reserved.
2
v.1.01
3. 1
2
Beyond Siri: the next frontier in User Interfaces
Key Messages
Helped by Apple’s successful launch of its Siri technology in 2011, voice-activated mobile virtual assistants
(VAs) -- applications enabling users to complete tasks like search, dialling or texting via voice commands --
have crossed the chasm into mass-market deployments. Apple’s product triggered a wave of both imitation
and innovation in the last year, including tens of smartphone applications. The most-downloaded
examples on Android and iOS today include Vlingo Virtual Assistant, Iris, Voice Actions, Skyvi, Everfriends
and Dragon Go. In this report, we profile four applications besides Siri: Dragon Go by SR specialist Nuance
Corporation, Everfriends by visualisation-driven i-Free Innovations, iris by AI startup Dexetra and
Speak4it by AT&T Labs.
A shift from commands to dialogue. Following advances in Artificial Intelligence (AI) -- in particular
Natural Language Processing (NLP), user profiling and search – VA technology is moving from
understanding language to anticipating user intent. As such the focus for virtual assistant apps is shifting
from today’s command-and-control (“I ask, you answer”) towards continual dialogues of recommendations
and user actions. Established vendors such as SRI International, Apple, Google and Nuance as well as
challengers like Artificial Solutions, Dexetra and i-Free Innovations, are all working on this shift from
commands to dialogue. SRI International is to showcase a back-and-forth dialogue technology by fall 2012.
VAs are disrupting search. Delivering answers rather than search results, is a core value proposition of
virtual assistants. For traditional search engines, this translates into decreasing page hits and consequently
to a fall in search advertising revenue. Google has seen declining search traffic from iPhones following the
launch of Siri, according to our sources. We expect Google to launch a free Siri alternative across multiple
smartphone platforms, hardwired to Google’s search results and advertising revenue streams.
Virtual assistants as a control point for user targeting. As a point of convergence for user profiling
data, virtual assistants establish a new control point. By amassing deep knowledge of user search terms,
they can become pivotal to third parties wanting to target users by interest.
Business models changing to service distribution deals. Today’s VA business models focus on user
acquisition, and apps are therefore distributed primarily as free downloads. The top 43 VA apps generated
under two million dollars (USD), despite having produced over 133 million cumulative downloads. Over 94
percent of downloads were on Android while nearly 86 percent of paid download revenue were on iOS.
Moving forward, we see revenue coming in from search and advertising and, increasingly, from third-party
deals and avatar customisation, rather than from paid downloads.
Virtual assistants becoming a competitive differentiation for handset makers. Integrating VAs
into the core user interface rather than as just-another-app gives OEMs better control over user experience
and service discovery. Apple got a head start by integrating Siri into the iOS 5 user interface, with other
handset manufacturers following closely. Samsung’s latest smartphone also has an integrated voice UI,
Samsung Voice. And Nokia is readying a Siri-like UI for the late 2012, according to our sources. This new
UI will be taking advantage of Nokia’s Navteq capabilities.
Voice UIs a primary access point to connected screens. Voice-activated user interfaces are set to
become a key component of multimodal UIs that support touch, gesture, or text input. More importantly,
voice UIs can become a universal, cross-screen and screen-agnostic UI, starting with tablets, TVs and
desktop computers. Besides Apple and Samsung, Nuance is well-positioned for speech recognition
deployments on multiple screens.
Telcos get involved. NTT DoCoMo, which pioneered the VA concept with iConcier in 1998 and AT&T,
with a number of VA app launches in the past year are the leading telcos in the VA space. We expect to see
more tier-1 telcos working on the deployment of VAs based on the Rich Communication Suite (RCS)
standard, hitting the market in 2014. Besides running a VA as a service discovery gateway, optimising
network access for VAs may provide operators with additional service differentiation.
© VisionMobile 2012. Some rights reserved.
3
4. 1
2
Beyond Siri: the next frontier in User Interfaces
VA personalities to remain in the cloud benefiting Google and Amazon. Virtual assistant
personalities will move from devices to the cloud due to the immense amounts of data needed to process
with the next-generation VAs. With personalities stored in the cloud, virtual assistants will become readily
and seamless available not only on smartphones, but also on TVs, in cars, and in smart homes. Established
cloud storage and processing companies like Google and Amazon stand to benefit the most.
Google hits it big on free speech recognition API. Google’s free speech recognition API -- and
thereby the Android platform -- is today the platform of choice for a majority of VA apps. More speech
recognition vendors are expected to head towards free API use, as VAs using licensed speech recognition
(SR) engines look into moving to free alternatives.
Patent wars can spill into the VA market. SRI International patents extensively, Nuance holds 2,000
speech recognition patents while AT&T holds 600 patents in the AI space. As virtual assistants become a
competitive asset for handset makers, we can expect patent wars to spill over from mobile handsets into
the VA domain as well.
Strong B2B vendor Nuance rising in the consumer VA market. Nuance’s speech recognition
technology is used by Apple and Google and powers a large number of VA apps available for direct
consumer download -- including two of the VA apps most often downloaded by consumers. Nuance’s
direct-to-consumer apps will help its technology improve, but also put the company in competition with its
own B2B customers.
New opportunities for targeted marketing. Context-based user profiling opens new opportunities
for contextual marketing and advertising, by letting brands push more user-relevant messages, offers and
recommendations. Mobile advertising is today the fastest growing segment within digital advertising and
mobile users’ interest in the format is proven to increase when ad relevance increases.
© VisionMobile 2012. Some rights reserved.
4
5. 1
2
Beyond Siri: the next frontier in User Interfaces
CHAPTER ONE
Virtual assistants: four generations in 20 years
A virtual assistant is a context-aware conversational application and interface for the delegation of tasks
such as search, dialling or texting using natural language. Large companies have in the past decade
deployed web-based VAs to complement traditional customer service agents. The launch of Siri by Apple in
2011 helped virtual assistants cross the chasm into a mass-market technology. But the journey for VAs
began long before Siri; VAs have evolved through three generations in the last 20 years, and are now
entering a fourth one, as shown in the table on the next page.
Virtual assistants were first introduced in the mid-90s by General Magic, a spin-off of Apple’s Paradigm
project, led by Marc Porat. General Magic’s Portico was a network-based virtual office assistant available to
US business users on desktop computers and PDAs. Using keyword-based voice-command and text-to-
speech, Portico could complete voicemail and email administration tasks. Despite retail deals with Sony,
AT&T and Motorola, Portico proved to be a commercial failure.
In 1998, NTT DoCoMO launched iConcier to the Japanese consumer market. This second-generation VA,
available on i-mode-enabled handsets, used artificial intelligence functions such as phrase understanding
for simple command-response dialogues with an avatar. NTT entered into content deals with over 250
third parties, in turn giving paying subscribers access to services ranging from navigation to bus timetables
and coupon deals. Initially available only on NTT’s own media platform, i-mode, iConcier was made
available to third-party Android developers in March of this year.
About a year before Siri hit the market in April 2011, Nokia deployed a voice search function using
Microsoft Tellme. Google had also made search-by-voice possible on Android a full eight months before
Siri hit the market. A few elements set Siri and those third-generation VAs apart from Portico and the
initial iConcier. Firstly, interactions between the VA and the user are more human-like, thanks to Natural
Language Processing (NLP). Humour elements help give the VA more of the feel of a human personality.
Secondly, today’s third-generation VAs perform tasks beyond traditional communications, such as dialing
and texting, as they access third-party content, web search results in particular. They also pull and push
user-specific content, like Facebook or Twitter status updates. The wide uptake of smartphones and
improved mobile connectivity contributed to this evolution.
VA players are now working on fourth-generation virtual assistants that communicate in an even more
human-like way, understand not just language but intent and, ultimately, anticipate user needs. Fourth-
generation VA personalities will live in the cloud, as large amounts of data need to be processed, giving
cloud processing companies like Google and Amazon an upper hand. Fourth-generation VAs will draw
upon advances in NLP, speech recognition, personalisation and search, from companies that include SRI,
AT&T Labs, and Nuance.
“Google and some research labs are capable of building a next generation VA. SRI
would of course love to work with Google”, says Norman Winarsky, VP Ventures at
SRI International and Visiting Scholar at Stanford University, and one of the brains
behind Siri.
SRI international will be showing an implementation of back-and-forth dialogue around fall 2012. This
June, AT&T Labs plans to let third party developers access the APIs of Watson, its artificial intelligence
platform. We understand that Apple is working on a deeper integration of Siri with its core iOS
applications, to be rolled out on iPhones and other screens. I-Free is investing in 3D character
visualisation, Dexetra on making personal history searchable. Nokia is readying a Siri-like UI for the late
2012, according to our sources, taking advantage of Nokia’s Navteq capabilities.
© VisionMobile 2012. Some rights reserved.
5
6. 1
2
Beyond Siri: the next frontier in User Interfaces
Four Generations Of Virtual Assistants
1995-1999 2000-2010 2011 2012-2015
The virtual telephone The virtual concierge The virtual search The new UI: your
assistant assistant lifestyle buddy
Type of VA Reactive, task- Reactive, person- Proactive, lifestyle-
Reactive, program-
centric, device- centric, device- centric, device-
embedded
embedded embedded embedded
Cellular network- Device- and cellular Device- and cloud-
Architecture Mostly cloud-based
based network-based based
Natural language
Text-to-speech Speech recognition Speech recognition
understanding
Technology
Keywords and Back-and forth
Keywords Phrases
phrases dialogue
Simple voice Text-to-speech Multimodal: speech,
Interface Text-to-speech
Commands Speech-to-text text, gesture, touch
US-English
US/UK-English
Languages US English Local language for All
Some local languages
locally developed VAs
Takes messages, Web search,
Delivers third-party Delivers context- and
forwards calls, reads navigation, set alarms
information user-relevant third
Tasks performed e-mail, keeps track of using user data, open
(weather, coupons, party information,
tasks, schedules other apps and
etc.), set alarms recommendations
appointments location data
Desktop computers, Smartphones and Smartphones, tablets,
Screen Feature phones
PDAs tablets computers, TVs, cars
Simple command-
None Limited dialogue Conversation
response
Artificial
Intelligence Keyword Humour, some intent
Phrase understanding Intent anticipation
understanding understanding
All types of third-
Developer APIs None None-some Some
party APIs
User- and context-
User-specific content,
Personalisation None Avatar specific content and
avatar, voice
services, avatar, voice
US market, Asia,
US market US market, Asia Global
Europe
Audience
Business users, B2B,
Business users B2B, consumers B2B, B2C, B2B2C
consumers
Free and paid apps, Third-party content
Paid, subscription- ad/search revenue and services revenue
Business model Paid, usage-based
based share, licensing, share, licensing,
vertical applications vertical applications
Handset & device
manufacturers, SR
Handset
and AI vendors,
Launched by Telecoms operators Telecoms operators manufacturers,
B2B2C, cloud
developers, end-users
companies,
developers
Siri
Dragon Search
NTT's iConcier Voice Actions SRI's next generation
Leading Portico’s Mary (1996)
(2008) Vlingo VPA
examples Wildfire (1995)
SK Telecom's Nate Everfriends Google Glass
Iris
Speaktoit
Source: VisionMobile research
© VisionMobile 2012. Some rights reserved.
6
7. 1
2
Beyond Siri: the next frontier in User Interfaces
CHAPTER TWO
The evolving VA technology landscape
Technologies today and tomorrow
Virtual assistants rely on five technological building blocks: Speech Recognition (SR), Natural Language
Processing (NLP), user profiling, search & recommendations, and avatar visualisation. These technology
blocks are very much in a state of continual evolution, leaving the field open for innovation by large
vendors and start-ups alike.
SPEECH NATURAL LANGUAGE USER PROFILING SEARCH & AVATAR
RECOGNITION PROCESSING RECOMMENDATIONS VISUALISATION
Technology building blocks for virtual assistants
Source: VisionMobile research
Speech recognition
Speech recognition (SR), also referred to as automatic speech recognition (ASR) or speech-to-text (STT),
deals with machine translation of spoken words into text. Text-to-speech (TTS) is also required to translate
text into spoken words. Without it, no dialogue is possible between the user and the virtual assistant.
Voice-activated VAs use speech recognition to carry out tasks such as web search, voice dialling and
dictating text-based messages like email and SMS, or even entire documents.
Key players in the speech recognition space are Nuance, Google, iSpeech and Microsoft.
Outlook. In addition to high demand for US English-speaking VAs, speech recognition technology
vendors are experiencing growing demand for local language support and are working towards speeding
up their language production. One major challenge is the cost associated with language development;
speech recognition support for every new language must be built almost from scratch. Language
interdependency -- the fact that most languages are not self-contained -- adds to the difficulty. US English
is today the language of choice for VAs, as it is the perfect SR engine training ground: The US is a
linguistically homogeneous market, and there is a substantial amount of content and third-party APIs
accessible in either English or in the US.
Natural Language Processing -- Understanding the context “The next technology
While speech recognition translates spoken words into text, Natural leap for the virtual
Language Processing (NLP) turns text into meaning and context personal assistant will
understanding. By understanding the user’s context -- their history, habits,
tastes and location -- the VA can return the most relevant information or
be to maintain a
recommendations, and do it in a socially appropriate fashion. conversation.”
Key players in the Natural Language Processing technology are SRI Norman D. Winarsky, Ph.D.
International, Nuance, AT&T Labs, Google and Artificial Solutions. Vice President, SRI Ventures
SRI International
© VisionMobile 2012. Some rights reserved.
7
8. 1
2
Beyond Siri: the next frontier in User Interfaces
Outlook. In order to make virtual assistants fully conversational, vendors today are working on
technology that enables back-and-forth dialogues and on understanding the rules of social interaction. We
should not forget how breaking these social interaction rules made Microsoft Office assistant Clippit (aka
“Clippy”) unpopular. An intermediate solution is to let the user set rules of interaction on a case-by-case
basis, whereby the user tells the VA their level of availability: open for chat, dialogue, recommendation or
neither of those. VAs also need to learn and act upon the user’s historical data, which requires contextual
training, processing vast amounts of data and substantial server capacity. The cloud is a natural place for
this sort of “Big Data”, but for the foreseeable future, vendors favour a hybrid solution, with some data
stored in the device to allow the virtual assistant to function where connectivity is unavailable.
User profiling
User profiling involves collecting information about a user and using it to model their interests,
preferences, context and goals. User profiling is essential for having a VA able to deliver personalised
information, dialogue and recommendations.
Key players in user profiling technology are SRI International, Google, Apple, AT&T Labs, Artificial
Solutions, and Tobii (Apple).
Outlook. New techniques of user profiling promise to go beyond mere digital content tracking, by
gathering information about mood and emotion through eye-tracking, keyboard tracking, and/or
temperature tracking. Samsung’s latest smartphone, the Galaxy S III, features eye-tracking technology and
Apple, who bought parts of eye-tracking specialist Tobii in 2009, is said to be integrating its technology in
the future.
Search and product recommendation
Combined, NLP and user profiling enable personalised search results and recommendations, such as
advice on content and services. Asking the VA for restaurants can for example lead to different
recommendations based on the user profile and its context: a juice-bar for a jogger in a park, and a
gourmet restaurant for a fine food enthusiast walking in the same park.
Key players. Delivering recommendations is a matter of collaboration within the ecosystem: NLP
vendors, search engines, knowledge and Q&A platforms, content providers, social networks and ad
networks.
Outlook. Recommendations technologies are already in use, most notably by Amazon and Netflix, where
different books or movies are recommended to the user based on prior shopping or viewing history.
Content recommendations via social sharing and social networks such as Facebook, or via websites using
Facebook’s Graph API, are mainstream, and pull more user profiling data than Amazon. The missing piece
is context, which is still work in process.
Avatar visualisation and personalisation
Avatars – graphic, animated representations of a person -- are also used by many VAs. Avatars are a way to
humanize the assistant, with the intent of increasing emotional attachment. Avatar visualisation is a form
of gamification that helps make interaction more fun and engaging.
Outlook. For human-looking avatars, new technologies such as 3D body-scanning and facial recognition
have the potential, when integrated with 3D graphics in devices, to take avatar visualisation to the next
level. Avatars are used by a large number of VAs, but opinions differ as to the potential of monetising
customisation. Selling customisation as an in-app purchase is one opportunity. Another is brand
placement, buying for example a branded sweater for the avatar.
© VisionMobile 2012. Some rights reserved.
8
9. 1
2
Beyond Siri: the next frontier in User Interfaces
CHAPTER THREE
The VA Competitive landscape
Siri is only the tip of the iceberg in what is becoming a very competitive market. Apple’s product triggered a
wave of both imitation and innovation in the last year, including tens of smartphone applications. In this
report, we profile four applications besides Siri: Dragon Go by SR specialist Nuance Corporation,
Everfriends by visualisation-driven i-Free Innovations, iris by AI startup Dexetra, and Speak4it by AT&T
Labs.
We identified 43 representative virtual assistant applications available on Android or iOS. They were
collectively downloaded 133.3 million times. Some 94 percent of these VA downloads were Android apps.
Google’s Voice Search -- a true VA only once Voice Actions has been activated -- single-handedly accounted
for over 81 percent of the 133.3 downloads we looked at. Even without Voice Search, Android apps still
represented 68 percent of the remaining 24.2 million downloads. On the contrary, iPhone and iPad apps
accounted for nearly 90 percent of all paid downloads, and nearly 86 percent of paid VA app revenues.
Such revenues amounted to 1.8 million dollars (USD), overall a meagre amount considering the potential.
The top ten apps accounted for 42 percent of total revenues. Pannous’s Voice Actions alone represented
nearly 36 percent of total revenues, closely followed by QuanticApps’s Voice Assistant and True
Knowledge’s Evi. Besides Pannous’s Voice Actions, AIVC and Speaktoit were the only apps in the top ten to
generate revenue with paid downloads. What sets apart the top ten downloadable VA apps is that they are
to a large extent produced by either speech recognition vendors or start-ups working on AI research.
Google and Nuance produce the most popular SR engines in the top ten, while one app uses iSpeech. Many
VA makers admit to trying out and comparing multiple vendors ahead of a potential switch, quoting
dissatisfaction with speech recognition quality or with service pricing.
Looking at VA tasks and features, we identified a number of common denominators among top-ten apps.
Must-have tasks for a VA are local search & general search, including weather forecasts, voice dialling and
texting, including contact lookup and navigation. Key VA features, implemented more widely among top-
ten apps, were personality, local language, access to third-party and local content. Such features require
extended partnerships with technology vendors or investments in R&D, and local third-party content
partnerships. As speech recognition is still under heavy R&D, app production is often driven within one
company.
Giving VA a personality makes it more human-like. Siri’s sarcastic personality has certainly contributed to
an illusion of human-like features. A number of apps are also using customisable avatars to add personality
to the VA. Another critical differentiator is access to third-party content, typically from Facebook,
YouTube, Spotify, last.fm, or to local content, which is in high demand. Local content and local language
support are two interdependent features. For example, Siri works fine in the US where there are third-
party content deals in place, but is pretty much reduced an entertaining gizmo in other regions, due to lack
of local content.
© VisionMobile 2012. Some rights reserved.
9
10. 1
2
Beyond Siri: the next frontier in User Interfaces
Table: Android the virtual assistants’ platform of choice
Virtual Assistant applications (May 2012)
Downloads (1,000s) Revenue (USD 1,000s)
VA application name Publisher/Developer iOS Android Total iOS Android Total
Voice Search Google Inc 0 109,000 109,000.0 0 0 0.0
Vlingo Corporation
Vlingo Virtual Assistant 5,350 2,860 8,210.0 0 0 0.0
(Nuance Corporation)
iris. (alpha) Dexetra 0 4,400 4,400.0 0 0 0.0
Skyvi BlueTornado 0 1,900 1,900.0 0 0 0.0
Speaktoit Assistant SpeaktoIt 58 1,650 1,708.0 57 0 57.4
AIVC YourApp24 0 1,416 1,416.1 0 58 57.7
Car Home Google Inc 0 1,190 1,190.0 0 0 0.0
Nuance
Dragon Search 1,080 0 1,080.0 0 0 0.0
Communications
Voice Actions/Jeannie Pannous 163 902 1,064.7 600 56 655.4
Everfriends i-Free Innovations 0 731 731.0 0 0 0.0
Evi True Knowledge Ltd 253 235 488.0 250 0 250.5
Andy - Siri for Android 74 Technologies 0 296 295.8 0 20 19.6
Edwin, Speech-to-Speech neureau 0 265 265.0 0 0 0.0
Nuance
Dragon Go 148 117 265.0 0 0 0.0
Communications, Inc
Speak4it AT&T Interactive R&D 231 0 231.0 0 0 0.0
Voice Assistant - Just use
QuanticApps 193 0 193.0 589 0 588.7
your voice
Pocket Blonde* i-Free Innovations 0 184 184.0 0 0 0.0
EVA - Virtual Assistant BulletProof 0 177 177.0 0 80 79.9
Phone Directories
Ziplocal 123 20 143.4 0 0 0.0
Company/ZipLocal
Cluzee Your Personal
Tronton LLC 0 71 70.8 0 0 0.0
Assistant
Voice Control Luka Kama 0 57 56.7 0 12 12.2
Risi Beta kkTeam 0 41 41.1 0 1 0.9
EVAN - Virtual Assistant BulletProof 0 28 28.2 0 10 10.0
Artificial Intelligence
AnSoft 0 24 23.8 0 0 0.0
Dialog
gSoft Technology
Monica 22 0 22.0 0 0 0.0
Solutions
Voice Control without
K&J Software 0 20 20.0 0 0 0.0
internet
netpeople:a iNAGO 0 16 16.0 0 0 0.0
My Virtual Assistant Narada Robotics 12 0 12.0 0 0 0.0
vokul KulTek, LLC 12 0 12.0 36 0 35.9
Vocal Search AppSimo 8 0 8.0 24 0 23.9
Serge Logovision Inc 8 0 8.0 0 0 0.0
Android Voice Xtreme BulletProof 0 8 7.7 0 10 10.0
Juke! Speech Driven
David Cheney Design 0 5 5.1 0 0 0.0
Music Box
© VisionMobile 2012. Some rights reserved.
10
11. 1
2
Beyond Siri: the next frontier in User Interfaces
Virtual Assistant applications (May 2012)
Downloads (1,000s) Revenue (USD 1,000s)
VA application name Publisher/Developer iOS Android Total iOS Android Total
Inclusive Design
Tecla Access 0 5 4.9 0 0 0.0
Research Centre
VoicePOD (Android 2.0
MOBk 0 4 4.4 0 5 4.9
up)
Sprachsteuerung Bytetex 0 4 4.2 0 8 7.7
ScottyKnows nSphere Mobile 0 4 4.1 0 0 0.0
Talk to Eve sparklingapps 1 1 2.1 1 0 0.9
TasksEveryday Virtual
iEverydayApps 2 0 2.0 0 0 0.0
Assistant
Super Voice Assistant McFly Entertainment 2 0 2.0 2 0 2.0
mia powered by netpeople iNAGO 0 2 1.5 0 0 0.0
Voice Answer Sparkling Apps 0 1 1.0 0 4 3.7
Voice Ask sparklingapps 0 0 0.1 0 0 0.3
Source: VisionMobile research
Data Source: Xyologic, Revenue analysis: Vision Mobile Research, based on top 43 VPA apps by download, cumulative
from app launch to May 2012. Revenue is calculated using total estimated cumulative downloads since app launch and
paid download price for the respective apps in May 2012, on the respective platforms.
© VisionMobile 2012. Some rights reserved.
11
12. 1
2
Beyond Siri: the next frontier in User Interfaces
Dragon Go -- Nuance moving into the consumer space
"Nuance is investing in the direct-to-consumer category because we believe learning
about what people like is the fastest way to bring innovations to market."
Matt Revis, Vice President, Mobile and TV, Nuance
Nuance -- whose speech recognition technology powers Siri -- has traditionally derived most of its revenue
from working with verticals. In the past two years, it has nevertheless acquired a number of more directly
consumer-facing competitors, such as Vlingo and SVOX Mobile Voices. It has also launched a number of
consumer-facing applications, such as Dragon Search, Dragon Dictation and Dragon Go, the first fully-
fledged Nuance-branded virtual assistant. Cars, TVs, PC and laptops are other screens Nuance is working
on today. Moving forward, the vendor sees its technology being deployed on any consumer electronics
device, from cameras to microwave ovens. Dragon Go’s content partnerships for music and
recommendations position it as a content distribution platform. Nuance also derives valuable consumer
insights and speech recognition engine training by extending its B2B business to consumer-facing apps. At
the same time, growing its B2C business puts it in direct competition with its customers.
The publisher Nuance Communications, The app Dragon Go
Burlington, MA, USA
Tagline Solutions and technologies Tagline Control your personal universe
that help people work more with no boundaries
intelligently
Main activities Provides speech and Main tasks Web search, navigation, voice-
imaging solutions for performed calling, third-party reservation
businesses and consumers services, reviews, play music
from Spotify and last.fm and
movies from Netflix, social
network updates, social sharing
CEO Paul Ricci Total estimated 148,000 on iPhone
downloads 117,00 on Android
Revenue 1.4 billion USD in 2011 Revenue streams Third-party content and
services
Founders SRI International, of which Platform iPhone, Android
Nuance is a spin-off, in 1992 availability
Funding Public, traded on Nasdaq Regional US only
availability & Main US
download location
Total apps 9 Languages US English
published supported
Website http://www.nuance.com/ Avatar No
Source for app download data: Xyologic, Data: Vision Mobile Research, Nuance Communications
© VisionMobile 2012. Some rights reserved.
12
13. 1
2
Beyond Siri: the next frontier in User Interfaces
Everfriends -- Third-party services and customisation
"Building conversation logic around interactive characters helps engage users in a
deeper activity than simple tasks."
Kirill Petrov, co-founder of i-Free and head of i-Free Innovations.
I-Free Innovations, publisher of VA app Everfriends, is a subsidiary of St Petersburg-based i-Free Ltd.,
which publishes apps and games for smartphones and conducts IT research and ventures. To i-Free,
today’s voice-activated assistants are only a stepping-stone to cloud-based personalised services accessible
from any device, using natural language and interactive avatars that create a deeper relationship between
user and VA. Having a mix of revenue streams, Everfriend is a good case study for VA business models: 87
percent of its revenues come from third-party partnerships, five percent from advertising, and eight
percent from avatar customisation, according to the company. Avatar customisation is a core focus for the
vendor, which expects customisation revenues to grow strongly. I-Free foresees the VA ecosystem evolving
towards open speech and natural language APIs, so as to enable third-party developers to create derivative
services. The deployment of billing capability, enabling third-party transactions, is also key to new and
growing revenue from third-party services.
The publisher i-Free Innovations The app Everfriends
St. Petersburg, Russia
Tagline None Tagline: A new generation of pocket
assistants -- with personality
and a sense of humor!
Main activities Development and Main tasks search, weather forecasts, voice
implementation of performed calls, sms, e-mails, notes, maps,
innovation projects in the alarm and news reminder
mobile and digital space, setting, social network update
B2C and B2B delivery, games and jokes,
music playback, shopping, hotel
booking, encyclopedia lookups.
CEO Vyacheslav Ovchinnikov Total estimated 750,000
downloads
Revenue 200 million USD in 2011 Revenue streams In-App purchases, third-party
services ( 87%), customisation
(8%), advertising (5%)
Founders Kirill Petrov, Kirill Gorynya, Platform Android
Sergey Shulga, in 2010 as a availability
division of i-Free
Funding Kirill Petrov, Kirill Gorynya, Regional 24 countries
Sergey Shulga availability & Main USA
download location
Total apps 28 Languages English and Russian, plans for
published supported more
Website http://www.i-free.com Avatar Yes, with free and paid-for
customisation add-ons
Source for app download data: Xyologic, Data: Vision Mobile Research, i-Free Innovations
© VisionMobile 2012. Some rights reserved.
13
14. 1
2
Beyond Siri: the next frontier in User Interfaces
Iris (alpha) – Largest independent vendor VA app
“Phones should not only respond to you, they should talk to you.”
Narayan Babu, CEO, Dexetra.
Dexetra reportedly created the first version of virtual assistant iris in just a few hours, in order to bring a
Siri-like capability to Android phones. Today, Dexetra derives most of its revenue from advertising, and
sees OEMs and telcos as key partners. Dexetra has already signed a licensing deal with Micromax, allowing
the phone manufacturer to integrate iris’s Indian version, Aisha, into devices. It also plans to co-market the
app with telcos. Dexetra plans to reach sales of 500,000 devices in the first half of 2012. The company is
also working on a private beta focusing on context recognition, and is further developing voice-based ads.
Additionally, the company is refining its NLP engine with Friday, a project in beta, which maps and makes
a user’s search history searchable and is due to be released in June 2012.
The publisher Dexetra Software Solutions The app iris (beta)
Private Limited,
Bangalore, India
Tagline Delivering surprises Tagline ask.listen.
Main activities Building products at the Main tasks Local search, news search, voice
intersection of mobile, cloud performed calling and texting, music and
and machine intelligence video playback, movie reviews
on request, alarm and reminder
setting
CEO Narayan Babu Total estimated 4,400,000
downloads
Revenue Undisclosed Revenue streams In-app purchases, advertising-
based revenue shares, licensing
with OEMs and telcos
Founders Narayan Babu, Nithin John, Platform Android
Eby Chembola, Binil Antony, availability
Yaser Hameed, Aibin
Varghese, in 2010
Funding 200,000 USD from One97 Regional 24 countries
Mobility Fund in 2011 availability & Main USA
download location
Total apps 11 Languages English
published supported
Website http://www.dexetra.com Avatar No
Source for app data: Xyologic, Data: Vision Mobile Research, Dexetra
© VisionMobile 2012. Some rights reserved.
14
15. 1
2
Beyond Siri: the next frontier in User Interfaces
Siri -- The first mass market embedded virtual assistant
“Siri’s release made a huge impact on the virtual assistants market, first of all as the
first mass market case of natural language processing.”
Kirill Petrov, co-founder of i-Free and head of i-Free Innovations.
The launch of Siri increased awareness of voice-activated assistants, but more importantly triggered a wave
of virtual assistant innovation, both in terms of downloadable apps and industrial research. A number of
VAs are trying to piggyback on Apple’s success, with taglines such as “a friend of Siri”, “Siri for Android” or
“Siri-like”. Research published by Parkes Associates in April found that about a third of US iPhone 4S
owners they interviewed use Siri daily, to make phone calls, text or search. Another 87 percent use at least
one Siri feature monthly, predominantly voice dialling, texting and information search.
Siri’s speech recognition is powered by Nuance, and its search results by Wolfram Alpha. What sets Siri
apart from other VAs is that it is fully embedded in the iPhone UI and core applications, including the
contacts and calendar. With a head start on voice-activated interfaces for phones, Apple is said to be taking
the next step, by integrating a voice UI into other devices, starting with tablets and the rumoured Apple
TV. Apple is also readying support for additional languages, a key success factor provided that languages
are associated with local search and, more importantly, local content deals. We also understand that
Apple’s Siri has led to a measurable decline of search traffic to Google, which we believe will prompt
Google to launch a free, cross-platform competitor.
The publisher Apple, The app Siri
Cupertino, CA, USA
Tagline None Tagline What can I do for you?
Main activities Offers mobile Main tasks Search, local search and maps
communication and media performed (only US-English), weather,
devices, personal computing voice dialling, sms and email,
products, portable digital contact lookup, setting
music players, and calendar, reminders and timers,
associated software and playing music (iTunes only),
peripherals. stock market tracking
CEO Tim Cook Total estimated Undisclosed, There are as many
downloads potential Siri users as there are
activated iPhone 4S users
Revenue 108 billion USD fiscal year Revenue streams Siri drives sales of iPhone 4S,
2011 third-party content and services
Founders Steve Jobs, Steve Wozniak, Platform Exclusive to iPhone 4S, not
Ronald G. Wayne, in 1997 availability downloadable
Funding Public, traded on Nasdaq Regional Global (although without
availability & Main specific local content),
download location undisclosed
Total apps 23 Languages English (United States, United
published supported Kingdom, Australia), French
(France), German, Japanese.
Planned languages in 2012:
Chinese, Korean, Italian, and
Spanish
Website http://apple.com Avatar No
Source for app download data: Xyologic, Data: Vision Mobile Research, Apple
© VisionMobile 2012. Some rights reserved.
15
16. 1
2
Beyond Siri: the next frontier in User Interfaces
Speak4it -- Creating the multimodal VA
“There is a huge opportunity combining gesture and voice. Speak4it is today the
only app that allows you to do multimodal understanding.”
Mazin Gilbert, Assistant Vice President of Technical Research, at AT&T Labs
Speak4it is a local search app using both voice and gesture recognition. It was developed by AT&T Labs,
and uses AT&T’s “Watson” artificial intelligence platform. The company sees virtual assistants becoming
the natural interface for all types of devices within three to five years. To realise that vision, AT&T hopes to
attract third-party developers to its AI platform engine. Only available to AT&T’s partners today, AT&T
Watson’s API library is scheduled to open to third-party developers in June 2012. AT&T will first release
libraries for speech-to-text, general and local search, voicemail and SMS. Multimodal capabilities
(combining voice and gesture recognition) will remain exclusive to AT&T’s strategic partners at this point,
though. AT&T sees its intelligent network as a key competitive advantage in the growing VA space. AT&T
Labs’ 600 patents in the AI space certainly give it a competitive edge, along with monetisation potential
from licensing deals.
The publisher AT&T Interactive R&D, The app Speak4it
Dallas, TX, USA
Tagline Leading Invention, Driving Tagline: Mobile local search
Innovation
Main activities Research in communications, Main tasks National and local navigation
computing and networks performed search
CEO Krish Prabhu Total estimated 145,000 on iPhone,
downloads 86,000 on iPad
Revenue Undisclosed Revenue streams Third-party services
Founders AT&T/Bell Labs, Platform iPhone, iPad
in 1996 in its current form. availability
Initially Bell Labs, founded in
1926
Funding AT&T spent 1.1 billion USD Regional Global for download, US-only
on R&D under full year 2011 availability & Main for tasks
download location USA
Total apps 7 Languages US English
published supported
Website http://www.research.att.com/ Avatar No
Source for app download data: Xyologic, Data: Vision Mobile Research, AT&T Labs
© VisionMobile 2012. Some rights reserved.
16
17. 1
2
Beyond Siri: the next frontier in User Interfaces
CHAPTER FOUR
VA business models: Revenue share rather than
paid app downloads
VA business models are only starting to shape themselves, which is only natural for a market just past the
early adopter chasm. Nearly 42 percent of the leading 43 VA apps on the market choose a paid-download
model, compared to 30 percent for top-ten VA apps. Top-ten VA apps were more likely to offer both a free
and a paid version of the same app. They were also more likely to offer in-app purchase: 50 percent do it
with free apps and 10 percent with paid apps.
These revenue numbers are consistent with the fact that many virtual assistant apps are looking at
alternative ways of generating revenue, beyond paid-per-download and subscription models. App
publishers are exploring business models like search and display advertising, third party service
distribution, avatar customisation, or white-label VA licensing. Business models vary depending on
whether the VA publisher owns the underlying technology building blocks or is licensing them.
© VisionMobile 2012. Some rights reserved.
17
18. 1
2
Beyond Siri: the next frontier in User Interfaces
Paid-per-download and Subscription models
Our analysis of the virtual assistant market shows that only about 42 percent of VA apps choose the paid-
per-download model. In the top ten, three VAs use this revenue model accompanied by a free download
version. Pannous, the company behind the Jeannie virtual assistant app, tops the revenue chart with over
USD $655,000 in revenue from paid downloads on iOS. Android generates most VA downloads with most
Android VA apps being free downloads - apart from AIVC, Eva/Evan, Andy and Android Voice Extreme,
which offer free and paid versions. On the contrary, most paid app revenues are generated by virtual
assistants on iOS. There are also fewer VA apps on iOS, with 40 percent of VA apps available on iOS,
against over 77 percent on Android. Siri’s presence is a major competitive obstacle on iOS. Additionally,
Google’s speech recognition engine API is open on Android.
Another paid VA model is used by Japanese telecom operator NTT DoCoMo, who develops iConcier, one of
the first mobile virtual assistants ever. The service, which uses NTT’s proprietary i-mode platform, reached
four million paid customers in its launch year and had by then about 250 third-party deals in place. In
March this year, NTT DoCoMo extended iConcier availability to Android. NTT DoCoMo has also
attempted, without success, to license its virtual assistant platform to other telcos outside of Japan.
© VisionMobile 2012. Some rights reserved.
18
19. 1
2
Beyond Siri: the next frontier in User Interfaces
Outlook. As expected, paid VA apps attract fewer downloads than their free counterparts. We expect the
number of paid VA apps to decrease after Google launches its answer to Siri, and OEMs deploy their
versions of voice UI. The expected growth in revenue derived from third-party revenue share agreements is
also likely to contribute to a decline in the number of paid VAs.
Search & Display Advertising
Advertising is by far the most widespread way for VAs to generate revenue today, be it via ad placement
within apps or via mobile search. Advertising is likely to continue being a primary VA revenue model in the
future, as VA implementations of user profiling and NLP improve targeting precision for ads and therefore
per-ad revenues. Leading mobile ad networks are Adfonic, Admob (Google), Jumptap, inMobi, iAds
(Apple), Mojiva, Millennial Media, and Vserv.
Outlook. Mobile advertising is already an established model for mobile app ad revenue generation.
Mobile advertising is also the fastest growing segment within digital advertising. Today, performance-
based advertising on the Internet (including via mobile) is already generating twice as much revenue as
impression-based ads, with an estimated 20.4 billion dollars (USD) in 2011 for the US alone, according to
research by PwC on behalf of IAB. But at the same time, advertising is a game of scale, and smaller virtual
assistant apps might face a catch 22-situation if they fail to attract users with relevant content: No content
means no users and no users means no ad revenue.
Third party service distribution
Many VAs today, including Siri, i-Free, Nuance, and Speak4it, link to restaurant booking sites, travel
bookings sites or, in Nuance’s case, to music services such as Spotify and last.fm. With virtual assistants
becoming a mainstream form of search, third-party content and service distribution is poised to rapidly
grow, especially for local services. It is a matter of both more third-party APIs opening up and the ability
for VAs to integrate and bill for them. Today, I-Free Innovations’ Everfriends already generates 87 percent
of its revenue via linking to third-party services, with a goal to add more content and services in the future.
It is also a matter of partnering not only with global content, but more crucially with local content, closest
to users’ interests and location.
Outlook. As virtual assistants move from being apps to becoming access points for personalised service
discovery, they will become strategic distribution channels for both mainstream and niche service
providers. The major opportunity lies in using VAs to distribute local content, a digital good where demand
currently outstrips supply. At the same time, profit margins are relatively low on purchase intermediation,
making it necessary for VAs to target either a large user base or a very specialized one. There are also
restrictions in local content and regulation in some countries -- India for example -- which restrict or
forbid outright the billing of third-party services via the telephone bill, a payment model that has proven
successful among mobile users.
Avatar personalisation
Charging users for personalisation of the avatar gender, body, clothes, voice and personality is an
additional revenue stream for virtual assistant apps. This can be monetised through in-app items, items
marking achievements (badges) or product placements in the form of sponsored items, like a branded
jacket. Product placement is an important up-and-coming revenue model in mobile games as well. One of
the profiled apps, Everfriends, lets the user choose between three avatars and a range of clothes.
© VisionMobile 2012. Some rights reserved.
19
20. 1
2
Beyond Siri: the next frontier in User Interfaces
Outlook. Virtual assistant vendors do not all agree on the value of providing avatar personalisation.
Everfriends’s publisher i-Free Innovations already generates eight percent of its revenue from avatar
personalisation, and sees great potential in this revenue model. It is investing in 3D-animated character
generation. Avatar customisation represents a costly investment, but may pay off if it can be amortised
over a large-enough user base, i.e., in the millions of users.
White label licensing
A number of vendors including xBrainSoft and Artificial Solutions offer white-label virtual assistant
solutions for B2B2C. Not only is the model economically viable, it allows for the training of virtual
assistants within specific verticals.
Outlook. As the VA market becomes more consumer-facing, virtual assistant vendors are looking into
making their development platforms available not only to B2B customers, but to third-party developers.
Smaller or newer players developing their own AI technology may also want to consider this option as a
complement to their consumer-facing business, as it brings in revenue, and vertical training possibilities in
specific environments and across diverse languages.
© VisionMobile 2012. Some rights reserved.
20
21. 1
2
Beyond Siri: the next frontier in User Interfaces
CHAPTER FIVE
Leaders and challengers in the VA value chain
Building a virtual assistant is a complex undertaking, in part due to the need to assemble building blocks
across the supply chain. It requires licensing and partnership deals with technology vendors, search
engines, ad networks, third-party service providers, app marketplaces and handset makers. In this chapter
we look at the leaders and challengers across the virtual assistant market. Leading VA apps are driven by
R&D efforts. US-based companies lead the pack, with Russian and Indian companies closing in.
Virtual assistant solutions: Leaders
Apple (Cupertino, CA, USA): Apple took a leap forward when acquiring Siri from SRI International and
integrating it in the iPhone 4S user interface. The Cupertino company is working on the integration of Siri
as a UI beyond the iPhone. In addition, virtual assistant apps sold via the Apple App Store generate the
majority of paid VA app revenue today. See our analysis of Siri in chapter 3.
SRI International (Menlo Park, CA, USA): The founders of Siri continue to drive AI research, focusing
their VA efforts on vertical markets. No less than three new VA implementations are underway. SRI owns a
large number of fundamental AI patents.
Google (Mountain View, CA, USA): Google has a number of apps using speech recognition, of which Voice
Search, via the activation of Voice Actions, tops our ranking. With on-going research into search
technology, speech recognition, augmented reality and translation, not to forget ownership of the fastest
growing mobile platform, Android, the company has a lot to bring to the table. Google is a favoured SR
vendor, as it offers its technology for free. Google is said to be working on a VA of its own, code named
Majel, which could even be combined with the much-talked-about augmented reality glasses, Google Glass.
Nuance (Burlington, MA, USA): With three apps in the top 20 -- Vlingo Virtual Assistant, Dragon Search
and Dragon Go -- Nuance continues to buy and extend its way into speech recognition technology, in
particular Natural Language Processing and speech-to-text. Nuance holds about 2,000 patents, and its
speech recognition engine is used by many VA apps, including six in the top ten.
Microsoft (Redmond, WA, USA): Microsoft’s speech engine Tellme powers the company’s speech-
enabled products and services. It is integrated in WP7 and XBox Kinect, as well as an array of B2B
applications. Microsoft’s platform is open to developers with a licensing model.
AT&T Labs (Dallas, TX, USA): AT&T holds about 600 patents in the AI space, and uses its own AI engine
(“Watson”) for its top-20 virtual assistant app, Speak4it. The company plans to open its API library in
June, 2012, for developers who want to build voice-enabled apps. In the virtual assistant space, AT&T Labs
focuses on multimodal UIs, using speech and gesture recognition.
Virtual assistant solutions: Challengers
Dexetra (Bangalore, India): In the top ten with Iris (Siri written backwards), the company has great
ambitions in the knowledge-enabling VA space, and works also on voice ads. Its core project is Friday, an
app that maps a user’s personal web trails, as well as content stored in handsets, and makes it searchable.
Pannous (Hasloh, Germany): Besides having the top grossing VA app (Voice Actions -- not to be confused
with Google’s assistant function with same name) Pannous offers R&D services in the fields of enterprise
search and artificial intelligence.
Speaktoit (Newark, DE, USA): Besides having a consumer-facing VA app in the top ten, Speaktoit
develops custom VAs for third-parties. Speaktoit founders are from Russia.
© VisionMobile 2012. Some rights reserved.
21