Human-to-human electronic communication has moved from text (email) to voice (VoIP) to augmented video (Zoom/Skype). Similarly, the medium for human-to-machine conversation has moved from text (chatbots) to voice, with voice-enabled chatbots in wide use today. The next step in this evolution is a video-enabled conversational experience. Each medium change brings its own technical challenges. Creating a good voice experience involves more than just hooking up a chatbot to a text-to-speech and speech-to-text service. Vocinity has developed a platform for voice-enabled chatbots that has been in production for almost 2 years. We're updating our platform to support a multimedia experience where the bot communicates via video, voice and text messages and images. Using Rasa to provide the conversational logic for the immersive multimedia bot enables us to meet the challenges in voice/video communication. Rasa’s power and flexibility enabled us to extend it to support voice and video.
Presented by CTO of Vocinity, Nathan Stratton at the 2021 Rasa Summit https://rasa.com/summit/
Exploring the Future Potential of AI-Enabled Smartphone Processors
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa Summit
1. SPEAKER
INTRODUCTION
NATHAN STRATTON IS A PASSIONATE TECHNOLOGIST WITH A
REPUTATION FOR “DOING THE IMPOSSIBLE.” A HIGH SCHOOL
DROPOUT, HIS INTELLECT AND PASSION HAVE MADE HIM A 5-
TIME STARTUP LEADER. HE STARTED HIS FIRST COMPANY IN HIS
JUNIOR YEAR IN HIGH SCHOOL AND SHORTLY THEREAFTER
HELPED LAUNCH 26 CLECS. THROUGHOUT HIS CAREER, NATHAN
HAS BEEN ON THE LEADING EDGE AT THE INTERSECTION OF
VOICE, VIDEO AND THE INTERNET INCLUDING:
2. SPEAKER INTRODUCTION (CONTINUED)
The first video telephone company
The first to directly interconnect via SIP with two major national US providers
The first VoIP provider to expand local calling not just to the US, but to 28 countries
The first to virtualize BroadSoft’s BroadWorks telephony server
The first VoIP provider to allow users to bring your own device (BYOD)
The first 2 line carrier-less phone with VoIP and PSTN lines
The first to use a generic x86 PC boxes as a core route server for internet traffic
The first to support over 120 different SIP user agents on a single network
The first VoIP provider to launch a Wi-Fi phone
NATHAN IS CURRENTLY THE FOUNDER AND CTO OF VOCINITY, WHICH HE STARTED JUST ABOUT THREE
YEARS AGO. VOCINITY IS A VOICE AND VIDEO CUSTOMER ENGAGEMENT PLATFORM. THE PLATFORM
PROVIDES VOICE AND VIDEO CONVERSATION AGENTS, SUPPORTED BY RASA TECHNOLOGY, TO ADDRESS
CUSTOMER NEEDS IN RETAIL, HEALTHCARE, FINANCIAL SERVICES AND HOSPITALITY.
3. THE NOW GENERATION CUSTOMER
EXPERIENCE
WHY VOICE AND VIDEO
ARE INCREASINGLY
IMPORTANT FOR
CUSTOMER EXPERIENCE?
Nathan StrattonFounder, CTO
5. WHAT MARKETERS
WANT NOW!
More Engaging Experiences
Improved ROI (Print & Digital)
Personalized Journeys
Customer Loyalty
Higher Conversion Rates
6. BUT THE REALITY IS
VERY DIFFERENT
Personnel Costs Are Soaring
Omnichannel Experiences Out of Sync
Chatbots Still Require Typing (especially on
smartphones)
Contactless Is The New Norm
Budgets Are Shrinking
7. 5 to 7 times faster than typing
Hands-free interactions
More natural conversations
VOICE HAS ADVANTAGES OVER TEXT-BASED BOTS
A PICTURE IS
WORTH A
THOUSAND
WORDS
BUT VOICE BOTS ALSO HAVE CHALLENGES TOO
Real-time interactions require “engagement”
Users quickly lose focus(lots of distractions)
The average person remembers a fraction of what is said to
them
Technically it is harder to remain in context and filter out
“nonsense”
8. LINKEDIN, FACEBOOK, AMAZON,
AND OTHERS ALL NOW HAVE LIVE
STREAMING OPTIONS.
TIKTOK, INSTAGRAM, YOUTUBE
HAVE DEMONSTRATED THE VALUE
AND IMPORTANCE OF VIDEO.
9. THE POWER OF VIDEO IS UNDENIABLE
82% prefer live video from a brand to social posts.
Enjoyment of video increased sales intent by 97% and brand association by 139%.
Video on a landing page can increase conversion by 80% or more.
After watching a video, 64% of users are more likely to buy a product online.
80% of audiences would rather watch live video from a brand than read a blog.
Video in an email leads to a 200-300% increase in click-through rate.
39% of executives call a vendor after viewing a video.
59% of executives would rather watch a video than reading text.
73% of B2B businesses using live video report positive results to their ROI
10. BUT REAL-TIME VIDEO INTERACTIONS
ARE HARD!
Requires broad
domain expertise
across SIP, NLU,
NLP, Video,
Broadcasting,
WebRTC, etc.
Need to create and
manage microsites,
content and video
engagement
experiences
WebRTC only
recently became a
standard across all
platforms and
browsers
“Digital Humans” are
still evolving (GPUs,
FPS, Continuous
Viewing)
11. WHAT IS A
VIDEO
ENGAGEMENT
PLATFORM?
MULTILINGUAL
VIDEO AVATAR
MULTIMEDIA
GATEWAY
WebRTC
SIP
gRPC
Broadcast Services
MICROSITE
PROXY SERVER
HTML/CSS
Template
Interactive Media
VOICE CORE
PLATFORM
ASR
NLP/NLU
TTS
Agent Builder
12. AGENT BUILDER
Overlay images and videos
based on time or dialogue
duration
OVERLAY IMAGES &
VIDEOS
Much more complicated than
just intents and responses
BUILT FROM
SCRATCH
Transfers, pauses, use API data,
send SMS/text, pause, DTMF,
etc.
MULTIPLE ACTION
TYPES
No code approach, On-the-fly building, compiling, containerizing and deploying
13. Websites Kiosks & Digital
Signage
OMNICHANNEL DEPLOYMENTS
Mobile Applications
QR Codes
15. INTERACTIVE RICH MEDIA
EXPERIENCES
On-demand videos or text messages
Escalation, in context, to live rep (if needed)
Share links (SMS/Text)
Voice over images for additional context
17. WHY A VIDEO ENGAGEMENT
PLATFORM?
25-200% reduction in cost (versus employee or contractor)
Always available, always ready. Never gets sick. (7x24x365 Coverage)
Escalation, in context, to Live Agent (when needed)
Diminished CoVID liability
Digital twin of your best and brightest. And gets smarter overtime.
19. TECHNOLOGY
APPROACH
BENEFITS
Artificial intelligence
technology maintains a
natural conversation
Cloud technology lowers
customer acquisition and
operations costs
Scalable and secure offering
across any digital medium
Machine learning algorithms
support narrow-scope
conversations
Faster than typing and texting
Hands-free, eyes-free
Better, more natural experience
Near human recognition levels
V-BOT TM