Presentations from the AWS Media & Entertainment Seminar on Artificial Intelligence in NYC on August 15, 2017. Attendees spent the afternoon with AWS and a few of our Media and Entertainment customers exploring how M&E organizations can derive higher productivity and new business insight using AI services, platforms, tools and infrastructure on the AWS Cloud.
2. Agenda - AWS AI M&E event – NYC
12:00 - 1:00 PM Check-in, Lunch & Introduction
Ben Masek – AWS BD Lead Media & Entertainment
1:00 - 1:30 PM Overview of AI (AWS perspective)
Introduction of AWS AI services/all up, fits/features and value proposition
David Pearson - AWS BD Lead AI Services
1:30 - 2:15PM Session 1: Media Use case 1
Best Practices for integrating Amazon Rekognition into Your own Media Applications.
Demonstrating practical approaches to enhance your media workflows with Amazon
Rekognition, through common use cases
Scott Malkie – AWS Media & Entertainment Specialist Solutions Architect
2:15 - 2:30 PM Break
2:30 - 3:15 PM Session 2: Media Use case 2
Customer (character) Interaction with Lex and Polly. i.e. How to build a bot
Keith Steward – AWS AI Solutions Architect
3:15 - 4:00 PM Session 3: Media Use case 3
An overview of deep learning and how to build a recommendation engine using
Apache MXNet
Dan Mbanga – AWS BD Lead for Deep Learning Engines
4:00 - 5:00 PM Cocktail reception & mixer with local AWS team
3. AI across the Media Value Chain
Pre-processing &
Optimization
Live & VOD Feature Extraction
Close Caption
Media Supply Chain
Broadcast Playout &
Distribution
Ad personalization &
content recommendation
Audience engagement & show
selection
OTT
Ad personalization &
content recommendation
Content routing optimization
Audience engagement & show selection
DAM & Archive
Tag on Ingest
Metadata Augmentation
Celebrity Detection
Post-Production
Dailies / Editorial Review
Application & Filesystem
Texture & Asset Search
Publishing
Translation Services
Audience Engagement
4. A flywheel for media content & consumers
Better
Decisions
Machine Learning
Deep Learning/ AI
Recommendations
Greenlight ROI Better
Content
Better storyline
Targeted content
More
Viewers Engaged customers
Reduced churn
More DataClick stream
User activity
Purchase history
Profiles
7. Images – Universal, Ubiquitous, & Essential
There are 3,700,000,000 internet users in 2017
1,200,000,000 photos will be taken in 2017 (9% YoY Growth)
Source: InfoTrends Worldwide
8. Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organize millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
Celebrity
Recognition
Image
Moderation
9. Why use Rekognition?
• Object & Scene Detection
Photo-sharing apps can power smart searches
and quickly find cherished memories, such as
weddings, hiking, or sunsets.
• Facial Analysis
Content creators can understand the
demographics and sentiment of audience members.
• Face Comparison
Securely grant access to content or identify
users making important operational decisions.
• Facial Recognition
Archival Footage can leverage face collections
to automate locating persons of interest.
10. Object & Scene Detection
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Flower
Chair
Coffee Table
Living Room
Indoors
Object and scene detection makes it easy for you to add features that search,
filter, and curate large image libraries.
Identify objects and scenes and provide confidence scores
DetectLabels
11. Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Brightness: 25.84
Sharpness: 160
General Attributes
Facial Analysis
Analyze facial characteristics in multiple dimensions
DetectFaces
12. Face Comparison
Measure the likelihood that faces are of the same person
Similarity 93% Similarity 0%
CompareFaces
14. Celebrity Recognition & Image Moderation
Newly released Rekognition features
Detect explicit and suggestive contentRecognize thousands of famous individuals
DetectModerationLabelsRekognizeCelebrities
15. Interfacing with Rekognition
Build, test and deploy for Rekognition using SDKs & API calls
aws rekognition detect-labels –image
“S3Object={Bucket=mybucket,Name=image.jpg}” |
grep -E ‘(Vehicle|Automobile|Car)’ | mail -s “Alert! Car on Property!” me@site.com
RubyiOS PythonAndroid Node.js.NET PHP AWS CLIJavaScriptJava Xamarin
Or use the AWS CLI
17. {
"FaceMatches": [
{"Face": {"BoundingB
"Height":
0.2683333456516266,
"Left":
0.5099999904632568,
"Top":
0.1783333271741867,
"Width":
0.17888888716697693},
"
{
"FaceMatches": [
{"Face": {"BoundingB
"Height":
0.2683333456516266,
"Left":
0.5099999904632568,
"Top":
0.1783333271741867,
"Width":
0.17888888716697693},
"
Rekognition APIs – Overview
Rekognition’s computer vision API operations can be grouped into
Non-storage API operations and Storage-based API operations
CompareFaces
DetectFaces
DetectLabels
DetectModerationLabels
GetCelebrityInfo
RecognizeCelebrities
Non-storage API Operations
CreateCollection
DeleteCollection
DeleteFaces
IndexFaces
ListCollections
SearchFaces
SearchFacesByImage
Storage-based API Operations
ListFaces
18. Collections and Access Patterns
Logging - public events; visitor logs; digital libraries
• One large collection per event/time period
• Enables wide searches
Social Tagging - photo storage and sharing
• One collection per application user
• Enables automated friend tagging
Person Verification - employee gate check
• One collection for each person to be verified
• Enables detection of stolen/shared IDs
20. Encryption & Security
• APIs - Non-storage vs. storage-based
API operations
• Encryption - S3 in-transit and at-rest
w/ HTTPS, KMS
• Tampering - Lock down IAM roles and
policies
• Content - purge or lifecycle to Glacier
w/vault lock
• Least Common Privilege - Lambda,
EC2 and other infrastructure
• Hydration - EBS encryption for boot,
data and snapshotted volumes
Best Practices
21. Interfacing with Rekognition
• S3 input for API calls - max image size of 15MB
• 5MB limit for non-s3 (Base64 encoded) API calls
• Minimum image resolution (x or y) of 80 pixels
• Image data must be in PNG or JPG format
• Max number of faces in a single face collection is 1 million
• The max matching faces the search API returns is 4096
• Use at least 1024 (x or y) px as input – or extract regions
Size of face should occupy ~5%+ of image for detection
• Collections are for faces!
Optimizing your input & requests for best performance
…
Use Amazon CloudWatch to observe & alert off Rekognition Metrics
22. Optimizing your Input Source
Images
• Image enhancement, extraction & stabilization
• Unsharp mask, deconvolution – CPU impact
• e.g. ImageMagick, OpenCV, scikit-image
Video
• Video stabilization - motion / optical flow analysis
• Scene change detection vs. frame extraction
• Offset - PTS vs. seconds and why it matters
• e.g. FFMPEG w/deshake, vidstab, OpenCV
Optimizing your input & requests for best performance
23. Use Cases & Customers
• Digital Asset Management
• Archive Monetization
• Travel and Hospitality
• Influencer Marketing
• Systems Integration
• Digital Advertising
• Consumer Storage
• Law Enforcement
• Public Safety
• eCommerce
• Education
24. Searchable Image Library
Photo Upload Amazon S3 AWS Lambda
Property Search Amazon Elasticsearch
Detect Objects & Scenes
User captures an image
for their property listing
Mobile app uploads
the image to S3
A Lambda function is triggered and
calls Rekognition
Rekognition retrieves the image from S3
and returns labels for the property and
amenities
Lambda pushes the labels and
confidence scores to Elasticsearch
Other users can search properties by
landmarks, category, etc.
Real Estate Property Search
25. Searchable Image Library
• Optimize the client
• Event based, decoupled infra
• Buffering - SQS, SNS, Kinesis
• Rate Control - high volume S3
image ingest
• Dynamo – scale label storage
• Elasticsearch - operational &
performance statistics
• CloudFront - search cache
Real Estate Property Search
Photo Upload Amazon S3 AWS Lambda
Property Search Amazon Elasticsearch
Detect Objects & Scenes
User captures an image
for their property listing
Mobile app uploads
the image to S3
A Lambda function is triggered
and calls Rekognition
Rekognition retrieves the image
from S3 and returns labels for the
property and amenities
Lambda pushes the labels and
confidence scores to
Elasticsearch
Other users can search
properties by landmarks,
category, etc.
1
2
3
AWS
Lambda
Amazon
S3
Ama zon
SQS
AWS
CloudFormation
Amazon
CloudWatch
Amazon
Kinesis
Amazon
CloudFront
Amazon
DynamoDB
Amazon
ElasticSearch
26. Sentiment Analysis
Amazon RedshiftAmazon Quicksight
Live Subject Focus Group Camera Application
Amazon S3
Analyze Faces
Viewers watching
pre-release content
Audience cameras
capture live images of
viewers
A Lambda function is
triggered and calls
Rekognition
Rekognition analyzes the image and
returns facial attributes detected, which
include emotion and demographic detail
Return data is normalized
and staged in S3 en route
to Redshift
Marketing Reports
Periodic ingest of data
into Redshift
Regular analysis to identify
trends in demographic activity
and in-store sentiment over time
Trend reporting for audience measurement
27. Sentiment Analysis
• Reduce API volume - perform
motion detection and frame pre-
processing on the capture device
• Customer photos not stored on
AWS
• S3 - A file content lake to feed
other services – EMR, Lambda
• Resell as a service – API
gateway + Lambda + S3
Trend reporting for audience reactions
Live Subject Focus Groups Camera Application
Amazon S3
Analyze Faces
Viewers watching
pre-release content
Audience cameras
capture live images
of viewers
A Lambda function is
triggered and calls
Rekognition
Rekognition analyzes the image
and returns facial attributes
detected, which include emotion
and demographic detail
Return data is
normalized and
staged in S3 en route
to Redshift
Marketing
Reports Periodic ingest of
data into Redshift
Regular analysis to identify
trends in demographic
activity and in-store
sentiment over time
Amazon Quicksight Amazon Redshift
1
2
3
AWS
Lambda
Amazon
S3
Amazon
SQS
AWS
CloudFormation
Amazon
CloudWatch
Amazon
ElasticSearch
Amazon
Redshift
Amazon
QuickSight
Amazon API
Gateway*
28. • Built in 3 weeks
• Indexed against 99,000 people
• Index created in one day
• Saved ~9,000 hours a year in
manual curation costs
• Live video with frame sampling
Automating Footage Tagging with
Amazon Rekognition
Previously, only about half of all footage was indexed due to the immense
time requirements required by manual processes
30. Rekognition - Summary
• Leverage Amazon internal experience with
AI, ML, and Computer Vision
• Managed API services with embedded AI for
maximum accessibility and simplicity
• Full stack of deep learning image processing
algorithms for specialized applications
• Integrates natively with other AWS Services
• Extensible by design
33. Agenda - AWS AI M&E event – NYC
12:00 - 1:00 PM Check-in & Lunch
1:00 - 1:30 PM Overview of AI (AWS perspective)
Introduction of AWS AI services/all up, fits/features and value proposition
1:30 - 2:15PM Session 1: Media Use case 1
Best Practices for integrating Amazon Rekognition into Your own Media Applications.
Demonstrating practical approaches to enhance your media workflows with Amazon
Rekognition, through common use cases
2:15 - 2:30 PM Break
2:30 - 3:15 PM Session 2: Media Use case 2
Customer (character) Interaction with Lex and Polly. i.e. How to build a bot
3:15 - 4:00 PM Session 3: Media Use case 3
An overview of deep learning and how to build a recommendation engine using Apache
MXNet
4:00 - 5:00 PM Cocktail reception & mixer with local AWS team
35. Gartner quote:
“By 2020, the average person will have more
conversations with bots than with their spouse.”
36. Business Value of Bots -- Fundamentals
• Help Answer Questions
• Perform tasks on behalf of users
• Increase efficiency and/or user experience
• Enable easier access to relevant data (Enterprise
systems are highly complex to use!)
• More informed decision making
• Streamline processes
37. Business Value of Bots -- Specifics
• Help customers use your growing number and complexity of
offerings
• Customer Service Improvements
• Fact: Customers hate dealing with today’s Customer Service
• Fact: Highest cost in the Contact Center is labor
• Aim: Fulfill majority of common customer questions
• Aim: Provide instant recommendations in the context of current conversation
• Aim: Hand off requests requiring human interaction to customer support staff
• Aim: Reduce operational costs
• Aim: Strike balance between automation & human assistance
• Help staff leverage internal data assets, functions, and business
processes in the most efficient manner
38. Converts text
to life-like speech
48 voices 24 languages Low latency,
real time
Fully managed
Amazon Polly: Life-like Speech Service
Voice Quality & Pronunciation
1. Automatic, Accurate Text Processing
2. Intelligible and Easy to Understand
3. Add Semantic Meaning to Text
4. Customized Pronunciation
Articles and Blogs
Training Material
Chatbots (Lex)
Public Announcements
39. Amazon Polly: Common Use Cases
• Internet of Things (smart home, connected devices)
• Education (language learning, training videos)
• Voiced Media (news, blogs, email)
• Voiced Chat Bots (Amazon Lex, Alexa skills)
• Gaming (avatars, Amazon Lumberyard)
#VoiceFirst Movement
40. The Advent of Conversational Interactions
1st gen: Machine-oriented
interactions
41. The Advent of Conversational Interactions
1st gen: Machine-oriented
interactions
2nd gen: Control-oriented
& translated
42. The Advent of Conversational Interactions
1st gen: Machine-oriented
interactions
2nd gen: Control-oriented
& translated
3rd gen:
Intent-oriented
43. Amazon Lex ... for Conversational Interactions
Powered by the same deep learning technology as Alexa
Enterprise SaaS Connectors
Deployment to chat platforms, like Slack, Facebook
Messenger, Twilio SMS
Build Voice and Text Chatbots
Interactions on mobile, web, and devices
44. A Sample Amazon Lex Conversation: Booking a Hotel
Utterances
Spoken or
typed phrases
that invoke
your intent
BookHotel Intents
input
Fulfillment
Slots
nt
45. Amazon Lex Use Cases
Informational Bots
Chatbots for everyday consumer requests
Application Bots
Build powerful interfaces to mobile applications
• News updates
• Weather information
• Game scores ….
• Book tickets
• Order food
• Manage bank accounts ….
Enterprise Productivity Bots
Streamline enterprise work activities and improve efficiencies
• Check sales numbers
• Marketing performance
• Inventory status ….
Internet of Things (IoT) Bots
Enable conversational interfaces for device interactions
• Wearables
• Appliances
• Auto ….
46. “Coffee Bot” – what we will build today
Order Small
Mocha
“I’d like to order a small
Mocha”
Automatic Speech
Recognition
CafeOrderBeverage
Small Mocha
Natural Language
Understanding
Intent/Slot
Model
Utterances
Coffee Bot
Type Mocha
Size Small
Temperature Hot
“Your mocha will be
available soon!”
Polly
Confirmation: “Your mocha
will be available soon!”
a
“You’d like a small
mocha, is that right?
47. AWS Mobile Hub Integration
Authenticate users
Analyze user behavior
Store and share media
Synchronize data
More ….
Track retention
Conversational Bots
AWS Mobile SDKs
AWS Mobile Hub
48. Time to code: Build CoffeeBot
https://s3.us-east-2.amazonaws.com/mast-mast/public/labs/lex-pressobot/readme.html
http://bit.ly/2nG8oIG
• Intents, Slot Types, Utterances
• Lambda
• Mobile Hub integration
• CloudWatch metrics
77. Agenda - AWS AI M&E event – NYC
12:00 - 1:00 PM Check-in & Lunch
1:00 - 1:30 PM
AI and Deep Learning at Amazon
Introduction to AWS AI services for media and entertainment
1:30 - 2:15PM Best Practices for integrating Amazon Rekognition into media applications
2:15 - 2:30 PM Break
2:30 - 3:15 PM Interaction with Lex and Polly. i.e. How to build a bot
3:15 - 4:00 PM
An overview of deep learning and how to build a recommendation
engine using Apache MXNet
4:00 - 5:00 PM Cocktail reception & mixer with local AWS team
87. Amazon AI
Intelligent Services Powered By Deep Learning
Automate manual, effort-intensive processes
Engage audiences, customers, and employees
Optimize product quality and customer experiences
88. Amazon Lex is a service for building
conversational interfaces into any application
• Embedded AI service
• Enables intent-driven interactions
• Supports both voice and text conversations
• Deployable across mobile and messaging platforms
• Integrated speech recognition and natural language
understanding
89. Developer challenges
Conversational interfaces need to combine a large number of
sophisticated algorithms and technologies
Speech
Recognition Language
Understanding
Business Logic
Disparate
Systems
Authentication
Messaging
platforms
Scale Testing
Security
Availability
Mobile
90. When would you like to fly?
Next Sunday
Departure:
6/11/2017
Conversational chatbots
Heathrow, please
Destination:
LHR
I’d like to book a flight to London
Sure! Do you want to fly to Heathrow or Gatwick?
91. Amazon Lex and active audience engagement
• Fans interact with character bots via social media,
mobile and web apps
• Employees use help desk assistants to simplify access
to company information and services
• Authenticated customers speak to customer service
bots to retrieve their current account status
• Executives use voice requests to gain fast insights into
their business data from within enterprise applications
92.
93.
94. Converts text
to life-like speech
48 voices 24 languages Low latency,
real time
Fully managed
Amazon Polly: Life-like Speech Service
Voice Quality & Pronunciation
1. Automatic, Accurate Text Processing
2. Intelligible and Easy to Understand
3. Add Semantic Meaning to Text
4. Customized Pronunciation
95. Amazon Polly media use cases
• Voice-enabled fan interaction. Add voice to text
chatbots for a hands-free option
• Listen to the news. Multi-channel news and blog
article experience for users
• Synthetic character voices. Synchronize speech
with on-screen content and customize pronunciation
• Executive assistant. Spoken results of information
requests from internal users
96. Whisper Voice and Speech Marks
Whisper SSML effect: <amazon:effect name="whispered">
Synchronize Speech for an Enhanced Visual Experience
• Request an additional stream of metadata about
sentence word timings
• Use the metadata stream alongside the synthesized
speech audio stream to sync audio and visual
97. Amazon Rekognition
Extract rich metadata from visual content
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
Celebrity
Recognition
Image
Moderation
98. Celebrity recognition and image moderation
Newly released Rekognition features
Detect explicit and suggestive contentRecognize thousands of famous individuals
DetectModerationLabelsRecognizeCelebrities
99. Object and scene detection
Object and scene detection makes it easy for you to add features that search,
filter, and curate large image libraries.
Identify objects and scenes and provide confidence scores
DetectLabels
Flower
Arrangement
Chair
Coffee Table
Living Room Indoors
Furniture
Cushion
Vase
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Patio
104. • Built in 3 weeks
• Indexed against 99,000 people
• Index created in one day
• Saved ~9,000 hours a year in
manual curation costs
• Live video with frame sampling
Automating footage tagging with
Amazon Rekognition
Previously, only about half of all footage was indexed due to the
immense time requirements required by manual processes
105. Amazon Rekognition media use cases
• Automated metadata. Classification of video and still
images – objects, talent, sentiment, demographics
• Audience sentiment tracking. Touchless, time-aligned
capture of audience reactions during a video screening
• Dynamic second screen. OTT users seamlessly
access details of actors appearing on screen in real-time
• Content rating. Identify suggestive and explicit content
to improve alignment with rating standards
106. Playout &
Distribution
Filtering & Quality Control
Visual Effects & Editing
Application & Filesystem
Texture & Asset Search
Analytics
Sentiment Analysis
Other Amazon AI Services
(Lex, Polly ++)
DAM & Archive
Auto-categorization
Metadata Augmentation
Digital Supply Chain
Tag on Ingest
Live and VOD Feature Extraction
Celebrity Detection
Publishing
Value Add
API-based services
OTT
Filtering &
Quality Control
Acquisition
Pre-processing &
optimization
Use Cases Across Media Segments
107. Amazon AI
Intelligent Services Powered By Deep Learning
https://aws.amazon.com/blogs/ai/
https://aws.amazon.com/amazon-ai/
Very first AWS AI event focused entirely for M&E customers around media workflows.
AI has been used in M&E for many years, especially in areas such as content recommendation (companies such as Netflix), as well as metadata augmentation with image recognition, audio content recognition (for stock footage and content indexing). Over past year, we have seen increased interest amongst our customers and we are seeing many new use cases, most of which are designed to either 1) enhance viewer’s media consumption experience, 2) automate production processes, and 3) optimize delivery of media content.
You will never take less photos than the year before
This presents a massive challenge for businesses and consumers who want to extract contextual information from their photos
This is also a challenge for customers moving digital archives to the cloud
- outcome, api, architecture, why
use the similarity score to verify a user against a reference photo in near real time
use the similarity score to verify a user against a reference photo in near real time
The API operations in this group do not persist any information on the server. You provide input images, the API performs the analysis, and returns results, but nothing is saved on the server.
- Logging – one collection per event. A corporate summer party hires photographers, who take many photos and upload them into a single collection, making it easy for attendees to find images of themselves by selecting a reference image
- Social Tagging – one collection is created for each application user, containing each user’s social media friends
- https://www.flickr.com/photos/31349545@N00/4898256954/
Cognito authentication, api gateway, reducing size of image before shipping over the wire
SNS, SQS, Kinesis streams to decouple and rate control the Reko API
3 tiers that we can focus on for this solution to design for scale, capacity and reliability
3 tiers that we can focus on for this solution to design for scale, capacity and reliability
remove store locations => screening rooms or audience analysis
Built in 3 weeks- started at reinvent when saw the announcement
Previously, only about half of all footage was indexed due to the immense time requirements required by manual processes.
Now 100%C-SPAN, an acronym for Cable Satellite Public Affairs Network, is an American public TV network. C-SPAN uses AWS Rekognition to identify who is on camera at what time for each of C-SPAN’s eight networks, so that recorded video streams can be indexed and searched. Using AWS, C-SPAN indexed 97,000 people from the C-SPAN database in one day and it can sample a frame every six seconds for recognition against indexed faces.
Integration with Mobile Hub
Deployment to Chat services
Versioning
- outcome, api, architecture, why
use the similarity score to verify a user against a reference photo in near real time
Built in 3 weeks- started at reinvent when saw the announcement
Previously, only about half of all footage was indexed due to the immense time requirements required by manual processes.
Now 100%C-SPAN, an acronym for Cable Satellite Public Affairs Network, is an American public TV network. C-SPAN uses AWS Rekognition to identify who is on camera at what time for each of C-SPAN’s eight networks, so that recorded video streams can be indexed and searched. Using AWS, C-SPAN indexed 97,000 people from the C-SPAN database in one day and it can sample a frame every six seconds for recognition against indexed faces.