AWS AI Media & Entertainment Seminar - NYC, August 15, 2017

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AI for Media & Entertainment
AI Media Customer Event NY, Aug 15 2017

Agenda - AWS AI M&E event – NYC
12:00 - 1:00 PM Check-in, Lunch & Introduction
Ben Masek – AWS BD Lead Media & Entertainment
1:00 - 1:30 PM Overview of AI (AWS perspective)
Introduction of AWS AI services/all up, fits/features and value proposition
David Pearson - AWS BD Lead AI Services
1:30 - 2:15PM Session 1: Media Use case 1
Best Practices for integrating Amazon Rekognition into Your own Media Applications.
Demonstrating practical approaches to enhance your media workflows with Amazon
Rekognition, through common use cases
Scott Malkie – AWS Media & Entertainment Specialist Solutions Architect
2:15 - 2:30 PM Break
2:30 - 3:15 PM Session 2: Media Use case 2
Customer (character) Interaction with Lex and Polly. i.e. How to build a bot
Keith Steward – AWS AI Solutions Architect
An overview of deep learning and how to build a recommendation engine using
Apache MXNet
Dan Mbanga – AWS BD Lead for Deep Learning Engines
4:00 - 5:00 PM Cocktail reception & mixer with local AWS team

AI across the Media Value Chain
Pre-processing &
Optimization
Live & VOD Feature Extraction
Close Caption
Media Supply Chain
Broadcast Playout &
Distribution
Ad personalization &
content recommendation
Audience engagement & show
selection
OTT
Ad personalization &
content recommendation
Content routing optimization
Audience engagement & show selection
DAM & Archive
Tag on Ingest
Metadata Augmentation
Celebrity Detection
Post-Production
Dailies / Editorial Review
Application & Filesystem
Texture & Asset Search
Publishing
Translation Services
Audience Engagement

A flywheel for media content & consumers
Better
Decisions
Machine Learning
Deep Learning/ AI
Recommendations
Greenlight ROI Better
Content
Better storyline
Targeted content
More
Viewers Engaged customers
Reduced churn
More DataClick stream
User activity
Purchase history
Profiles

Scott Malkie || malkie@amazon.com
15 AUG 2017
Amazon Rekognition Best Practices
Integrating Deep Learning-based Image Analysis into your own
Applications

Images – Universal, Ubiquitous, & Essential
There are 3,700,000,000 internet users in 2017
1,200,000,000 photos will be taken in 2017 (9% YoY Growth)
Source: InfoTrends Worldwide

Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organize millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
Celebrity
Recognition
Image
Moderation

Why use Rekognition?
• Object & Scene Detection
Photo-sharing apps can power smart searches
and quickly find cherished memories, such as
weddings, hiking, or sunsets.
• Facial Analysis
Content creators can understand the
demographics and sentiment of audience members.
• Face Comparison
Securely grant access to content or identify
users making important operational decisions.
• Facial Recognition
Archival Footage can leverage face collections
to automate locating persons of interest.

Object & Scene Detection
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Flower
Chair
Coffee Table
Living Room
Indoors
Object and scene detection makes it easy for you to add features that search,
filter, and curate large image libraries.
Identify objects and scenes and provide confidence scores
DetectLabels

Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Brightness: 25.84
Sharpness: 160
General Attributes
Facial Analysis
Analyze facial characteristics in multiple dimensions
DetectFaces

Face Comparison
Measure the likelihood that faces are of the same person
Similarity 93% Similarity 0%
CompareFaces

Facial Recognition
Find similar faces in a large collection of images
Search
Index
Collection
SearchFacesByImage

Celebrity Recognition & Image Moderation
Newly released Rekognition features
Detect explicit and suggestive contentRecognize thousands of famous individuals
DetectModerationLabelsRekognizeCelebrities

Interfacing with Rekognition
Build, test and deploy for Rekognition using SDKs & API calls
aws rekognition detect-labels –image
“S3Object={Bucket=mybucket,Name=image.jpg}” |
grep -E ‘(Vehicle|Automobile|Car)’ | mail -s “Alert! Car on Property!” me@site.com
RubyiOS PythonAndroid Node.js.NET PHP AWS CLIJavaScriptJava Xamarin
Or use the AWS CLI

Integrating & Extending Rekognition
3rd Party Software
• AWS AI AMI
• OpenCV
• ImageMagick
• FFMPEG
• YOLO, CCV, …
AWS Services
• Amazon S3
• AWS Lambda
• AWS SQS, SNS
• AWS Batch
• Amazon EC2
AWS Partners
• Media Intelligence
• Asset Management
• Image Hosting
• Image Processing
• Brand Management

{
"FaceMatches": [
{"Face": {"BoundingB
"Height":
0.2683333456516266,
"Left":
0.5099999904632568,
"Top":
0.1783333271741867,
"Width":
0.17888888716697693},
"
{
"FaceMatches": [
"Height":
0.2683333456516266,
"Left":
0.5099999904632568,
"Top":
0.1783333271741867,
"Width":
0.17888888716697693},
"
Rekognition APIs – Overview
Rekognition’s computer vision API operations can be grouped into
Non-storage API operations and Storage-based API operations
CompareFaces
DetectFaces
DetectLabels
DetectModerationLabels
GetCelebrityInfo
RecognizeCelebrities
Non-storage API Operations
CreateCollection
DeleteCollection
DeleteFaces
IndexFaces
ListCollections
SearchFaces
SearchFacesByImage
Storage-based API Operations
ListFaces

Collections and Access Patterns
Logging - public events; visitor logs; digital libraries
• One large collection per event/time period
• Enables wide searches
Social Tagging - photo storage and sharing
• One collection per application user
• Enables automated friend tagging
Person Verification - employee gate check
• One collection for each person to be verified
• Enables detection of stolen/shared IDs

{
"FaceMatches": [
"Height":
0.2683333456516266,
"Left":
0.5099999904632568,
"Top":
0.1783333271741867,
"Width":
0.17888888716697693},
"
CreateCollection
DeleteCollection
DeleteFaces
IndexFaces
ListCollections
SearchFaces
SearchFacesByImage
ListFaces
{
"FaceMatches": [
"Height":
0.2683333456516266,
"Left":
0.5099999904632568,
"Top":
0.1783333271741867,
"Width":
0.17888888716697693},
"
CompareFaces
DetectFaces
DetectLabels
DetectModerationLabels
GetCelebrityInfo
RecognizeCelebrities
Rekognition APIs – Advanced Usage
Decision trees and processing pipelines
Why?
• Many use cases require more than a single
operation to arrive at actionable data
How?
• S3 event notifications, Lambda, Step Functions
• Dynamo for persistent pipeline storage
• Augmenting results with 3rd Party AI/ML
• OpenCV, MXNet, etc. on EC2 Spot, ECS, AI/ML AMI
Sample Use Cases
• Person of interest near a celebrity
• Multi-pass motion detection enhancement
• Subjects leaving a location without possessions
https://aws.amazon.com/amazon-ai/amis

Encryption & Security
• APIs - Non-storage vs. storage-based
API operations
• Encryption - S3 in-transit and at-rest
w/ HTTPS, KMS
• Tampering - Lock down IAM roles and
policies
• Content - purge or lifecycle to Glacier
w/vault lock
• Least Common Privilege - Lambda,
EC2 and other infrastructure
• Hydration - EBS encryption for boot,
data and snapshotted volumes
Best Practices

Interfacing with Rekognition
• S3 input for API calls - max image size of 15MB
• 5MB limit for non-s3 (Base64 encoded) API calls
• Minimum image resolution (x or y) of 80 pixels
• Image data must be in PNG or JPG format
• Max number of faces in a single face collection is 1 million
• The max matching faces the search API returns is 4096
• Use at least 1024 (x or y) px as input – or extract regions
Size of face should occupy ~5%+ of image for detection
• Collections are for faces!
Optimizing your input & requests for best performance
…
Use Amazon CloudWatch to observe & alert off Rekognition Metrics

Optimizing your Input Source
Images
• Image enhancement, extraction & stabilization
• Unsharp mask, deconvolution – CPU impact
• e.g. ImageMagick, OpenCV, scikit-image
Video
• Video stabilization - motion / optical flow analysis
• Scene change detection vs. frame extraction
• Offset - PTS vs. seconds and why it matters
• e.g. FFMPEG w/deshake, vidstab, OpenCV
Optimizing your input & requests for best performance

Use Cases & Customers
• Digital Asset Management
• Archive Monetization
• Travel and Hospitality
• Influencer Marketing
• Systems Integration
• Digital Advertising
• Consumer Storage
• Law Enforcement
• Public Safety
• eCommerce
• Education

Searchable Image Library
Photo Upload Amazon S3 AWS Lambda
Property Search Amazon Elasticsearch
Detect Objects & Scenes
User captures an image
for their property listing
Mobile app uploads
the image to S3
A Lambda function is triggered and
calls Rekognition
Rekognition retrieves the image from S3
and returns labels for the property and
amenities
Lambda pushes the labels and
confidence scores to Elasticsearch
Other users can search properties by
landmarks, category, etc.
Real Estate Property Search

Searchable Image Library
• Optimize the client
• Event based, decoupled infra
• Buffering - SQS, SNS, Kinesis
• Rate Control - high volume S3
image ingest
• Dynamo – scale label storage
• Elasticsearch - operational &
performance statistics
• CloudFront - search cache
Real Estate Property Search
Photo Upload Amazon S3 AWS Lambda
Property Search Amazon Elasticsearch
Detect Objects & Scenes
User captures an image
for their property listing
Mobile app uploads
the image to S3
A Lambda function is triggered
and calls Rekognition
Rekognition retrieves the image
from S3 and returns labels for the
property and amenities
Lambda pushes the labels and
confidence scores to
Elasticsearch
Other users can search
properties by landmarks,
category, etc.
1
2
3
AWS
Lambda
Amazon
S3
Ama zon
SQS
AWS
CloudFormation
Amazon
CloudWatch
Amazon
Kinesis
Amazon
CloudFront
Amazon
DynamoDB
Amazon
ElasticSearch

Sentiment Analysis
Amazon RedshiftAmazon Quicksight
Live Subject Focus Group Camera Application
Amazon S3
Analyze Faces
Viewers watching
pre-release content
Audience cameras
capture live images of
viewers
A Lambda function is
triggered and calls
Rekognition
Rekognition analyzes the image and
returns facial attributes detected, which
include emotion and demographic detail
Return data is normalized
and staged in S3 en route
to Redshift
Marketing Reports
Periodic ingest of data
into Redshift
Regular analysis to identify
trends in demographic activity
and in-store sentiment over time
Trend reporting for audience measurement

Sentiment Analysis
• Reduce API volume - perform
motion detection and frame pre-
processing on the capture device
• Customer photos not stored on
AWS
• S3 - A file content lake to feed
other services – EMR, Lambda
• Resell as a service – API
gateway + Lambda + S3
Trend reporting for audience reactions
Live Subject Focus Groups Camera Application
Amazon S3
Analyze Faces
Viewers watching
pre-release content
Audience cameras
capture live images
of viewers
A Lambda function is
triggered and calls
Rekognition
Rekognition analyzes the image
and returns facial attributes
detected, which include emotion
and demographic detail
Return data is
normalized and
staged in S3 en route
to Redshift
Marketing
Reports Periodic ingest of
data into Redshift
Regular analysis to identify
trends in demographic
activity and in-store
sentiment over time
Amazon Quicksight Amazon Redshift
1
2
3
AWS
Lambda
Amazon
S3
Amazon
SQS
AWS
CloudFormation
Amazon
CloudWatch
Amazon
ElasticSearch
Amazon
Redshift
Amazon
QuickSight
Amazon API
Gateway*

• Built in 3 weeks
• Indexed against 99,000 people
• Index created in one day
• Saved ~9,000 hours a year in
manual curation costs
• Live video with frame sampling
Automating Footage Tagging with
Amazon Rekognition
Previously, only about half of all footage was indexed due to the immense
time requirements required by manual processes

Automating Footage Tagging with
Amazon Rekognition
Solution Architecture
EncodersStills Extraction
& Feeds
Results
Cache
Bucket
R3
Amazon
Rekognition
Users
Stills
Frames
SQS
Trigger
1
2
3
4

Rekognition - Summary
• Leverage Amazon internal experience with
AI, ML, and Computer Vision
• Managed API services with embedded AI for
maximum accessibility and simplicity
• Full stack of deep learning image processing
algorithms for specialized applications
• Integrates natively with other AWS Services
• Extensible by design

Keith Steward, Ph.D.,
Specialist Solutions Architect, AWS
Customer (Character) Interaction
with Amazon Lex & Amazon Polly
(how to build a bot)

12:00 - 1:00 PM Check-in & Lunch
1:00 - 1:30 PM Overview of AI (AWS perspective)
Introduction of AWS AI services/all up, fits/features and value proposition
1:30 - 2:15PM Session 1: Media Use case 1
Best Practices for integrating Amazon Rekognition into Your own Media Applications.
Demonstrating practical approaches to enhance your media workflows with Amazon
Rekognition, through common use cases
2:15 - 2:30 PM Break
 2:30 - 3:15 PM Session 2: Media Use case 2
Customer (character) Interaction with Lex and Polly. i.e. How to build a bot
An overview of deep learning and how to build a recommendation engine using Apache
MXNet

Gartner quote:
“By 2020, the average person will have more
conversations with bots than with their spouse.”

Business Value of Bots -- Fundamentals
• Help Answer Questions
• Perform tasks on behalf of users
• Increase efficiency and/or user experience
• Enable easier access to relevant data (Enterprise
systems are highly complex to use!)
• More informed decision making
• Streamline processes

Business Value of Bots -- Specifics
• Help customers use your growing number and complexity of
offerings
• Customer Service Improvements
• Fact: Customers hate dealing with today’s Customer Service
• Fact: Highest cost in the Contact Center is labor
• Aim: Fulfill majority of common customer questions
• Aim: Provide instant recommendations in the context of current conversation
• Aim: Hand off requests requiring human interaction to customer support staff
• Aim: Reduce operational costs
• Aim: Strike balance between automation & human assistance
• Help staff leverage internal data assets, functions, and business
processes in the most efficient manner

Converts text
to life-like speech
48 voices 24 languages Low latency,
real time
Fully managed
Amazon Polly: Life-like Speech Service
Voice Quality & Pronunciation
1. Automatic, Accurate Text Processing
2. Intelligible and Easy to Understand
3. Add Semantic Meaning to Text
4. Customized Pronunciation
Articles and Blogs
Training Material
Chatbots (Lex)
Public Announcements

Amazon Polly: Common Use Cases
• Internet of Things (smart home, connected devices)
• Education (language learning, training videos)
• Voiced Media (news, blogs, email)
• Voiced Chat Bots (Amazon Lex, Alexa skills)
• Gaming (avatars, Amazon Lumberyard)
#VoiceFirst Movement

The Advent of Conversational Interactions
1st gen: Machine-oriented
interactions

interactions
2nd gen: Control-oriented
& translated

interactions
2nd gen: Control-oriented
& translated
3rd gen:
Intent-oriented

Amazon Lex ... for Conversational Interactions
Powered by the same deep learning technology as Alexa
Enterprise SaaS Connectors
Deployment to chat platforms, like Slack, Facebook
Messenger, Twilio SMS
Build Voice and Text Chatbots
Interactions on mobile, web, and devices

A Sample Amazon Lex Conversation: Booking a Hotel
Utterances
Spoken or
typed phrases
that invoke
your intent
BookHotel Intents
input
Fulfillment
Slots
nt

Amazon Lex Use Cases
Informational Bots
Chatbots for everyday consumer requests
Application Bots
Build powerful interfaces to mobile applications
• News updates
• Weather information
• Game scores ….
• Book tickets
• Order food
• Manage bank accounts ….
Enterprise Productivity Bots
Streamline enterprise work activities and improve efficiencies
• Check sales numbers
• Marketing performance
• Inventory status ….
Internet of Things (IoT) Bots
Enable conversational interfaces for device interactions
• Wearables
• Appliances
• Auto ….

“Coffee Bot” – what we will build today
Order Small
Mocha
“I’d like to order a small
Mocha”
Automatic Speech
Recognition
CafeOrderBeverage
Small Mocha
Natural Language
Understanding
Intent/Slot
Model
Utterances
Coffee Bot
Type Mocha
Size Small
Temperature Hot
“Your mocha will be
available soon!”
Polly
Confirmation: “Your mocha
will be available soon!”
a
“You’d like a small
mocha, is that right?

AWS Mobile Hub Integration
Authenticate users
Analyze user behavior
Store and share media
Synchronize data
More ….
Track retention
Conversational Bots
AWS Mobile SDKs
AWS Mobile Hub

Time to code: Build CoffeeBot
https://s3.us-east-2.amazonaws.com/mast-mast/public/labs/lex-pressobot/readme.html
http://bit.ly/2nG8oIG
• Intents, Slot Types, Utterances
• Lambda
• Mobile Hub integration
• CloudWatch metrics

David Pearson, AWS AI Services
Aug 2017
AI and Deep Learning at Amazon
Automate | Engage | Optimize

12:00 - 1:00 PM Check-in & Lunch
1:00 - 1:30 PM
AI and Deep Learning at Amazon
Introduction to AWS AI services for media and entertainment
1:30 - 2:15PM Best Practices for integrating Amazon Rekognition into media applications
2:15 - 2:30 PM Break
2:30 - 3:15 PM Interaction with Lex and Polly. i.e. How to build a bot
3:15 - 4:00 PM
An overview of deep learning and how to build a recommendation
engine using Apache MXNet

Artificial Intelligence at Amazon

Can We Help Customers
Put Intelligence At The Heart Of
Every Application & Business?

Amazon AI
Intelligent Services Powered By Deep Learning

AI customer production use cases
Zillow
Zestimate (using Apache Spark)
Howard Hughes Corp
Lead scoring for luxury real estate
purchase predictions
FINRA
Anomaly detection, sequence matching,
regression analysis, network/tribe analysis
Netflix
Recommendation engine
Pinterest
Image recognition search
Fraud.net
Detect online payment fraud
DataXu
Leverage automated & unattended ML at
large scale (Amazon EMR + Spark)
Mapillary
Computer vision for crowd sourced maps
Hudl
Predictive analytics on sports plays
Upserve
Restaurant table management & POS for
forecasting customer traffic
TuSimple
Computer Vision for Autonomous Driving
Clarifai
Computer Vision APIs

Predictive analytics on sports plays

Amazon AI
Automate manual, effort-intensive processes
Engage audiences, customers, and employees
Optimize product quality and customer experiences

Amazon Lex is a service for building
conversational interfaces into any application
• Embedded AI service
• Enables intent-driven interactions
• Supports both voice and text conversations
• Deployable across mobile and messaging platforms
• Integrated speech recognition and natural language
understanding

Developer challenges
Conversational interfaces need to combine a large number of
sophisticated algorithms and technologies
Speech
Recognition Language
Understanding
Business Logic
Disparate
Systems
Authentication
Messaging
platforms
Scale Testing
Security
Availability
Mobile

When would you like to fly?
Next Sunday
Departure:
6/11/2017
Conversational chatbots
Heathrow, please
Destination:
LHR
I’d like to book a flight to London
Sure! Do you want to fly to Heathrow or Gatwick?

Amazon Lex and active audience engagement
• Fans interact with character bots via social media,
mobile and web apps
• Employees use help desk assistants to simplify access
to company information and services
• Authenticated customers speak to customer service
bots to retrieve their current account status
• Executives use voice requests to gain fast insights into
their business data from within enterprise applications

Converts text
to life-like speech
48 voices 24 languages Low latency,
real time
Fully managed
Amazon Polly: Life-like Speech Service
Voice Quality & Pronunciation
1. Automatic, Accurate Text Processing
2. Intelligible and Easy to Understand
3. Add Semantic Meaning to Text
4. Customized Pronunciation

Amazon Polly media use cases
• Voice-enabled fan interaction. Add voice to text
chatbots for a hands-free option
• Listen to the news. Multi-channel news and blog
article experience for users
• Synthetic character voices. Synchronize speech
with on-screen content and customize pronunciation
• Executive assistant. Spoken results of information
requests from internal users

Whisper Voice and Speech Marks
Whisper SSML effect: <amazon:effect name="whispered">
Synchronize Speech for an Enhanced Visual Experience
• Request an additional stream of metadata about
sentence word timings
• Use the metadata stream alongside the synthesized
speech audio stream to sync audio and visual

Amazon Rekognition
Extract rich metadata from visual content
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
Celebrity
Recognition
Image
Moderation

Celebrity recognition and image moderation
Newly released Rekognition features
Detect explicit and suggestive contentRecognize thousands of famous individuals
DetectModerationLabelsRecognizeCelebrities

Object and scene detection
Object and scene detection makes it easy for you to add features that search,
filter, and curate large image libraries.
Identify objects and scenes and provide confidence scores
DetectLabels
Flower
Arrangement
Chair
Coffee Table
Living Room Indoors
Furniture
Cushion
Vase
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Patio

Confidence Rekognition
Labels
99.2% Animal
Dog
Chihuahua
98.6% Food
Dessert
Muffin
97.9% Collage
Computer Vision Challenge:
Dog or Muffin?

Facial analysis
Analyze facial characteristics in multiple dimensions
DetectFaces
Image Quality
Facial Landmarks
Demographic Data Emotion Expressed
General Attributes
Facial Pose
Brightness 23.6%
Sharpness 99.9%
EyeLeft,EyeRight,Nose
RightPupil,LeftPupil
MouthRight,LeftEyeBrowUp
Bounding Box...
Age Range 29-45
Gender:Male 96.5%
Happy 83.8%
Surprised 0.65%
Smile:True 23.6%
EyesOpen:True 99.8%
Beard:True 99.5%
Mustache:True 99.9%...
Pitch 1.446
Roll 5.725
Yaw 4.383

Amazon Rekognition: Audience Analysis
• Touchless data gathering via cameras facing the audience
• Anonymous, high volume demographic and sentiment capture
• Analysis produces usable feedback trends and patterns
CAMERAAUDIENCE

Facial recognition
Find similar faces in a large collection of images
SearchFacesByImage
Search
Index
Collection

• Built in 3 weeks
• Indexed against 99,000 people
• Index created in one day
• Saved ~9,000 hours a year in
manual curation costs
• Live video with frame sampling
Automating footage tagging with
Amazon Rekognition
Previously, only about half of all footage was indexed due to the
immense time requirements required by manual processes

Amazon Rekognition media use cases
• Automated metadata. Classification of video and still
images – objects, talent, sentiment, demographics
• Audience sentiment tracking. Touchless, time-aligned
capture of audience reactions during a video screening
• Dynamic second screen. OTT users seamlessly
access details of actors appearing on screen in real-time
• Content rating. Identify suggestive and explicit content
to improve alignment with rating standards

Playout &
Distribution
Filtering & Quality Control
Visual Effects & Editing
Application & Filesystem
Texture & Asset Search
Analytics
Sentiment Analysis
Other Amazon AI Services
(Lex, Polly ++)
DAM & Archive
Auto-categorization
Metadata Augmentation
Digital Supply Chain
Tag on Ingest
Live and VOD Feature Extraction
Celebrity Detection
Publishing
Value Add
API-based services
OTT
Filtering &
Quality Control
Acquisition
Pre-processing &
optimization
Use Cases Across Media Segments

Amazon AI
https://aws.amazon.com/blogs/ai/
https://aws.amazon.com/amazon-ai/

AWS AI Media & Entertainment Seminar - NYC, August 15, 2017

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie AWS AI Media & Entertainment Seminar - NYC, August 15, 2017

Ähnlich wie AWS AI Media & Entertainment Seminar - NYC, August 15, 2017 (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

AWS AI Media & Entertainment Seminar - NYC, August 15, 2017

Hinweis der Redaktion