SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bryan Samis
Solutions Architect
SRV322
Increase the Value of Video Using
Machine Learning and AWS Media
Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• Brief introduction to services
• Using ML in video workflows
• Content indexing / metadata generation
• Add searchable metadata to a video archive
• Log when celebrities appear in new episodes
• Generate captions for a collection of video assets
• Content retrieval
• Transcode just the video clip that contains a specific person
• Putting it all together
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Services Used
• AWS Elemental MediaConvert
• AWS Elemental MediaLive
• Amazon Rekognition
• Amazon Transcribe
• AWS Lambda
• AWS Step Functions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Elemental MediaConvert
AWS Elemental MediaConvert is a file-based video processing service that enables anyone, with
any size content library, to easily and reliably transcode on-demand content for broadcast and
multiscreen delivery.
• Access to professional grade video features and quality
• No software or hardware infrastructure to manage
• Automatically scales in response to variations in incoming video volume
• Ability to manage capacity and control order in which jobs are processed
• Pay for what you use, billed by the second of content produced
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition
Object & scene
detection
Facial
analysis
Face
comparison
Face
search
Celebrity
detection
Image
moderation
Text
detection
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition Video
Object, scene &
Activity detection
Face
search
Facial analysis Activity pathing
Unsafe content
detection
Celebrity
detection
Text in images
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition
File requirements
• Image recognition
• JPEG / PNG image
• Up to 15 MB
• Video recognition
• MOV / MP4 file with H.264 video
• Up to 8 GB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
A fully managed and continuously trained automatic speech recognition (ASR)
service that takes in audio and automatically generates accurate transcripts
Support for audio in
many formats and
low fidelity
§
Amazon S3
integration
Hello/
Hola
Time stamps and
confidence scores
English and SpanishPunctuation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
• Input file types accepted are:
• FLAC
• MP3
• WAV
• MP4
• Up to 2 hours in duration
• Up to 1 GB in size
• Produces JSON output with full transcription and word timing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe – Use Cases
Call centers Subtitles for
VOD
Transcribe
meetings
Broadcast closed
captions
Content
indexing
Compliance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using Amazon ML Services for Media
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using Amazon ML Services for Media
• Use services such as Amazon Rekognition &
Amazon Transcribe to generate metadata about
your content
• Store that metadata and make it searchable
• Retrieve only the portion of the content
you want
• Prepare it for timely use
Live and file
Sources
Amazon ML
Services
ML
Amazon
DynamoDB
Database
Live and file
Content
Content Indexing / Metadata Generation Content Retrieval / Action Metadata
AWS Elemental
Media Services
Media processing
AWS Elemental
Media Services
Media processing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation
File-based
content
Live
content
MediaLive
Kinesis
Video
Streams
MediaConvert
Amazon Rekognition
(Image)
• JPEG/PNG
• Up to 15 MB
Amazon Rekognition
(Video)
• H.264 video
• MP4/MOV file
• Up to 8 GB
Transcribe
• FLAC/MP3/WAV/MP
4
• Up to 2 hours
• Up to 1 GB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Rekognition
TheChallenge
• A broadcaster
wants to add
metadata to
existing archive of
video content
• Index metadata
and video to make
it searchable
• Keep costs low
TheSolution
• Use AWS
Elemental
MediaConvert to
extract frames
from video content
• Use Amazon
Rekognition to
analyze and create
metadata for video
content
TheBenefit
• Video tagged with
objects, scene
and celebrities
detection
• Five-second frame
extraction keeps
cost low while
providing
searching index
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Rekognition
AWS Elemental
MediaConvert job
transcodes file and
extracts JPEG frames to S3
bucket.
AWS Lambda function
triggered by Amazon S3
object-created event tells
Amazon Rekognition to
analyze the JPEG file.
Amazon Rekognition
performs requested
operation on image (i.e.,
object detection, celebrity
recognition, etc.).
Amazon Rekognition returns result
to AWS Lambda, which stores tags
and confidence scores in Amazon
DynamoDB, Amazon Redshift,
Amazon Elasticsearch Service,
Amazon RDS, or whichever service
best suits the use case.
• Use AWS Elemental MediaConvert to extract still frames from a video
AWS Elemental
MediaConvert
File-based
processing
Amazon S3
Storage
AWS Lambda
Serverless
Amazon
Rekognition
ML / AI
Amazon
DynamoDB
Database
File
Source
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Rekognition
• Add new file output group to an AWS Elemental MediaConvert job
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Rekognition
Add frame capture (JPEG) output the job
Framerate
determines the
number of images
that will be extracted
from the video per
second. 1/5 indicates
to create one JPEG
every 5 seconds.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition
AWS Lambda function to invoke Amazon Rekognition on our extracted JPEG to detect
celebrities
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition
Result from our image
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Rekognition Video
TheChallenge
• A content producer
wants to log who is
in each scene of
new episode of a
show
• Raw video files are
~200 GB for 60 min
TheSolution
• Use AWS Elemental
MediaConvert to
compress video
content (but retain
quality
• Use Amazon
Rekognition Video to
analyze and create
metadata for video
content
TheBenefit
• Video tagged
celebrities detection
and timing and
position of celebrity
• Video files reduces
to <8 GB for 60 mins
to reduce costs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Rekognition Video
AWS Elemental MediaConvert
job transcodes source file to
H.264/MP4 at a bit rate such that
the file size is <8 GB.
AWS Lambda function triggered by
Amazon S3 object-created event
tells Amazon Rekognition to analyze
the video file.
Amazon Rekognition Video
performs requested operation on
video (i.e., person tracking,
celebrity recognition, etc.).
Amazon Rekognition returns result to
AWS Lambda, which stores tags and
confidence scores in Amazon
DynamoDB, Amazon Redshift, Amazon
Elasticsearch Service, Amazon RDS, or
whichever service best suits the use
case.
Use AWS Elemental MediaConvert to compress files >8 GB and feed it to
Amazon Rekognition
AWS Elemental
MediaConvert
File-based
processing
Amazon S3
Storage
AWS Lambda
Serverless
Amazon
Rekognition
ML / AI
Amazon
DynamoDB
Database
File
Source
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Elemental MediaConvert
• Add H.264/MP4 output to MediaConvert job
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Elemental MediaConvert
• Add H.264/MP4 output to AWS
Elemental MediaConvert job
Use Container MPEG-
4 Container (MP4)
and a file extension of
mp4.
Set Video Codec to
MPEG-4 AVC (H.264).
Select bit rate accordingly
so output file is smaller
than 8 GB. For example, a
60-minute movie at 7 Mbps
will be approximately
3.2 GB.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition Video
AWS Lambda function to invoke Amazon Rekognition on our transcoded video
to detect labels:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition Video
Example code to fetch our Amazon Rekognition Video results when Amazon
SNS notification is published:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Rekognition Video
Result from our video
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Indexing / Metadata Generation –
AWS Elemental MediaConvert and Amazon Transcribe
TheChallenge
• An online training
provider has 1,000s of
hours of video that
need captions
• Video is in a variety of
formats
TheSolution
• Use AWS Elemental
MediaConvert create
audio only version of
content
• Use Amazon
Transcribe to generate
timestamped
transcription
• Convert Amazon
Transcribe output to
captions file
TheBenefit
• All formats of video
content get captions
added to make them
more accessible
• Option to run Amazon
Transcribe output
through Amazon
Translate to get multi-
language captions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
AWS Elemental MediaConvert
job transcodes source file,
creating audio-only rendition for
Amazon Transcribe
AWS Elemental
MediaConvert also
creates normal
audio/video output
AWS Lambda function triggered
by Amazon S3 object-created
event creates a new Transcribe
job
Amazon Transcribe outputs
JSON file of detected words
and timing
Lambda function converts Amazon
Transcribe JSON into subtitle format
(such as WebVTT, SRT, or TTML) and
delivers to
Amazon S3 bucket with content
AWS Elemental
MediaConvert
File-based
processing
AWS Lambda
Serverless
Amazon
Transcribe
ML / AI
File
Source
Amazon S3
Storage
Amazon S3
STORAGE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Elemental MediaConvert
Add audio-only WAV output to the job. Start by adding an additional file output group.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Elemental MediaConvert
Configure audio-only uncompressed WAV or MP4 output.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
AWS Lambda function to create an Amazon Transcribe job from the audio file
created by AWS Elemental MediaConvert.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
Use AWS Step Functions to monitor status of Transcribe job.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
Transcribe creates a JSON file with complete transcription
and word by word timing.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
Must convert Amazon Transcribe JSON into usable closed caption / subtitle format, such
as SRT.
• Not a trivial problem. We need to determine sentence boundaries and which words to
combine into the same captions.
Example:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Transcribe
• Some ideas for tackling this problem:
• Calculate the cadence of the wording, and look for larger than average gaps
between words. Use these points as our breaks.
• Use a fixed caption duration of 1–2 seconds and “aggregate” all words that fall
within that duration.
• None of these methods are perfect. Analyzing audio alone won’t necessarily account for
scene changes, gaps in dialog, non-dialog sound elements, etc.
But they can get us close…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Retrieval
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Retrieval
AWS Elemental MediaConvert
TheChallenge
• The content
producer would like
to create a promo
clip of all of the
scenes from their
episode that contain
a particular actor.
• Remember, the
source file is 60
minutes long and
200 GB.
TheSolution
• Amazon Rekognition
video facial
recognition identifies
when the star
appears in the
source video.
• AWS Elemental
MediaConvert uses
time references to
selectively transcode
source video.
TheBenefit
• Faster and more
cost-effective clip
generation as only
the video contents
that has been
identified as
featuring the
celebrity is
transcoded.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Retrieval AWS Elemental
MediaConvert
transcodes clips from the
source file, using only the
time range(s) specified
AWS Elemental
MediaConvert
File-based
processing
Amazon S3
Storage
Clipped file
Output
Amazon
DynamoDB
Database
AWS Lambda
Serverless
Lambda function
queries database for
metadata being
searched
Lambda function creates
MediaConvert transcode job
specifying time(s) from
source to clip
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Content Retrieval
Use AWS Elemental MediaConvert “Input Clipping” feature to clip a file to specific times
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Media Analysis Solution
https://aws.amazon.com/answers/media-entertainment/media-analysis-solution/
• Generate searchable metadata from
your media assets using Amazon
Rekognition, Amazon Transcribe,
Amazon Comprehend, and Amazon
Elasticsearch Service
• Deploy in minutes with a single click
using AWS CloudFormation
• Interact via API or demo web UI
• Orchestrated with Step Functions,
extensible and easily customizable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bringing it all together
Adding video transcoding to the Media Analysis Solution
AWS Elemental
MediaConvert
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Machine Learning Stack
Platforms
Application services
A m a z o n
R e k o g n i t i o n
A m a z o n
R e k o g n i t i o n
V i d e o
P o l l y T r a n s c r i b e T r a n s l a t e C o m p r e h e n dL e x
Amazon SageMaker Amazon Mechanical Turk
Frameworks KERAS
NVIDIA
Tesla V100 GPUs
(14x faster than P2)
P3
Machine Learning
AMIs
5,120 Tensor cores
128 GB of memory
1 Petaflop of compute
NVLink 2.0
Infrastructure
&
Submit Session Feedback
1. Tap the Schedule icon.
2. Select the session you
attended.
3. Tap Session Evaluation to
submit your feedback.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Create and Publish AR and VR Apps with Amazon Sumerian
Create and Publish AR and VR Apps with Amazon SumerianCreate and Publish AR and VR Apps with Amazon Sumerian
Create and Publish AR and VR Apps with Amazon Sumerian
 
SRV321 Deep Dive on Amazon EBS
 SRV321 Deep Dive on Amazon EBS SRV321 Deep Dive on Amazon EBS
SRV321 Deep Dive on Amazon EBS
 
Amazon EBS: Deep Dive
Amazon EBS: Deep DiveAmazon EBS: Deep Dive
Amazon EBS: Deep Dive
 
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
 
Amazon EFS: Deep Dive
Amazon EFS: Deep DiveAmazon EFS: Deep Dive
Amazon EFS: Deep Dive
 
SRV207 Orchestrating AWS Lambda with Step Functions
 SRV207 Orchestrating AWS Lambda with Step Functions SRV207 Orchestrating AWS Lambda with Step Functions
SRV207 Orchestrating AWS Lambda with Step Functions
 
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
 
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
Introducing AWS Transfer for SFTP, a Fully Managed SFTP Service for Amazon S3...
 
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
The Future of Enterprise Applications is Serverless (ENT314-R1) - AWS re:Inve...
 
Build a Multi-Region Serverless Application for Resilience & High Availabilit...
Build a Multi-Region Serverless Application for Resilience & High Availabilit...Build a Multi-Region Serverless Application for Resilience & High Availabilit...
Build a Multi-Region Serverless Application for Resilience & High Availabilit...
 
Secure your AWS Account and your Organization's Accounts
Secure your AWS Account and your Organization's Accounts Secure your AWS Account and your Organization's Accounts
Secure your AWS Account and your Organization's Accounts
 
Simplifying Microsoft Architectures with AWS Services (WIN306) - AWS re:Inven...
Simplifying Microsoft Architectures with AWS Services (WIN306) - AWS re:Inven...Simplifying Microsoft Architectures with AWS Services (WIN306) - AWS re:Inven...
Simplifying Microsoft Architectures with AWS Services (WIN306) - AWS re:Inven...
 
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
 
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
 
SRV314 Containerized App Development with AWS Fargate
SRV314 Containerized App Development with AWS FargateSRV314 Containerized App Development with AWS Fargate
SRV314 Containerized App Development with AWS Fargate
 
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
 
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
 
AWS Identity, Directory, and Access Services: An Overview
AWS Identity, Directory, and Access Services: An Overview AWS Identity, Directory, and Access Services: An Overview
AWS Identity, Directory, and Access Services: An Overview
 
Compliance and Security Mitigation Techniques
Compliance and Security Mitigation TechniquesCompliance and Security Mitigation Techniques
Compliance and Security Mitigation Techniques
 
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
 

Ähnlich wie Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit

Ähnlich wie Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit (20)

SRV322 Increase the Value of Video with ML & Media Services
 SRV322 Increase the Value of Video with ML & Media Services SRV322 Increase the Value of Video with ML & Media Services
SRV322 Increase the Value of Video with ML & Media Services
 
Increase the Value of Video with ML & Media Services - SRV322 - Anaheim AWS S...
Increase the Value of Video with ML & Media Services - SRV322 - Anaheim AWS S...Increase the Value of Video with ML & Media Services - SRV322 - Anaheim AWS S...
Increase the Value of Video with ML & Media Services - SRV322 - Anaheim AWS S...
 
Increase the Value of Video with Machine Learning & Media Services - SRV322 -...
Increase the Value of Video with Machine Learning & Media Services - SRV322 -...Increase the Value of Video with Machine Learning & Media Services - SRV322 -...
Increase the Value of Video with Machine Learning & Media Services - SRV322 -...
 
Increase the Value of Video with ML & Media Services - SRV322 - Chicago AWS S...
Increase the Value of Video with ML & Media Services - SRV322 - Chicago AWS S...Increase the Value of Video with ML & Media Services - SRV322 - Chicago AWS S...
Increase the Value of Video with ML & Media Services - SRV322 - Chicago AWS S...
 
Increase the Value of Video with ML & Media Services - SRV322 - Toronto AWS S...
Increase the Value of Video with ML & Media Services - SRV322 - Toronto AWS S...Increase the Value of Video with ML & Media Services - SRV322 - Toronto AWS S...
Increase the Value of Video with ML & Media Services - SRV322 - Toronto AWS S...
 
Customize a "VOD on AWS" Transcoding Solution with QC, DRM, and More (CTD412)...
Customize a "VOD on AWS" Transcoding Solution with QC, DRM, and More (CTD412)...Customize a "VOD on AWS" Transcoding Solution with QC, DRM, and More (CTD412)...
Customize a "VOD on AWS" Transcoding Solution with QC, DRM, and More (CTD412)...
 
AWS Immersion Day - Image Data Insights & Analytics Specialist Session - June...
AWS Immersion Day - Image Data Insights & Analytics Specialist Session - June...AWS Immersion Day - Image Data Insights & Analytics Specialist Session - June...
AWS Immersion Day - Image Data Insights & Analytics Specialist Session - June...
 
NEW LAUNCH! Build your own live streaming and on-demand video service with AW...
NEW LAUNCH! Build your own live streaming and on-demand video service with AW...NEW LAUNCH! Build your own live streaming and on-demand video service with AW...
NEW LAUNCH! Build your own live streaming and on-demand video service with AW...
 
How Netflix Encodes at Scale - CMP309 - re:Invent 2017
How Netflix Encodes at Scale - CMP309 - re:Invent 2017How Netflix Encodes at Scale - CMP309 - re:Invent 2017
How Netflix Encodes at Scale - CMP309 - re:Invent 2017
 
Automate for Efficiency with Amazon Transcribe and Amazon Translate - AWS Onl...
Automate for Efficiency with Amazon Transcribe and Amazon Translate - AWS Onl...Automate for Efficiency with Amazon Transcribe and Amazon Translate - AWS Onl...
Automate for Efficiency with Amazon Transcribe and Amazon Translate - AWS Onl...
 
Analisi avanzata di video e immagini con i servizi AI di AWS
Analisi avanzata di video e immagini con i servizi AI di AWSAnalisi avanzata di video e immagini con i servizi AI di AWS
Analisi avanzata di video e immagini con i servizi AI di AWS
 
How Reddit Scales to 1B+ Video Views a Month Using AWS (CTD320) - AWS re:Inve...
How Reddit Scales to 1B+ Video Views a Month Using AWS (CTD320) - AWS re:Inve...How Reddit Scales to 1B+ Video Views a Month Using AWS (CTD320) - AWS re:Inve...
How Reddit Scales to 1B+ Video Views a Month Using AWS (CTD320) - AWS re:Inve...
 
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
 
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
 
Serverless Architectures.pdf
Serverless Architectures.pdfServerless Architectures.pdf
Serverless Architectures.pdf
 
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
Data Lake Patterns for Voice, Vision, Advanced Analytics, & ML Using Serverle...
 
Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018
Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018
Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018
 
Configure an End-to-End Video Channel to Deliver Low Latency (CTD411-R3) - AW...
Configure an End-to-End Video Channel to Deliver Low Latency (CTD411-R3) - AW...Configure an End-to-End Video Channel to Deliver Low Latency (CTD411-R3) - AW...
Configure an End-to-End Video Channel to Deliver Low Latency (CTD411-R3) - AW...
 
(SPOT209) Raising the Bar on Video Streaming Quality Using AWS
(SPOT209) Raising the Bar on Video Streaming Quality Using AWS(SPOT209) Raising the Bar on Video Streaming Quality Using AWS
(SPOT209) Raising the Bar on Video Streaming Quality Using AWS
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Increase the Value of Video with ML & Media Services - SRV322 - New York AWS Summit

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bryan Samis Solutions Architect SRV322 Increase the Value of Video Using Machine Learning and AWS Media Services
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda • Brief introduction to services • Using ML in video workflows • Content indexing / metadata generation • Add searchable metadata to a video archive • Log when celebrities appear in new episodes • Generate captions for a collection of video assets • Content retrieval • Transcode just the video clip that contains a specific person • Putting it all together
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Services Used • AWS Elemental MediaConvert • AWS Elemental MediaLive • Amazon Rekognition • Amazon Transcribe • AWS Lambda • AWS Step Functions
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Elemental MediaConvert AWS Elemental MediaConvert is a file-based video processing service that enables anyone, with any size content library, to easily and reliably transcode on-demand content for broadcast and multiscreen delivery. • Access to professional grade video features and quality • No software or hardware infrastructure to manage • Automatically scales in response to variations in incoming video volume • Ability to manage capacity and control order in which jobs are processed • Pay for what you use, billed by the second of content produced
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition Object & scene detection Facial analysis Face comparison Face search Celebrity detection Image moderation Text detection
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition Video Object, scene & Activity detection Face search Facial analysis Activity pathing Unsafe content detection Celebrity detection Text in images
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition File requirements • Image recognition • JPEG / PNG image • Up to 15 MB • Video recognition • MOV / MP4 file with H.264 video • Up to 8 GB
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe A fully managed and continuously trained automatic speech recognition (ASR) service that takes in audio and automatically generates accurate transcripts Support for audio in many formats and low fidelity § Amazon S3 integration Hello/ Hola Time stamps and confidence scores English and SpanishPunctuation
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe • Input file types accepted are: • FLAC • MP3 • WAV • MP4 • Up to 2 hours in duration • Up to 1 GB in size • Produces JSON output with full transcription and word timing
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe – Use Cases Call centers Subtitles for VOD Transcribe meetings Broadcast closed captions Content indexing Compliance
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using Amazon ML Services for Media
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using Amazon ML Services for Media • Use services such as Amazon Rekognition & Amazon Transcribe to generate metadata about your content • Store that metadata and make it searchable • Retrieve only the portion of the content you want • Prepare it for timely use Live and file Sources Amazon ML Services ML Amazon DynamoDB Database Live and file Content Content Indexing / Metadata Generation Content Retrieval / Action Metadata AWS Elemental Media Services Media processing AWS Elemental Media Services Media processing
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation File-based content Live content MediaLive Kinesis Video Streams MediaConvert Amazon Rekognition (Image) • JPEG/PNG • Up to 15 MB Amazon Rekognition (Video) • H.264 video • MP4/MOV file • Up to 8 GB Transcribe • FLAC/MP3/WAV/MP 4 • Up to 2 hours • Up to 1 GB
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation – AWS Elemental MediaConvert and Amazon Rekognition TheChallenge • A broadcaster wants to add metadata to existing archive of video content • Index metadata and video to make it searchable • Keep costs low TheSolution • Use AWS Elemental MediaConvert to extract frames from video content • Use Amazon Rekognition to analyze and create metadata for video content TheBenefit • Video tagged with objects, scene and celebrities detection • Five-second frame extraction keeps cost low while providing searching index
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation – AWS Elemental MediaConvert and Amazon Rekognition AWS Elemental MediaConvert job transcodes file and extracts JPEG frames to S3 bucket. AWS Lambda function triggered by Amazon S3 object-created event tells Amazon Rekognition to analyze the JPEG file. Amazon Rekognition performs requested operation on image (i.e., object detection, celebrity recognition, etc.). Amazon Rekognition returns result to AWS Lambda, which stores tags and confidence scores in Amazon DynamoDB, Amazon Redshift, Amazon Elasticsearch Service, Amazon RDS, or whichever service best suits the use case. • Use AWS Elemental MediaConvert to extract still frames from a video AWS Elemental MediaConvert File-based processing Amazon S3 Storage AWS Lambda Serverless Amazon Rekognition ML / AI Amazon DynamoDB Database File Source
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation – AWS Elemental MediaConvert and Amazon Rekognition • Add new file output group to an AWS Elemental MediaConvert job
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation – AWS Elemental MediaConvert and Amazon Rekognition Add frame capture (JPEG) output the job Framerate determines the number of images that will be extracted from the video per second. 1/5 indicates to create one JPEG every 5 seconds.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition AWS Lambda function to invoke Amazon Rekognition on our extracted JPEG to detect celebrities
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition Result from our image
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation – AWS Elemental MediaConvert and Amazon Rekognition Video TheChallenge • A content producer wants to log who is in each scene of new episode of a show • Raw video files are ~200 GB for 60 min TheSolution • Use AWS Elemental MediaConvert to compress video content (but retain quality • Use Amazon Rekognition Video to analyze and create metadata for video content TheBenefit • Video tagged celebrities detection and timing and position of celebrity • Video files reduces to <8 GB for 60 mins to reduce costs
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation – AWS Elemental MediaConvert and Amazon Rekognition Video AWS Elemental MediaConvert job transcodes source file to H.264/MP4 at a bit rate such that the file size is <8 GB. AWS Lambda function triggered by Amazon S3 object-created event tells Amazon Rekognition to analyze the video file. Amazon Rekognition Video performs requested operation on video (i.e., person tracking, celebrity recognition, etc.). Amazon Rekognition returns result to AWS Lambda, which stores tags and confidence scores in Amazon DynamoDB, Amazon Redshift, Amazon Elasticsearch Service, Amazon RDS, or whichever service best suits the use case. Use AWS Elemental MediaConvert to compress files >8 GB and feed it to Amazon Rekognition AWS Elemental MediaConvert File-based processing Amazon S3 Storage AWS Lambda Serverless Amazon Rekognition ML / AI Amazon DynamoDB Database File Source
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Elemental MediaConvert • Add H.264/MP4 output to MediaConvert job
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Elemental MediaConvert • Add H.264/MP4 output to AWS Elemental MediaConvert job Use Container MPEG- 4 Container (MP4) and a file extension of mp4. Set Video Codec to MPEG-4 AVC (H.264). Select bit rate accordingly so output file is smaller than 8 GB. For example, a 60-minute movie at 7 Mbps will be approximately 3.2 GB.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition Video AWS Lambda function to invoke Amazon Rekognition on our transcoded video to detect labels:
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition Video Example code to fetch our Amazon Rekognition Video results when Amazon SNS notification is published:
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Rekognition Video Result from our video
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Indexing / Metadata Generation – AWS Elemental MediaConvert and Amazon Transcribe TheChallenge • An online training provider has 1,000s of hours of video that need captions • Video is in a variety of formats TheSolution • Use AWS Elemental MediaConvert create audio only version of content • Use Amazon Transcribe to generate timestamped transcription • Convert Amazon Transcribe output to captions file TheBenefit • All formats of video content get captions added to make them more accessible • Option to run Amazon Transcribe output through Amazon Translate to get multi- language captions
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe AWS Elemental MediaConvert job transcodes source file, creating audio-only rendition for Amazon Transcribe AWS Elemental MediaConvert also creates normal audio/video output AWS Lambda function triggered by Amazon S3 object-created event creates a new Transcribe job Amazon Transcribe outputs JSON file of detected words and timing Lambda function converts Amazon Transcribe JSON into subtitle format (such as WebVTT, SRT, or TTML) and delivers to Amazon S3 bucket with content AWS Elemental MediaConvert File-based processing AWS Lambda Serverless Amazon Transcribe ML / AI File Source Amazon S3 Storage Amazon S3 STORAGE
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Elemental MediaConvert Add audio-only WAV output to the job. Start by adding an additional file output group.
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Elemental MediaConvert Configure audio-only uncompressed WAV or MP4 output.
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe AWS Lambda function to create an Amazon Transcribe job from the audio file created by AWS Elemental MediaConvert.
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe Use AWS Step Functions to monitor status of Transcribe job.
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe Transcribe creates a JSON file with complete transcription and word by word timing.
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe Must convert Amazon Transcribe JSON into usable closed caption / subtitle format, such as SRT. • Not a trivial problem. We need to determine sentence boundaries and which words to combine into the same captions. Example:
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Transcribe • Some ideas for tackling this problem: • Calculate the cadence of the wording, and look for larger than average gaps between words. Use these points as our breaks. • Use a fixed caption duration of 1–2 seconds and “aggregate” all words that fall within that duration. • None of these methods are perfect. Analyzing audio alone won’t necessarily account for scene changes, gaps in dialog, non-dialog sound elements, etc. But they can get us close…
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Retrieval
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Retrieval AWS Elemental MediaConvert TheChallenge • The content producer would like to create a promo clip of all of the scenes from their episode that contain a particular actor. • Remember, the source file is 60 minutes long and 200 GB. TheSolution • Amazon Rekognition video facial recognition identifies when the star appears in the source video. • AWS Elemental MediaConvert uses time references to selectively transcode source video. TheBenefit • Faster and more cost-effective clip generation as only the video contents that has been identified as featuring the celebrity is transcoded.
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Retrieval AWS Elemental MediaConvert transcodes clips from the source file, using only the time range(s) specified AWS Elemental MediaConvert File-based processing Amazon S3 Storage Clipped file Output Amazon DynamoDB Database AWS Lambda Serverless Lambda function queries database for metadata being searched Lambda function creates MediaConvert transcode job specifying time(s) from source to clip
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Content Retrieval Use AWS Elemental MediaConvert “Input Clipping” feature to clip a file to specific times
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Media Analysis Solution https://aws.amazon.com/answers/media-entertainment/media-analysis-solution/ • Generate searchable metadata from your media assets using Amazon Rekognition, Amazon Transcribe, Amazon Comprehend, and Amazon Elasticsearch Service • Deploy in minutes with a single click using AWS CloudFormation • Interact via API or demo web UI • Orchestrated with Step Functions, extensible and easily customizable
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bringing it all together Adding video transcoding to the Media Analysis Solution AWS Elemental MediaConvert
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Machine Learning Stack Platforms Application services A m a z o n R e k o g n i t i o n A m a z o n R e k o g n i t i o n V i d e o P o l l y T r a n s c r i b e T r a n s l a t e C o m p r e h e n dL e x Amazon SageMaker Amazon Mechanical Turk Frameworks KERAS NVIDIA Tesla V100 GPUs (14x faster than P2) P3 Machine Learning AMIs 5,120 Tensor cores 128 GB of memory 1 Petaflop of compute NVLink 2.0 Infrastructure &
  • 47. Submit Session Feedback 1. Tap the Schedule icon. 2. Select the session you attended. 3. Tap Session Evaluation to submit your feedback.
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank You