With the breadth of AWS services available that are relevant to digital media, organizations can readily build out complete content/asset management (DAM/MAM/CMS) solutions in the cloud. This session provides a detailed walkthrough for implementing a scalable, rich-media asset management platform capable of supporting a variety of industry use cases. The session includes code-level walkthrough, AWS architecture strategies, and integration best practices for content storage, metadata processing, discovery, and overall library management functionality—with particular focus on the use of Amazon S3, Amazon Elastic Transcoder, Amazon DynamoDB and Amazon CloudSearch. Customer case study will highlight successful usage of Amazon CloudSearch by PBS to enable rich discovery of programming content across the breadth of their network catalog.
3. Big Picture: Enterprise Media Architecture
Integrated
Workflow
RTMP
MPEG-TS
Live
Stream
Media
Files
Content
Management,
Discovery &
Delivery
Physical
Media
Transcoders
Camera
HD-SDI
Store output
profile and file
Store output
profile and file
4. Big Picture: Digital Asset Management (DAM)
Integrated
Workflow
RTMP
MPEG-TS
Live
Stream
Media
Files
Content
Management,
Discovery, &
Delivery
Physical
Media
DAM
Transcoders
Camera
HD-SDI
Store output
profile and file
Store output
profile and file
15. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
16. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Web
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
17. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Web
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
18. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Web
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
19. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Web
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
20. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
21. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
22. Tools Available to Us
Need
Description
AWS Service
Ingest
Integrate w / existing file-based workflows
Amazon S3
Metadata
Process inline and sidecar files
EC2 / Elastic Beanstalk
Renditions
Autogenerate thumbnails and proxies
Amazon Elastic Transcoder
Catalog part 1
Administrative entities, simple retrieval
Amazon DynamoDB
Catalog part 2
Field and free-form search
Amazon CloudSearch
Storage
Nearline, online, offline infinite storage
Amazon S3, Amazon Glacier
Delivery
Global caching and streaming footprint
Amazon CloudFront
23. Catalog: A word on why DynamoDB
Container-A
Header
NoSQL Data Model
Layer-2
Core Elem1
Core Elem2
Elem from A
Name_A
Size
Some_Field
Name_B
Size
Name_C
Layer-1
Size
Container-B
Header
Container-C
Layer-1
Header
Layer-2
Elem from B
Some_Field
24. Catalog: A Word on Why CloudSearch
• Video and text
– Header fields with textual descriptions, synopsis, comments
– Tracks with speech to text, closed caption data
– Links to scripts
• Video and structured elements
– XMP dynamic media
– Sidecar files
• A managed search engine dedicated to these kinds of problems
– Case folding, stemming, stopword removal, synonyms
– Also accent normalization, UTF-8 normalization, etc.
26. Delivery
Cache
DAM
Storage &
S3 Buckets
Archive
For Renditions,
Mailbox
Event
Handler
DAM
Web Service
DAM
Interface
AWS
Beanstalk
Rendition
Processing
Metadata Sidecar
Files
EC2 Workers
Auto scaling
Group
DynamoDB
Catalog
Mailbox
Metadata
Processing
EC2 Workers
Auto scaling
Group
Amazon
CloudSearch
27. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
Amazon SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
Amazon SQS
Queue
Metadata
Processing Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
29. Setup
• Amazon Simple Storage Service (S3) buckets
ready to go
– External staging locations
– Internal working locations
• Amazon Simple Notification Service (SNS) +
Amazon Simple Queue Service (SQS) wired up
• Catalog data models established
– Amazon DynamoDB table “catalog” created
– Amazon CloudSearch search domain “catalog” created
30. 1. Ingest, Crawl, Notify
a.
b.
c.
d.
End user initiates data copy
EC2 worker scans Amazon S3 staging bucket
EC2 worker copies or moves content
EC2 worker broadcasts “NEW DATA” event
31. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
32. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
33. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
34. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
35. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
36. 1. Ingest, Crawl, Notify
a.
b.
c.
d.
End user initiates data copy
EC2 worker scans Amazon S3 staging bucket
EC2 worker copies or moves content
EC2 worker broadcasts “NEW DATA” event
(SNS)
37. 2. Metadata Extraction
a. EC2 worker polls inbox (SQS)
b. EC2 worker pulls down media asset from
Amazon S3
c. EC2 worker parses media files
d. EC2 worker pumps metadata through ETL flow
to prepare for catalog insertion
e. EC2 worker inserts into catalog (Amazon
DynamoDB)
38. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
39. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
40. 2. Metadata Extraction
a. EC2 worker polls inbox (SQS)
b. EC2 worker pulls down media asset from
Amazon S3
c. EC2 worker parses media files
d. EC2 worker pumps metadata through ETL flow
to prepare for catalog insertion
e. EC2 worker inserts into catalog (Amazon
DynamoDB)
43. 3. Catalog Processing
a. Store metadata record in Amazon DynamoDB
b. Reflect searchable subset to Amazon
CloudSearch
c. Go crazy (HTTP GET)
44. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
Metadata
Workers
Amazon
CloudSearch
45. CloudFront
Download
Distribution
Media
Content
Amazon S3 Storage
For Source,
Renditions, Metadata
Sidecar Files
EC2 Crawler
EC2 ASG
SQS Queue
Rendition Jobs
Rendition
Workers
Elastic Transcoder
Proxy / Thumbnail
Generation
DAM
Catalog
Amazon
DynamoDB
1
Amazon SNS Topic
DAM
Web Service
SQS Queue
Metadata
Processing
Jobs
AWS Elastic
Beanstalk
EC2 ASG
2
Metadata
Workers
Amazon
CloudSearch
46. Querying the Catalog (Amazon CloudSearch)
• http://cloudsearch.demo.aws.com/2011-0201/search?bq=complete_name : …<field=value>
•
In Node.js
var optionsget
host :
port :
path :
= {
'cloudsearch.demo.aws.com', // here only the domain name
80,
'/2011-02-01/search?bq=complete_name:'-STRAWBERRY'&
return-fields=complete_name,text_relevance,codec_id_info,
duration,file_size, duration,encoded_date',
method : 'GET'
}
48. Merlin: PBS CMS/DAM
• Code name Merlin
• Structured metadata
• 200+ web object records daily
– 29,046 web objects
• 150+ Video objects daily
– 91,436 videos
• Users from over 150 stations 30 national producers
– Frontline
– Downton Abbey
– PBS Newshour
49. What’s It Do?
• Large multitenant system
– 1200 registered users
• 250 million streams per month
• 20 million unique viewers
• 8 PB of video delivered monthly
50. Getting Data In
• 33 ingestible web feeds
– Content editors
– Web page listings
• Batch video ingest API
– Video content editors
– External workflow integration
• Manually entered videos
– Video content editors from all 50 states
– Number of user accounts
52. Basic Workflow
• Object registered with Merlin
• Images registered and processed with ITS
– Stored in CDN fronted Amazon S3 bucket
• Videos registered with VTS
– Jobs sent to Zencoder for processing
– Video stored in CDN fronted Amazon S3 bucket
• Objects ready for clients
– Objects rendered for consumption in Amazon S3
– Objects registered with APIs
– Objects discoverable
53. Making It Discoverable
• Search util service
• Runs every hour
– Re-indexes last several hours each time
• Polls APIs
– Content API
– Modified time
• Updates Amazon CloudSearch index
– 2 primary indexes
54. Search Considerations
• Hidden objects
• Rights management
• Partitioned search
– Local station search
– Results by geo
– Restrict results for international customers
• Unify and normalize existing APIs
– Flatten data model
• Users looking for programs
– Specific searches
– Suitable for structured data
55. Challenges
• No native time field
– Convert dates to integers
– Epoch time
• Versioning of documents
– Epoch for versioning
• Exposing two versions of most fields
– Text searchable
– Facets (copy of text version)
60. Summary
• Build an enterprise-scale DAM platform now
– Managed storage and archive (Amazon S3, Amazon Glacier)
– Managed database for catalog processing (Amazon DynamoDB, Amazon
Relational Database Service [RDS])
– Managed search (CloudSearch)
• Application development accelerators
– Elastic Beanstalk harness (web, API, and worker roles)
– Reduced effort with the AWS CLI
• (Almost) fire and forget
61. AWS Marketplace Can Help
•
AWS online software store
–
–
–
–
–
•
Customer can find, research, buy software
Simple pricing, aligns with EC2 usage model
1-click launch in minutes
Marketplace billing integrated into your AWS account
1,000+ products across 24 categories
Digital asset management related options Include:
–
–
–
–
WebDAM – centralize, store, manage and distribute collateral
Digital asset management cloud – web-based open source DAM
Widen – manage and distribute digital media and brand assets with
user roles and permissions
Adobe Experience Manager – unified asset management including
mobile
Learn more at: http://aws.amazon.com/marketplace
63. Please give us your feedback on this
presentation
MED-402 Building a Scalable Video /
DAM Solution in the Cloud
As a thank you, we will select prize
winners daily for completed surveys!