Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013

1. MED402: Building a Scalable Video / Digital Asset Management (DAM) Platform in the Cloud Michael Limcaco – Enterprise Solutions Architect (AWS) Jonathan Rivers – Director, Technical Operations (PBS) November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

2. Agenda • • • • • The big picture Architecture Build-out exercise Customer case study (PBS) Observations and summary

3. Big Picture: Enterprise Media Architecture Integrated Workflow RTMP MPEG-TS Live Stream Media Files Content Management, Discovery & Delivery Physical Media Transcoders Camera HD-SDI Store output profile and file Store output profile and file

4. Big Picture: Digital Asset Management (DAM) Integrated Workflow RTMP MPEG-TS Live Stream Media Files Content Management, Discovery, & Delivery Physical Media DAM Transcoders Camera HD-SDI Store output profile and file Store output profile and file

5. Workflow Management Ingest Processing Storage Discovery & Delivery

6. Workflow Management Ingest Processing Storage Discovery & Delivery

7. Key DAM Requirements • • • • • • • Ingest Metadata extraction Create renditions Build the catalog Enable rich search Manage storage lifecycle Provide efficient delivery of media assets

10. Why Scalable? • Increasing volume, variety, velocity – Collectors, cameras, sensors and sources • Ex: UGC, raw source, Mezzanine, B-roll, creative collateral • Final content – Formats and standards • Transport, containers, codecs, metadata • SD, HD, 4K …. 8K – Devices and user expectations • Opportunities through cloud enablement – Media platform as a service – Multitenancy

11. What about Search? Ugh … • Core elements – Project, keyword, asset name, tags, date/time capture, timecode range, subject, format, size • Extended structured search – Dublin core, XMP, MPEG-7, IPTC, EXIF, FCXML, SMPTE, MISB • Unstructured search – Comments, notes, transcript, closed captioning

12. Enough Theory … Let’s Build a DAM in the Cloud!

13. (Demo) The User Experience (Notional Reference Client)

14. Architecture

15. Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch

16. Delivery Cache DAM Storage & S3 Buckets Archive For Renditions, Mailbox Event Handler DAM Web Service DAM Web Interface AWS Beanstalk Rendition Processing Metadata Sidecar Files EC2 Workers Auto scaling Group DynamoDB Catalog Mailbox Metadata Processing EC2 Workers Auto scaling Group Amazon CloudSearch

22. Tools Available to Us Need Description AWS Service Ingest Integrate w / existing file-based workflows Amazon S3 Metadata Process inline and sidecar files EC2 / Elastic Beanstalk Renditions Autogenerate thumbnails and proxies Amazon Elastic Transcoder Catalog part 1 Administrative entities, simple retrieval Amazon DynamoDB Catalog part 2 Field and free-form search Amazon CloudSearch Storage Nearline, online, offline infinite storage Amazon S3, Amazon Glacier Delivery Global caching and streaming footprint Amazon CloudFront

23. Catalog: A word on why DynamoDB Container-A Header NoSQL Data Model Layer-2 Core Elem1 Core Elem2 Elem from A Name_A Size Some_Field Name_B Size Name_C Layer-1 Size Container-B Header Container-C Layer-1 Header Layer-2 Elem from B Some_Field

24. Catalog: A Word on Why CloudSearch • Video and text – Header fields with textual descriptions, synopsis, comments – Tracks with speech to text, closed caption data – Links to scripts • Video and structured elements – XMP dynamic media – Sidecar files • A managed search engine dedicated to these kinds of problems – Case folding, stemming, stopword removal, synonyms – Also accent normalization, UTF-8 normalization, etc.

25. Other Goodies • Back-end services – AWS CLI – Open source decode utilities • EXIFtool • MediaInfo – ETL support • Talend (representative) • Front-end services – Node.js + AWS Node SDK

27. CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG Amazon SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service Amazon SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch

28. (Dual Screen) Walkthrough

29. Setup • Amazon Simple Storage Service (S3) buckets ready to go – External staging locations – Internal working locations • Amazon Simple Notification Service (SNS) + Amazon Simple Queue Service (SQS) wired up • Catalog data models established – Amazon DynamoDB table “catalog” created – Amazon CloudSearch search domain “catalog” created

30. 1. Ingest, Crawl, Notify a. b. c. d. End user initiates data copy EC2 worker scans Amazon S3 staging bucket EC2 worker copies or moves content EC2 worker broadcasts “NEW DATA” event

31. CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG Metadata Workers Amazon CloudSearch

36. 1. Ingest, Crawl, Notify a. b. c. d. End user initiates data copy EC2 worker scans Amazon S3 staging bucket EC2 worker copies or moves content EC2 worker broadcasts “NEW DATA” event (SNS)

37. 2. Metadata Extraction a. EC2 worker polls inbox (SQS) b. EC2 worker pulls down media asset from Amazon S3 c. EC2 worker parses media files d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion e. EC2 worker inserts into catalog (Amazon DynamoDB)

40. 2. Metadata Extraction a. EC2 worker polls inbox (SQS) b. EC2 worker pulls down media asset from Amazon S3 c. EC2 worker parses media files d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion e. EC2 worker inserts into catalog (Amazon DynamoDB)

41. Preparing for Amazon DynamoDB Insert { "COMPLETE_NAME" : { "S" : "01_01_SoccerF_05_A.mp4" }, "FORMAT" : { "S" : "MPEG-4" }, "CODEC_ID" : { "S" : "mp42" } }

42. Model It and Deploy to EC2! (Talend)

43. 3. Catalog Processing a. Store metadata record in Amazon DynamoDB b. Reflect searchable subset to Amazon CloudSearch c. Go crazy (HTTP GET)

45. CloudFront Download Distribution Media Content Amazon S3 Storage For Source, Renditions, Metadata Sidecar Files EC2 Crawler EC2 ASG SQS Queue Rendition Jobs Rendition Workers Elastic Transcoder Proxy / Thumbnail Generation DAM Catalog Amazon DynamoDB 1 Amazon SNS Topic DAM Web Service SQS Queue Metadata Processing Jobs AWS Elastic Beanstalk EC2 ASG 2 Metadata Workers Amazon CloudSearch

46. Querying the Catalog (Amazon CloudSearch) • http://cloudsearch.demo.aws.com/2011-0201/search?bq=complete_name : …<field=value> • In Node.js var optionsget host : port : path : = { 'cloudsearch.demo.aws.com', // here only the domain name 80, '/2011-02-01/search?bq=complete_name:'-STRAWBERRY'& return-fields=complete_name,text_relevance,codec_id_info, duration,file_size, duration,encoded_date', method : 'GET' }

47. Customer Case Study (PBS)

48. Merlin: PBS CMS/DAM • Code name Merlin • Structured metadata • 200+ web object records daily – 29,046 web objects • 150+ Video objects daily – 91,436 videos • Users from over 150 stations 30 national producers – Frontline – Downton Abbey – PBS Newshour

49. What’s It Do? • Large multitenant system – 1200 registered users • 250 million streams per month • 20 million unique viewers • 8 PB of video delivered monthly

50. Getting Data In • 33 ingestible web feeds – Content editors – Web page listings • Batch video ingest API – Video content editors – External workflow integration • Manually entered videos – Video content editors from all 50 states – Number of user accounts

51. System Overview User Input Ingest API Amazon CloudSearch Search Util DAM (Merlin) Workflow Service Content API Amazon SWF RSS Amazon RDS Amazon S3 CDN Amazon RDS

52. Basic Workflow • Object registered with Merlin • Images registered and processed with ITS – Stored in CDN fronted Amazon S3 bucket • Videos registered with VTS – Jobs sent to Zencoder for processing – Video stored in CDN fronted Amazon S3 bucket • Objects ready for clients – Objects rendered for consumption in Amazon S3 – Objects registered with APIs – Objects discoverable

53. Making It Discoverable • Search util service • Runs every hour – Re-indexes last several hours each time • Polls APIs – Content API – Modified time • Updates Amazon CloudSearch index – 2 primary indexes

54. Search Considerations • Hidden objects • Rights management • Partitioned search – Local station search – Results by geo – Restrict results for international customers • Unify and normalize existing APIs – Flatten data model • Users looking for programs – Specific searches – Suitable for structured data

55. Challenges • No native time field – Convert dates to integers – Epoch time • Versioning of documents – Epoch for versioning • Exposing two versions of most fields – Text searchable – Facets (copy of text version)

56. Search Consumers (PBS.org) Site Search

57. Search Consumers (Video Portal) Site Search Programs A-Z

58. Xbox / OTT

59. Summary

60. Summary • Build an enterprise-scale DAM platform now – Managed storage and archive (Amazon S3, Amazon Glacier) – Managed database for catalog processing (Amazon DynamoDB, Amazon Relational Database Service [RDS]) – Managed search (CloudSearch) • Application development accelerators – Elastic Beanstalk harness (web, API, and worker roles) – Reduced effort with the AWS CLI • (Almost) fire and forget

61. AWS Marketplace Can Help • AWS online software store – – – – – • Customer can find, research, buy software Simple pricing, aligns with EC2 usage model 1-click launch in minutes Marketplace billing integrated into your AWS account 1,000+ products across 24 categories Digital asset management related options Include: – – – – WebDAM – centralize, store, manage and distribute collateral Digital asset management cloud – web-based open source DAM Widen – manage and distribute digital media and brand assets with user roles and permissions Adobe Experience Manager – unified asset management including mobile Learn more at: http://aws.amazon.com/marketplace

62. “DAM!”

63. Please give us your feedback on this presentation MED-402 Building a Scalable Video / DAM Solution in the Cloud As a thank you, we will select prize winners daily for completed surveys!

Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013

Ähnlich wie Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013 (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013