AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero (SVR308)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Miguel Alvarado, VP of Data Analytics
Alan Zawari, Senior Engineer, Content Services
December 1, 2016
Content and Data Platforms at Vevo
Rebuilding and Scaling from Zero in One Year
SVR308

About Us
• Miguel Alvarado, VP of Data Analytics
@djmalvarado
miguelalvarado
miguel.alvarado@vevo.com
• Alan Zawari, Senior Engineer, Content Services
@alanzawari
alanzawari
skilledDeveloper
alan.zawari@vevo.com

What to Expect from the Session
• Learn About Vevo and Engineering @ Vevo
• Content Services
• What is content services?
• Rearchitecting content services from the ground up
• How AWS Lambda functions fit into the picture
• Data Services
• What are data services?
• Building a data platform from scratch
• Using Amazon Kinesis as the central data nervous system

What Is Vevo?
Vevo is the world’s leading all-
premium music video and
entertainment platform with over
19 billion monthly views globally.
Vevo delivers a personalized and
expertly curated experience for
audiences to explore and discover
music videos, exclusive original
programming, and live
performances from the artists they
love on mobile, web, and
connected TV.

PROPERTY OF VEVO LLC PRIVATE & CONFIDENTIAL
The evolution of vevo
Before (2015)

The evolution of vevo
After (2016)

Engineering @ Vevo
• Vevo Engineering 1.0 (2009-15):
• Hybrid hosting (Rackspace + AWS)
• Megaservices
• .Net!
• No continuous delivery, no tests
• Loads of technical debt
• Vevo Engineering 2.0 (2016):
• 100% AWS + Kubernetes
• Microservices + Nanoservices (Lambdas)
• Go, Scala, Java, Node.js
• Continuous delivery, TDD, BDD
• Rewritten most of stack
• We’re hiring!

What Is Content Services?
• Infrastructure that allows music artists to deliver video to
their audience:
• Artist and video metadata
• Video ingestion
• Video encoding
• Publishing to our own platforms and partners
• Providing APIs for our client apps

What Are Data Services?
• Data services are the collection of services and
infrastructure that encompasses the Vevo Data Platform
• The Vevo Data Platform powers two main things:
• Vevo’s “Smart Consumer Experiences” in the form of
personalization and recommendations
• Analytics for all Vevo product and business groups
• The Data Services team comprises platform engineers
and data scientists

Rearchitecting Content
Services

Old Architecture
• A giant monolithic service responsible for everything:
• Authentication
• Search
• Playlists
• Artists
• Videos and streams
• Recommendation
• More
• .NET/SQL Server stack

New Content Architecture
• Microservice/stream architecture
• Small independent services + Amazon Kinesis
• Different technology stacks: Node.js, Java, Go
• Services don’t talk to each other directly
• Every service has its own event stream (bus)
• Services can consume each others’ stream events
• Each streams has its own JSON schema
• The whole thing cannot go down
• Continuous delivery
• Cost effective

Developing New Architecture
• Old architecture is in production and running
• New architecture is in progress
• We wanted to feed new architecture with live data
• And get both running simultaneously
• Slowly switch traffic over to new architecture
• Connect both worlds without changing the old code?

Bridging New Architecture to the Old
• Project Mexit
• Runs on a recurring basis
• Queries old API for recent metadata changes (like a client)
• Emits the changes to new-architecture Amazon Kinesis
streams
• Fault tolerant: Stores last successful timestamp
• AWS Lambda + Amazon DynamoDB

Bridging New Architecture to the Old
Live
Prod
Data

How We Used Lambdas
• Scheduled tasks (Cron job)
• Database triggers
• User-facing services
• Other use cases

How We Used Lambdas
• Scheduled tasks (Cron job)
• Read artist/video metadata changes and:
• Update Amazon Elasticsearch Service index
• Stream changes to Amazon Kinesis (Project Mexit)
• Cache warming: Keep top artist images in the cache
• Release new videos based on startDate (Project Releasr)
• Polling every 5 sec
• Lambda schedule event: rate(1 minute)
• Long running Lambda (i.e., 5 min timeout)
• 5-sec intervals

Releasr, 5-Sec Recurring Task
module.exports.handler = function (event, context, callback) {
//MAIN TIMER - runs every 5 seconds
var timer = setInterval(function () {
if (!processing) {
processItems();
} else {
//still processing, come back later!
}
}, 5000);
//finish lambda before it’s timed out
setTimeout(function () {
clearInterval(timer);
console.log("Finished processing at ", new Date().toISOString());
callback();
}, LAMBDA_TIMEOUT - 1000);
};

How We Used Lambdas (cont.)
• Database triggers
• Send user likes to Amazon Kinesis (Project Dartmouth)
• Export user likes to Amazon S3/Amazon Redshift
• Cross-account Amazon DynamoDB replication (Project
Fargo):

• User-facing services
• Project Susa, Vevo link shortening and social interaction
tracking
• Pure serverless!
• Consists of nanoservices:
• Shorten
• Expand
• Event-Publisher: DB trigger to capture/publish events
• AWS Lambda, Amazon API Gateway, and Amazon DynamoDB

• Project Susa: Creating a short link
AMAZONKINESISBUS
Client
Events
Core
REST API
APIGateway l1
Data
Consumers
l3
Auth
APIv2
Decode
Token
APIv2
Video
Metadata
l0
Create a new
Short link
Response
Store link
and parameters
Record a
‘Share’ event
Get YouTube
URL
Authentication

• Project Susa: Clicking on a short link
AMAZONKINESISBUS
Core
Events
Click on
Short link
Redirection
Response
REST API
APIGateway
l2
Retrieve
full URL
Record a
‘Click’ event
Data
Consumers
l0

• Project Susa: Lambda/API Gateway Scalability
Spike:
80X normal traffic

• Other use cases:
• Sending data to third-party ML providers (Project Dartmouth)
• Slack integration (on-demand cache buster):

Building Data Services from
Scratch

Old World
• No Data team
• No Vevo Data Platform
• No first-party data
• No data science
• No personalization or recommendations
• Used third-party comScore DAX for analytics
• No continuous delivery

Data Science Leapfrog:
Project Dartmouth

Project Dartmouth
• 5 ML companies A, B, C, D, E
• Power the Feed on iOS
• Real-time event collection
• Real-time recommendations
• Goal: improve swipe/click
ratio

Project Dartmouth and Event Collection POC

Current Data Services Architecture

Service-to-Service Contracts
• Based on JSON schemas, considered Avro and Protocol
Buffers
• All entities that are shared via Amazon Kinesis need a
schema
• Full payloads can be passed or just notification with the
ID of new or modified entity
• Messages should be dropped into Amazon Kinesis at
the time of creation or modification

Central Repo/Sample JSON Schema
{ "$schema": "http://json-schema.org/draft-04/schema#",
"id": "http://schemas.vevo.com/streams/like/1.0/like-
event.json#",
"type": "object",
"properties":
{ "user_id":
{"type": "string",
"minLength": 1 },
"entity_type":
{"type": "string",
"enum": [
"USER",
"PLAYLIST",
"VIDEO",
"ARTIST" ]
},
"entity_id": {
"type": "string",
"minLength": 1
},
"action": {"type": "string",
"enum": ["LIKE",
"UNLIKE"
]
}
},
"required": [
"user_id",
"entity_type",
"entity_id",
"action" ]}

Spark for Most Data Processing

Current Data Services Architecture (recap)

Lessons Learned
• Lambda (and resources) deployment could be challenging
• Used serverless framework
• Integrated with our CI/CD framework
• Lambda throttled invocations
• Watched it and increased concurrent invocations per account
• Lambda cold start issue
• Kept it warm by frequent invocations (every 5 min.)
• Standardized what goes on stream
• A central place for schemas
• A central place for error messages

Lessons Learned (cont.)
• Don’t try to boil the ocean all at once
• Use real user data to make decisions
• Reuse as much existing technology vs. build
• If you’re serious about analytics, build your own platform

Remember to complete
your evaluations!

AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero (SVR308)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero (SVR308)

Similar to AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero (SVR308) (20)

Recently uploaded

Recently uploaded (20)

AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero (SVR308)