Vevo has undergone a complete strategic and technical reboot, driven not only by product, but also by engineering. Since November 2015, Vevo has been replacing monolithic, legacy content services with a modern, modular, microservices architecture, all while developing new features and functionality. In parallel, Vevo has built its data platform from scratch to power internal analytics as well as a unique music video consumption experience through a new personalized feed of recommendations — all in less than one year.
This has been a monumental effort that was made possible in this short time span largely because of AWS technologies. The content team has been heavily using serverless architectures and AWS Lambda in the form of microservices, taking a similar approach to functional programming, which has helped us speed up the development process and time to market. The data team has been building the data platform by heavily leveraging Amazon Kinesis for data exchange across services, Amazon Aurora for consumer-facing services, Apache Spark on Amazon EMR for ETL + Machine Learning, as well as Amazon Redshift as the core analytics data store.
In this session, Miguel and Alan walk you through Vevo's journey, describing best practices and learnings that the Vevo team has picked up along the way.
2. About Us
• Miguel Alvarado, VP of Data Analytics
@djmalvarado
miguelalvarado
miguel.alvarado@vevo.com
• Alan Zawari, Senior Engineer, Content Services
@alanzawari
alanzawari
skilledDeveloper
alan.zawari@vevo.com
3. What to Expect from the Session
• Learn About Vevo and Engineering @ Vevo
• Content Services
• What is content services?
• Rearchitecting content services from the ground up
• How AWS Lambda functions fit into the picture
• Data Services
• What are data services?
• Building a data platform from scratch
• Using Amazon Kinesis as the central data nervous system
4. What Is Vevo?
Vevo is the world’s leading all-
premium music video and
entertainment platform with over
19 billion monthly views globally.
Vevo delivers a personalized and
expertly curated experience for
audiences to explore and discover
music videos, exclusive original
programming, and live
performances from the artists they
love on mobile, web, and
connected TV.
11. What Is Content Services?
• Infrastructure that allows music artists to deliver video to
their audience:
• Artist and video metadata
• Video ingestion
• Video encoding
• Publishing to our own platforms and partners
• Providing APIs for our client apps
13. What Are Data Services?
• Data services are the collection of services and
infrastructure that encompasses the Vevo Data Platform
• The Vevo Data Platform powers two main things:
• Vevo’s “Smart Consumer Experiences” in the form of
personalization and recommendations
• Analytics for all Vevo product and business groups
• The Data Services team comprises platform engineers
and data scientists
15. Old Architecture
• A giant monolithic service responsible for everything:
• Authentication
• Search
• Playlists
• Artists
• Videos and streams
• Recommendation
• More
• .NET/SQL Server stack
17. New Content Architecture
• Microservice/stream architecture
• Small independent services + Amazon Kinesis
• Different technology stacks: Node.js, Java, Go
• Services don’t talk to each other directly
• Every service has its own event stream (bus)
• Services can consume each others’ stream events
• Each streams has its own JSON schema
• The whole thing cannot go down
• Continuous delivery
• Cost effective
19. Developing New Architecture
• Old architecture is in production and running
• New architecture is in progress
• We wanted to feed new architecture with live data
• And get both running simultaneously
• Slowly switch traffic over to new architecture
• Connect both worlds without changing the old code?
20. Bridging New Architecture to the Old
• Project Mexit
• Runs on a recurring basis
• Queries old API for recent metadata changes (like a client)
• Emits the changes to new-architecture Amazon Kinesis
streams
• Fault tolerant: Stores last successful timestamp
• AWS Lambda + Amazon DynamoDB
23. How We Used Lambdas
• Scheduled tasks (Cron job)
• Database triggers
• User-facing services
• Other use cases
24. How We Used Lambdas
• Scheduled tasks (Cron job)
• Read artist/video metadata changes and:
• Update Amazon Elasticsearch Service index
• Stream changes to Amazon Kinesis (Project Mexit)
• Cache warming: Keep top artist images in the cache
• Release new videos based on startDate (Project Releasr)
• Polling every 5 sec
• Lambda schedule event: rate(1 minute)
• Long running Lambda (i.e., 5 min timeout)
• 5-sec intervals
25. Releasr, 5-Sec Recurring Task
module.exports.handler = function (event, context, callback) {
//MAIN TIMER - runs every 5 seconds
var timer = setInterval(function () {
if (!processing) {
processItems();
} else {
//still processing, come back later!
}
}, 5000);
//finish lambda before it’s timed out
setTimeout(function () {
clearInterval(timer);
console.log("Finished processing at ", new Date().toISOString());
callback();
}, LAMBDA_TIMEOUT - 1000);
};
26. How We Used Lambdas (cont.)
• Database triggers
• Send user likes to Amazon Kinesis (Project Dartmouth)
• Export user likes to Amazon S3/Amazon Redshift
• Cross-account Amazon DynamoDB replication (Project
Fargo):
27. How We Used Lambdas (cont.)
• User-facing services
• Project Susa, Vevo link shortening and social interaction
tracking
• Pure serverless!
• Consists of nanoservices:
• Shorten
• Expand
• Event-Publisher: DB trigger to capture/publish events
• AWS Lambda, Amazon API Gateway, and Amazon DynamoDB
28. How We Used Lambdas (cont.)
• Project Susa: Creating a short link
AMAZONKINESISBUS
Client
Events
Core
REST API
APIGateway l1
Data
Consumers
l3
Auth
APIv2
Decode
Token
APIv2
Video
Metadata
l0
Create a new
Short link
Response
Store link
and parameters
Record a
‘Share’ event
Get YouTube
URL
Authentication
29. How We Used Lambdas (cont.)
• Project Susa: Clicking on a short link
AMAZONKINESISBUS
Core
Events
Click on
Short link
Redirection
Response
REST API
APIGateway
l2
Retrieve
full URL
Record a
‘Click’ event
Data
Consumers
l0
30. How We Used Lambdas (cont.)
• Project Susa: Lambda/API Gateway Scalability
Spike:
80X normal traffic
31. How We Used Lambdas (cont.)
• Other use cases:
• Sending data to third-party ML providers (Project Dartmouth)
• Slack integration (on-demand cache buster):
33. Old World
• No Data team
• No Vevo Data Platform
• No first-party data
• No data science
• No personalization or recommendations
• Used third-party comScore DAX for analytics
• No continuous delivery
36. Project Dartmouth
• 5 ML companies A, B, C, D, E
• Power the Feed on iOS
• Real-time event collection
• Real-time recommendations
• Goal: improve swipe/click
ratio
41. Service-to-Service Contracts
• Based on JSON schemas, considered Avro and Protocol
Buffers
• All entities that are shared via Amazon Kinesis need a
schema
• Full payloads can be passed or just notification with the
ID of new or modified entity
• Messages should be dropped into Amazon Kinesis at
the time of creation or modification
47. Lessons Learned
• Lambda (and resources) deployment could be challenging
• Used serverless framework
• Integrated with our CI/CD framework
• Lambda throttled invocations
• Watched it and increased concurrent invocations per account
• Lambda cold start issue
• Kept it warm by frequent invocations (every 5 min.)
• Standardized what goes on stream
• A central place for schemas
• A central place for error messages
48. Lessons Learned (cont.)
• Don’t try to boil the ocean all at once
• Use real user data to make decisions
• Reuse as much existing technology vs. build
• If you’re serious about analytics, build your own platform