Genie - Hadoop Platform as a Service at Netflix

1
Genie – Hadoop Platform as a Service at Netflix
Sriram Krishnan
Hadoop Summit, June 26, 2013

Netflix does Hadoop at scale in the cloud

S3 as the Cloud Data Warehouse
Cloud Data Warehouse

Multiple Hadoop Clusters
Hadoop (EMR) Clusters

Data Platform as a Service
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)

Large Ecosystem of Clients & Tools
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)

Why Genie?
 Simple API for job submission and management
 Accessible from the data center and the cloud
 Abstraction of physical details of back-end
Hadoop clusters

What Genie is Not
 A workflow scheduler, such as Oozie
 A task scheduler, such as fair share or capacity
schedulers
 An end-to-end resource management tool

Genie: Job Execution
 API to run Hadoop, Hive and Pig
jobs
 Auto-magic submission of jobs
to the right Hadoop cluster
 Abstracting away cluster details
from clients

Genie: Resource Configuration
 API for management of cluster
metadata
 Status: up, out of service, or
terminated
 Site-specific Hadoop, Hive and
Pig configurations
 Cluster naming/tagging for job
submissions

Eureka ServiceEureka Service
ClientEureka
Client
Ribbon
Client Eureka
Client
Python API
Registers
service
Discovers
service
Discovers
service
Invokes
(submits job)
Launches
cluster(s)
Launches
job
Registers
cluster
End-users
Admins
Netflix OSS
http://netflix.github.com
Karyon
Eureka
Client
Ribbon
Servo
Hadoop
Hive
Pig
Karyon
Archaius
Ribbon
Servo
Hadoop
Hive
Pig
Eureka
Client

• Job Type: {hadoop, hive, pig}
• File dependencies (script, udfs, etc)
• Command-line arguments
• Schedule: {adhoc, sla}
• Configuration: {prod, test, unittest}
REST call

* Used to query status, get outputs, kill job
Response: job ID*

Genie Job Details
Job ID
Script to execute
Standard output and error
Pig logs
Job conf directory

Genie – Use Cases Enabled at Netflix
 Running nightly short-lived “bonus” clusters to
augment ETL processing
 Re-routing traffic between clusters
 “Red/black” pushes for clusters
 Attaching stand-alone gateways to clusters
 Running 100% of all SLA jobs, and a high
percentage of ad-hoc jobs

Nightly Short-lived Bonus Clusters
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod

Bonus Cluster:
Schedule: bonus
{Schedule=bonus,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla

Bonus Cluster:
Schedule: bonus
Status: OUT_OF_SERVICE
Prod SLA Cluster:
Schedule: sla
{Schedule=sla,
Configuration=prod}

Bonus Cluster:
Schedule: bonus
Status: TERMINATED
Prod SLA Cluster:
Schedule: sla
{Schedule=sla,
Configuration=prod}

Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
{Schedule=sla,
Configuration=prod}

Ad-hoc Cluster:
Schedule: adhoc, sla
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla

Ad-hoc Cluster:
Schedule: adhoc
Prod SLA Cluster:
Schedule: sla
Status: UP
{Schedule=sla,
Configuration=prod}

“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Status: UP
{Schedule=sla,
Configuration=prod}

Prod SLA Cluster:
Schedule: sla
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Status: UP

Prod SLA Cluster:
Schedule: sla
Status: TERMINATED
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Status: UP

Genie Usage at Netflix
 Usage statistics brought to you by “Sherlock”
 Pig job to gather Hadoop job statistics
 Tableau-based visualization

Cloud Deployment
 Asgard is also part of Netflix OSS
 https://github.com/Netflix/asgard

Genie is now part of Netflix OSS!
 http://techblog.netflix.com/2013/06/genie-is-out-
of-bottle.html
 Clone it on GitHub at:
 https://github.com/Netflix/genie
 Still “version 0” – work in progress!
 All contributions and feedback welcome!
 Come talk to us and check out live demos at the
Netflix Booth

Watching Pigs Fly with the
Netflix Hadoop Toolkit

 Sriram Krishnan
We’re hiring!
Thank you!
Home: http://www.netflix.com
Jobs: http://jobs.netflix.com
Tech Blog: http://techblog.netflix.com/

Genie - Hadoop Platform as a Service at Netflix

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (19)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Genie - Hadoop Platform as a Service at Netflix

Hinweis der Redaktion