SlideShare ist ein Scribd-Unternehmen logo
1 von 34
1
Genie – Hadoop Platform as a Service at Netflix
Sriram Krishnan
Hadoop Summit, June 26, 2013
Netflix does Hadoop
Netflix does Hadoop at scale
Netflix does Hadoop at scale*
Netflix does Hadoop at scale in the cloud
S3 as the Cloud Data Warehouse
Cloud Data Warehouse
Multiple Hadoop Clusters
Cloud Data Warehouse
Hadoop (EMR) Clusters
Data Platform as a Service
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Large Ecosystem of Clients & Tools
Cloud Data Warehouse
Hadoop (EMR) Clusters
Hadoop Platform as a Service
Job
Execution
Resource Configuration
& Management
Metadata Service
(Franklin)
Why Genie?
 Simple API for job submission and management
 Accessible from the data center and the cloud
 Abstraction of physical details of back-end
Hadoop clusters
What Genie is Not
 A workflow scheduler, such as Oozie
 A task scheduler, such as fair share or capacity
schedulers
 An end-to-end resource management tool
Genie: Job Execution
 API to run Hadoop, Hive and Pig
jobs
 Auto-magic submission of jobs
to the right Hadoop cluster
 Abstracting away cluster details
from clients
Genie: Resource Configuration
 API for management of cluster
metadata
 Status: up, out of service, or
terminated
 Site-specific Hadoop, Hive and
Pig configurations
 Cluster naming/tagging for job
submissions
Eureka ServiceEureka Service
ClientEureka
Client
Ribbon
Client Eureka
Client
Python API
Registers
service
Discovers
service
Discovers
service
Invokes
(submits job)
Launches
cluster(s)
Launches
job
Registers
cluster
End-users
Admins
Netflix OSS
http://netflix.github.com
Karyon
Eureka
Client
Ribbon
Servo
Hadoop
Hive
Pig
Karyon
Archaius
Ribbon
Servo
Hadoop
Hive
Pig
Eureka
Client
Genie: Job Execution
• Job Type: {hadoop, hive, pig}
• File dependencies (script, udfs, etc)
• Command-line arguments
• Schedule: {adhoc, sla}
• Configuration: {prod, test, unittest}
REST call
Genie: Job Execution
* Used to query status, get outputs, kill job
Response: job ID*
Genie Job Details
Job ID
Script to execute
Standard output and error
Pig logs
Job conf directory
Genie – Use Cases Enabled at Netflix
 Running nightly short-lived “bonus” clusters to
augment ETL processing
 Re-routing traffic between clusters
 “Red/black” pushes for clusters
 Attaching stand-alone gateways to clusters
 Running 100% of all SLA jobs, and a high
percentage of ad-hoc jobs
Nightly Short-lived Bonus Clusters
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Execution Service Configuration Service
{Schedule=bonus,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Nightly Short-lived Bonus Clusters
Bonus Cluster:
Schedule: bonus
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
Prod SLA Cluster:
Schedule: sla
Configurations: prod
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc, sla
Configurations: prod, test
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Rerouting Traffic Between Clusters
Ad-hoc Cluster:
Schedule: adhoc
Configurations: prod, test
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: OUT_OF_SERVICE
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
“Red/Black” Pushes for Clusters
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: TERMINATED
Execution Service Configuration Service
{Schedule=sla,
Configuration=prod}
Prod SLA Cluster:
Schedule: sla
Configurations: prod
Status: UP
Genie Usage at Netflix
 Usage statistics brought to you by “Sherlock”
 Pig job to gather Hadoop job statistics
 Tableau-based visualization
Cloud Deployment
 Asgard is also part of Netflix OSS
 https://github.com/Netflix/asgard
Auto Scaling in the Cloud
Genie is now part of Netflix OSS!
 http://techblog.netflix.com/2013/06/genie-is-out-
of-bottle.html
 Clone it on GitHub at:
 https://github.com/Netflix/genie
 Still “version 0” – work in progress!
 All contributions and feedback welcome!
 Come talk to us and check out live demos at the
Netflix Booth
Watching Pigs Fly with the
Netflix Hadoop Toolkit
 Sriram Krishnan
We’re hiring!
Thank you!
Home: http://www.netflix.com
Jobs: http://jobs.netflix.com
Tech Blog: http://techblog.netflix.com/

Weitere ähnliche Inhalte

Andere mochten auch

May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopYahoo Developer Network
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12mislam77
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to SparkSky Yin
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NYahoo Developer Network
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Kurt Brown
 
Oozie sweet
Oozie sweetOozie sweet
Oozie sweetmislam77
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
Data Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixData Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixKurt Brown
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtimeDataWorks Summit
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieDataWorks Summit/Hadoop Summit
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterDon Drake
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 

Andere mochten auch (19)

May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for HadoopMay 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
May 2012 HUG: Oozie: Towards a scalable Workflow Management System for Hadoop
 
Oozie HUG May12
Oozie HUG May12Oozie HUG May12
Oozie HUG May12
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas NApache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
Apache Hadoop India Summit 2011 talk "Oozie - Workflow for Hadoop" by Andreas N
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Hadoop data access layer v4.0
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
 
October 2014 HUG : Oozie HA
October 2014 HUG : Oozie HAOctober 2014 HUG : Oozie HA
October 2014 HUG : Oozie HA
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
 
October 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.xOctober 2013 HUG: Oozie 4.x
October 2013 HUG: Oozie 4.x
 
Oozie sweet
Oozie sweetOozie sweet
Oozie sweet
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Data Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixData Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at Netflix
 
Oozie towards zero downtime
Oozie towards zero downtimeOozie towards zero downtime
Oozie towards zero downtime
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
Building and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache OozieBuilding and managing complex dependencies pipeline using Apache Oozie
Building and managing complex dependencies pipeline using Apache Oozie
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball RosterSpark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 

Kürzlich hochgeladen

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Genie - Hadoop Platform as a Service at Netflix

  • 1. 1 Genie – Hadoop Platform as a Service at Netflix Sriram Krishnan Hadoop Summit, June 26, 2013
  • 5. Netflix does Hadoop at scale in the cloud
  • 6. S3 as the Cloud Data Warehouse Cloud Data Warehouse
  • 7. Multiple Hadoop Clusters Cloud Data Warehouse Hadoop (EMR) Clusters
  • 8. Data Platform as a Service Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 9. Large Ecosystem of Clients & Tools Cloud Data Warehouse Hadoop (EMR) Clusters Hadoop Platform as a Service Job Execution Resource Configuration & Management Metadata Service (Franklin)
  • 10. Why Genie?  Simple API for job submission and management  Accessible from the data center and the cloud  Abstraction of physical details of back-end Hadoop clusters
  • 11. What Genie is Not  A workflow scheduler, such as Oozie  A task scheduler, such as fair share or capacity schedulers  An end-to-end resource management tool
  • 12. Genie: Job Execution  API to run Hadoop, Hive and Pig jobs  Auto-magic submission of jobs to the right Hadoop cluster  Abstracting away cluster details from clients
  • 13. Genie: Resource Configuration  API for management of cluster metadata  Status: up, out of service, or terminated  Site-specific Hadoop, Hive and Pig configurations  Cluster naming/tagging for job submissions
  • 14. Eureka ServiceEureka Service ClientEureka Client Ribbon Client Eureka Client Python API Registers service Discovers service Discovers service Invokes (submits job) Launches cluster(s) Launches job Registers cluster End-users Admins Netflix OSS http://netflix.github.com Karyon Eureka Client Ribbon Servo Hadoop Hive Pig Karyon Archaius Ribbon Servo Hadoop Hive Pig Eureka Client
  • 15. Genie: Job Execution • Job Type: {hadoop, hive, pig} • File dependencies (script, udfs, etc) • Command-line arguments • Schedule: {adhoc, sla} • Configuration: {prod, test, unittest} REST call
  • 16. Genie: Job Execution * Used to query status, get outputs, kill job Response: job ID*
  • 17. Genie Job Details Job ID Script to execute Standard output and error Pig logs Job conf directory
  • 18. Genie – Use Cases Enabled at Netflix  Running nightly short-lived “bonus” clusters to augment ETL processing  Re-routing traffic between clusters  “Red/black” pushes for clusters  Attaching stand-alone gateways to clusters  Running 100% of all SLA jobs, and a high percentage of ad-hoc jobs
  • 19. Nightly Short-lived Bonus Clusters Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod
  • 20. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Execution Service Configuration Service {Schedule=bonus, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod
  • 21. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 22. Nightly Short-lived Bonus Clusters Bonus Cluster: Schedule: bonus Configurations: prod Status: TERMINATED Execution Service Configuration Service Prod SLA Cluster: Schedule: sla Configurations: prod {Schedule=sla, Configuration=prod}
  • 23. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 24. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc, sla Configurations: prod, test Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE
  • 25. Rerouting Traffic Between Clusters Ad-hoc Cluster: Schedule: adhoc Configurations: prod, test Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 26. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP Execution Service Configuration Service {Schedule=sla, Configuration=prod}
  • 27. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: OUT_OF_SERVICE Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 28. “Red/Black” Pushes for Clusters Prod SLA Cluster: Schedule: sla Configurations: prod Status: TERMINATED Execution Service Configuration Service {Schedule=sla, Configuration=prod} Prod SLA Cluster: Schedule: sla Configurations: prod Status: UP
  • 29. Genie Usage at Netflix  Usage statistics brought to you by “Sherlock”  Pig job to gather Hadoop job statistics  Tableau-based visualization
  • 30. Cloud Deployment  Asgard is also part of Netflix OSS  https://github.com/Netflix/asgard
  • 31. Auto Scaling in the Cloud
  • 32. Genie is now part of Netflix OSS!  http://techblog.netflix.com/2013/06/genie-is-out- of-bottle.html  Clone it on GitHub at:  https://github.com/Netflix/genie  Still “version 0” – work in progress!  All contributions and feedback welcome!  Come talk to us and check out live demos at the Netflix Booth
  • 33. Watching Pigs Fly with the Netflix Hadoop Toolkit
  • 34.  Sriram Krishnan We’re hiring! Thank you! Home: http://www.netflix.com Jobs: http://jobs.netflix.com Tech Blog: http://techblog.netflix.com/

Hinweis der Redaktion

  1. Referencehttp://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.htmlUse cases – reporting, analytics, insights, algorithms (e.g. recommendations)But big deal – so does everyone in the room
  2. What is scale? It means different things to different people
  3. Few petabytes of data – billons of log events captured each data, with retention of a few monthsMany clusters – 1000s of nodesAgain, big deal – there are many others in the room who do Hadoop at this scale (petabyte is the new terabyte)
  4. Our Hadoop processing is 100% in the (public) cloudIn our case, public cloud is AWSThis is what differentiates our infrastructure from the restHadoop in the cloud is different from Hadoop in the datacenter – in this talk, we will discuss our cloud-based Hadoop platform
  5. S3 is the source of truthDecoupling of storage from the computational infrastructureS3 benefitsHighly durable and available – 11 9’sBucket versioningHighly elastic - we grew our data warehouse organically from a few hundred terabytes to petabytes without having to provision any storage resources in advanceHDFS? Only for transient data, intermediate results for multi-stage jobsS3 cons – performance, eventual consistency
  6. Another benefit of S3 - Multiple clusters can read/process the same data(Semi-) persistent sla and ad-hoc clusters~800-1300 nodesMultiple ad-hoc clusters to A/B test new releases/featuresNightly "bonus" clusters to supplement SLA clusterOperation assumption – clusters may go down at any time
  7. Traditional Gateways/CLIsAd-hoc queryingGenieREST API for job execution/monitoringRepository/abstraction for clusters and metastoresFranklin – MDSUses HCAT/HiveServer to talk to Hive metastore
  8. Next – we will focus on Genie for the rest of the talkOther tools will be talked about in the other Netflix talk
  9. EMR: HadoopIaaS, and an API to run jobs on transient clusters – our clusters are semi-persistent, and job submissions don’t result in new clusters.Oozie: Workflow tool, which only supports Hadoop ecosystem – we have hybrid jobs (Teradata+Hadoop) being orchestrated by UC4, so we just needed a job submission API. Also no support for Hive when we started.Templeton: No multi-cluster, multi-user support, not quite ready for prime-time.
  10. * Genie is a resource “match-maker”
  11. Unit of execution is a Hadoop/Hive/Pig jobUsers provide scripts, dependencies and other metadataDoes no scheduling per se – only does “meta-scheduling” or resource matching
  12. Status defines whether it is accepting jobsConfigurations are *-site.xmls and propertiesCluster name, schedule, etc
  13. Two classes of users: admins and end-usersAdmins spin up clusters, set cluster metadataUsers use the clusters once they have been registeredGenie is built on top of Netflix OSS
  14. Genie figures out the resources to run jobs on – back-end resources are abstracted outAsynchronous execution since jobs may be long-running
  15. Every job run as a separate process using Hadoop/Hive/Pig CLIAvoids “jar hell” since it needs Hadoop jarsJobs run in their own sandbox (working directory)Provides isolation between jobs, and between Genie and the jobsStandard output/error of jobs easily availableAble to support multiple versions of Hadoop/Hive/Pig, and connect to multiple clusters
  16. Configuration service helps us do crazy (cool) thingsWill describe each of these in greater detail
  17. New bonus clusters launched each night – but clients are oblivious of actual host names/IP’sOne way to do thisHigher SLA jobs first ask for cluster by name
  18. If it doesn’t exist, revert back to existing clusterWhy not just expand?Better isolationMixing matching instance types not ideal for HadoopProd cluster uses m1.xlarges for slave nodesShrink has proven to be a problemWe want to do hard shutdown when those instances are needed on awsprod
  19. We had to bounce the prod job tracker to enable priorities for “long-pole” jobsWanted to do it with minimal impact to SLA jobs
  20. Must wait for all existing jobs to finish for minimal impactHadoop jobs are long running – don’t want to kill a 5 hour job nearing its finish
  21. Prod cluster is back up after maintenanceJobs that were scheduled on query cluster will continue to run there until it finishesThis is done from time to time – although not too often, we do red-black pushes…
  22. This is initial state – we need to spin up a new cluster, e.g. to push a new feature
  23. * Spin up new cluster, mark it as UP, mark old cluster as OOS
  24. OUT_OF_SERVICE to TERMINATED
  25. Mention that we will be writing a techblog about this soon, with more detailsTwo query clusters – A/B testing new fair share scheduler
  26. Set up desired instance counts across multiple AZ’sDo “red-black” pushes using “sequential ASGs”Loss of individual nodes will cause jobs running on those nodes to be lost
  27. Auto-scaling policy set up to expand if number of running jobs > ~80%
  28. Still biased towards running in the cloud and at Netflix, but will generalize/improve it based on community feedback
  29. * Come listen to how we enable “Data Platform as a Service” – it is truly Lipstick on a Pig.