SlideShare ist ein Scribd-Unternehmen logo
1 von 21
August 4, 2011 Managing the Apache Hadoop lifecycle  Charles Zedlewski, Vice President, Product
The good and bad news – Hadoop means business ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 2 Use Case Use Case Industry Application Application Clickstream Sessionization Social Network Analysis Clickstream Sessionization Content Optimization Web Mediation Network Analytics Media ADVANCED ANALYTICS DATA PROCESSING Data Factory Loyalty & Promotions Analysis Telco Trade Reconciliation Fraud Analysis Retail SIGINT Entity Analysis Financial Genome Mapping Sequencing Analysis Federal Bioinformatics
You have reasonable asks for Hadoop ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 3 Activity Monitor ,[object Object]
Quickly diagnose the root cause of issues so you know what to improve
Quickly take action and solve issues at their root cause
Continuously optimize policies to improve system availability and QOS in the long termPatch / Hot Fix Restore / Recover Hadoop Operations
But Hadoop is special… Fault tolerant Scalable Widespread ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 4 Good Bad ,[object Object]
Verbose
Multi-layered
Hot market for skills,[object Object]
First principles – set business goals What are the business outcomes Hadoop is supposed to deliver? New insights Lower business costs Lower IT costs More data under management More revenue through better targeting, conversion ? 6 Copyright 2011 Cloudera Inc. All rights reserved
First principles – set operations goals Performance Utilization Cost of operations Availability Quality of service Flexibility / elasticity Security Transparency ? 7 Copyright 2011 Cloudera Inc. All rights reserved
System design – stick to the basics Hadoop needs to know where it’s hard drives are Running on a virtualized layer is a bad idea RAID is a bad idea Running on remote storage the worst idea Servers - prioritize flexibility over bells and whistles How easily will you be able to expand your cluster? How easily can you evolve your core / spindle ratio? How many companies support that exotic chip, card, drive, power supply, etc? Network – prioritize quality over bells & whistles 10G on the backplane is usually unnecessary Plan how to adapt your topology as your cluster grows 8 Copyright 2011 Cloudera Inc. All rights reserved
Hadoop in production – the tribe We have a chief that looks out for the tribe Make sure there’s enough fire for everyone Survival of the tribe is still the main concern Job code distinct from the rest of Hadoop Copyright 2011 Cloudera Inc. All rights reserved
Train your chief! Unix & DBA backgrounds are both valid starting points 10 Copyright 2011 Cloudera Inc. All rights reserved
Then empower your chief! Managing Hadoop requires Sensible selection of hardware Visibility into users, jobs, activities, hardware, operating system, services, logs and more Ability to make changes to configurations, services, patch levels and more In many organizations the chief is precluded from some of these decisions / actions by preexisting policy Take an “appliance mentality” to Hadoop decision making 11 Copyright 2011 Cloudera Inc. All rights reserved
Discovery – monitoring & alerting You want to anticipate & alert on: Health checks & status of key nodes (Namenode, Master, etc) Completion & performance of jobs & pipelines (for SLA measurement) System performance & availability Log events (only specific ones) 12 Copyright 2011 Cloudera Inc. All rights reserved
Diagnosis 7 lenses into Hadoop, used in combination Service metrics System metrics Configurations Change history Log history Activities, jobs & tasks Stack trace / profiling One lens rarely tells the whole story 13 Copyright 2011 Cloudera Inc. All rights reserved
Avoid the scripts Script to run a check Script to import a file Script to preempt a job Script to instrument a daemon Script to…. 14 Copyright 2011 Cloudera Inc. All rights reserved
The web of scripts – where it ends 15 Copyright 2011 Cloudera Inc. All rights reserved Nothing ever changes or improves Garish, jerry-rigged Time goes into maintaining scripts, not achieving the objectives One and only one person loves it
Hadoop as a standard platform Fire is not a big deal any more. Pollution, congestion, etc a concern More specialized roles Patching, updating, upgrading, configuring and tuning are all distinct Copyright 2011 Cloudera Inc. All rights reserved

Weitere ähnliche Inhalte

Was ist angesagt?

Splunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumSplunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner Symposium
Eddie Satterly
 
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera, Inc.
 
Solution Brief - Cloud Backup Services
Solution Brief - Cloud Backup ServicesSolution Brief - Cloud Backup Services
Solution Brief - Cloud Backup Services
Caitlin Brittingham
 

Was ist angesagt? (20)

Cloudera showcase c5.4
Cloudera showcase c5.4Cloudera showcase c5.4
Cloudera showcase c5.4
 
Deep Learning with Cloudera
Deep Learning with ClouderaDeep Learning with Cloudera
Deep Learning with Cloudera
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
Introduction to Distributed Computing & Distributed Databases
Introduction to Distributed Computing & Distributed DatabasesIntroduction to Distributed Computing & Distributed Databases
Introduction to Distributed Computing & Distributed Databases
 
Bezpečná databáze a jak využít volně dostupný nástroj DBSAT
Bezpečná databáze a jak využít volně dostupný nástroj DBSATBezpečná databáze a jak využít volně dostupný nástroj DBSAT
Bezpečná databáze a jak využít volně dostupný nástroj DBSAT
 
Bilbao oracle12c keynote
Bilbao  oracle12c keynoteBilbao  oracle12c keynote
Bilbao oracle12c keynote
 
DRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
DRAFT - Enterprise Data and Analytics Architecture Overview for Electric UtilityDRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
DRAFT - Enterprise Data and Analytics Architecture Overview for Electric Utility
 
Enterprise Data and Analytics Architecture Overview for Electric Utility
Enterprise Data and Analytics Architecture Overview for Electric UtilityEnterprise Data and Analytics Architecture Overview for Electric Utility
Enterprise Data and Analytics Architecture Overview for Electric Utility
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (2. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (2. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (2. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (2. část)
 
Splunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumSplunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner Symposium
 
Představení Oracle SPARC Miniclusteru
Představení Oracle SPARC MiniclusteruPředstavení Oracle SPARC Miniclusteru
Představení Oracle SPARC Miniclusteru
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
 
HP Autonomy - Three Ways to Preserve and Protect your Virtual Infrastructure
HP Autonomy - Three Ways to Preserve and Protect your Virtual InfrastructureHP Autonomy - Three Ways to Preserve and Protect your Virtual Infrastructure
HP Autonomy - Three Ways to Preserve and Protect your Virtual Infrastructure
 
Disaster Recovery pomocí Oracle Cloudu
Disaster Recovery pomocí Oracle ClouduDisaster Recovery pomocí Oracle Cloudu
Disaster Recovery pomocí Oracle Cloudu
 
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
Omaha rug customer 2 cloud customer facing hcm ppt aug 2014
 
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
 
Intel and Red Hat: Enhancing OpenStack for Enterprise Deployment
Intel and Red Hat: Enhancing OpenStack for Enterprise DeploymentIntel and Red Hat: Enhancing OpenStack for Enterprise Deployment
Intel and Red Hat: Enhancing OpenStack for Enterprise Deployment
 
Managing Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with PuppetManaging Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with Puppet
 
Yale-NUS SDDC on RHCI
Yale-NUS SDDC on RHCIYale-NUS SDDC on RHCI
Yale-NUS SDDC on RHCI
 
Solution Brief - Cloud Backup Services
Solution Brief - Cloud Backup ServicesSolution Brief - Cloud Backup Services
Solution Brief - Cloud Backup Services
 

Andere mochten auch

Andere mochten auch (6)

Trends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersTrends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase Clusters
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
 

Ähnlich wie Harnessing the Power of Apache Hadoop Series

Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
DataWorks Summit
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Amr Awadallah
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
EMC
 

Ähnlich wie Harnessing the Power of Apache Hadoop Series (20)

Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Solving Big Data Problems
Solving Big Data ProblemsSolving Big Data Problems
Solving Big Data Problems
 
Pivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant DancePivotal: Virtualize Big Data to Make the Elephant Dance
Pivotal: Virtualize Big Data to Make the Elephant Dance
 
Big Data
Big DataBig Data
Big Data
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Harnessing the Power of Apache Hadoop Series

  • 1. August 4, 2011 Managing the Apache Hadoop lifecycle Charles Zedlewski, Vice President, Product
  • 2. The good and bad news – Hadoop means business ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 2 Use Case Use Case Industry Application Application Clickstream Sessionization Social Network Analysis Clickstream Sessionization Content Optimization Web Mediation Network Analytics Media ADVANCED ANALYTICS DATA PROCESSING Data Factory Loyalty & Promotions Analysis Telco Trade Reconciliation Fraud Analysis Retail SIGINT Entity Analysis Financial Genome Mapping Sequencing Analysis Federal Bioinformatics
  • 3.
  • 4. Quickly diagnose the root cause of issues so you know what to improve
  • 5. Quickly take action and solve issues at their root cause
  • 6. Continuously optimize policies to improve system availability and QOS in the long termPatch / Hot Fix Restore / Recover Hadoop Operations
  • 7.
  • 10.
  • 11. First principles – set business goals What are the business outcomes Hadoop is supposed to deliver? New insights Lower business costs Lower IT costs More data under management More revenue through better targeting, conversion ? 6 Copyright 2011 Cloudera Inc. All rights reserved
  • 12. First principles – set operations goals Performance Utilization Cost of operations Availability Quality of service Flexibility / elasticity Security Transparency ? 7 Copyright 2011 Cloudera Inc. All rights reserved
  • 13. System design – stick to the basics Hadoop needs to know where it’s hard drives are Running on a virtualized layer is a bad idea RAID is a bad idea Running on remote storage the worst idea Servers - prioritize flexibility over bells and whistles How easily will you be able to expand your cluster? How easily can you evolve your core / spindle ratio? How many companies support that exotic chip, card, drive, power supply, etc? Network – prioritize quality over bells & whistles 10G on the backplane is usually unnecessary Plan how to adapt your topology as your cluster grows 8 Copyright 2011 Cloudera Inc. All rights reserved
  • 14. Hadoop in production – the tribe We have a chief that looks out for the tribe Make sure there’s enough fire for everyone Survival of the tribe is still the main concern Job code distinct from the rest of Hadoop Copyright 2011 Cloudera Inc. All rights reserved
  • 15. Train your chief! Unix & DBA backgrounds are both valid starting points 10 Copyright 2011 Cloudera Inc. All rights reserved
  • 16. Then empower your chief! Managing Hadoop requires Sensible selection of hardware Visibility into users, jobs, activities, hardware, operating system, services, logs and more Ability to make changes to configurations, services, patch levels and more In many organizations the chief is precluded from some of these decisions / actions by preexisting policy Take an “appliance mentality” to Hadoop decision making 11 Copyright 2011 Cloudera Inc. All rights reserved
  • 17. Discovery – monitoring & alerting You want to anticipate & alert on: Health checks & status of key nodes (Namenode, Master, etc) Completion & performance of jobs & pipelines (for SLA measurement) System performance & availability Log events (only specific ones) 12 Copyright 2011 Cloudera Inc. All rights reserved
  • 18. Diagnosis 7 lenses into Hadoop, used in combination Service metrics System metrics Configurations Change history Log history Activities, jobs & tasks Stack trace / profiling One lens rarely tells the whole story 13 Copyright 2011 Cloudera Inc. All rights reserved
  • 19. Avoid the scripts Script to run a check Script to import a file Script to preempt a job Script to instrument a daemon Script to…. 14 Copyright 2011 Cloudera Inc. All rights reserved
  • 20. The web of scripts – where it ends 15 Copyright 2011 Cloudera Inc. All rights reserved Nothing ever changes or improves Garish, jerry-rigged Time goes into maintaining scripts, not achieving the objectives One and only one person loves it
  • 21. Hadoop as a standard platform Fire is not a big deal any more. Pollution, congestion, etc a concern More specialized roles Patching, updating, upgrading, configuring and tuning are all distinct Copyright 2011 Cloudera Inc. All rights reserved
  • 22. Optimize – plan for multi-tenancy Definition – ability of disparate groups, users, data and workloads to operate concurrently on 1 logical Hadoop system Multi-tenancy helps you get more of what you really want Better performance Better cost of operations New insights Greater availability Multi-tenancy has some additional considerations 17 Copyright 2011 Cloudera Inc. All rights reserved
  • 23. Optimize – policies for permissions Authentication Don’t talk to strangers Should integrate with existing IT infrastructure Authentication (Kerberos) patches now part of CDH3 Authorization Not everyone can access everything Ex. Production data sets are read-only to quants / analysts. Analysts have home or group directories for derived data sets. Mostly enforced via HDFS permissions; directory structure and organization is critical Not as fine grained as column level access in EDW, RDBMS (but this is coming) 18 Copyright 2011 Cloudera Inc. All rights reserved
  • 24. Optimize – plan for resources Tracking & establishing policies for usage cluster resources Files, bytes and quotas thereof Tasks, memory, IO, CPU, network and scheduling thereof By now you’ve almost certainly graduated to a sophisticated scheduler Policies to prevent bad behavior (e.g. auto-kill) Monitor and track resource utilization across all groups Periodically review queue / pool decisions to improve QOS 19 Copyright 2011 Cloudera Inc. All rights reserved
  • 25. Wrapping it up The operational lifecycle for Hadoop is similar to other systems but Hadoop itself is not The basics are not a good place to get creative Think command center, not man cave Multi-tenancy is an attractive opportunity with some additional operational burdens There’s lots more work to do 20 Copyright 2011 Cloudera Inc. All rights reserved
  • 26. ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 21 We appreciate your time and interest in For Additional Information: www. cloudera.com twitter.com/ cloudera +1 (888) 789-1488 sales@cloudera.com facebook.com/ cloudera