SlideShare ist ein Scribd-Unternehmen logo
1 von 26
HADOOP, FROM LAB TO 24/7 PRODUCTION
http://criteolabs.com/jobs
criteolabs.com/jobs
Jean-Baptiste NOTE
jb.note@criteo.com
Ana DIN
a.din@criteo.com
From the Criteo HPC Team
(+ Loïc / Serge / Maxime / Samuel / Yann / Stuart)
ABOUT US
criteolabs.com/jobs
CRITEO ?
6 DATA CENTERS, 4 CONTINENTS.
120 BILLION REQUESTS/DAY*.
* EVERY DAY CRITEO IS CALLED MORE THAN 100 BILLION TIMES BY
ADVERTISERS AND PUBLISHERS
54 OPEN POSITIONS IN PARIS’ R&D
http://criteolabs.com/jobs
criteolabs.com/jobs
« Anything that can go wrong - will go wrong »
-- Murphy’s Law
TALES OF A TECHNOLOGY ADOPTION
criteolabs.com/jobs
Usage of Hadoop is growing exponentially
• Learning curve is real
• Analysts discover interesting things with raw data
– Which causes them to ask more questions
• Increased insight leads to a better product
– Which leads to more data
• Data gains in value and more is kept (and studied!)
• YOU (the admin) are the bottleneck !
USAGE GROWTH
criteolabs.com/jobs
• Administration automation
• Hadoop configuration tuning
• Network
• Multitenancy
TOPICS
criteolabs.com/jobs
ADMINISTRATION AUTOMATION
criteolabs.com/jobs
Rack and load!
• Machine is racked, cabled and provisionned for a role
• Chef is our one stop-shop for automation
• Diskless system install
AUTOMATING DEPLOYMENTS
INSTA-
CLUSTER!
criteolabs.com/jobs
• Learn from the past
• Previous cluster 1.5 years operation
• 78% failure rate on /dev/sda at restart
• Disk usage symmetry
• Garanteed statelessness
OS DISKLESS : WHY
criteolabs.com/jobs
• PXE Boot on custom CentOs image
• Automated Chef bootstrap
• Everything done by Chef
– Inventory
– Firmware updates
– OS / Service deployment
OS DISKLESS : HOW
criteolabs.com/jobs
• Evolutive maintenance (version bump)
• Not much to do on normal ops
• Most freq. issue is flacking / slow performing host
• Use Preprod / Prod for infra changes
• Progressive VS black out
MAINTENANCE
criteolabs.com/jobs
• User facing interfaces
• Jobtracker
• Fsimage checkpointing
• HDFS usage and local disk usage
MONITORING
criteolabs.com/jobs
HADOOP CONFIG TUNING
criteolabs.com/jobs
• Hadoop is a DDOS to your infrastructure
– Increase ARP retention (L2-specific)
– Use NSCD
• Increase Read ahead
• Disable THP compaction
• MTU jumbo frames
SYSTEM CONFIGS
criteolabs.com/jobs
CLUSTER CONFIGS
criteolabs.com/jobs
CLUSTER CONFIGS
• Adjust log settings (default is INFO,console)
• Increase handler counts (JT,NN,DN)
• Use namenode.service.handler.count
• Watch out for checkpointing loops
criteolabs.com/jobs
NETWORK
criteolabs.com/jobs
• One datacenter topology will not fit all
• Web traffic VS Hadoop traffic
• Historical Fat-tree hierarchy with layer 2 routing
• Switched to meshed design (soon layer3)
NETWORK TOPOLOGY
criteolabs.com/jobs
• Rack awareness (of course !)
– Performance
– Reliability
– Maintenance (eg. relocation)
HADOOP TOPOLOGY
criteolabs.com/jobs
• HDFS Quotas
• Scheduling (user-facing)
• Map / Reduce ratio
• Use Yarn !
MULTITENANCY
criteolabs.com/jobs
SECURITY
criteolabs.com/jobs
• Dedicated kdc / realm
• Dedicated services principals
• Cross-realm trusts
• Delegate user management to your IT
KERBEROS SETUP
criteolabs.com/jobs
• Use multiple proxies
• Easy way to interconnect to the outside world
• Data injection / read with a simple curl
• High bandwidth transfers
HTTPFS PROXIES
criteolabs.com/jobs
• Multiple use cases (ML, BI analytics)
• Baseline Json (+gzip) is ok
• Don’t optimize too early
• We still use it(*) at Peta scale
(*) some teams also use Parquet and contributed to Hive integration
FILE FORMATS
criteolabs.com/jobs
QUESTIONS ?
criteolabs.com/jobs
Did I say we’re hiring!
We’re hiring lots of engineers in 2014. Come join us!
http://criteolabs.com/jobs
MY FELLOW CRITEOS WOULD KILL ME…

Weitere ähnliche Inhalte

Was ist angesagt?

Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphJason Plurad
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJason Plurad
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metricsJim Plush
 
Cloud Costing Services
Cloud Costing Services Cloud Costing Services
Cloud Costing Services InnoTech
 
Powers of Ten Redux
Powers of Ten ReduxPowers of Ten Redux
Powers of Ten ReduxJason Plurad
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJason Plurad
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseJason Plurad
 
Deep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAIDeep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAIDatabricks
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinJason Plurad
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphJason Plurad
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
IMA Lab: Indianapolis Museum of Art Collection Page Redesign
IMA Lab: Indianapolis Museum of Art Collection Page RedesignIMA Lab: Indianapolis Museum of Art Collection Page Redesign
IMA Lab: Indianapolis Museum of Art Collection Page RedesignRita Troyer
 

Was ist angesagt? (20)

Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Microsoft cosmos
Microsoft cosmosMicrosoft cosmos
Microsoft cosmos
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metrics
 
Cloud Costing Services
Cloud Costing Services Cloud Costing Services
Cloud Costing Services
 
Powers of Ten Redux
Powers of Ten ReduxPowers of Ten Redux
Powers of Ten Redux
 
Spot at qubole
Spot at quboleSpot at qubole
Spot at qubole
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Deep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAIDeep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAI
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and Gremlin
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
IMA Lab: Indianapolis Museum of Art Collection Page Redesign
IMA Lab: Indianapolis Museum of Art Collection Page RedesignIMA Lab: Indianapolis Museum of Art Collection Page Redesign
IMA Lab: Indianapolis Museum of Art Collection Page Redesign
 

Ähnlich wie Hadoop summit-ams-2014-04-03

Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyStuart Pook
 
Big Data Adoption Status
Big Data Adoption Status Big Data Adoption Status
Big Data Adoption Status Xpand IT
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-PatternsDouglas Moore
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and developmentconline training
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Anthony Potappel
 

Ähnlich wie Hadoop summit-ams-2014-04-03 (20)

Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
Big Data Adoption Status
Big Data Adoption Status Big Data Adoption Status
Big Data Adoption Status
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
HUG France - Apache Drill
HUG France - Apache DrillHUG France - Apache Drill
HUG France - Apache Drill
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
Bn1028 demo hadoop administration and development
Bn1028 demo  hadoop administration and developmentBn1028 demo  hadoop administration and development
Bn1028 demo hadoop administration and development
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622Making BD Work~TIAS_20150622
Making BD Work~TIAS_20150622
 

Kürzlich hochgeladen

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 

Kürzlich hochgeladen (20)

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 

Hadoop summit-ams-2014-04-03

Hinweis der Redaktion

  1. http://www.shutterstock.com/pic.mhtml?id=95662684Who are we ?* Serving the right ad….* Slide wasimposed by MarketingYouprobablywillencounter the cloud versus in-house dilemniaKey factor is the elastic aspect ;we use our cluster 100% of the time ;wealready have DCs ;in-house waslessexpensive
  2. This is the story of a growing and successfull startup usingHadoop.Growingmeansincreased volume. Successfullmeansbuckloads of cash to grow the infrastructure. Startup meansverysmall teams to manage the wholething.PoCiseasyWhenyou gain traction, everythingwill go fastWentfrom 12 nodes to 150 2 yearsago, to 600 today, above 1000 by the end of the year.Whyisitgrowingthatfast ?Virtuouscircle :Variousteam aregatheringskillsBI analysts: the more theyget, the more theywantHadoop shows mutualizationbenefits, platform to consolidate ad-hoc data processingtoolsYou business will boom thanks to hadoop adoption
  3. Becauseyouneed to scale infrastructure:Automate operations (prod VS devops)Tune hadoop system (Hardware, Linux, Hadoopitself)Specifically networkThis is about scaling the infrastructure. Withhundreds of clients usinghadoop as a service, youalsoneed to scale infrastructure usage. For instance: multi-tenancy.Managing ressource contention Mapreduce, storageMaintainsecurity (user sandboxingthroughauthorization & authentication)Allowhadoop to beused as a service
  4. Don’tdo anything by hand ; youwillhurtyourselfmanagingthousands of serversBuildthings once, runforeverThe choice of freedom : don’tbebound to a specificvendor ; eg. We use CDH4.5.0 right now, but could, and probablywill switch to HDPFull stack automation : frombare-metal to live service
  5. Our cluster are turnkeydeployed once hosting and network have finishedtheirworkWeassigneverynode a role, and the hosts will boot and setup themselvesaccordingly
  6. Why a diskless system ?You want maximumstoragedensityThereforeyouwant to fillthose 14 slots per server with 3TB drivesThereforeyouwill break a nicesymmetry if yourun the OS fromdiskNot theoretical: hands-on experience on 150-node cluster operated for 1.5 yearsHard constraint; but veryworthwhile:RemoteloggingcompulsoryNothinghiddenfrom automation system2GB of RAM per node (2% of RAM)
  7. How do weachievethisMinimize size of diskless imageBoot chef as soon as youcan, and let it flow fromthereInitial chef roleis an inventoryrole. Chef used for management of updates, OS, service deployment.
  8. Maintenance : * EvolutiveUpgradingyour distribution regularly (don’twant to lagbehind)* CorrectiveHadoopworksbetterwhenyoujustdon’ttouchit* HowEverythingistested on a PREPROD environmentProgressive deployment (rolling-out node by node) maybedisrupting for long running jobs
  9. http://blog.cloudera.com/blog/2014/03/a-guide-to-checkpointing-in-hadoop/Monitor user facing interfaces : usersfrequentlyassimilatecluster’s condition to the JT’s or NN’s GUIMonitor yourJobtracker (MRv1 willeventuallygetstuck)MOST IMPORTANT OF ALL: CHECKPOINTINGMonitor the checkpoints of yourfsimage or youwill end-up with a namenode in reallybadshapeAt one point wewerehavingnearly 6 months of edits ;) 12 hours to start a NN ; urbanlegend of NN beingunsafe to restartMonitor HDFS disk usage and local disk usage
  10. 10:00http://www.flickr.com/photos/76588645
  11. In a realworld system most of yourtaskswillbe IO boundReadahead ! Very importantWhenyou hit a performance bottleneck, the first thing to watch for is *outside* hadoop, becauseHadoopis a DOS to yourwhole infrastructureUse infrastructure local caches as much as youcan
  12. Default parameters are usable for small clusters / smallnodesThese are examples, wehad to tune a significant part of themDetaillist of significantones + explanations
  13. Default parameters are usable for small clusters / smallnodesThese are examples, wehad to tune a significant part of themLog settings theywillkillyour JT / NNHandler countsSeparate the thread pool for internal / external clients. Alsoeasier for firewallingHA has somedownsides (checkpointing)
  14. One of the first thingthatyouwillgetwhenyou move yourhadoop cluster past a rack isyour network engineersyelling at you.Plan aheadyour network topology !
  15. Fat treesuited to North/South traffic.Hadoop uses the network as a bus: East-Westlayer2 FabricPath/TrillLayer3 BGP
  16. Soundsobvious but impementing a correct definition of the rack topologyregarding network isvery importantlldp information flackingdepending on which interface withask (4 interfaces bonding)
  17. 20:00Hadoopis a shared ressource.Whenyour usagegrowsyouwill face ressource starvation and contention. This will lead to twoproblems:1) Accountability: You willneed to report ressource accountabilitynumbers to plan for growth and optimize2) Maintain a good user experienceHDFS quotas ; but has bugs in fsimagecheckpointingScheduling ; user facingproblem ; requireseducation to understand the time/spacefolding; achievewelldesigned jobs (mapper ~ 3 to 15 minutes)YARN solvesmap/reduce ratio (+20% computing power)
  18. Once yourealizeyourcompany ’s mostcritical data has landed on HadoopAnd youronlysecurity model isobscurityYouwillwant to switch to somethingbuilt-in and robustsecurity model.
  19. Very good documentation fromCloudera, HortonworksNot sufficientthough:ironing out the problems (SPENGO) needs close integrationwith IT
  20. Hadoop limitation: POSIX-levelaccess to HDFS HTTPFS worksaround the absence of a scalable(and workswithKerberos, too !)In oursystems, HDFS replacedcompletelyIsilonStreaming a sustained 200-400MB/s of logs into the clusterDon’tcreatebottlenecks ; address the connectivitywith a many-to-many pattern
  21. JSON + GZIP is good enough for most uses.
  22. http://www.flickr.com/photos/jarbo/9379813470