SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
for System Administrators – Hadoop for System Administrators O –h iOo hLiion uLxi nFuexs tF 2e0s1t 42014 
Justin Miller 
Senior Systems Engineer/DevOps at iHealth Technologies 
Weston Bassler 
Systems Engineer at Verizon Wireless
Hadoop for System Administrators – Ohio Linux Fest 2014 
What we will be covering: 
Intro 
Why Hadoop? 
How Hadoop Works 
Architecture 
Planning Hardware/Storage/Network 
Processing and Storage 
HDFS Components 
YARN Components 
Operations 
Job scheduling 
Jobs alerts 
Monitoring 
Core Services 
Job scheduler and SLA 
Hardware 
High Availability 
YARN 
HDFS 
Oozie 
Security 
Security Issues 
Authentication 
Authorization 
Encrption 
Backup and Recovery 
What to plan for? 
How to combat 
Hadoop Vendors/Distros 
Cloudera 
HortonWorks 
MapR
Hadoop for System Administrators – Ohio Linux Fest 2014 
Why Hadoop?
Hadoop for System Administrators – Ohio Linux Fest 2014 
Why Hadoop? Cont... 
Sort through TB, even PB worth of data in a matter of minutes 
Easily sift through LOGS (patterns, data mining) → switch logs, application 
logs 
Batch Processing 
History → Inspired by 2 Google Papers on MapReduce and GoogleFS 
Implemented By Yahoo!
Hadoop for System Administrators – Ohio Linux Fest 2014 
Whose using it?
Hadoop for System Administrators – Ohio Linux Fest 2014 
How Hadoop? 
Processing 
• MapReduce (MRv1) 
What is MapReduce? 
Nobody likes it 
• YARN (MRv2) 
Yet Another Resource Negotiator 
Newer better/versatile 
2 New Roles → Resource Manager and Application Manager 
Spark → New Hotness 
• Bringing Processing and Storage together 
Data locality → avoid network! 
“MO NODES MO BETTA”
Hadoop for System Administrators – Ohio Linux Fest 2014 
YARN in Action
Hadoop for System Administrators – Ohio Linux Fest 2014 
Storage 
• HDFS 
What is HDFS? 
Why HDFS? 
• Components of HDFS 
NameNode 
Metadata → fsimage + fsedits 
ZooKeeper → HA management 
Quorum based journaling 
3 JournalNodes 
Active/Passive NameNode 
DataNodes – what do they do? 
Blocks in relation to NameNode Metadata 
Block storage
Hadoop for System Administrators – Ohio Linux Fest 2014 
HDFS Write Path
Hadoop for System Administrators – Ohio Linux Fest 2014 
Benefits and Limitations of HDFS 
Benefits 
Low cost per byte → commodity storage 
High Bandwidth/Scales effectively → “Mo nodes Mo speed” 
Rock solid data reliability 
Supports distributed computing I/O patterns 
OPEN SOURCE!!!!!
Hadoop for System Administrators – Ohio Linux Fest 2014 
Benefits and Limitations of HDFS (Continued...) 
Limitations 
Updates → data is immutable (can't be updated only appended) 
Write Once 
Optimized for sequential reads → not for real-time data processing 
Challenging import/export → requires additional tooling
Hadoop for System Administrators – Ohio Linux Fest 2014 
Architectur e 
• Planning your Hardware/Storage 
Cheap disks 
Distributed disk approach → replication factor of 3 for HA 
NO LVM and NO Raid and NO swap 
noatime, nodiratime 
• Network considerations 
Rack awareness affects data distribution 
Prefer a faster network when available → 10GB if possible
Hadoop for System Administrators – Ohio Linux Fest 2014 
Hadoop Operations 
• Jobs 
What is a job? 
Scheduling jobs with Oozie 
Alerts on Jobs 
Oozie SLAs → Start time, end time & duration 
File driven Job Configuration
Hadoop for System Administrators – Ohio Linux Fest 2014 
Example of a Job: 
Example of a coordinator:
Hadoop for System Administrators – Ohio Linux Fest 2014 
Troubleshooting 
• Application → Debug Code
Hadoop for System Administrators – Ohio Linux Fest 2014 
• Job → Debug Execution
Hadoop for System Administrators – Ohio Linux Fest 2014 
• Service → Debug Linux Process (/var/log/hadoop-*) 
Services wont start → port conflicts (nmap, netstat, lsof) 
if not application OR job; 
do 
cat /var/log/hadoop-* | grep ERROR 
done
Hadoop for System Administrators – Ohio Linux Fest 2014 
Monitoring 
• Core Services 
HDFS 
YARN 
JMX → JVM Monitoring 
Cloudera Manager 
• Performance 
Ganglia (HortonWorks) 
Cloudera Manager 
• Hardware → to each his own (traditional monitoring) 
SNMP 
Nagios 
Zenoss 
Cloudera Manager
Hadoop for System Administrators – Ohio Linux Fest 2014 
High Availability 
• HDFS 
ZooKeeper → quorum based journaling 
• YARN 
ZooKeeper
Hadoop for System Administrators – Ohio Linux Fest 2014 
• Oozie HA
Hadoop for System Administrators – Ohio Linux Fest 2014 
Security (Because people are evil)
Hadoop for System Administrators – Ohio Linux Fest 2014 
Security Continued.... 
• Known issues – Stupid/Lazy People 
Hadoop can be very secure 
• Authentication - Kerberos 
Principal (user) 
Realm (group of principals) 
Keytab file 
• Authorization 
LDAP 
Active Directory 
Role based 
• Encryption – For your eyes Only! 
Kerberos 1st 
SSL Certificates 
**** SSL must be enabled for all core Hadoop services
Hadoop for System Administrators – Ohio Linux Fest 2014 
Backup and Recovery – When things go wrong (And they will) 
What can go wrong? What to plan for? 
Data Corruption 
Node crashes 
Disk crashes 
Ways to combat when things do go wrong 
• Data Corruption 
checksums of metadata fail → NameNode replaces with fresh 
HDFS → hdfs fsck tool 
• Node crashes/Disk crashes 
HDFS saves the day! 
NameNode HA 
First 2 replicas of data on different hosts 
Heartbeat detection
Hadoop for System Administrators – Ohio Linux Fest 2014 
Hadoop Wars - Vendors and Distributions 
• Cloudera 
Specializes in Enterprise tools 
Auditing 
Access Control 
Cluster Management (Cloudera Manager) 
• HortonWorks 
Specializes in Engineering 
Also Open Source 
Top new cool things 
• MapR 
Lead developers begin Mahout
Hadoop for System Administrators – Ohio Linux Fest 2014 
Hopefully you enjoyed! 
If interested: 
Quick Ways to get started Learning Hadoop 
• Free Stuff – Who doesn't like free? 
Big Data University – Hadoop fundamentals, Pig, Oozie, lots more 
Udactity – Intro to Hadoop and Mapreduce 
MapR, Cloudera, HortonWorks – Training Videos

Weitere ähnliche Inhalte

Was ist angesagt?

a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
Application Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User GroupApplication Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User Grouphadooparchbook
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confSujee Maniyam
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an examplehadooparchbook
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applicationshadooparchbook
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applicationshadooparchbook
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 

Was ist angesagt? (20)

a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
Lecture 2 part 2
Lecture 2 part 2Lecture 2 part 2
Lecture 2 part 2
 
Application Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User GroupApplication Architectures with Hadoop - UK Hadoop User Group
Application Architectures with Hadoop - UK Hadoop User Group
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 

Andere mochten auch

Andere mochten auch (7)

Odometer
OdometerOdometer
Odometer
 
Intellij idea for php
Intellij idea for phpIntellij idea for php
Intellij idea for php
 
Pekka
PekkaPekka
Pekka
 
internet
internetinternet
internet
 
Антропонимика
АнтропонимикаАнтропонимика
Антропонимика
 
Fast cycle board matrix
Fast cycle board matrixFast cycle board matrix
Fast cycle board matrix
 
навстречу олимпиаде в сочи
навстречу олимпиаде в сочинавстречу олимпиаде в сочи
навстречу олимпиаде в сочи
 

Ähnlich wie Hadoop for sys_admin

Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemSteve Loughran
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 

Ähnlich wie Hadoop for sys_admin (20)

Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Presentation
PresentationPresentation
Presentation
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Hadoop for sys_admin

  • 1. for System Administrators – Hadoop for System Administrators O –h iOo hLiion uLxi nFuexs tF 2e0s1t 42014 Justin Miller Senior Systems Engineer/DevOps at iHealth Technologies Weston Bassler Systems Engineer at Verizon Wireless
  • 2. Hadoop for System Administrators – Ohio Linux Fest 2014 What we will be covering: Intro Why Hadoop? How Hadoop Works Architecture Planning Hardware/Storage/Network Processing and Storage HDFS Components YARN Components Operations Job scheduling Jobs alerts Monitoring Core Services Job scheduler and SLA Hardware High Availability YARN HDFS Oozie Security Security Issues Authentication Authorization Encrption Backup and Recovery What to plan for? How to combat Hadoop Vendors/Distros Cloudera HortonWorks MapR
  • 3. Hadoop for System Administrators – Ohio Linux Fest 2014 Why Hadoop?
  • 4. Hadoop for System Administrators – Ohio Linux Fest 2014 Why Hadoop? Cont... Sort through TB, even PB worth of data in a matter of minutes Easily sift through LOGS (patterns, data mining) → switch logs, application logs Batch Processing History → Inspired by 2 Google Papers on MapReduce and GoogleFS Implemented By Yahoo!
  • 5. Hadoop for System Administrators – Ohio Linux Fest 2014 Whose using it?
  • 6. Hadoop for System Administrators – Ohio Linux Fest 2014 How Hadoop? Processing • MapReduce (MRv1) What is MapReduce? Nobody likes it • YARN (MRv2) Yet Another Resource Negotiator Newer better/versatile 2 New Roles → Resource Manager and Application Manager Spark → New Hotness • Bringing Processing and Storage together Data locality → avoid network! “MO NODES MO BETTA”
  • 7. Hadoop for System Administrators – Ohio Linux Fest 2014 YARN in Action
  • 8. Hadoop for System Administrators – Ohio Linux Fest 2014 Storage • HDFS What is HDFS? Why HDFS? • Components of HDFS NameNode Metadata → fsimage + fsedits ZooKeeper → HA management Quorum based journaling 3 JournalNodes Active/Passive NameNode DataNodes – what do they do? Blocks in relation to NameNode Metadata Block storage
  • 9. Hadoop for System Administrators – Ohio Linux Fest 2014 HDFS Write Path
  • 10. Hadoop for System Administrators – Ohio Linux Fest 2014 Benefits and Limitations of HDFS Benefits Low cost per byte → commodity storage High Bandwidth/Scales effectively → “Mo nodes Mo speed” Rock solid data reliability Supports distributed computing I/O patterns OPEN SOURCE!!!!!
  • 11. Hadoop for System Administrators – Ohio Linux Fest 2014 Benefits and Limitations of HDFS (Continued...) Limitations Updates → data is immutable (can't be updated only appended) Write Once Optimized for sequential reads → not for real-time data processing Challenging import/export → requires additional tooling
  • 12. Hadoop for System Administrators – Ohio Linux Fest 2014 Architectur e • Planning your Hardware/Storage Cheap disks Distributed disk approach → replication factor of 3 for HA NO LVM and NO Raid and NO swap noatime, nodiratime • Network considerations Rack awareness affects data distribution Prefer a faster network when available → 10GB if possible
  • 13. Hadoop for System Administrators – Ohio Linux Fest 2014 Hadoop Operations • Jobs What is a job? Scheduling jobs with Oozie Alerts on Jobs Oozie SLAs → Start time, end time & duration File driven Job Configuration
  • 14. Hadoop for System Administrators – Ohio Linux Fest 2014 Example of a Job: Example of a coordinator:
  • 15. Hadoop for System Administrators – Ohio Linux Fest 2014 Troubleshooting • Application → Debug Code
  • 16. Hadoop for System Administrators – Ohio Linux Fest 2014 • Job → Debug Execution
  • 17. Hadoop for System Administrators – Ohio Linux Fest 2014 • Service → Debug Linux Process (/var/log/hadoop-*) Services wont start → port conflicts (nmap, netstat, lsof) if not application OR job; do cat /var/log/hadoop-* | grep ERROR done
  • 18. Hadoop for System Administrators – Ohio Linux Fest 2014 Monitoring • Core Services HDFS YARN JMX → JVM Monitoring Cloudera Manager • Performance Ganglia (HortonWorks) Cloudera Manager • Hardware → to each his own (traditional monitoring) SNMP Nagios Zenoss Cloudera Manager
  • 19. Hadoop for System Administrators – Ohio Linux Fest 2014 High Availability • HDFS ZooKeeper → quorum based journaling • YARN ZooKeeper
  • 20. Hadoop for System Administrators – Ohio Linux Fest 2014 • Oozie HA
  • 21. Hadoop for System Administrators – Ohio Linux Fest 2014 Security (Because people are evil)
  • 22. Hadoop for System Administrators – Ohio Linux Fest 2014 Security Continued.... • Known issues – Stupid/Lazy People Hadoop can be very secure • Authentication - Kerberos Principal (user) Realm (group of principals) Keytab file • Authorization LDAP Active Directory Role based • Encryption – For your eyes Only! Kerberos 1st SSL Certificates **** SSL must be enabled for all core Hadoop services
  • 23. Hadoop for System Administrators – Ohio Linux Fest 2014 Backup and Recovery – When things go wrong (And they will) What can go wrong? What to plan for? Data Corruption Node crashes Disk crashes Ways to combat when things do go wrong • Data Corruption checksums of metadata fail → NameNode replaces with fresh HDFS → hdfs fsck tool • Node crashes/Disk crashes HDFS saves the day! NameNode HA First 2 replicas of data on different hosts Heartbeat detection
  • 24. Hadoop for System Administrators – Ohio Linux Fest 2014 Hadoop Wars - Vendors and Distributions • Cloudera Specializes in Enterprise tools Auditing Access Control Cluster Management (Cloudera Manager) • HortonWorks Specializes in Engineering Also Open Source Top new cool things • MapR Lead developers begin Mahout
  • 25. Hadoop for System Administrators – Ohio Linux Fest 2014 Hopefully you enjoyed! If interested: Quick Ways to get started Learning Hadoop • Free Stuff – Who doesn't like free? Big Data University – Hadoop fundamentals, Pig, Oozie, lots more Udactity – Intro to Hadoop and Mapreduce MapR, Cloudera, HortonWorks – Training Videos