SlideShare a Scribd company logo
1 of 28
www.edureka.in/hadoop-admin
www.edureka.in/hadoop-admin
How It Works…
 LIVE classes
 Class recordings
 Module wise Quizzes and Practical Assignments
 24x7 on-demand technical support
 Deployment of different clusters
 Online certification exam
 Lifetime access to the Learning Management System
www.edureka.in/hadoop-admin
Course Topics
 Week 1
– Understanding Big Data
– Hadoop Components
– Introduction to Hadoop 2.0
 Week 2
– Hadoop 2.0
– Hadoop Configuration
– Hadoop Cluster Architecture
 Week 3
– Different Hadoop Server Roles
– Data processing flow
– Cluster Network Configuration
 Week 4
– Job Scheduling
– Fair Scheduler
– Monitoring a Hadoop Cluster
 Week 5
– Securing your Hadoop Cluster
– Kerberos and HDFS Federation
– Backup and Recovery
 Week 6
– Oozie and Hive Administration
– HBase Architecture
– HBase Administration
www.edureka.in/hadoop-admin
Topics for Today
 What is Big Data?
 Limitations of the existing solutions
 Solving the problem with Hadoop
 Introduction to Hadoop
 Hadoop Eco-System
 Hadoop Core Components
 MapReduce software framework
 Hadoop Cluster Administrator: Roles and Responsibilities
 Introduction to Hadoop 2.0
www.edureka.in/hadoop-admin
What Is Big Data?
 Lots of Data (Terabytes or Petabytes).
 Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of
information.
A airline jet collects 10 terabytes of sensor data
for every 30 minutes of flying time.
NYSE generates about one terabyte of new trade data
per day to Perform stock trading analytics to determine
trends for optimal trades.
www.edureka.in/hadoop-admin
IBM’s Definition
 IBM’s definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
Volume Velocity Variety
Characteristics Of Big Data
12 Terabytes of
tweets created
each day
Scrutinizes 5 million
trade events created
each day to identify
potential fraud
Sensor data, audio,
video, click streams,
log files and more
www.edureka.in/hadoop-admin
 Estimated Global Data Volume:
 2011: 1.8 ZB
 2015: 7.9 ZB
 The world's information doubles every two years
 Over the next 10 years:
 The number of servers worldwide will grow by 10x
 Amount of information managed by enterprise data
centers will grow by 50x
 Number of “files” enterprise data center handle will
grow by 75x
Source: http://www.emc.com/leadership/programs/digital-universe.htm,
which was based on the 2011 IDC Digital Universe Study
Data Volume Is Growing Exponentially
www.edureka.in/hadoop-admin
What Big Companies Have To Say…
“Analyzing Big Data sets will become a key basis for competition.”
“Leaders in every sector will have to grapple the implications of Big Data.”
McKinsey
Gartner
Forrester
Research
“Big Data analytics are rapidly emerging as the preferred solution to business and
technology trends that are disrupting.”
“Enterprises should not delay implementation of Big Data Analytics.”
“Use Hadoop to gain a competitive advantage over more risk-averse enterprises.”
“Prioritize Big Data projects that might benefit from Hadoop.”
www.edureka.in/hadoop-admin
Some Of the Hadoop Users
www.edureka.in/hadoop-admin
Hadoop Users – In Detail
http://wiki.apache.org/hadoop/PoweredBy
www.edureka.in/hadoop-admin
 Apache Hadoop is a framework that allows for the distributed processing of large data sets
across clusters of commodity computers using a simple programming model.
 It is an Open-source Data Management with scale-out storage & distributed processing.
What Is Hadoop?
www.edureka.in/hadoop-admin
Hadoop Key Characteristics
Reliable
Economical
Scalable
Flexible
Hadoop
Features
www.edureka.in/hadoop-admin
Hadoop History
Doug Cutting & Mike Cafarella
started working on Nutch
NY Times converts 4TB of
Image archives over 100 EC2s
Fastest sort of a TB,
62secs over 1,460 nodes
Sorted a PB in 16.25hours
Over 3.658 nodes
Fastest sort of a TB, 3.5mins
over 910 nodes
Doug Cutting adds DFS &
MapReduce support to Nutch
Google publishes GFS &
MapReduce papers Yahoo! hires Cutting,
Hadoop spins out of Nutch
Facebook launches Hive:
SQL Support for Hadoop
Doug Cutting
Joins Cloudera
Hadoop Summit 2009,
750 attendees
Founded
2002 2003 2004 2005 2006 2007 2008 2009
www.edureka.in/hadoop-admin
Hadoop 1.x Eco-System
Apache Oozie (Workflow)
HDFS (Hadoop Distributed File System)
Pig Latin
Data Analysis
Mahout
Machine Learning
Hive
DW System
MapReduce Framework
HBase
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
www.edureka.in/hadoop-admin
Hadoop is a system for large scale data processing.
It has two main components:
 HDFS – Hadoop Distributed File System (Storage)
 Distributed across “nodes”
 Natively redundant
 NameNode tracks locations.
 MapReduce (Processing)
 Splits a task across processors
 “near” the data & assembles results
 Self-Healing, High Bandwidth
 Clustered storage
 JobTracker manages the TaskTrackers
Hadoop 1.x Core Components
 Additional Administration
Tools:
 Filesystem utilities
 Job scheduling and monitoring
 Web UI
www.edureka.in/hadoop-admin
Data Node
Task
Tracker
Data Node
Task
Tracker
Data Node
Task
Tracker
Data Node
Task
Tracker
Hadoop 1.x Core Components (Contd.)
MapReduce
Engine
HDFS
Cluster
Job Tracker
Admin Node
Name node
www.edureka.in/hadoop-admin
 NameNode:
 master of the system
 maintains and manages the blocks which are present on the
DataNodes
 DataNodes:
 slaves which are deployed on each machine and provide the
actual storage
 responsible for serving read and write requests for the clients
Name Node and Data Nodes
www.edureka.in/hadoop-admin
 Secondary NameNode:
 Not a hot standby for the NameNode
 Connects to NameNode every hour*
 Housekeeping, backup of NemeNode metadata
 Saved metadata can build a failed NameNode
You give me
metadata every
hour, I will make
it secure
Single Point
Failure
Secondary
NameNode
NameNode
Secondary Name Node
metadata
metadata
www.edureka.in/hadoop-admin
What Is MapReduce?
 MapReduce is a programming model
 It is neither platform- nor language-specific
 Record-oriented data processing (key and value)
 Task distributed across multiple nodes
 Where possible, each node processes data
stored on that node
 Consists of two phases
 Map
 Reduce
ValueKey
MapReduce
www.edureka.in/hadoop-admin
What Is MapReduce? (Contd.)
Process can be considered as being similar to a Unix pipeline
cat /my/log | grep '.html' | sort | uniq –c > /my/outfile
MAP SORT REDUCE
www.edureka.in/hadoop-admin
Client
HDFS Map Reduce
Hadoop 1.x – In Summary
Secondary
Name Node
Data
Blocks
Data Node
Name Node Job Tracker
Task Tracker
Map Reduce
Data Node Task Tracker
Map Reduce
….
www.edureka.in/hadoop-admin
Poll Questions
www.edureka.in/hadoop-admin
Hadoop Cluster Administrator
 Deploying the cluster
 Performance and availability of the cluster
 Job scheduling and Management
 Upgrades
 Backup and Recovery
 Monitoring the cluster
 Troubleshooting
Roles and Responsibilities
www.edureka.in/hadoop-admin
Hadoop 1.0 Vs. Hadoop 2.0
Property Hadoop 1.x Hadoop 2.x
NameNodes 1 Many
High Availability Not present Highly Available
Processing Control JobTracker, Task Tracker Resource Manager, Node
Manager, App Master
www.edureka.in/hadoop-admin
MRv1 Vs. MRv2
Data Node
HDFS
(Data Storage)
MapReduce
(data processing)
MapReduce
(Data Processing)
Others
(data Processing)
Hadoop 1.0 Hadoop 2.0
Scheduler
Applications
Manager (AsM)
Job Tracker
YARN
(Cluster Resource Management)
HDFS
(Data Storage)
 Provides a Cluster Level Resource Manager
 Application Level Resource Management (Node
Manager??)
 Provides slots for Jobs other than Map and Reduce
 Problems with Resource utilization
 Slots only for Map and Reduce
www.edureka.in/hadoop-admin
Client
HDFS
YARN
Resource Manager
Hadoop 2.0 - Architecture
Shared
edit logs
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and applies
to its own namespace
Secondary
Name Node
Data Node Data Node
Data Node Data Node
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Standby
NameNode
Node Manager
Container
App
Master
Node Manager
Container
App
Master
Active
NameNode
www.edureka.in/hadoop-admin
 Attempt the following Assignments using the documents present in the LMS:
 Single Node Apache Hadoop 1.0 Installation on Ubuntu
 Execute Linux Basic Commands
 Execute HDFS Hands On
 Cloudera CDH3 and CDH4 Quick VM installation on your local machine
Assignments
Thank You
See You in Class Next Week

More Related Content

What's hot

SAP HANA Cloud Data Lake.pdf
SAP HANA Cloud Data Lake.pdfSAP HANA Cloud Data Lake.pdf
SAP HANA Cloud Data Lake.pdfAnkitSoni884458
 
Adopting a Process-Driven Approach to Master Data Management
Adopting a Process-Driven Approach to Master Data ManagementAdopting a Process-Driven Approach to Master Data Management
Adopting a Process-Driven Approach to Master Data ManagementSoftware AG
 
Forget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataForget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataAlan McSweeney
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance StrategyAnalytics8
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
DELL Technologies - The Complete Portfolio in 25 Minutes
DELL Technologies - The Complete Portfolio in 25 MinutesDELL Technologies - The Complete Portfolio in 25 Minutes
DELL Technologies - The Complete Portfolio in 25 MinutesDell Technologies
 
Data center cooling infrastructure slide
Data center cooling infrastructure slideData center cooling infrastructure slide
Data center cooling infrastructure slideLivin Jose
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceDATAVERSITY
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape CCG
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...
Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...
Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...DATAVERSITY
 
Reference master data management
Reference master data managementReference master data management
Reference master data managementDr. Hamdan Al-Sabri
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewJohn Bao Vuu
 
Gartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data ManagementGartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data ManagementGartner
 

What's hot (20)

SAP HANA Cloud Data Lake.pdf
SAP HANA Cloud Data Lake.pdfSAP HANA Cloud Data Lake.pdf
SAP HANA Cloud Data Lake.pdf
 
Adopting a Process-Driven Approach to Master Data Management
Adopting a Process-Driven Approach to Master Data ManagementAdopting a Process-Driven Approach to Master Data Management
Adopting a Process-Driven Approach to Master Data Management
 
Forget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart DataForget Big Data. It's All About Smart Data
Forget Big Data. It's All About Smart Data
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
DELL Technologies - The Complete Portfolio in 25 Minutes
DELL Technologies - The Complete Portfolio in 25 MinutesDELL Technologies - The Complete Portfolio in 25 Minutes
DELL Technologies - The Complete Portfolio in 25 Minutes
 
Data center cooling infrastructure slide
Data center cooling infrastructure slideData center cooling infrastructure slide
Data center cooling infrastructure slide
 
Master Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and GovernanceMaster Data Management – Aligning Data, Process, and Governance
Master Data Management – Aligning Data, Process, and Governance
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...
Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...
Enterprise Data World Webinars: Master Data Management: Ensuring Value is Del...
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework Overview
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Gartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data ManagementGartner: Seven Building Blocks of Master Data Management
Gartner: Seven Building Blocks of Master Data Management
 

Viewers also liked

Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop AdministrationEdureka!
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksEdureka!
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High AvailabilityDataWorks Summit
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guideChetan Khatri
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 

Viewers also liked (20)

Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High Availability
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 

Similar to Hadoop Administration pdf

Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Edureka!
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoopAditi Yadav
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn HadoopEdureka!
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 

Similar to Hadoop Administration pdf (20)

Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn Hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 

Recently uploaded (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 

Hadoop Administration pdf

  • 2. www.edureka.in/hadoop-admin How It Works…  LIVE classes  Class recordings  Module wise Quizzes and Practical Assignments  24x7 on-demand technical support  Deployment of different clusters  Online certification exam  Lifetime access to the Learning Management System
  • 3. www.edureka.in/hadoop-admin Course Topics  Week 1 – Understanding Big Data – Hadoop Components – Introduction to Hadoop 2.0  Week 2 – Hadoop 2.0 – Hadoop Configuration – Hadoop Cluster Architecture  Week 3 – Different Hadoop Server Roles – Data processing flow – Cluster Network Configuration  Week 4 – Job Scheduling – Fair Scheduler – Monitoring a Hadoop Cluster  Week 5 – Securing your Hadoop Cluster – Kerberos and HDFS Federation – Backup and Recovery  Week 6 – Oozie and Hive Administration – HBase Architecture – HBase Administration
  • 4. www.edureka.in/hadoop-admin Topics for Today  What is Big Data?  Limitations of the existing solutions  Solving the problem with Hadoop  Introduction to Hadoop  Hadoop Eco-System  Hadoop Core Components  MapReduce software framework  Hadoop Cluster Administrator: Roles and Responsibilities  Introduction to Hadoop 2.0
  • 5. www.edureka.in/hadoop-admin What Is Big Data?  Lots of Data (Terabytes or Petabytes).  Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information. A airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time. NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades.
  • 6. www.edureka.in/hadoop-admin IBM’s Definition  IBM’s definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Volume Velocity Variety Characteristics Of Big Data 12 Terabytes of tweets created each day Scrutinizes 5 million trade events created each day to identify potential fraud Sensor data, audio, video, click streams, log files and more
  • 7. www.edureka.in/hadoop-admin  Estimated Global Data Volume:  2011: 1.8 ZB  2015: 7.9 ZB  The world's information doubles every two years  Over the next 10 years:  The number of servers worldwide will grow by 10x  Amount of information managed by enterprise data centers will grow by 50x  Number of “files” enterprise data center handle will grow by 75x Source: http://www.emc.com/leadership/programs/digital-universe.htm, which was based on the 2011 IDC Digital Universe Study Data Volume Is Growing Exponentially
  • 8. www.edureka.in/hadoop-admin What Big Companies Have To Say… “Analyzing Big Data sets will become a key basis for competition.” “Leaders in every sector will have to grapple the implications of Big Data.” McKinsey Gartner Forrester Research “Big Data analytics are rapidly emerging as the preferred solution to business and technology trends that are disrupting.” “Enterprises should not delay implementation of Big Data Analytics.” “Use Hadoop to gain a competitive advantage over more risk-averse enterprises.” “Prioritize Big Data projects that might benefit from Hadoop.”
  • 10. www.edureka.in/hadoop-admin Hadoop Users – In Detail http://wiki.apache.org/hadoop/PoweredBy
  • 11. www.edureka.in/hadoop-admin  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing. What Is Hadoop?
  • 13. www.edureka.in/hadoop-admin Hadoop History Doug Cutting & Mike Cafarella started working on Nutch NY Times converts 4TB of Image archives over 100 EC2s Fastest sort of a TB, 62secs over 1,460 nodes Sorted a PB in 16.25hours Over 3.658 nodes Fastest sort of a TB, 3.5mins over 910 nodes Doug Cutting adds DFS & MapReduce support to Nutch Google publishes GFS & MapReduce papers Yahoo! hires Cutting, Hadoop spins out of Nutch Facebook launches Hive: SQL Support for Hadoop Doug Cutting Joins Cloudera Hadoop Summit 2009, 750 attendees Founded 2002 2003 2004 2005 2006 2007 2008 2009
  • 14. www.edureka.in/hadoop-admin Hadoop 1.x Eco-System Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Mahout Machine Learning Hive DW System MapReduce Framework HBase Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data
  • 15. www.edureka.in/hadoop-admin Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  Distributed across “nodes”  Natively redundant  NameNode tracks locations.  MapReduce (Processing)  Splits a task across processors  “near” the data & assembles results  Self-Healing, High Bandwidth  Clustered storage  JobTracker manages the TaskTrackers Hadoop 1.x Core Components  Additional Administration Tools:  Filesystem utilities  Job scheduling and monitoring  Web UI
  • 16. www.edureka.in/hadoop-admin Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Hadoop 1.x Core Components (Contd.) MapReduce Engine HDFS Cluster Job Tracker Admin Node Name node
  • 17. www.edureka.in/hadoop-admin  NameNode:  master of the system  maintains and manages the blocks which are present on the DataNodes  DataNodes:  slaves which are deployed on each machine and provide the actual storage  responsible for serving read and write requests for the clients Name Node and Data Nodes
  • 18. www.edureka.in/hadoop-admin  Secondary NameNode:  Not a hot standby for the NameNode  Connects to NameNode every hour*  Housekeeping, backup of NemeNode metadata  Saved metadata can build a failed NameNode You give me metadata every hour, I will make it secure Single Point Failure Secondary NameNode NameNode Secondary Name Node metadata metadata
  • 19. www.edureka.in/hadoop-admin What Is MapReduce?  MapReduce is a programming model  It is neither platform- nor language-specific  Record-oriented data processing (key and value)  Task distributed across multiple nodes  Where possible, each node processes data stored on that node  Consists of two phases  Map  Reduce ValueKey MapReduce
  • 20. www.edureka.in/hadoop-admin What Is MapReduce? (Contd.) Process can be considered as being similar to a Unix pipeline cat /my/log | grep '.html' | sort | uniq –c > /my/outfile MAP SORT REDUCE
  • 21. www.edureka.in/hadoop-admin Client HDFS Map Reduce Hadoop 1.x – In Summary Secondary Name Node Data Blocks Data Node Name Node Job Tracker Task Tracker Map Reduce Data Node Task Tracker Map Reduce ….
  • 23. www.edureka.in/hadoop-admin Hadoop Cluster Administrator  Deploying the cluster  Performance and availability of the cluster  Job scheduling and Management  Upgrades  Backup and Recovery  Monitoring the cluster  Troubleshooting Roles and Responsibilities
  • 24. www.edureka.in/hadoop-admin Hadoop 1.0 Vs. Hadoop 2.0 Property Hadoop 1.x Hadoop 2.x NameNodes 1 Many High Availability Not present Highly Available Processing Control JobTracker, Task Tracker Resource Manager, Node Manager, App Master
  • 25. www.edureka.in/hadoop-admin MRv1 Vs. MRv2 Data Node HDFS (Data Storage) MapReduce (data processing) MapReduce (Data Processing) Others (data Processing) Hadoop 1.0 Hadoop 2.0 Scheduler Applications Manager (AsM) Job Tracker YARN (Cluster Resource Management) HDFS (Data Storage)  Provides a Cluster Level Resource Manager  Application Level Resource Management (Node Manager??)  Provides slots for Jobs other than Map and Reduce  Problems with Resource utilization  Slots only for Map and Reduce
  • 26. www.edureka.in/hadoop-admin Client HDFS YARN Resource Manager Hadoop 2.0 - Architecture Shared edit logs All name space edits logged to shared NFS storage; single writer (fencing) Read edit logs and applies to its own namespace Secondary Name Node Data Node Data Node Data Node Data Node Node Manager Container App Master Node Manager Container App Master Standby NameNode Node Manager Container App Master Node Manager Container App Master Active NameNode
  • 27. www.edureka.in/hadoop-admin  Attempt the following Assignments using the documents present in the LMS:  Single Node Apache Hadoop 1.0 Installation on Ubuntu  Execute Linux Basic Commands  Execute HDFS Hands On  Cloudera CDH3 and CDH4 Quick VM installation on your local machine Assignments
  • 28. Thank You See You in Class Next Week