SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Slide 1 www.edureka.in/hadoop
Course Topics
Slide 2 www.edureka.in/hadoop
 Module 1
 Understanding Big Data
 Hadoop Architecture
 Module 2
 Hadoop Cluster Configuration
 Data loading Techniques
 Hadoop Project Environment
 Module 3
 Hadoop MapReduce framework
 Programming in Map Reduce
 Module 4
 Advance MapReduce
 MRUnit testing framework
 Module 5
 Analytics using Pig
 Understanding Pig Latin
 Module 6
 Analytics using Hive
 Understanding HIVE QL
 Module 7
 Advance Hive
 NoSQL Databases and HBASE
 Module 8
 Advance HBASE
 Zookeeper Service
 Module 9
 Hadoop 2.0 – New Features
 Programming in MRv2
 Module 10
 Apache Oozie
 Real world Datasets and Analysis
 Project Discussion
Topics for Today
Slide 3 www.edureka.in/hadoop
 Revision
 Hadoop 2.0 New Features
 HDFS High Availability
 HDFS Federation
 YARN or MRv2
 YARN and Hadoop ecosystem
 YARN Components
 YARN Application Execution
 Running an Application with YARN
 Writing a YARN Application
Lets’s Revise – HBase and ZooKeeper
Master
HFile
RegionServers
ReRgeigoinoSneSrevrevresrs
memstore
WAL
/hbase/region
1
/hbase/region
2
…..
…..
/hbase/region
Zookeeper
HDFS
Slide 4 www.edureka.in/hadoop
Client
HDFS Map Reduce
Secondary
Name Node
Data
Blocks
….
Data Node
Name Node Job Tracker
Task Tracker
Map Reduce
Data Node Task Tracker
Map Reduce
Hadoop 1.0 – In Summary
Data Node Data NodeTask Tracker
Map Reduce
Task Tracker
Map Reduce
Slide 5 www.edureka.in/hadoop
Hadoop 1.0 - Challenges
Slide 6 www.edureka.in/hadoop
Problem Description
NameNode – No Horizontal
Scalability
Single NameNode and Single Namespaces, limited by
NameNode RAM
NameNode – No High Availability
(HA)
NameNode is Single Point of Failure, Need manual recovery
using Secondary NameNode in case of failure
Job Tracker – Overburdened Spends significant portion of time and effort managing the life
cycle of applications
MRv1 – Only Map and Reduce tasks Humongous Data stored in HDFS remains unutilized and
cannot be used for other workloads such as Graph processing
etc.
NameNode – Scale and HA
NameNode - No High Availability
NameNode - No Horizontal Scale
Data
Node
Data
Node
Data
Node
….
Client get Block Locations
Block Management
Read Data
NameNode
NS
Slide 7 www.edureka.in/hadoop
 Secondary NameNode:
 “Not a hot standby” for the NameNode
 Connects to NameNode every hour*
 Housekeeping, backup of NemeNode metadata
 Saved metadata can build a failed NameNode
You give me
metadata every
hour, I will make
it secure
Secondary
NameNode
Single Point
of FailureNameNode
metadata
Slide 8 www.edureka.in/hadoop
metadata
Name Node –Single Point of Failure
Job Tracker – Overburdened
CPU
 Spends a very significant portion of time and
effort managing the life cycle of applications
Network
 Single Listener Thread to communicate
thousands of Map and Reduce Jobs
with
Task Tracker Task Tracker Task Tracker….
Job
Tracker
Slide 9 www.edureka.in/hadoop
MRv1 – Unpredictability in Large Clusters
As the cluster size grow and reaches to 4000 Nodes
 Cascading Failures
 The DataNode failures results in a serious
deterioration of the overall cluster
performance because of attempts to
replicate data and overload live nodes,
through network flooding.
 Multi-tenancy
 As clusters increase in size, you may want
to employ these clusters for a variety of
models. MRv1 dedicates its nodes to
Hadoop and cannot be re-purposed for
other applications and workloads in an
Organization. With the growing popularity
and adoption of cloud computing among
enterprises, this becomes more important.
Slide 10 www.edureka.in/hadoop
 Terabytes and Petabytes of data in HDFS can be used only for MapReduce processing
Unutilized Data in HDFS
www.edureka.in/hadoop
Hadoop 2.0 New Features
Slide 12 www.edureka.in/hadoop
Other important Hadoop 2.0 features
 HDFS Snapshots
 NFSv3 access to data in HDFS
 Support for running Hadoop on MS Windows
 Binary Compatibility for MapReduce applications built on Hadoop 1.0
 Substantial amount of Integration testing with rest of the projects (such as
Hadoop ecosystem
PIG, HIVE) in
Property Hadoop 1.0 Hadoop 2.0
Federation One Namenode and
Namespaces
Multiple Namenode and
Namespaces
High Availability Not present Highly Available
YARN - Processing
Control and Multi-
tenancy
JobTracker, Task Tracker Resource Manager, Node
Manager, App Master,
Capacity Scheduler
Hadoop 2.0 Cluster Architecture - Federation
Namenode
Block Management
NS
Storage
Datanode Datanode…
NamespaceBlockStorage
Namespace
NS1 NSk NSn
NN-1 NN-k NN-n
Common Storage
Datanode 1
…
Datanode 2
…
Datanode m
…
BlockStorage
Pool 1 Pool k Pool n
Block Pools
… …
Hadoop 1.0 Hadoop 2.0
Slide 13 www.edureka.in/hadoop
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html
- provides cross-data center (non-
Hlo
eca
lll
o) s
Tup
hp
eo
rr
et
!fo
!r HDFS, allowing
Slide 14 www.edureka.in/hadoop
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
How does HDFS Federation help HDFS Scale horizontally?
- reduces the load on any single NameNode by using the multiple,
independent NameNode to manage individual parts of the
filesystem namespace
a cluster administrator to split the Block Storage outside the local
cluster.
Annie’s Question
A. In order to scale the name service horizontally, HDFS federation
uses multiple independent Namenodes. The Namenodes are
federated, that is, the Namenodes are independent and don’t require
coordination with each other.
Slide 15 www.edureka.in/hadoop
Annie’s Answer
/finance respectively. What will happ
Hen
elif
loyo
Tu
htr
ey
reto
!!put a file in
Slide 16 www.edureka.in/hadoop
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
You have configured two name nodes to manage /marketing and
/accounting directory?
Annie’s Question
The put will fail. None of the namespace will manage the file and you
will get an IOException with a No such file or directory error.
Slide 17 www.edureka.in/hadoop
Annie’s Answer
Node Manager
HDFS
YARN
Resource
Manager
Shared
edit logs
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and applies
to its own namespace
Secondary
Name Node
Data Node
Standby
NameNode
Active
NameNode
Container
App
Master
Node Manager
Data Node
Container
App
Master
Data Node
Container
Data Node
NDoadtae NMoadneager
Client
App
Master
Node Manager
Container
App
Master
Hadoop 2.0 Cluster Architecture - HA
NameNode High
Availability
Slide 18 www.edureka.in/hadoop
Next Generation
MapReduce
HDFS HIGH AVAILABILITY
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
*Not necessary to
configure Secondary
NameNode
NameNode Recovery Vs. Failover
High Availability in
Hadoop 2.0
NameNode recovery in
Hadoop 1.0
Secondary
Name Node
Standby
NameNode
Active
NameNode
Secondary
Name Node
NameNode
Edit logs
Meta-Data
Automatic failover
to Standby
NameNode
Manually Recover
using Secondary
NameNode
FSImage
www.edureka.in/hadoop
HDFS HA was developed to overcome the following disadvantage in
Hadoop 1.0?
- Single Point Of Failure Of Name•Node
- Only one version can be run in cHlaseslilcoMTaph•eRreed!u!ce
- Too much burden on Job TracMkeyr name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Slide 20 www.edureka.in/hadoop
Annie’s Question
Single Point of Failure of NameNode.
Slide 21 www.edureka.in/hadoop
Annie’s Answer
Hadoop 2.0 – In Summary
Client
HDFS
Distributed Data Storage
Active
NameNode
YARN
Resource Manager
Applications
Secondary
Name Node
Standby
NameNode
Distributed Data Processing
Data Node
Node Manager
Container
App
Master …….
Masters
Slaves
Node Manager
Data Node
Container
App
Master
Data Node
Node Manager
Container
App
Master
Shared
edit logs
Scheduler Manager
(AsM)
NameNode High
Availability
www.edureka.in/hadoop
Next Generation
MapReduce
YARN and Hadoop Ecosystem
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework
HBase
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks
(MPI, GIRAPH)
framework
Slide 23 www.edureka.in/hadoop
YARN
Cluster Resource Management
YARN adds a more general interface to run non-MapReduce
jobs (such as Graph Processing) within the Hadoop
BATCH
(MapReduce)
INTERACTIVE
(Text)
ONLINE
(HBase)
STREAMING
(Storm, S4, …)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave..)
YARN – Moving beyond MapReduce
Slide 24 www.edureka.in/hadoop
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html
Multi-tenancy - Capacity Scheduler
Slide 25 www.edureka.in/hadoop
 Organizes jobs into queues
 Queue shares as %’s of cluster
 FIFO scheduling within each
queue
 Data locality-aware Scheduling
 Hierarchical Queues
To manage the resource within an organization.
 Capacity Guarantees
A fraction to the total available capacity allocated to each Queue.
 Security
To safeguard applications from other users.
 Elasticity
Resources are available in a predictable and elastic manner to queues.
 Multi-tenancy
Set of limit to prevent over-utilization of resources by a single application.
 Operability
Runtime configuration of Queues.
 Resource-based scheduling
If needed, Applications can request more resources than the default.
Executing MapReduce Application on YARN
www.edureka.in/hadoop
MapReduce Application Execution
YARN MR Application Execution Flow
Slide 27 www.edureka.in/hadoop
 MapReduce Job Execution
 Job Submission
 Job Initialization
 Tasks Assignment
 Tasks’ Memory
 Status Updates
 Failure Recovery
 Client
 Submit a MapReduce Job
 Resource Manager
 Manage the resource utilization across
Hadoop Cluster
 Node Manager
 Runs on each Data Node
 Creates execution container
 Monitors Container’s usage
 MapReduce Application Master
 Coordinates and Manages MapReduce Jobs
 Negotiates with Resource Manager to
schedule asks
 The tasks are started by NodeManager(s)
 HDFS
 shares resources and Job’s artefacts
among YARN components
YARN = Yet Another Resource Negotiator
Node Manager
Container Container
Node Manager
App
Master
Container
Node Manager
Container
App
Master
Resource
Manager
Client
Client
MapReduce Status
Job Submission
Node Status
Resource Request
 Resource Manager
 Cluster Level resource
manager
 Long Life, High Quality
Hardware
 Node Manager
 One per Data Node
 Monitors resources on Data
Node
 Application Master
 One per application
 Short life
 Manages task /scheduling
Hadoop 2.0 - YARN
JobHistory
Server
Slide 28 www.edureka.in/hadoop
YARN MR Application Execution Flow
Client
Resource
ManagerApplication
Node Manager
YARNChild
Job Object
1. Run Job 2. Get New Application
Client JVM
3. Copy Job Resources
4. Submit Application
Map or
Reduce
Task
HDFS
MR AppMaster
DataNode
Slide 29 www.edureka.in/hadoop
Management Node
Task JVM
YARN MR Application Execution Flow
Client
Resource
ManagerApplication
Node Manager
YarnChild
Job Object
1. Run Job 2. Get New Application
Client JVM
3. Copy Job Resources
4. Submit Application
5. Start MR AppMaster container
Map or
Reduce
Task
HDFS
MR AppMaster
DataNode
Slide 30 www.edureka.in/hadoop
7. Get Input Splits
8. Request Resources
9. Start container
6. Create container
Management Node
Task JVM
YARN MR Application Execution Flow
Client
Resource
Manager
HDFS
Application
Node Manager
MR AppMaster
YarnChild
Map or
Reduce
Task
Job Object
1. Run Job 2. Get New Application
Client JVM
3. Copy Job Resources
4. Submit Application
5. Start MR AppMaster container
7. Get Input Splits
8. Request Resources
9. Start container
6. Create container
DataNode
12. Execute
10. Create Container
11. Acquire Job Resources
Slide 31 www.edureka.in/hadoop
Management Node
Task JVM
YARN MR Application Execution Flow
Client
Resource
Manager
HDFS
Application
Node Manager
MR AppMaster
YarnChild
Map or
Reduce
Task
Job Object
Client JVM
DataNode
Management Node
Task JVM
Poll for Status
Slide 32 www.edureka.in/hadoop
Update Status
Hadoop 2.0 : YARN Workflow
Resource
Manager
Node Manager
Node Manager
App
Master 2
Node Manager
Node Manager
Node Manager
Container 2.2
Node Manager
Container 2.3
Node Manager
Node Manager
Container 1.1
Container 2.1
Node Manager
Container 1.2
Scheduler
Applications
Manager (AsM)
Node Manager
Node Manager
Node Manager
App
Master 1
www.edureka.in/hadoop
YARN was developed to overcome the following disadvantage in
Hadoop 1.0 MapReduce framework?
- Single Point Of Failure Of Name•Node
- Only one version can be run in cHlaseslilcoMTaph•eRreed!u!ce
- Too much burden on Job TracMkeyr name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Slide 34 www.edureka.in/hadoop
Annie’s Question
Too much burden on Job Tracker.
Slide 35 www.edureka.in/hadoop
Annie’s Answer
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
In YARN, the functionality of JobTracker has been replaced by which
of the following YARN features:
Slide 36 www.edureka.in/hadoop
- Job Scheduling
- Task Monitoring
- Resource Management
- Node management
Annie’s Question
Task Monitoring and Resource Management. The fundamental
idea of YARN is to split up the two major functionalities of the
JobTracker, i.e. resource management and job
scheduling/monitoring, into separate daemons. A global
ResourceManager (RM) for resources and per-application
ApplicationMaster (AM) for task monitoring.
Slide 37 www.edureka.in/hadoop
Annie’s Answer
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
In YARN, which of the following daemons takes care of the container
and the resource utilization by the applications?:
Slide 38 www.edureka.in/hadoop
- Node Manager
- Job Tracker
- Task tracker
- Application Master
- Resource manager
Annie’s Question
ApplicationMaster.
Slide 39 www.edureka.in/hadoop
Annie’s Answer
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Can we run MRv1 Jobs in a YARN enabled Hadoop Cluster?
Slide 40 www.edureka.in/hadoop
- Yes
- No
Annie’s Question
Yes. You need to recompile the Jobs in MRv2 after enabling YARN to
run the Job successfully in a YARN enabled Hadoop Cluster.
Slide 41 www.edureka.in/hadoop
Annie’s Answer
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Which of the following YARN/MRv2 daemon is responsible for
launching the tasks?
Slide 42 www.edureka.in/hadoop
- Task Tracker
- Resource Manager
- ApplicationMaster
- ApplicationMasterServer
Annie’s Question
ApplicationMaster.
Slide 43 www.edureka.in/hadoop
Annie’s Answer
 Write the MapReduce programs using MRv2 for your Module-3 assignments.
 Assignment Programs – WordCount , Patents, Temperature, and Alphabets
 Create a Single Node Apache Hadoop Cluster using the documents present in the LMS.
 Execute the ‘teragen’ example to test the Cluster
Slide 44 www.edureka.in/hadoop
Module-10 Pre-work
Thank You
See You in Class Next Week
www.edureka.in/hadoop
NN High Availability – Quorum Journal Manager
Standby Name
Node
Active Name
Node
Data Node Data Node Data Node Data Node
….
Data Blocks
Journal
Node
Write namespace
modification to edit
logs; only Active Name
Node is the writer
Journal
Node
Journal
Node
Read edit logs and applies to its own
namespace
Data Nodes are configured with the location of both
Name Nodes, and send block location information and
heartbeats to both.

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopGiovanna Roda
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksEdureka!
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop TechnologyOpenDev
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 

Was ist angesagt? (20)

Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 

Andere mochten auch

Crandon Institute Certificate-Executive Assistant
Crandon Institute Certificate-Executive AssistantCrandon Institute Certificate-Executive Assistant
Crandon Institute Certificate-Executive AssistantRuth Malan
 
BIG DATA Nedir ve IBM Çözümleri.
BIG DATA Nedir ve IBM Çözümleri.BIG DATA Nedir ve IBM Çözümleri.
BIG DATA Nedir ve IBM Çözümleri.Cuneyt Goksu
 
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Zekeriya Besiroglu
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Big Data / Büyük Veri Nedir?
Big Data / Büyük Veri Nedir?Big Data / Büyük Veri Nedir?
Big Data / Büyük Veri Nedir?Veli Bahçeci
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 

Andere mochten auch (10)

Crandon Institute Certificate-Executive Assistant
Crandon Institute Certificate-Executive AssistantCrandon Institute Certificate-Executive Assistant
Crandon Institute Certificate-Executive Assistant
 
BIG DATA Nedir ve IBM Çözümleri.
BIG DATA Nedir ve IBM Çözümleri.BIG DATA Nedir ve IBM Çözümleri.
BIG DATA Nedir ve IBM Çözümleri.
 
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Big Data Sunum
Big Data SunumBig Data Sunum
Big Data Sunum
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Big Data / Büyük Veri Nedir?
Big Data / Büyük Veri Nedir?Big Data / Büyük Veri Nedir?
Big Data / Büyük Veri Nedir?
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 

Ähnlich wie Hadoop Developer

Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Edureka!
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn HadoopEdureka!
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfSheetal Jain
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionEdureka!
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahoMartin Ferguson
 
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewHdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewNitesh Ghosh
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
 

Ähnlich wie Hadoop Developer (20)

Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn Hadoop
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
Unit 1
Unit 1Unit 1
Unit 1
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
 
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewHdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 

Mehr von Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Mehr von Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Kürzlich hochgeladen

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Kürzlich hochgeladen (20)

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 

Hadoop Developer

  • 2. Course Topics Slide 2 www.edureka.in/hadoop  Module 1  Understanding Big Data  Hadoop Architecture  Module 2  Hadoop Cluster Configuration  Data loading Techniques  Hadoop Project Environment  Module 3  Hadoop MapReduce framework  Programming in Map Reduce  Module 4  Advance MapReduce  MRUnit testing framework  Module 5  Analytics using Pig  Understanding Pig Latin  Module 6  Analytics using Hive  Understanding HIVE QL  Module 7  Advance Hive  NoSQL Databases and HBASE  Module 8  Advance HBASE  Zookeeper Service  Module 9  Hadoop 2.0 – New Features  Programming in MRv2  Module 10  Apache Oozie  Real world Datasets and Analysis  Project Discussion
  • 3. Topics for Today Slide 3 www.edureka.in/hadoop  Revision  Hadoop 2.0 New Features  HDFS High Availability  HDFS Federation  YARN or MRv2  YARN and Hadoop ecosystem  YARN Components  YARN Application Execution  Running an Application with YARN  Writing a YARN Application
  • 4. Lets’s Revise – HBase and ZooKeeper Master HFile RegionServers ReRgeigoinoSneSrevrevresrs memstore WAL /hbase/region 1 /hbase/region 2 ….. ….. /hbase/region Zookeeper HDFS Slide 4 www.edureka.in/hadoop
  • 5. Client HDFS Map Reduce Secondary Name Node Data Blocks …. Data Node Name Node Job Tracker Task Tracker Map Reduce Data Node Task Tracker Map Reduce Hadoop 1.0 – In Summary Data Node Data NodeTask Tracker Map Reduce Task Tracker Map Reduce Slide 5 www.edureka.in/hadoop
  • 6. Hadoop 1.0 - Challenges Slide 6 www.edureka.in/hadoop Problem Description NameNode – No Horizontal Scalability Single NameNode and Single Namespaces, limited by NameNode RAM NameNode – No High Availability (HA) NameNode is Single Point of Failure, Need manual recovery using Secondary NameNode in case of failure Job Tracker – Overburdened Spends significant portion of time and effort managing the life cycle of applications MRv1 – Only Map and Reduce tasks Humongous Data stored in HDFS remains unutilized and cannot be used for other workloads such as Graph processing etc.
  • 7. NameNode – Scale and HA NameNode - No High Availability NameNode - No Horizontal Scale Data Node Data Node Data Node …. Client get Block Locations Block Management Read Data NameNode NS Slide 7 www.edureka.in/hadoop
  • 8.  Secondary NameNode:  “Not a hot standby” for the NameNode  Connects to NameNode every hour*  Housekeeping, backup of NemeNode metadata  Saved metadata can build a failed NameNode You give me metadata every hour, I will make it secure Secondary NameNode Single Point of FailureNameNode metadata Slide 8 www.edureka.in/hadoop metadata Name Node –Single Point of Failure
  • 9. Job Tracker – Overburdened CPU  Spends a very significant portion of time and effort managing the life cycle of applications Network  Single Listener Thread to communicate thousands of Map and Reduce Jobs with Task Tracker Task Tracker Task Tracker…. Job Tracker Slide 9 www.edureka.in/hadoop
  • 10. MRv1 – Unpredictability in Large Clusters As the cluster size grow and reaches to 4000 Nodes  Cascading Failures  The DataNode failures results in a serious deterioration of the overall cluster performance because of attempts to replicate data and overload live nodes, through network flooding.  Multi-tenancy  As clusters increase in size, you may want to employ these clusters for a variety of models. MRv1 dedicates its nodes to Hadoop and cannot be re-purposed for other applications and workloads in an Organization. With the growing popularity and adoption of cloud computing among enterprises, this becomes more important. Slide 10 www.edureka.in/hadoop
  • 11.  Terabytes and Petabytes of data in HDFS can be used only for MapReduce processing Unutilized Data in HDFS www.edureka.in/hadoop
  • 12. Hadoop 2.0 New Features Slide 12 www.edureka.in/hadoop Other important Hadoop 2.0 features  HDFS Snapshots  NFSv3 access to data in HDFS  Support for running Hadoop on MS Windows  Binary Compatibility for MapReduce applications built on Hadoop 1.0  Substantial amount of Integration testing with rest of the projects (such as Hadoop ecosystem PIG, HIVE) in Property Hadoop 1.0 Hadoop 2.0 Federation One Namenode and Namespaces Multiple Namenode and Namespaces High Availability Not present Highly Available YARN - Processing Control and Multi- tenancy JobTracker, Task Tracker Resource Manager, Node Manager, App Master, Capacity Scheduler
  • 13. Hadoop 2.0 Cluster Architecture - Federation Namenode Block Management NS Storage Datanode Datanode… NamespaceBlockStorage Namespace NS1 NSk NSn NN-1 NN-k NN-n Common Storage Datanode 1 … Datanode 2 … Datanode m … BlockStorage Pool 1 Pool k Pool n Block Pools … … Hadoop 1.0 Hadoop 2.0 Slide 13 www.edureka.in/hadoop http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html
  • 14. - provides cross-data center (non- Hlo eca lll o) s Tup hp eo rr et !fo !r HDFS, allowing Slide 14 www.edureka.in/hadoop My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. How does HDFS Federation help HDFS Scale horizontally? - reduces the load on any single NameNode by using the multiple, independent NameNode to manage individual parts of the filesystem namespace a cluster administrator to split the Block Storage outside the local cluster. Annie’s Question
  • 15. A. In order to scale the name service horizontally, HDFS federation uses multiple independent Namenodes. The Namenodes are federated, that is, the Namenodes are independent and don’t require coordination with each other. Slide 15 www.edureka.in/hadoop Annie’s Answer
  • 16. /finance respectively. What will happ Hen elif loyo Tu htr ey reto !!put a file in Slide 16 www.edureka.in/hadoop My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. You have configured two name nodes to manage /marketing and /accounting directory? Annie’s Question
  • 17. The put will fail. None of the namespace will manage the file and you will get an IOException with a No such file or directory error. Slide 17 www.edureka.in/hadoop Annie’s Answer
  • 18. Node Manager HDFS YARN Resource Manager Shared edit logs All name space edits logged to shared NFS storage; single writer (fencing) Read edit logs and applies to its own namespace Secondary Name Node Data Node Standby NameNode Active NameNode Container App Master Node Manager Data Node Container App Master Data Node Container Data Node NDoadtae NMoadneager Client App Master Node Manager Container App Master Hadoop 2.0 Cluster Architecture - HA NameNode High Availability Slide 18 www.edureka.in/hadoop Next Generation MapReduce HDFS HIGH AVAILABILITY http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html *Not necessary to configure Secondary NameNode
  • 19. NameNode Recovery Vs. Failover High Availability in Hadoop 2.0 NameNode recovery in Hadoop 1.0 Secondary Name Node Standby NameNode Active NameNode Secondary Name Node NameNode Edit logs Meta-Data Automatic failover to Standby NameNode Manually Recover using Secondary NameNode FSImage www.edureka.in/hadoop
  • 20. HDFS HA was developed to overcome the following disadvantage in Hadoop 1.0? - Single Point Of Failure Of Name•Node - Only one version can be run in cHlaseslilcoMTaph•eRreed!u!ce - Too much burden on Job TracMkeyr name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Slide 20 www.edureka.in/hadoop Annie’s Question
  • 21. Single Point of Failure of NameNode. Slide 21 www.edureka.in/hadoop Annie’s Answer
  • 22. Hadoop 2.0 – In Summary Client HDFS Distributed Data Storage Active NameNode YARN Resource Manager Applications Secondary Name Node Standby NameNode Distributed Data Processing Data Node Node Manager Container App Master ……. Masters Slaves Node Manager Data Node Container App Master Data Node Node Manager Container App Master Shared edit logs Scheduler Manager (AsM) NameNode High Availability www.edureka.in/hadoop Next Generation MapReduce
  • 23. YARN and Hadoop Ecosystem Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) framework Slide 23 www.edureka.in/hadoop YARN Cluster Resource Management YARN adds a more general interface to run non-MapReduce jobs (such as Graph Processing) within the Hadoop
  • 24. BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm, S4, …) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) YARN – Moving beyond MapReduce Slide 24 www.edureka.in/hadoop http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html
  • 25. Multi-tenancy - Capacity Scheduler Slide 25 www.edureka.in/hadoop  Organizes jobs into queues  Queue shares as %’s of cluster  FIFO scheduling within each queue  Data locality-aware Scheduling  Hierarchical Queues To manage the resource within an organization.  Capacity Guarantees A fraction to the total available capacity allocated to each Queue.  Security To safeguard applications from other users.  Elasticity Resources are available in a predictable and elastic manner to queues.  Multi-tenancy Set of limit to prevent over-utilization of resources by a single application.  Operability Runtime configuration of Queues.  Resource-based scheduling If needed, Applications can request more resources than the default.
  • 26. Executing MapReduce Application on YARN www.edureka.in/hadoop MapReduce Application Execution
  • 27. YARN MR Application Execution Flow Slide 27 www.edureka.in/hadoop  MapReduce Job Execution  Job Submission  Job Initialization  Tasks Assignment  Tasks’ Memory  Status Updates  Failure Recovery  Client  Submit a MapReduce Job  Resource Manager  Manage the resource utilization across Hadoop Cluster  Node Manager  Runs on each Data Node  Creates execution container  Monitors Container’s usage  MapReduce Application Master  Coordinates and Manages MapReduce Jobs  Negotiates with Resource Manager to schedule asks  The tasks are started by NodeManager(s)  HDFS  shares resources and Job’s artefacts among YARN components
  • 28. YARN = Yet Another Resource Negotiator Node Manager Container Container Node Manager App Master Container Node Manager Container App Master Resource Manager Client Client MapReduce Status Job Submission Node Status Resource Request  Resource Manager  Cluster Level resource manager  Long Life, High Quality Hardware  Node Manager  One per Data Node  Monitors resources on Data Node  Application Master  One per application  Short life  Manages task /scheduling Hadoop 2.0 - YARN JobHistory Server Slide 28 www.edureka.in/hadoop
  • 29. YARN MR Application Execution Flow Client Resource ManagerApplication Node Manager YARNChild Job Object 1. Run Job 2. Get New Application Client JVM 3. Copy Job Resources 4. Submit Application Map or Reduce Task HDFS MR AppMaster DataNode Slide 29 www.edureka.in/hadoop Management Node Task JVM
  • 30. YARN MR Application Execution Flow Client Resource ManagerApplication Node Manager YarnChild Job Object 1. Run Job 2. Get New Application Client JVM 3. Copy Job Resources 4. Submit Application 5. Start MR AppMaster container Map or Reduce Task HDFS MR AppMaster DataNode Slide 30 www.edureka.in/hadoop 7. Get Input Splits 8. Request Resources 9. Start container 6. Create container Management Node Task JVM
  • 31. YARN MR Application Execution Flow Client Resource Manager HDFS Application Node Manager MR AppMaster YarnChild Map or Reduce Task Job Object 1. Run Job 2. Get New Application Client JVM 3. Copy Job Resources 4. Submit Application 5. Start MR AppMaster container 7. Get Input Splits 8. Request Resources 9. Start container 6. Create container DataNode 12. Execute 10. Create Container 11. Acquire Job Resources Slide 31 www.edureka.in/hadoop Management Node Task JVM
  • 32. YARN MR Application Execution Flow Client Resource Manager HDFS Application Node Manager MR AppMaster YarnChild Map or Reduce Task Job Object Client JVM DataNode Management Node Task JVM Poll for Status Slide 32 www.edureka.in/hadoop Update Status
  • 33. Hadoop 2.0 : YARN Workflow Resource Manager Node Manager Node Manager App Master 2 Node Manager Node Manager Node Manager Container 2.2 Node Manager Container 2.3 Node Manager Node Manager Container 1.1 Container 2.1 Node Manager Container 1.2 Scheduler Applications Manager (AsM) Node Manager Node Manager Node Manager App Master 1 www.edureka.in/hadoop
  • 34. YARN was developed to overcome the following disadvantage in Hadoop 1.0 MapReduce framework? - Single Point Of Failure Of Name•Node - Only one version can be run in cHlaseslilcoMTaph•eRreed!u!ce - Too much burden on Job TracMkeyr name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Slide 34 www.edureka.in/hadoop Annie’s Question
  • 35. Too much burden on Job Tracker. Slide 35 www.edureka.in/hadoop Annie’s Answer
  • 36. Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. In YARN, the functionality of JobTracker has been replaced by which of the following YARN features: Slide 36 www.edureka.in/hadoop - Job Scheduling - Task Monitoring - Resource Management - Node management Annie’s Question
  • 37. Task Monitoring and Resource Management. The fundamental idea of YARN is to split up the two major functionalities of the JobTracker, i.e. resource management and job scheduling/monitoring, into separate daemons. A global ResourceManager (RM) for resources and per-application ApplicationMaster (AM) for task monitoring. Slide 37 www.edureka.in/hadoop Annie’s Answer
  • 38. Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. In YARN, which of the following daemons takes care of the container and the resource utilization by the applications?: Slide 38 www.edureka.in/hadoop - Node Manager - Job Tracker - Task tracker - Application Master - Resource manager Annie’s Question
  • 40. Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Can we run MRv1 Jobs in a YARN enabled Hadoop Cluster? Slide 40 www.edureka.in/hadoop - Yes - No Annie’s Question
  • 41. Yes. You need to recompile the Jobs in MRv2 after enabling YARN to run the Job successfully in a YARN enabled Hadoop Cluster. Slide 41 www.edureka.in/hadoop Annie’s Answer
  • 42. Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Which of the following YARN/MRv2 daemon is responsible for launching the tasks? Slide 42 www.edureka.in/hadoop - Task Tracker - Resource Manager - ApplicationMaster - ApplicationMasterServer Annie’s Question
  • 44.  Write the MapReduce programs using MRv2 for your Module-3 assignments.  Assignment Programs – WordCount , Patents, Temperature, and Alphabets  Create a Single Node Apache Hadoop Cluster using the documents present in the LMS.  Execute the ‘teragen’ example to test the Cluster Slide 44 www.edureka.in/hadoop Module-10 Pre-work
  • 45. Thank You See You in Class Next Week
  • 46. www.edureka.in/hadoop NN High Availability – Quorum Journal Manager Standby Name Node Active Name Node Data Node Data Node Data Node Data Node …. Data Blocks Journal Node Write namespace modification to edit logs; only Active Name Node is the writer Journal Node Journal Node Read edit logs and applies to its own namespace Data Nodes are configured with the location of both Name Nodes, and send block location information and heartbeats to both.