SlideShare ist ein Scribd-Unternehmen logo
1 von 87
Hadoop Training
What’s in it for you?
Need for Hadoop
What’s in it for you?
Need for Hadoop
What is Hadoop?
What’s in it for you?
Need for Hadoop
What is Hadoop?
Hadoop Ecosystem
What’s in it for you?
Need for Hadoop
What is Hadoop?
Hadoop Ecosystem
Hadoop Features
What’s in it for you?
Need for Hadoop
What is Hadoop?
Hadoop Ecosystem
Hadoop Features
What is HDFS?
What’s in it for you?
Need for Hadoop
What is Hadoop?
Hadoop Ecosystem
Hadoop Features
What is HDFS?
What is MapReduce?
What’s in it for you?
Need for Hadoop
What is Hadoop?
Hadoop Ecosystem
Hadoop Features
What is HDFS?
What is MapReduce?
What is YARN?
What’s in it for you?
Need for Hadoop
What is Hadoop?
Hadoop Ecosystem
Hadoop Features
What is HDFS?
What is MapReduce?
What is YARN?
Bank case study
Need for HadoopNeed for Hadoop
JUNe
In today’s world, data is
increasingly growing from
heterogenous sources like social
media, aviation, logistics, e-
commerce, etc.
Need for HadoopNeed for Hadoop
JUNe
All these digital data is expected to
reach 163 zettabytes by 2025
( 1 ZB = 10 TB )
9
Need for HadoopNeed for Hadoop
JUNe
Companies face problems in storing and
processing these vast volumes of data
Need for HadoopNeed for Hadoop
JUNe
Solution is big data
technologies such as
What is Hadoop?
Open source framework to store
and process huge volumes of data
What is Hadoop?
Open source framework to store
and process huge volumes of data
Stores large volumes of data in multiple data nodes
Data DN1 DN2 DN3 DN4
What is Hadoop?
Open source framework to store
and process huge volumes of data
Stores large volumes of data in multiple data nodes
Data DN1 DN2 DN3 DN4
Processes data parallelly in multiple data nodes
Components of Hadoop
HDFS – Distributed data storage1
Components of Hadoop
MapReduce – Parallel data processing2
HDFS – Distributed data storage1
Components of Hadoop
MapReduce – Parallel data processing2
YARN – Cluster resource management3
HDFS – Distributed data storage1
Hadoop Ecosystem
Data Collection
and ingestion
Work Flow
Pig
(Scripting)
Hive
(SQL Query)
Interactive
Analysis
Machine
Learning
Streaming
Read/write
access to data
Hadoop Distributed Files System
Cluster Resource Management
Data Processing
Management and
Monitoring
Hadoop Features
Scalable Fault tolerantFlexible
Distributed
storage
Cost effective
Robust
ecosystem
Robust
ecosystem
Hadoop Features
Scalable Fault tolerantFlexible
Distributed
storage
Cost effective
Hadoop in flexible in storing any type of data, be it
structured, semi structured or unstructured data
Robust
ecosystem
Distributed
storage
Hadoop Features
Scalable Fault tolerantFlexible
Cost effective
As the volume of data grows, new node machines can be
easily added and scaled to the Hadoop cluster
Robust
ecosystem
Distributed
storage
Hadoop Features
Scalable Fault tolerantFlexible
Cost effective
Data stored on HDFS gets replicated automatically on to
different data nodes. This brings high fault tolerance when a
data node crashes
Robust
ecosystem
Distributed
storage
Hadoop Features
Scalable Fault tolerantFlexible
Cost effective
Hadoop supports distributed data storage and hence allows
faster processing of data
Distributed
storage
Hadoop Features
Scalable Fault tolerantFlexible
Cost effective
Hadoop has a robust ecosystem that suits the analytical needs
of small and big organizations. These include spark, pig, hive,
mahout, etc.
Robust
ecosystem
Robust
ecosystem
Distributed
storage
Hadoop Features
Scalable Fault tolerantFlexible
Cost effective
Hadoop stores and processes data on a cluster of commodity
hardware, resulting in a substantial reduction of cost per
terabyte of storage
Hadoop use case
Before 2008 economic recession, every bank
maintained a legacy data warehouse
Data warehouse
Home mortgage details, credit card
transactions and other financial details of
every customer was restricted to local
database systems
Hadoop use case
Before 2008 economic recession, every bank
maintained a legacy data warehouse
Data warehouse
Home mortgage details, credit card
transactions and other financial details of
every customer was restricted to local
database systems
Banks could not store and
process data efficiently
Failed to build a comprehensive risk
portfolio for their customers
Hadoop use case
After 2008 economic recession, most of the
financial institutions and national monetary
associations started maintaining a single
Hadoop Cluster containing more than
petabytes of financial data
Hadoop cluster
Hadoop use case
After 2008 economic recession, most of the
financial institutions and national monetary
associations started maintaining a single
Hadoop Cluster containing more than
petabytes of financial data
Hadoop cluster
Hadoop use case
After 2008 economic recession, most of the
financial institutions and national monetary
associations started maintaining a single
Hadoop Cluster containing more than
petabytes of financial data
Hadoop cluster
Along with transaction data, it could also store
call records, email, chat
and web logs
Hadoop use case
After 2008 economic recession, most of the
financial institutions and national monetary
associations started maintaining a single
Hadoop Cluster containing more than
petabytes of financial data
Hadoop cluster
Along with transaction data, it could also store
call records, email, chat
and web logs
Data is analyzed to perform sentiment
analysis, text processing,
pattern matching
Hadoop use case
Banking and financial giant
with services in more than 100
nations
With over 150 petabytes of data, 30,000
databases, 3.5 billion log records, data is the
oil for JP Morgan
Hadoop use case
Banking and financial giant
with services in more than 100
nations
With over 150 petabytes of data, 30,000
databases, 3.5 billion log records, data is the
oil for JP Morgan
Storing vast volumes of unstructured
data allows the company to collect web
logs, transaction data, social media
data, etc.
Hadoop use case
Banking and financial giant
with services in more than 100
nations
With over 150 petabytes of data, 30,000
databases, 3.5 billion log records, data is the
oil for JP Morgan
Storing vast volumes of unstructured
data allows the company to collect web
logs, transaction data, social media
data, etc.
Uses Hadoop framework for risk
management and detecting fraud
transactions
What is HDFS?
Hadoop Distributed File System (HDFS) is the storage layer of
Hadoop that stores data in multiple data nodes
What is HDFS?
Hadoop Distributed File System (HDFS) is the storage layer of
Hadoop that stores data in multiple data nodes
Datanode2 Datanode3Datanode1 Datanode4
Big Data
What is HDFS?
In HDFS, data gets divided into multiple blocks and
the blocks are stored on multiple nodes
What is HDFS?
Each block of data is stored on multiple data nodes
and by default has 128 MB of data
300 MB Data
128 MB 128 MB 44 MB
In HDFS, data gets divided into multiple blocks and
the blocks are stored on multiple nodes
Datanode Datanode Datanode
HDFS Architecture
Secondary
Namenode
Namenode
Master
Namenode is the master server Secondary Namenode is the
backup server
Namenode and Secondary Namenode are the master daemons
HDFS Architecture
Secondary
Namenode
Namenode
Master
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,….):
/home/foo/data, 3, …
Namenode holds metadata information about the various
Datanodes, their location, the size of each block, etc.
HDFS Architecture
Secondary
Namenode
Namenode
Master
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,….):
/home/foo/data, 3, …
File.txt
Helps to execute file system namespace operations –
opening, closing, renaming files and directories
HDFS Architecture
Secondary
Namenode
Namenode
Master
Metadata in Disk
Edit log Fsimage
Metadata in RAM
Metadata (Name, replicas,….):
/home/foo/data, 3, …
File.txt
Maintains
Metadata in Disk
Edit log Fsimage
Secondary Namenode server is responsible for
maintaining a copy of Metadata in disk
HDFS Architecture
Secondary
Namenode
Namenode
Master
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode N
……….
Datanodes are the slave daemons that store and maintain the data blocks
Slave
HDFS Architecture
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode 4 Datanode 5
B4 B2 B4B3
Client
Metadata ops Metadata (Name, replicas, ….):
/home/foo/data, 3, …
Read
Replication
Namenode
Datanode reads and writes client’s request and performs block creation, deletion
and replication on instruction from the Namenode
HDFS Architecture
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode 4 Datanode 5
B4 B2 B4B3
Client
Metadata ops Metadata (Name, replicas, ….):
/home/foo/data, 3, …
Block ops
Read
Replication
Namenode
Datanode reads and writes client’s request and performs block creation, deletion
and replication on instruction from the Namenode
HDFS Architecture
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode 4 Datanode 5
B4 B2 B4B3
Client
Metadata ops Metadata (Name, replicas, ….):
/home/foo/data, 3, …
Block ops
Read
Replication
Response from the Namenode
that the operation was
successful
Namenode
Datanode reads and writes client’s request and performs block creation, deletion
and replication on instruction from the Namenode
HDFS Architecture
Datanode 1
B1 B2
Datanode 2
B1 B3
Datanode 3
B2 B3
Datanode 4 Datanode 5
B4 B2 B4B3
Client
Metadata ops Metadata (Name, replicas, ….):
/home/foo/data, 3, …
Client
Block ops
Read
Write
Replication
Response from the Namenode
that the operation was
successful
Namenode
Datanode reads and writes client’s request and performs block creation, deletion
and replication on instruction from the Namenode
HDFS Write
MasterClient
Where can I write &
store my data?
128 MB
128 MB
44 MB
300 MB
split
Data is split into multiple blocks on
128 MB each
HDFS Write
MasterClient Datanodes
Where can I write &
store my data?
Finds the datanodes
available
128 MB
128 MB
44 MB
300 MB
split
HDFS Write
MasterClient Datanodes
Where can I write &
store my data?
Finds the datanodes
available
128 MB
128 MB
44 MB
300 MB
Write the 1st block of data to A3,
B2, B4
split
128 MB
128 MB
128 MB
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
Rack 1 Rack 2 Rack 3
Data block is replicated thrice on different datanotes
HDFS Write
MasterClient Datanodes
Where can I write &
store my data?
Finds the datanodes
available
128 MB
128 MB
44 MB
300 MB
split
128 MB
128 MB
128 MB
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
Rack 1 Rack 2 Rack 3
Similarly, the other 2 blocks are written on to different datanodes
128 MB 128 MB
128 MB
44 MB44 MB
44 MB
HDFS Read
MasterClient
I want to read my file
128 MB
128 MB
44 MB 128 MB
128 MB
128 MB
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
Rack 1 Rack 2 Rack 3
128 MB 128 MB
128 MB
44 MB44 MB
44 MB
HDFS Read
MasterClient Datanodes
I want to read my file
Finds the datanodes to
read from
128 MB
128 MB
44 MB 128 MB
128 MB
128 MB
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
Rack 1 Rack 2 Rack 3
128 MB 128 MB
128 MB
44 MB44 MB
44 MB
HDFS Read
MasterClient Datanodes
I want to read my file
Finds the datanodes to
read from
128 MB
128 MB
44 MB
Read data from A2, A3, B1
128 MB
128 MB
128 MB
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
Rack 1 Rack 2 Rack 3
128 MB 128 MB
128 MB
44 MB44 MB
44 MB
Importing data to HDFS
Relational databases
RDBMS Data
warehouse
SQOOP is used to import data from relational databases on to HDFS
Importing data to HDFS
Relational databases Streaming data
RDBMS Data
warehouse
Sensor
Web
server
FLUME is used to import streaming data from sensors and web servers on to HDFS
What is MapReduce?
MapReduce is a programming model to process large datasets parallelly
on different nodes
What is MapReduce?
MapReduce is a programming model to process large datasets parallelly
on different nodes
Data is processed simultaneously on
different slave nodes
Slave node 1 Slave node 2
Slave node 3 Slave node 4
Master node
MapReduce Workflow
Input
MapReduce Workflow
Map Tasks
Input
Map( )
MapReduce Workflow
Map Tasks Shuffle and
sort
Input
Map( )
MapReduce Workflow
Map Tasks Shuffle and
sort
Reduce
Tasks
Input
Map( ) Reduce( )
MapReduce Workflow
Map Tasks Shuffle and
sort
Reduce
Tasks
OutputInput
Map( ) Reduce( )
MapReduce Example
Input
Square Red Triangle Blue Circle Green
Square Green Triangle White Cube Blue
Cube Yellow Circle Red Cube Blue
Hexagon Green Square Blue Cube Yellow
MapReduce Example
Square Red Triangle Blue Circle Green
Square Green Triangle White Cube Blue
Cube Yellow Circle Red Cube Blue
Hexagon Green Square Blue Cube Yellow
Square Red Triangle Blue Circle Green
Square Green Triangle White Cube Blue
Cube Yellow Circle Red Cube Blue
Hexagon Green Square Blue Cube Yellow
Map Function
Split step
MapReduce Example
Square Red Triangle Blue Circle Green
Square Green Triangle White Cube Blue
Cube Yellow Circle Red Cube Blue
Hexagon Green Square Blue Cube Yellow
Square = 1
Red = 1
Triangle = 1
Blue = 1
Circle = 1
Green = 1
Square = 1
Green = 1
Triangle = 1
White = 1
Cube = 1
Blue = 1
Cube = 1
Yellow = 1
Circle = 1
Red = 1
Cube = 1
Blue = 1
Hexagon = 1
Green = 1
Square = 1
Blue = 1
Cube = 1
Yellow = 1
Map step
MapReduce Example
Square = 1
Red = 1
Triangle = 1
Blue = 1
Circle = 1
Green = 1
Square = 1
Green = 1
Triangle = 1
White = 1
Cube = 1
Blue = 1
Cube = 1
Yellow = 1
Circle = 1
Red = 1
Cube = 1
Blue = 1
Hexagon = 1
Green = 1
Square = 1
Blue = 1
Cube = 1
Yellow = 1
Merge step
Square = {1,1}
Red = {1}
Triangle = {1,1}
Blue = {1,1}
Circle = {1}
Green = {1,1}
White = {1}
Cube = {1}
Cube = {1,1,1}
Yellow = {1,1}
Circle = {1}
Red = {1}
Blue = {1,1}
Hexagon = {1}
Green = {1}
Square = {1}
Merge step
Square = {1,1,1}
Red = {1,1}
Triangle = {1,1}
Blue = {1,1,1,1}
Circle = {1,1}
Green = {1,1,1}
White = {1}
Cube = {1,1,1,1}
Yellow = {1,1}
Hexagon = {1}
MapReduce Example
Square = {1,1,1}
Red = {1,1}
Triangle = {1,1}
Blue = {1,1,1,1}
Circle = {1,1}
Green = {1,1,1}
White = {1}
Cube = {1,1,1,1}
Yellow = {1,1}
Hexagon = {1}
Blue = {1,1,1,1}
Circle = {1,1}
Cube = {1,1,1,1}
Green = {1,1,1}
Hexagon = {1}
Red = {1,1}
Square = {1,1,1}
Triangle = {1,1}
White = {1}
Yellow = {1,1}
Shuffle and sort step Reduce step
Blue = 4
Circle = 2
Cube = 4
Green = 3
Hexagon = 1
Red = 2
Square = 3
Triangle = 2
White = 1
Yellow = 2
YARN – Yet Another Resource Negotiator
YARN was introduced in Hadoop 2.0 to solve the issues in Hadoop 1.0 (MR 1)
such as scalability, availability of nodes, resource utilization, etc.
YARN – Yet Another Resource Negotiator
YARN was introduced in Hadoop 2.0 to solve the issues in Hadoop 1.0 (MR 1)
such as scalability, availability of nodes, resource utilization, etc.
YARN is the cluster resource management layer of Hadoop
that schedules jobs and assigns resources to running
applications
YARN – Yet Another Resource Negotiator
YARN was introduced in Hadoop 2.0 to solve the issues in Hadoop 1.0 (MR 1)
such as scalability, availability of nodes, resource utilization, etc.
YARN is the cluster resource management layer of Hadoop
that schedules jobs and assigns resources to running
applications
MapReduce
application
Memory CPU
RAM
YARN – Yet Another Resource Negotiator
Client
YARN – Yet Another Resource Negotiator
Resource
ManagerClient
Job Submission
Submit job
request
Client submits an application to the
ResourceManager
YARN – Yet Another Resource Negotiator
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Submit job
request
YARN – Yet Another Resource Negotiator
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
Submit job
request
NodeManager sends its status to
ResourceManager
YARN – Yet Another Resource Negotiator
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Submit job
request
ApplicationMaster contacts the related
NodeManager
YARN – Yet Another Resource Negotiator
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
Container executes the ApplicationMaster
Bank case study
VIRTUAL BANK
You own a Virtual Bank that is
generating a lot of customer
transaction data and uses RDBMS
to store the data
RDBMS
Bank case study
VIRTUAL BANK
But, the bank’s data is rapidly
increasing and RDBMS has become
inefficient in handling such large
volumes of data
RDBMS
Bank case study
VIRTUAL BANK
You need a solution to move your
bank data from traditional RDBMS
to more flexible and scalable data
storage
Bank case study
VIRTUAL BANK
What if I use Hadoop Distributed
File System (HDFS) to store the
data?
Bank case study
VIRTUAL BANK
HDFS can easily store large
volumes of data. So, let me use
Sqoop to move all the bank’s data
from RDBMS onto HDFS
RDBMS
Bank case study
VIRTUAL BANK
This will also allow us to analyze
customer data using Sqoop
commands. Now, let’s see how to
do this
RDBMS
Key Takeaways
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoop Tutorial |Simplilearn

Weitere ähnliche Inhalte

Was ist angesagt?

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Simplilearn
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
Simplilearn
 

Was ist angesagt? (20)

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop
HadoopHadoop
Hadoop
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 
Big data
Big dataBig data
Big data
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Sqoop
SqoopSqoop
Sqoop
 

Ähnlich wie Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoop Tutorial |Simplilearn

field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
Martin Ferguson
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
preetik9044
 

Ähnlich wie Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoop Tutorial |Simplilearn (20)

Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 

Mehr von Simplilearn

What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
Simplilearn
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Simplilearn
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
Simplilearn
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Simplilearn
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
Simplilearn
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Simplilearn
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Simplilearn
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Simplilearn
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
Simplilearn
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
Simplilearn
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
Simplilearn
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
Simplilearn
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Simplilearn
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
Simplilearn
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
Simplilearn
 

Mehr von Simplilearn (20)

ChatGPT in Cybersecurity
ChatGPT in CybersecurityChatGPT in Cybersecurity
ChatGPT in Cybersecurity
 
Whatis SQL Injection.pptx
Whatis SQL Injection.pptxWhatis SQL Injection.pptx
Whatis SQL Injection.pptx
 
Top 5 High Paying Cloud Computing Jobs in 2023
 Top 5 High Paying Cloud Computing Jobs in 2023  Top 5 High Paying Cloud Computing Jobs in 2023
Top 5 High Paying Cloud Computing Jobs in 2023
 
Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024Types Of Cloud Jobs In 2024
Types Of Cloud Jobs In 2024
 
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
Top 12 AI Technologies To Learn 2024 | Top AI Technologies in 2024 | AI Trend...
 
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
What is LSTM ?| Long Short Term Memory Explained with Example | Deep Learning...
 
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
Top 10 Chat GPT Use Cases | ChatGPT Applications | ChatGPT Tutorial For Begin...
 
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
React JS Vs Next JS - What's The Difference | Next JS Tutorial For Beginners ...
 
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
Backpropagation in Neural Networks | Back Propagation Algorithm with Examples...
 
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
How to Become a Business Analyst ?| Roadmap to Become Business Analyst | Simp...
 
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
Career Opportunities In Artificial Intelligence 2023 | AI Job Opportunities |...
 
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
Programming for Beginners | How to Start Coding in 2023? | Introduction to Pr...
 
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
Best IDE for Programming in 2023 | Top 8 Programming IDE You Should Know | Si...
 
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
React 18 Overview | React 18 New Features and Changes | React 18 Tutorial 202...
 
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
What Is Next JS ? | Introduction to Next JS | Basics of Next JS | Next JS Tut...
 
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
How To Become an SEO Expert In 2023 | SEO Expert Tutorial | SEO For Beginners...
 
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
WordPress Tutorial for Beginners 2023 | What Is WordPress and How Does It Wor...
 
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
Blogging For Beginners 2023 | How To Create A Blog | Blogging Tutorial | Simp...
 
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
How To Start A Blog In 2023 | Pros And Cons Of Blogging | Blogging Tutorial |...
 
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
How to Increase Website Traffic ? | 10 Ways To Increase Website Traffic in 20...
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoop Tutorial |Simplilearn

  • 2. What’s in it for you? Need for Hadoop
  • 3. What’s in it for you? Need for Hadoop What is Hadoop?
  • 4. What’s in it for you? Need for Hadoop What is Hadoop? Hadoop Ecosystem
  • 5. What’s in it for you? Need for Hadoop What is Hadoop? Hadoop Ecosystem Hadoop Features
  • 6. What’s in it for you? Need for Hadoop What is Hadoop? Hadoop Ecosystem Hadoop Features What is HDFS?
  • 7. What’s in it for you? Need for Hadoop What is Hadoop? Hadoop Ecosystem Hadoop Features What is HDFS? What is MapReduce?
  • 8. What’s in it for you? Need for Hadoop What is Hadoop? Hadoop Ecosystem Hadoop Features What is HDFS? What is MapReduce? What is YARN?
  • 9. What’s in it for you? Need for Hadoop What is Hadoop? Hadoop Ecosystem Hadoop Features What is HDFS? What is MapReduce? What is YARN? Bank case study
  • 10. Need for HadoopNeed for Hadoop JUNe In today’s world, data is increasingly growing from heterogenous sources like social media, aviation, logistics, e- commerce, etc.
  • 11. Need for HadoopNeed for Hadoop JUNe All these digital data is expected to reach 163 zettabytes by 2025 ( 1 ZB = 10 TB ) 9
  • 12. Need for HadoopNeed for Hadoop JUNe Companies face problems in storing and processing these vast volumes of data
  • 13. Need for HadoopNeed for Hadoop JUNe Solution is big data technologies such as
  • 14. What is Hadoop? Open source framework to store and process huge volumes of data
  • 15. What is Hadoop? Open source framework to store and process huge volumes of data Stores large volumes of data in multiple data nodes Data DN1 DN2 DN3 DN4
  • 16. What is Hadoop? Open source framework to store and process huge volumes of data Stores large volumes of data in multiple data nodes Data DN1 DN2 DN3 DN4 Processes data parallelly in multiple data nodes
  • 17. Components of Hadoop HDFS – Distributed data storage1
  • 18. Components of Hadoop MapReduce – Parallel data processing2 HDFS – Distributed data storage1
  • 19. Components of Hadoop MapReduce – Parallel data processing2 YARN – Cluster resource management3 HDFS – Distributed data storage1
  • 20. Hadoop Ecosystem Data Collection and ingestion Work Flow Pig (Scripting) Hive (SQL Query) Interactive Analysis Machine Learning Streaming Read/write access to data Hadoop Distributed Files System Cluster Resource Management Data Processing Management and Monitoring
  • 21. Hadoop Features Scalable Fault tolerantFlexible Distributed storage Cost effective Robust ecosystem
  • 22. Robust ecosystem Hadoop Features Scalable Fault tolerantFlexible Distributed storage Cost effective Hadoop in flexible in storing any type of data, be it structured, semi structured or unstructured data
  • 23. Robust ecosystem Distributed storage Hadoop Features Scalable Fault tolerantFlexible Cost effective As the volume of data grows, new node machines can be easily added and scaled to the Hadoop cluster
  • 24. Robust ecosystem Distributed storage Hadoop Features Scalable Fault tolerantFlexible Cost effective Data stored on HDFS gets replicated automatically on to different data nodes. This brings high fault tolerance when a data node crashes
  • 25. Robust ecosystem Distributed storage Hadoop Features Scalable Fault tolerantFlexible Cost effective Hadoop supports distributed data storage and hence allows faster processing of data
  • 26. Distributed storage Hadoop Features Scalable Fault tolerantFlexible Cost effective Hadoop has a robust ecosystem that suits the analytical needs of small and big organizations. These include spark, pig, hive, mahout, etc. Robust ecosystem
  • 27. Robust ecosystem Distributed storage Hadoop Features Scalable Fault tolerantFlexible Cost effective Hadoop stores and processes data on a cluster of commodity hardware, resulting in a substantial reduction of cost per terabyte of storage
  • 28. Hadoop use case Before 2008 economic recession, every bank maintained a legacy data warehouse Data warehouse Home mortgage details, credit card transactions and other financial details of every customer was restricted to local database systems
  • 29. Hadoop use case Before 2008 economic recession, every bank maintained a legacy data warehouse Data warehouse Home mortgage details, credit card transactions and other financial details of every customer was restricted to local database systems Banks could not store and process data efficiently Failed to build a comprehensive risk portfolio for their customers
  • 30. Hadoop use case After 2008 economic recession, most of the financial institutions and national monetary associations started maintaining a single Hadoop Cluster containing more than petabytes of financial data Hadoop cluster
  • 31. Hadoop use case After 2008 economic recession, most of the financial institutions and national monetary associations started maintaining a single Hadoop Cluster containing more than petabytes of financial data Hadoop cluster
  • 32. Hadoop use case After 2008 economic recession, most of the financial institutions and national monetary associations started maintaining a single Hadoop Cluster containing more than petabytes of financial data Hadoop cluster Along with transaction data, it could also store call records, email, chat and web logs
  • 33. Hadoop use case After 2008 economic recession, most of the financial institutions and national monetary associations started maintaining a single Hadoop Cluster containing more than petabytes of financial data Hadoop cluster Along with transaction data, it could also store call records, email, chat and web logs Data is analyzed to perform sentiment analysis, text processing, pattern matching
  • 34. Hadoop use case Banking and financial giant with services in more than 100 nations With over 150 petabytes of data, 30,000 databases, 3.5 billion log records, data is the oil for JP Morgan
  • 35. Hadoop use case Banking and financial giant with services in more than 100 nations With over 150 petabytes of data, 30,000 databases, 3.5 billion log records, data is the oil for JP Morgan Storing vast volumes of unstructured data allows the company to collect web logs, transaction data, social media data, etc.
  • 36. Hadoop use case Banking and financial giant with services in more than 100 nations With over 150 petabytes of data, 30,000 databases, 3.5 billion log records, data is the oil for JP Morgan Storing vast volumes of unstructured data allows the company to collect web logs, transaction data, social media data, etc. Uses Hadoop framework for risk management and detecting fraud transactions
  • 37. What is HDFS? Hadoop Distributed File System (HDFS) is the storage layer of Hadoop that stores data in multiple data nodes
  • 38. What is HDFS? Hadoop Distributed File System (HDFS) is the storage layer of Hadoop that stores data in multiple data nodes Datanode2 Datanode3Datanode1 Datanode4 Big Data
  • 39. What is HDFS? In HDFS, data gets divided into multiple blocks and the blocks are stored on multiple nodes
  • 40. What is HDFS? Each block of data is stored on multiple data nodes and by default has 128 MB of data 300 MB Data 128 MB 128 MB 44 MB In HDFS, data gets divided into multiple blocks and the blocks are stored on multiple nodes Datanode Datanode Datanode
  • 41. HDFS Architecture Secondary Namenode Namenode Master Namenode is the master server Secondary Namenode is the backup server Namenode and Secondary Namenode are the master daemons
  • 42. HDFS Architecture Secondary Namenode Namenode Master Metadata in Disk Edit log Fsimage Metadata in RAM Metadata (Name, replicas,….): /home/foo/data, 3, … Namenode holds metadata information about the various Datanodes, their location, the size of each block, etc.
  • 43. HDFS Architecture Secondary Namenode Namenode Master Metadata in Disk Edit log Fsimage Metadata in RAM Metadata (Name, replicas,….): /home/foo/data, 3, … File.txt Helps to execute file system namespace operations – opening, closing, renaming files and directories
  • 44. HDFS Architecture Secondary Namenode Namenode Master Metadata in Disk Edit log Fsimage Metadata in RAM Metadata (Name, replicas,….): /home/foo/data, 3, … File.txt Maintains Metadata in Disk Edit log Fsimage Secondary Namenode server is responsible for maintaining a copy of Metadata in disk
  • 45. HDFS Architecture Secondary Namenode Namenode Master Datanode 1 B1 B2 Datanode 2 B1 B3 Datanode 3 B2 B3 Datanode N ………. Datanodes are the slave daemons that store and maintain the data blocks Slave
  • 46. HDFS Architecture Datanode 1 B1 B2 Datanode 2 B1 B3 Datanode 3 B2 B3 Datanode 4 Datanode 5 B4 B2 B4B3 Client Metadata ops Metadata (Name, replicas, ….): /home/foo/data, 3, … Read Replication Namenode Datanode reads and writes client’s request and performs block creation, deletion and replication on instruction from the Namenode
  • 47. HDFS Architecture Datanode 1 B1 B2 Datanode 2 B1 B3 Datanode 3 B2 B3 Datanode 4 Datanode 5 B4 B2 B4B3 Client Metadata ops Metadata (Name, replicas, ….): /home/foo/data, 3, … Block ops Read Replication Namenode Datanode reads and writes client’s request and performs block creation, deletion and replication on instruction from the Namenode
  • 48. HDFS Architecture Datanode 1 B1 B2 Datanode 2 B1 B3 Datanode 3 B2 B3 Datanode 4 Datanode 5 B4 B2 B4B3 Client Metadata ops Metadata (Name, replicas, ….): /home/foo/data, 3, … Block ops Read Replication Response from the Namenode that the operation was successful Namenode Datanode reads and writes client’s request and performs block creation, deletion and replication on instruction from the Namenode
  • 49. HDFS Architecture Datanode 1 B1 B2 Datanode 2 B1 B3 Datanode 3 B2 B3 Datanode 4 Datanode 5 B4 B2 B4B3 Client Metadata ops Metadata (Name, replicas, ….): /home/foo/data, 3, … Client Block ops Read Write Replication Response from the Namenode that the operation was successful Namenode Datanode reads and writes client’s request and performs block creation, deletion and replication on instruction from the Namenode
  • 50. HDFS Write MasterClient Where can I write & store my data? 128 MB 128 MB 44 MB 300 MB split Data is split into multiple blocks on 128 MB each
  • 51. HDFS Write MasterClient Datanodes Where can I write & store my data? Finds the datanodes available 128 MB 128 MB 44 MB 300 MB split
  • 52. HDFS Write MasterClient Datanodes Where can I write & store my data? Finds the datanodes available 128 MB 128 MB 44 MB 300 MB Write the 1st block of data to A3, B2, B4 split 128 MB 128 MB 128 MB A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Rack 1 Rack 2 Rack 3 Data block is replicated thrice on different datanotes
  • 53. HDFS Write MasterClient Datanodes Where can I write & store my data? Finds the datanodes available 128 MB 128 MB 44 MB 300 MB split 128 MB 128 MB 128 MB A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Rack 1 Rack 2 Rack 3 Similarly, the other 2 blocks are written on to different datanodes 128 MB 128 MB 128 MB 44 MB44 MB 44 MB
  • 54. HDFS Read MasterClient I want to read my file 128 MB 128 MB 44 MB 128 MB 128 MB 128 MB A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Rack 1 Rack 2 Rack 3 128 MB 128 MB 128 MB 44 MB44 MB 44 MB
  • 55. HDFS Read MasterClient Datanodes I want to read my file Finds the datanodes to read from 128 MB 128 MB 44 MB 128 MB 128 MB 128 MB A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Rack 1 Rack 2 Rack 3 128 MB 128 MB 128 MB 44 MB44 MB 44 MB
  • 56. HDFS Read MasterClient Datanodes I want to read my file Finds the datanodes to read from 128 MB 128 MB 44 MB Read data from A2, A3, B1 128 MB 128 MB 128 MB A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Rack 1 Rack 2 Rack 3 128 MB 128 MB 128 MB 44 MB44 MB 44 MB
  • 57. Importing data to HDFS Relational databases RDBMS Data warehouse SQOOP is used to import data from relational databases on to HDFS
  • 58. Importing data to HDFS Relational databases Streaming data RDBMS Data warehouse Sensor Web server FLUME is used to import streaming data from sensors and web servers on to HDFS
  • 59. What is MapReduce? MapReduce is a programming model to process large datasets parallelly on different nodes
  • 60. What is MapReduce? MapReduce is a programming model to process large datasets parallelly on different nodes Data is processed simultaneously on different slave nodes Slave node 1 Slave node 2 Slave node 3 Slave node 4 Master node
  • 63. MapReduce Workflow Map Tasks Shuffle and sort Input Map( )
  • 64. MapReduce Workflow Map Tasks Shuffle and sort Reduce Tasks Input Map( ) Reduce( )
  • 65. MapReduce Workflow Map Tasks Shuffle and sort Reduce Tasks OutputInput Map( ) Reduce( )
  • 66. MapReduce Example Input Square Red Triangle Blue Circle Green Square Green Triangle White Cube Blue Cube Yellow Circle Red Cube Blue Hexagon Green Square Blue Cube Yellow
  • 67. MapReduce Example Square Red Triangle Blue Circle Green Square Green Triangle White Cube Blue Cube Yellow Circle Red Cube Blue Hexagon Green Square Blue Cube Yellow Square Red Triangle Blue Circle Green Square Green Triangle White Cube Blue Cube Yellow Circle Red Cube Blue Hexagon Green Square Blue Cube Yellow Map Function Split step
  • 68. MapReduce Example Square Red Triangle Blue Circle Green Square Green Triangle White Cube Blue Cube Yellow Circle Red Cube Blue Hexagon Green Square Blue Cube Yellow Square = 1 Red = 1 Triangle = 1 Blue = 1 Circle = 1 Green = 1 Square = 1 Green = 1 Triangle = 1 White = 1 Cube = 1 Blue = 1 Cube = 1 Yellow = 1 Circle = 1 Red = 1 Cube = 1 Blue = 1 Hexagon = 1 Green = 1 Square = 1 Blue = 1 Cube = 1 Yellow = 1 Map step
  • 69. MapReduce Example Square = 1 Red = 1 Triangle = 1 Blue = 1 Circle = 1 Green = 1 Square = 1 Green = 1 Triangle = 1 White = 1 Cube = 1 Blue = 1 Cube = 1 Yellow = 1 Circle = 1 Red = 1 Cube = 1 Blue = 1 Hexagon = 1 Green = 1 Square = 1 Blue = 1 Cube = 1 Yellow = 1 Merge step Square = {1,1} Red = {1} Triangle = {1,1} Blue = {1,1} Circle = {1} Green = {1,1} White = {1} Cube = {1} Cube = {1,1,1} Yellow = {1,1} Circle = {1} Red = {1} Blue = {1,1} Hexagon = {1} Green = {1} Square = {1} Merge step Square = {1,1,1} Red = {1,1} Triangle = {1,1} Blue = {1,1,1,1} Circle = {1,1} Green = {1,1,1} White = {1} Cube = {1,1,1,1} Yellow = {1,1} Hexagon = {1}
  • 70. MapReduce Example Square = {1,1,1} Red = {1,1} Triangle = {1,1} Blue = {1,1,1,1} Circle = {1,1} Green = {1,1,1} White = {1} Cube = {1,1,1,1} Yellow = {1,1} Hexagon = {1} Blue = {1,1,1,1} Circle = {1,1} Cube = {1,1,1,1} Green = {1,1,1} Hexagon = {1} Red = {1,1} Square = {1,1,1} Triangle = {1,1} White = {1} Yellow = {1,1} Shuffle and sort step Reduce step Blue = 4 Circle = 2 Cube = 4 Green = 3 Hexagon = 1 Red = 2 Square = 3 Triangle = 2 White = 1 Yellow = 2
  • 71. YARN – Yet Another Resource Negotiator YARN was introduced in Hadoop 2.0 to solve the issues in Hadoop 1.0 (MR 1) such as scalability, availability of nodes, resource utilization, etc.
  • 72. YARN – Yet Another Resource Negotiator YARN was introduced in Hadoop 2.0 to solve the issues in Hadoop 1.0 (MR 1) such as scalability, availability of nodes, resource utilization, etc. YARN is the cluster resource management layer of Hadoop that schedules jobs and assigns resources to running applications
  • 73. YARN – Yet Another Resource Negotiator YARN was introduced in Hadoop 2.0 to solve the issues in Hadoop 1.0 (MR 1) such as scalability, availability of nodes, resource utilization, etc. YARN is the cluster resource management layer of Hadoop that schedules jobs and assigns resources to running applications MapReduce application Memory CPU RAM
  • 74. YARN – Yet Another Resource Negotiator Client
  • 75. YARN – Yet Another Resource Negotiator Resource ManagerClient Job Submission Submit job request Client submits an application to the ResourceManager
  • 76. YARN – Yet Another Resource Negotiator Resource ManagerClient Node Manager container App Master App Master container Node Manager Node Manager container container Job Submission Submit job request
  • 77. YARN – Yet Another Resource Negotiator Resource ManagerClient Node Manager container App Master App Master container Node Manager Node Manager container container Job Submission Node Status Submit job request NodeManager sends its status to ResourceManager
  • 78. YARN – Yet Another Resource Negotiator Resource ManagerClient Node Manager container App Master App Master container Node Manager Node Manager container container Job Submission Node Status MapReduce Status Submit job request ApplicationMaster contacts the related NodeManager
  • 79. YARN – Yet Another Resource Negotiator Resource ManagerClient Node Manager container App Master App Master container Node Manager Node Manager container container Job Submission Node Status MapReduce Status Resource Request Submit job request Container executes the ApplicationMaster
  • 80. Bank case study VIRTUAL BANK You own a Virtual Bank that is generating a lot of customer transaction data and uses RDBMS to store the data RDBMS
  • 81. Bank case study VIRTUAL BANK But, the bank’s data is rapidly increasing and RDBMS has become inefficient in handling such large volumes of data RDBMS
  • 82. Bank case study VIRTUAL BANK You need a solution to move your bank data from traditional RDBMS to more flexible and scalable data storage
  • 83. Bank case study VIRTUAL BANK What if I use Hadoop Distributed File System (HDFS) to store the data?
  • 84. Bank case study VIRTUAL BANK HDFS can easily store large volumes of data. So, let me use Sqoop to move all the bank’s data from RDBMS onto HDFS RDBMS
  • 85. Bank case study VIRTUAL BANK This will also allow us to analyze customer data using Sqoop commands. Now, let’s see how to do this RDBMS

Hinweis der Redaktion

  1. Style - 01
  2. Style - 01
  3. Style - 01
  4. Style - 01
  5. Style - 01
  6. Style - 01
  7. Style - 01
  8. Style - 01
  9. Style - 01
  10. Style - 01
  11. Style - 01
  12. Style - 01
  13. Style - 01
  14. Style - 01
  15. Style - 01
  16. Style - 01
  17. Style - 01
  18. Style - 01
  19. Style - 01
  20. Style - 01
  21. Style - 01
  22. Style - 01
  23. Style - 01
  24. Style - 01
  25. Style - 01
  26. Style - 01
  27. Style - 01
  28. Style - 01
  29. Style - 01
  30. Style - 01
  31. Style - 01
  32. Style - 01
  33. Style - 01
  34. Style - 01
  35. Style - 01
  36. Style - 01
  37. Style - 01
  38. Style - 01
  39. Style - 01
  40. Style - 01
  41. Style - 01
  42. Style - 01
  43. Style - 01
  44. Style - 01
  45. Style - 01
  46. Style - 01
  47. Style - 01
  48. Style - 01
  49. Style - 01
  50. Style - 01
  51. Style - 01
  52. Style - 01
  53. Style - 01
  54. Style - 01
  55. Style - 01
  56. Style - 01
  57. Style - 01
  58. Style - 01
  59. Style - 01
  60. Style - 01
  61. Style - 01
  62. Style - 01
  63. Style - 01
  64. Style - 01
  65. Style - 01
  66. Style - 01
  67. Style - 01
  68. Style - 01
  69. Style - 01
  70. Style - 01
  71. Style - 01
  72. Style - 01
  73. Style - 01
  74. Style - 01
  75. Style - 01
  76. Style - 01
  77. Style - 01
  78. Style - 01
  79. Style - 01
  80. Style - 01
  81. Style - 01
  82. Style - 01
  83. Style - 01
  84. Style - 01
  85. Style - 01
  86. Style - 01