SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
www.edureka.co/big-data-and-hadoop
Hadoop : Bulk loading with Mapreduce
View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop
For more details please contact us:
US : 1800 275 9730 (toll free)
INDIA : +91 88808 62004
Email Us : sales@edureka.co
For Queries:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
Slide 2 www.edureka.co/big-data-and-hadoop
Objectives
Analyze different use-cases where MapReduce is used
Differentiate between Traditional way and MapReduce way
Learn about Hadoop 2.x MapReduce architecture and components
Understand execution flow of YARN MapReduce application
Implement basic MapReduce concepts
Run a MapReduce Program
At the end of this module, you will be able to
Slide 3 www.edureka.co/big-data-and-hadoop
Where MapReduce is Used?
Weather Forecasting
HealthCare
 Problem Statement:
» De-identify personal health information.
 Problem Statement:
» Finding Maximum temperature recorded in a year.
Slide 4 www.edureka.co/big-data-and-hadoop
Where MapReduce is Used?
MapReduce
FeaturesLarge Scale
Distributed Model
Used in
Function
Design Pattern
Parallel
Programming
A Program Model
Classification
Analytics
Recommendation
Index and Search
Map
Reduce
Classification
Eg: Top N records
Analytics
Eg: Join, Selection
Recommendation
Eg: Sort
Summarization
Eg: Inverted Index
Implemented
Google
Apache Hadoop
HDFS
Pig
Hive
HBase
For
Slide 5 www.edureka.co/big-data-and-hadoop
MapReduce Paradigm
The Overall MapReduce Word Count Process
Input Splitting Mapping Shuffling Reducing Final Result
List(K3,V3)
Deer Bear River
Dear Bear River
Car Car River
Deer Car Bear
Bear, 2
Car, 3
Deer, 2
River, 2
Deer, 1
Bear, 1
River, 1
Car, 1
Car, 1
River, 1
Deer, 1
Car, 1
Bear, 1
K2,List(V2)List(K2,V2)
K1,V1
Car Car River
Deer Car Bear
Bear, 2
Car, 3
Deer, 2
River, 2
Bear, (1,1)
Car, (1,1,1)
Deer, (1,1)
River, (1,1)
Slide 6 www.edureka.co/big-data-and-hadoop
MapReduce Application Execution
Executing MapReduce Application on YARN
Slide 7 www.edureka.co/big-data-and-hadoop
YARN MR Application Execution Flow
MapReduce Job Execution
» Job Submission
» Job Initialization
» Tasks Assignment
» Memory Assignment
» Status Updates
» Failure Recovery
Slide 8 www.edureka.co/big-data-and-hadoop
YARN MR Application Execution Flow
11.Task get Executed.
12.If any reducer in a Job Reducer, again AppMaster Request the Node Manager to start the and Allocate
Container
13.Output of All the Maps given to reducer and Reducer get executed
14.Once Job finished, Application Master notify the Resource Manager and Client Library
15.Application Master closed.
Slide 9 www.edureka.co/big-data-and-hadoop
Hadoop 2.x : YARN Workflow
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Node Manager
Container 1.2
Container 1.1
Container 2.1
Container 2.2
Container 2.3
App
Master 2
App
Master 1
Scheduler
Applications
Manager (AsM)
Resource
Manager
Slide 10 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application Client RM NM AM
1
Slide 11 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
Client RM NM AM
1
2
Slide 12 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
Client RM NM AM
1
2
3
Slide 13 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
Client RM NM AM
1
2
3
4
Slide 14 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
Client RM NM AM
1
2
3
4
5
Slide 15 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
Client RM NM AM
1
2
3
4
5
6
Slide 16 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
7. Client contacts RM/AM to monitor application’s status
Client RM NM AM
1
2
3
4
5
7 6
Slide 17 www.edureka.co/big-data-and-hadoop
Summary: Application Workflow
Execution Sequence :
1. Client submits an application
2. RM allocates a container to start AM
3. AM registers with RM
4. AM asks containers from RM
5. AM notifies NM to launch containers
6. Application code is executed in container
7. Client contacts RM/AM to monitor application’s status
8. AM unregisters with RM
Client RM NM AM
1
2
3
4
5
7
8
6
Slide 18 www.edureka.co/big-data-and-hadoop
Input Splits
INPUT DATA
Physical
Division
Logical
Division
HDFS
Blocks
Input
Splits
Slide 19 www.edureka.co/big-data-and-hadoop
Relation Between Input Splits and HDFS Blocks
1 2 3 4 5 6 7 8 9 10 11
 Logical records do not fit neatly into the HDFS blocks.
 Logical records are lines that cross the boundary of the blocks.
 First split contains line 5 although it spans across blocks.
File
Lines
Block
Boundary
Block
Boundary
Block
Boundary
Block
Boundary
Split Split Split
Slide 20 www.edureka.co/big-data-and-hadoop
MapReduce Job Submission Flow
Input data is distributed to nodes
Node 1 Node 2
INPUT DATA
Slide 21 www.edureka.co/big-data-and-hadoop
MapReduce Job Submission Flow
Input data is distributed to nodes
Each map task works on a “split” of data
Map
Node 1
Map
Node 2
INPUT DATA
Slide 22 www.edureka.co/big-data-and-hadoop
MapReduce Job Submission Flow
Input data is distributed to nodes
Each map task works on a “split” of data
Mapper outputs intermediate data
Map
Node 1
Map
Node 2
INPUT DATA
Slide 23 www.edureka.co/big-data-and-hadoop
MapReduce Job Submission Flow
Input data is distributed to nodes
Each map task works on a “split” of data
Mapper outputs intermediate data
Data exchange between nodes in a “shuffle” process
Map
Node 1
Map
Node 2
Node 1 Node 2
INPUT DATA
Slide 24 www.edureka.co/big-data-and-hadoop
MapReduce Job Submission Flow
Input data is distributed to nodes
Each map task works on a “split” of data
Mapper outputs intermediate data
Data exchange between nodes in a “shuffle” process
Intermediate data of the same key goes to the same reducer
Map
Node 1
Map
Node 2
Reduce
Node 1
Reduce
Node 2
INPUT DATA
Slide 25 www.edureka.co/big-data-and-hadoop
MapReduce Job Submission Flow
Input data is distributed to nodes
Each map task works on a “split” of data
Mapper outputs intermediate data
Data exchange between nodes in a “shuffle” process
Intermediate data of the same key goes to the same reducer
Reducer output is stored
Map
Node 1
Map
Node 2
Reduce
Node 1
Reduce
Node 2
INPUT DATA
Slide 26 www.edureka.co/big-data-and-hadoop
Getting Data to the Mapper
Input File Input File
Input split Input split Input split Input split
RecordReader RecordReader RecordReader RecordReader
Mapper Mapper Mapper Mapper
(intermediates) (intermediates) (intermediates) (intermediates)
Slide 27 www.edureka.co/big-data-and-hadoop
Partition and Shuffle
Mapper Mapper Mapper Mapper
(intermediates) (intermediates) (intermediates) (intermediates)
Partitioner Partitioner Partitioner Partitioner
(intermediates) (intermediates) (intermediates)
Reducer Reducer Reducer
Slide 28 www.edureka.co/big-data-and-hadoop
Input file
Input Split Input Split Input Split
Record
Reader
Record
Reader
Record
Reader
Mapper Mapper Mapper
(Intermediates) (Intermediates) (Intermediates)
InputFormat
Input Split
Record
Reader
Mapper
Input file
(Intermediates)
Input Format
Slide 29 www.edureka.co/big-data-and-hadoop
Combine File
Input Format<K,V>
Text Input Format
Key Value Text
Input Format
Nline Input Format
Sequence File
Input Format<K,V>
File Input Format
<K,V>
Input Format<K,V>
org.apache.hadoop.mapreduce
<<interface>>
Composable
Input Format
<K,V>
Composite Input Format
<K,V>
DB Input
Format<T>
Sequence File As
Binary Input Format
Sequence File As
Text Input Format
Sequence File Input
Filter<K,V>
Input Format – Class Hierarchy
Slide 30 www.edureka.co/big-data-and-hadoop
What is Bulk Load
 Process or method provided by dbmses to load multiple rows of data into a database table.
 Way to load data (typically into a database) in 'large chunks‘
 Loads hundreds/thousands/millions of records in a short period of time.
Slide 31 www.edureka.co/big-data-and-hadoop
Demo
Demo: Bulk Load with MR
Bulk Loading Into HBase With MapReduce

Weitere ähnliche Inhalte

Was ist angesagt?

Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Edureka!
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big DataEdureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!Edureka!
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopEdureka!
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And HadoopEdureka!
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?Edureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Edureka!
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend Edureka!
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopLeons Petražickis
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 

Was ist angesagt? (20)

Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015Webinar: Ways to Succeed with Hadoop in 2015
Webinar: Ways to Succeed with Hadoop in 2015
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 

Andere mochten auch

Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to HadoopEdureka!
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksEdureka!
 
Cloud Computing with AWS
Cloud Computing with AWSCloud Computing with AWS
Cloud Computing with AWSEdureka!
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 

Andere mochten auch (9)

Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
 
Cloud Computing with AWS
Cloud Computing with AWSCloud Computing with AWS
Cloud Computing with AWS
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 

Ähnlich wie Bulk Loading Into HBase With MapReduce

XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaEdureka!
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Apache Storm
Apache StormApache Storm
Apache StormEdureka!
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data OverviewPrabhu Thukkaram
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingNandan Kumar
 
HARD COPY REPORT CDAC
HARD COPY REPORT CDACHARD COPY REPORT CDAC
HARD COPY REPORT CDACSarthak Dubey
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...Wolfgang Gentzsch
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to ProductThe UberCloud
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Guidelhhghghine document final
Guidelhhghghine document finalGuidelhhghghine document final
Guidelhhghghine document finalnanirao686
 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMRABC Talks
 
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop TechnologyIRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop TechnologyIRJET Journal
 

Ähnlich wie Bulk Loading Into HBase With MapReduce (20)

XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
 
UDP Report
UDP ReportUDP Report
UDP Report
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
BdT 3.1-3.6
BdT 3.1-3.6BdT 3.1-3.6
BdT 3.1-3.6
 
Hadoop and Big Data Overview
Hadoop and Big Data OverviewHadoop and Big Data Overview
Hadoop and Big Data Overview
 
E031201032036
E031201032036E031201032036
E031201032036
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch training
 
HARD COPY REPORT CDAC
HARD COPY REPORT CDACHARD COPY REPORT CDAC
HARD COPY REPORT CDAC
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
The UberCloud - From Project to Product - From HPC Experiment to HPC Marketpl...
 
UberCloud - From Project to Product
UberCloud - From Project to ProductUberCloud - From Project to Product
UberCloud - From Project to Product
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Guidelhhghghine document final
Guidelhhghghine document finalGuidelhhghghine document final
Guidelhhghghine document final
 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMR
 
IRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop TechnologyIRJET - Weather Log Analysis based on Hadoop Technology
IRJET - Weather Log Analysis based on Hadoop Technology
 

Mehr von Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Mehr von Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Kürzlich hochgeladen

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Kürzlich hochgeladen (20)

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Bulk Loading Into HBase With MapReduce

  • 1. www.edureka.co/big-data-and-hadoop Hadoop : Bulk loading with Mapreduce View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : sales@edureka.co For Queries: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN
  • 2. Slide 2 www.edureka.co/big-data-and-hadoop Objectives Analyze different use-cases where MapReduce is used Differentiate between Traditional way and MapReduce way Learn about Hadoop 2.x MapReduce architecture and components Understand execution flow of YARN MapReduce application Implement basic MapReduce concepts Run a MapReduce Program At the end of this module, you will be able to
  • 3. Slide 3 www.edureka.co/big-data-and-hadoop Where MapReduce is Used? Weather Forecasting HealthCare  Problem Statement: » De-identify personal health information.  Problem Statement: » Finding Maximum temperature recorded in a year.
  • 4. Slide 4 www.edureka.co/big-data-and-hadoop Where MapReduce is Used? MapReduce FeaturesLarge Scale Distributed Model Used in Function Design Pattern Parallel Programming A Program Model Classification Analytics Recommendation Index and Search Map Reduce Classification Eg: Top N records Analytics Eg: Join, Selection Recommendation Eg: Sort Summarization Eg: Inverted Index Implemented Google Apache Hadoop HDFS Pig Hive HBase For
  • 5. Slide 5 www.edureka.co/big-data-and-hadoop MapReduce Paradigm The Overall MapReduce Word Count Process Input Splitting Mapping Shuffling Reducing Final Result List(K3,V3) Deer Bear River Dear Bear River Car Car River Deer Car Bear Bear, 2 Car, 3 Deer, 2 River, 2 Deer, 1 Bear, 1 River, 1 Car, 1 Car, 1 River, 1 Deer, 1 Car, 1 Bear, 1 K2,List(V2)List(K2,V2) K1,V1 Car Car River Deer Car Bear Bear, 2 Car, 3 Deer, 2 River, 2 Bear, (1,1) Car, (1,1,1) Deer, (1,1) River, (1,1)
  • 6. Slide 6 www.edureka.co/big-data-and-hadoop MapReduce Application Execution Executing MapReduce Application on YARN
  • 7. Slide 7 www.edureka.co/big-data-and-hadoop YARN MR Application Execution Flow MapReduce Job Execution » Job Submission » Job Initialization » Tasks Assignment » Memory Assignment » Status Updates » Failure Recovery
  • 8. Slide 8 www.edureka.co/big-data-and-hadoop YARN MR Application Execution Flow 11.Task get Executed. 12.If any reducer in a Job Reducer, again AppMaster Request the Node Manager to start the and Allocate Container 13.Output of All the Maps given to reducer and Reducer get executed 14.Once Job finished, Application Master notify the Resource Manager and Client Library 15.Application Master closed.
  • 9. Slide 9 www.edureka.co/big-data-and-hadoop Hadoop 2.x : YARN Workflow Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Container 1.2 Container 1.1 Container 2.1 Container 2.2 Container 2.3 App Master 2 App Master 1 Scheduler Applications Manager (AsM) Resource Manager
  • 10. Slide 10 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application Client RM NM AM 1
  • 11. Slide 11 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM Client RM NM AM 1 2
  • 12. Slide 12 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM Client RM NM AM 1 2 3
  • 13. Slide 13 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM Client RM NM AM 1 2 3 4
  • 14. Slide 14 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers Client RM NM AM 1 2 3 4 5
  • 15. Slide 15 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers 6. Application code is executed in container Client RM NM AM 1 2 3 4 5 6
  • 16. Slide 16 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers 6. Application code is executed in container 7. Client contacts RM/AM to monitor application’s status Client RM NM AM 1 2 3 4 5 7 6
  • 17. Slide 17 www.edureka.co/big-data-and-hadoop Summary: Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers 6. Application code is executed in container 7. Client contacts RM/AM to monitor application’s status 8. AM unregisters with RM Client RM NM AM 1 2 3 4 5 7 8 6
  • 18. Slide 18 www.edureka.co/big-data-and-hadoop Input Splits INPUT DATA Physical Division Logical Division HDFS Blocks Input Splits
  • 19. Slide 19 www.edureka.co/big-data-and-hadoop Relation Between Input Splits and HDFS Blocks 1 2 3 4 5 6 7 8 9 10 11  Logical records do not fit neatly into the HDFS blocks.  Logical records are lines that cross the boundary of the blocks.  First split contains line 5 although it spans across blocks. File Lines Block Boundary Block Boundary Block Boundary Block Boundary Split Split Split
  • 20. Slide 20 www.edureka.co/big-data-and-hadoop MapReduce Job Submission Flow Input data is distributed to nodes Node 1 Node 2 INPUT DATA
  • 21. Slide 21 www.edureka.co/big-data-and-hadoop MapReduce Job Submission Flow Input data is distributed to nodes Each map task works on a “split” of data Map Node 1 Map Node 2 INPUT DATA
  • 22. Slide 22 www.edureka.co/big-data-and-hadoop MapReduce Job Submission Flow Input data is distributed to nodes Each map task works on a “split” of data Mapper outputs intermediate data Map Node 1 Map Node 2 INPUT DATA
  • 23. Slide 23 www.edureka.co/big-data-and-hadoop MapReduce Job Submission Flow Input data is distributed to nodes Each map task works on a “split” of data Mapper outputs intermediate data Data exchange between nodes in a “shuffle” process Map Node 1 Map Node 2 Node 1 Node 2 INPUT DATA
  • 24. Slide 24 www.edureka.co/big-data-and-hadoop MapReduce Job Submission Flow Input data is distributed to nodes Each map task works on a “split” of data Mapper outputs intermediate data Data exchange between nodes in a “shuffle” process Intermediate data of the same key goes to the same reducer Map Node 1 Map Node 2 Reduce Node 1 Reduce Node 2 INPUT DATA
  • 25. Slide 25 www.edureka.co/big-data-and-hadoop MapReduce Job Submission Flow Input data is distributed to nodes Each map task works on a “split” of data Mapper outputs intermediate data Data exchange between nodes in a “shuffle” process Intermediate data of the same key goes to the same reducer Reducer output is stored Map Node 1 Map Node 2 Reduce Node 1 Reduce Node 2 INPUT DATA
  • 26. Slide 26 www.edureka.co/big-data-and-hadoop Getting Data to the Mapper Input File Input File Input split Input split Input split Input split RecordReader RecordReader RecordReader RecordReader Mapper Mapper Mapper Mapper (intermediates) (intermediates) (intermediates) (intermediates)
  • 27. Slide 27 www.edureka.co/big-data-and-hadoop Partition and Shuffle Mapper Mapper Mapper Mapper (intermediates) (intermediates) (intermediates) (intermediates) Partitioner Partitioner Partitioner Partitioner (intermediates) (intermediates) (intermediates) Reducer Reducer Reducer
  • 28. Slide 28 www.edureka.co/big-data-and-hadoop Input file Input Split Input Split Input Split Record Reader Record Reader Record Reader Mapper Mapper Mapper (Intermediates) (Intermediates) (Intermediates) InputFormat Input Split Record Reader Mapper Input file (Intermediates) Input Format
  • 29. Slide 29 www.edureka.co/big-data-and-hadoop Combine File Input Format<K,V> Text Input Format Key Value Text Input Format Nline Input Format Sequence File Input Format<K,V> File Input Format <K,V> Input Format<K,V> org.apache.hadoop.mapreduce <<interface>> Composable Input Format <K,V> Composite Input Format <K,V> DB Input Format<T> Sequence File As Binary Input Format Sequence File As Text Input Format Sequence File Input Filter<K,V> Input Format – Class Hierarchy
  • 30. Slide 30 www.edureka.co/big-data-and-hadoop What is Bulk Load  Process or method provided by dbmses to load multiple rows of data into a database table.  Way to load data (typically into a database) in 'large chunks‘  Loads hundreds/thousands/millions of records in a short period of time.