SlideShare ist ein Scribd-Unternehmen logo
1 von 13
MapReduce
Introduction to MapReduce
• MapReduce is a programming model and an
associated implementation for processing
and generating large data sets with a
parallel, distributed algorithm on a cluster.
• Map() procedure that performs filtering and
sorting.
• Reduce() procedure that performs a
summary operation (such as statistical
operations)
What is MapReduce in Hadoop?
• Hadoop is a free, Java-based programming framework that supports
the processing of large data sets in a distributed computing
environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
• MapReduce is defined as the framework of Hadoop, which is used
to process a huge amount of data parallelly on large clusters of
commodity hardware in a reliable manner. It allows the application
to store the data in the distributed form and process large dataset
across groups of computers using simple programming models
Why use MapReduce?
MapReduce Architecture
How MapReduce in Hadoop works?
Steps of MapReduce in Hadoop
• Map stage − The map or mapper’s job is to process the input
data. Generally, the input data is in the form of file or directory
and is stored in the Hadoop file system (HDFS). The input file is
passed to the mapper function line by line. The mapper
processes the data and creates several small chunks of data.
• Reduce stage − This stage is the combination of
the Shuffle stage and the Reduce stage. The Reducer’s job is to
process the data that comes from the mapper. After processing,
it produces a new set of output, which will be stored in the
HDFS.
Algorithm Behind MapReduce
MapReduce Jobs
Application Master
• Responsible for the
execution of a single
application or MapReduce
job
• Divides job requests into
tasks and assigns them to
Node-Managers running on
the slave node
Node-Manager
• Has many dynamic
resource containers
which executes each
active map or reduce
task
• Communicates regularly
with the Application
Master
MapReduce jobs workflow
Advantage of MapReduce
• Fault tolerance: It can handle failures without downtime.
Speed: It splits, shuffles, and reduces the unstructured data in a short
time.
• Cost-effective: Hadoop MapReduce has a scale-out feature that enables
users to process or store the data in a cost-effective manner.
• Scalability: It provides a highly scalable framework. MapReduce allows
users to run applications from many nodes.
• Parallel processing: As Hadoop storage data in the distributed file system
and the MapReduce program’s working, it divides tasks task map and
reduce and that could execute in parallel. And again, because of the
parallel execution, it reduces the entire run time.
Limitations Of MapReduce
• MapReduce cannot cache the intermediate data in memory
for a further requirement which diminishes the performance
of Hadoop.
• It is only suitable for Batch Processing of a Huge amounts
of Data, and it is not flexible, means the MapReduce framework
is rigid.
• A lot of manual coding is required, even for common operations
such as join, filter, projection, aggregates, sorting, distinct...
• Semantics are hidden inside the map and reduce functions, so it
is difficult to maintain, extend and optimize them.
Mapreduce Hadop.pptx

Weitere ähnliche Inhalte

Ähnlich wie Mapreduce Hadop.pptx

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 

Ähnlich wie Mapreduce Hadop.pptx (20)

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Anju
AnjuAnju
Anju
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Analytics 3
Analytics 3Analytics 3
Analytics 3
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
hadoop.pptx
hadoop.pptxhadoop.pptx
hadoop.pptx
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 

Kürzlich hochgeladen

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Kürzlich hochgeladen (20)

A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 

Mapreduce Hadop.pptx

  • 2. Introduction to MapReduce • MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. • Map() procedure that performs filtering and sorting. • Reduce() procedure that performs a summary operation (such as statistical operations)
  • 3. What is MapReduce in Hadoop? • Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. • MapReduce is defined as the framework of Hadoop, which is used to process a huge amount of data parallelly on large clusters of commodity hardware in a reliable manner. It allows the application to store the data in the distributed form and process large dataset across groups of computers using simple programming models
  • 6. How MapReduce in Hadoop works?
  • 7. Steps of MapReduce in Hadoop • Map stage − The map or mapper’s job is to process the input data. Generally, the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data. • Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
  • 9. MapReduce Jobs Application Master • Responsible for the execution of a single application or MapReduce job • Divides job requests into tasks and assigns them to Node-Managers running on the slave node Node-Manager • Has many dynamic resource containers which executes each active map or reduce task • Communicates regularly with the Application Master
  • 11. Advantage of MapReduce • Fault tolerance: It can handle failures without downtime. Speed: It splits, shuffles, and reduces the unstructured data in a short time. • Cost-effective: Hadoop MapReduce has a scale-out feature that enables users to process or store the data in a cost-effective manner. • Scalability: It provides a highly scalable framework. MapReduce allows users to run applications from many nodes. • Parallel processing: As Hadoop storage data in the distributed file system and the MapReduce program’s working, it divides tasks task map and reduce and that could execute in parallel. And again, because of the parallel execution, it reduces the entire run time.
  • 12. Limitations Of MapReduce • MapReduce cannot cache the intermediate data in memory for a further requirement which diminishes the performance of Hadoop. • It is only suitable for Batch Processing of a Huge amounts of Data, and it is not flexible, means the MapReduce framework is rigid. • A lot of manual coding is required, even for common operations such as join, filter, projection, aggregates, sorting, distinct... • Semantics are hidden inside the map and reduce functions, so it is difficult to maintain, extend and optimize them.