SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Confidential, Copyright © Quanticate
Introduction to Map - Reduce
Muralidharan Deenathayalan
Technical Lead
Muralidharan.deenathayalan@quanticate.com
Apache logo are trademarks of The Apache Software Foundation.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Confidential, Copyright © Quanticate
Agenda
What is Map-Reduce?
Map-Reduce architecture
Advantages of Map-Reduce
Frameworks available for writing Map-Reduce?
WordCount – Map-Reduce Program explained
How to compile Map-Reduce program using Eclipse?
How to deploy Map-Reduce program?
How to run Map-Reduce program?
Q & A
Confidential, Copyright © Quanticate
Who Am I ?
7+ years of experience in Microsoft technologies like Asp.net, C#,
SQL server and SharePoint
2+ years of experience in open source technologies like Java, Alfresco and Apache
Cassandra
Author of Apache Cassandra Cookbook (In writing )
Csharpcorner MVP
Frequent blogger
Confidential, Copyright © Quanticate
What is Map-Reduce?
 Generally called as Map-R program
 MapReduce Map() + Reduce()
 MapReduce is a programming approach to process large datasets in parallel, distributed on a
cluster ( Divide and conquer).
Map
Confidential, Copyright © Quanticate
What is Map-Reduce?
• Map:
– Receives input key/value pair
– Outputs intermediate key/value pair
• Reduce :
– Receives intermediate key/value pair
– Outputs key/value pair
Input Data
Map
Reduce
Reduce
Map
Map
Input Data
Confidential, Copyright © Quanticate
Map-Reduce Architecture overview
Job trackerJob tracker
Task tracker
Task tracker
Task tracker
Master node
Slave node 1 Slave node 2 Slave node N
Workers
user
Workers Workers
Confidential, Copyright © Quanticate
Advantages of Map-Reduce
 Distributed pattern-based searching
 Distributed sorting
 Web access logs
 Machine Learning
Confidential, Copyright © Quanticate
Framework available for writing
Map-Reduce
Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html
JAVA
Cascading
Crunch
CLOJURE
Cascalog
SCALA
Scrunch
Scalding
Scoobi
R
Rhadoop
MICROSOFT
.Net (C# / VB.net)
SPECIAL (HIGH-LEVEL)
Apache Hive
Apache Pig
RUBY
Wukong
Cascading Jruby
PYTHON
MR Job
Dumbo
Hadooppy
Pydoop
Luigi
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text,
Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
} } }
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
} }
Confidential, Copyright © Quanticate
WordCount – Map-Reduce Program
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf); }
Confidential, Copyright © Quanticate
How to compile Map-Reduce
program using Eclipse?
 Refer Hadoop jar file from your disk
 Maven is simple to use
 Eclipse  Project  Build Project
 No errors in the eclipse console 
Confidential, Copyright © Quanticate
How to deploy Map-Reduce program?
Confidential, Copyright © Quanticate
How to run Map-Reduce program?
Confidential, Copyright © Quanticate
Summary
 What is Map-Reduce?
 Architecture of Map-Reduce?
 Advantages of Map-Reduce
 Frameworks available for Map-Reduce?
 WordCount – Map-Reduce Program explained
 Compiling WordCount Map-Reduce program using Eclipse
 Deploying Map-Reduce program
 Executing a Map-Reduce program
Confidential, Copyright © Quanticate
Q & A
Confidential, Copyright © Quanticate
References
http://en.wikipedia.org/wiki/MapReduce
http://hortonworks.com
http://hadoop.apache.org
Confidential, Copyright © Quanticate
Coding-Freaks.Net
www.codingfreaks.net
Quanticate OPDev Twitter
https://twitter.com/quanticateopdev
Twitter
www.Twitter.com/muralidharand
Confidential, Copyright © Quanticate

Weitere ähnliche Inhalte

Was ist angesagt?

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduceHassan A-j
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Map Reduce
Map ReduceMap Reduce
Map Reduceschapht
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
Map reduce paradigm explained
Map reduce paradigm explainedMap reduce paradigm explained
Map reduce paradigm explainedDmytro Sandu
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSpraveen bhat
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 

Was ist angesagt? (20)

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
Map reduce paradigm explained
Map reduce paradigm explainedMap reduce paradigm explained
Map reduce paradigm explained
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 

Ähnlich wie Map Reduce introduction

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveEMC
 
MapReduce wordcount program
MapReduce wordcount program MapReduce wordcount program
MapReduce wordcount program Sarwan Singh
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's comingDatabricks
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiUnmesh Baile
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezMapR Technologies
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
Mapredtutorial
MapredtutorialMapredtutorial
MapredtutorialAnup Mohta
 
Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on HadoopSenturus
 
Cs267 hadoop programming
Cs267 hadoop programmingCs267 hadoop programming
Cs267 hadoop programmingKuldeep Dhole
 
Dart and Flutter Basics.pptx
Dart and Flutter Basics.pptxDart and Flutter Basics.pptx
Dart and Flutter Basics.pptxDSCVSSUT
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 

Ähnlich wie Map Reduce introduction (20)

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
MapReduce wordcount program
MapReduce wordcount program MapReduce wordcount program
MapReduce wordcount program
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
Mapredtutorial
MapredtutorialMapredtutorial
Mapredtutorial
 
Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on Hadoop
 
Cs267 hadoop programming
Cs267 hadoop programmingCs267 hadoop programming
Cs267 hadoop programming
 
Dart and Flutter Basics.pptx
Dart and Flutter Basics.pptxDart and Flutter Basics.pptx
Dart and Flutter Basics.pptx
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 

Mehr von Muralidharan Deenathayalan (10)

What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Alfresco 5.0 features
Alfresco 5.0 featuresAlfresco 5.0 features
Alfresco 5.0 features
 
Test drive on driven development process
Test drive on driven development processTest drive on driven development process
Test drive on driven development process
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Alfresco share 4.1 to 4.2 customisation
Alfresco share 4.1 to 4.2 customisationAlfresco share 4.1 to 4.2 customisation
Alfresco share 4.1 to 4.2 customisation
 
Introduction about Alfresco webscript
Introduction about Alfresco webscriptIntroduction about Alfresco webscript
Introduction about Alfresco webscript
 
Alfresco activiti workflows
Alfresco activiti workflowsAlfresco activiti workflows
Alfresco activiti workflows
 
Alfresco content model
Alfresco content modelAlfresco content model
Alfresco content model
 

Kürzlich hochgeladen

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Kürzlich hochgeladen (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Map Reduce introduction

  • 1. Confidential, Copyright © Quanticate Introduction to Map - Reduce Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2. Confidential, Copyright © Quanticate Agenda What is Map-Reduce? Map-Reduce architecture Advantages of Map-Reduce Frameworks available for writing Map-Reduce? WordCount – Map-Reduce Program explained How to compile Map-Reduce program using Eclipse? How to deploy Map-Reduce program? How to run Map-Reduce program? Q & A
  • 3. Confidential, Copyright © Quanticate Who Am I ? 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra Author of Apache Cassandra Cookbook (In writing ) Csharpcorner MVP Frequent blogger
  • 4. Confidential, Copyright © Quanticate What is Map-Reduce?  Generally called as Map-R program  MapReduce Map() + Reduce()  MapReduce is a programming approach to process large datasets in parallel, distributed on a cluster ( Divide and conquer). Map
  • 5. Confidential, Copyright © Quanticate What is Map-Reduce? • Map: – Receives input key/value pair – Outputs intermediate key/value pair • Reduce : – Receives intermediate key/value pair – Outputs key/value pair Input Data Map Reduce Reduce Map Map Input Data
  • 6. Confidential, Copyright © Quanticate Map-Reduce Architecture overview Job trackerJob tracker Task tracker Task tracker Task tracker Master node Slave node 1 Slave node 2 Slave node N Workers user Workers Workers
  • 7. Confidential, Copyright © Quanticate Advantages of Map-Reduce  Distributed pattern-based searching  Distributed sorting  Web access logs  Machine Learning
  • 8. Confidential, Copyright © Quanticate Framework available for writing Map-Reduce Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html JAVA Cascading Crunch CLOJURE Cascalog SCALA Scrunch Scalding Scoobi R Rhadoop MICROSOFT .Net (C# / VB.net) SPECIAL (HIGH-LEVEL) Apache Hive Apache Pig RUBY Wukong Cascading Jruby PYTHON MR Job Dumbo Hadooppy Pydoop Luigi
  • 9. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }
  • 10. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
  • 11. Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }
  • 12. Confidential, Copyright © Quanticate How to compile Map-Reduce program using Eclipse?  Refer Hadoop jar file from your disk  Maven is simple to use  Eclipse  Project  Build Project  No errors in the eclipse console 
  • 13. Confidential, Copyright © Quanticate How to deploy Map-Reduce program?
  • 14. Confidential, Copyright © Quanticate How to run Map-Reduce program?
  • 15. Confidential, Copyright © Quanticate Summary  What is Map-Reduce?  Architecture of Map-Reduce?  Advantages of Map-Reduce  Frameworks available for Map-Reduce?  WordCount – Map-Reduce Program explained  Compiling WordCount Map-Reduce program using Eclipse  Deploying Map-Reduce program  Executing a Map-Reduce program
  • 16. Confidential, Copyright © Quanticate Q & A
  • 17. Confidential, Copyright © Quanticate References http://en.wikipedia.org/wiki/MapReduce http://hortonworks.com http://hadoop.apache.org
  • 18. Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand