SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Er. Jay Nagar(Technology Researcher )
Call:+91-960157620
Before MapReduce…
 Large scale data processing was difficult!
 Managing hundreds or thousands of processors
 Managing parallelization and distribution
 I/O Scheduling
 Status and monitoring
 Fault/crash tolerance
 MapReduce provides all of these, easily!
MapReduce Overview
 MapReduce is a programming model for processing large data sets
with a parallel, distributed algorithm on a cluster
 How does it solve our previously mentioned problems?
 MapReduce is highly scalable and can be used across many computers.
 Many small machines can be used to process jobs that normally could not
be processed by a large machine.
How MapReduce works?
 MapReduce is a method for distributing a task across multiple
nodes
 Each node processes data stored on that node
 Where possible
 Consists of two phases:
 Map
 Reduce
Features of MapReduce
 Automatic parallelization and distribution
 Fault‐tolerance
 Status and monitoring tools
 A clean abstraction for programmers
 MapReduce programs are usually written in Java
 Can be written in any language using Hadoop Streaming (see later)
 All of Hadoop is written in Java
 MapReduce abstracts all the ‘housekeeping’ away from the
developer
 Developer can concentrate simply on working the Map and Reduce functions
A Bigger
Picture
MapReduce: The JobTracker
Basic Cluster Configuration
MapReduce: Terminology
MapReduce : The Mapper
MapReduce : The Mapper
MapReduce : The Reducer
Diagram
Creating and Running a MapReduce Job
The
MapReduce
Flow: The
Mapper
The
MapReduce
Flow: Shuffle
and Sort
The
MapReduce
Flow: The
Reducer
Our MapReduce Program: WordCount
 This consists of three portions
 The driver Code – Code that runs on the client to configure and submit
the job
 The Mapper
 The Reducer
Some Standard Input Formats
Keys and Values
 Keys and Values Are Objects
 Values are objects that implements Writable
 Keys are objects that implements WritableComparable
 Hadoop defines its own ‘box classes’ for strings, integers etc
 IntWritable
 LongWritables
 FloatWritables
 Text
 …
Driver Codeimport org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.printf("Usage: WordCount <input dir> <output
dir>n");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new
Path(args[1]));
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);
} }
Mapper Codeimport java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
for (String word : line.split("W+")) {
if (word.length() > 0) {
context.write(new Text(word), new IntWritable(1));
}
}
}
}
Reducer Codeimport java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int wordCount = 0;
for (IntWritable value : values) {
wordCount += value.get();
}
context.write(key, new IntWritable(wordCount));
}
}
Hands-on to execute a MapReduce Job
- WordCount
Mean
 We want to find the mean max temperature for every month
Input Data:
Temperature in Milan
(DDMMYYY, MIN, MAX)
01012000, -4.0, 5.0
02012000, -5.0, 5.1
03012000, -5.0, 7.7
…
29122013, 3.0, 9.0
30122013, 0.0, 9.8
31122013, 0.0, 9.0
Mean
 Sample input data:
01012000, 0.0, 10.0
02012000, 0.0, 20.0
03012000, 0.0, 2.0
04012000, 0.0, 4.0
05012000, 0.0, 3.0
 Mapper #1: lines 1, 2
 Mapper #2: lines 3, 4, 5
 Mapper#1: mean = (10.0 + 20.0) / 2 = 15.0
 Mapper#2: mean = (2.0 + 4.0 + 3.0) / 3 = 3.0
 Reducer mean = (15.0 + 3.0) / 2 = 9.0
 But the correct mean is:
 (10.0 + 20.0 + 2.0 + 4.0 + 3.0) / 5 = 7.8
Hands-on to execute a MapReduce Job
- Mean
Sorting
 MapReduce is very well suited to sorting large data sets
 Recall: keys are passed to the Reducer in sorted order
 Assuming the file to be sorted contains lines with a single value:
 Mapper is merely the identity function for the value
(k, v) -> (v, _)
 Reducer is the identity function
(k, _) -> (k, '')
Searching
 Assume the input is a set of files containing lines of text
 Assume the Mapper has been passed the pattern for which to search
as a special parameter
 We saw how to pass parameters to your Mapper
 Algorithm:
 Mapper compares the line against the pattern
 If the pattern matches, Mapper outputs (line, _)
 Or (filename+line, _), or …
 If the pattern does not match, Mapper outputs nothing
 Reducer is the Identity Reducer
 Just outputs each intermediate key
The Streaming API: Motivation
 The Streaming API allows developers to use any language they wish to
write Mappers and Reducers
 As long as the language can read from standard input and write to standard output
 Advantages of the Streaming API:
 No need for non‐Java coders to learn Java
 Fast development time
 Ability to use existing code Libraries
 Disadvantages of the Streaming API:
 Performance
 Primarily suited for handling data that can be represented as text
 Streaming jobs can use excessive amounts of RAM or fork excessive numbers of
processes
 Although Mappers and Reducers can be written using the Streaming API,
Partitioners, InputFormats etc. must still be written in Java
How Streaming Works
 To implement streaming, write separate Mapper and Reducer
programs in the language of your choice
 They will receive input via stdin
 They should write their output to stdout
 If TextInputFormat (the default) is used, the streaming Mapper
just receives each line from the file on stdin
 No key is passed
 Streaming Mapper and streaming Reducer’s output should be sent
to stdout as key (tab) value (newline)
 Separators other than tab can be specified
Joins When processing large data sets the need for joining data by a
common key can be very useful, if not essential.
 We will be covering 2 types of joins, Reduce-Side joins, Map-Side joins
SELECT Employees.Name, Employees.Age, Department.Name FROM Employees INNER JOIN Department ON
Employees.Dept_Id=Department.Dept_Id
Reduce
Side Join
Sample Code
map (K table, V rec) {
dept_id = rec.Dept_Id
tagged_rec.tag = table
tagged_rec.rec = rec
emit(dept_id, tagged_rec)
}
reduce (K dept_id, list<tagged_rec> tagged_recs) {
for (tagged_rec : tagged_recs) {
for (tagged_rec1 : taagged_recs) {
if (tagged_rec.tag != tagged_rec1.tag) {
joined_rec = join(tagged_rec, tagged_rec1)
}
emit (tagged_rec.rec.Dept_Id, joined_rec)
}
}
map reduce Technic in big data

Weitere ähnliche Inhalte

Was ist angesagt?

Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formatsVigen Sahakyan
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Rohit Agrawal
 
Unit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex dataUnit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex datavishal choudhary
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoopishan0019
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiUnmesh Baile
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_PennonsoftPennonSoft
 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windowsMuhammad Shahid
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
Map Reduce
Map ReduceMap Reduce
Map Reduceschapht
 

Was ist angesagt? (20)

Map/Reduce intro
Map/Reduce introMap/Reduce intro
Map/Reduce intro
 
Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formats
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
Unit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex dataUnit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex data
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Ähnlich wie map reduce Technic in big data

MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview questionpappupassindia
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Map reduce
Map reduceMap reduce
Map reducexydii
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 

Ähnlich wie map reduce Technic in big data (20)

MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 

Mehr von Jay Nagar

11 best tips to grow your influence youtube
11 best tips to grow your influence youtube11 best tips to grow your influence youtube
11 best tips to grow your influence youtubeJay Nagar
 
Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022Jay Nagar
 
What is Signature marketing
What is Signature marketingWhat is Signature marketing
What is Signature marketingJay Nagar
 
100+ Guest blogging sites list
100+ Guest blogging sites list100+ Guest blogging sites list
100+ Guest blogging sites listJay Nagar
 
Ethical Hacking and Defense Penetration
Ethical Hacking and Defense PenetrationEthical Hacking and Defense Penetration
Ethical Hacking and Defense PenetrationJay Nagar
 
Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020Jay Nagar
 
On-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech JayOn-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech JayJay Nagar
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceJay Nagar
 
Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness Jay Nagar
 
Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual Jay Nagar
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programmingJay Nagar
 
Bluethooth Protocol stack/layers
Bluethooth Protocol stack/layersBluethooth Protocol stack/layers
Bluethooth Protocol stack/layersJay Nagar
 
GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)Jay Nagar
 
Communication and Networking
Communication and NetworkingCommunication and Networking
Communication and NetworkingJay Nagar
 
MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION Jay Nagar
 
Global system for mobile communication(GSM)
Global system for mobile communication(GSM)Global system for mobile communication(GSM)
Global system for mobile communication(GSM)Jay Nagar
 
Python for beginners
Python for beginnersPython for beginners
Python for beginnersJay Nagar
 
Earn Money from bug bounty
Earn Money from bug bountyEarn Money from bug bounty
Earn Money from bug bountyJay Nagar
 
Code smell & refactoring
Code smell & refactoringCode smell & refactoring
Code smell & refactoringJay Nagar
 
The Diffie-Hellman Algorithm
The Diffie-Hellman AlgorithmThe Diffie-Hellman Algorithm
The Diffie-Hellman AlgorithmJay Nagar
 

Mehr von Jay Nagar (20)

11 best tips to grow your influence youtube
11 best tips to grow your influence youtube11 best tips to grow your influence youtube
11 best tips to grow your influence youtube
 
Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022Impact of micro vs macro influencers in 2022
Impact of micro vs macro influencers in 2022
 
What is Signature marketing
What is Signature marketingWhat is Signature marketing
What is Signature marketing
 
100+ Guest blogging sites list
100+ Guest blogging sites list100+ Guest blogging sites list
100+ Guest blogging sites list
 
Ethical Hacking and Defense Penetration
Ethical Hacking and Defense PenetrationEthical Hacking and Defense Penetration
Ethical Hacking and Defense Penetration
 
Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020Cyber Security and Cyber Awareness Tips manual 2020
Cyber Security and Cyber Awareness Tips manual 2020
 
On-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech JayOn-Page SEO Techniques By Digitech Jay
On-Page SEO Techniques By Digitech Jay
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness Cyber Security and Cyber Awareness
Cyber Security and Cyber Awareness
 
Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual Cyber security and Privacy Awareness manual
Cyber security and Privacy Awareness manual
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Bluethooth Protocol stack/layers
Bluethooth Protocol stack/layersBluethooth Protocol stack/layers
Bluethooth Protocol stack/layers
 
GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)GPRS(General Packet Radio Service)
GPRS(General Packet Radio Service)
 
Communication and Networking
Communication and NetworkingCommunication and Networking
Communication and Networking
 
MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION MOBILE COMPUTING and WIRELESS COMMUNICATION
MOBILE COMPUTING and WIRELESS COMMUNICATION
 
Global system for mobile communication(GSM)
Global system for mobile communication(GSM)Global system for mobile communication(GSM)
Global system for mobile communication(GSM)
 
Python for beginners
Python for beginnersPython for beginners
Python for beginners
 
Earn Money from bug bounty
Earn Money from bug bountyEarn Money from bug bounty
Earn Money from bug bounty
 
Code smell & refactoring
Code smell & refactoringCode smell & refactoring
Code smell & refactoring
 
The Diffie-Hellman Algorithm
The Diffie-Hellman AlgorithmThe Diffie-Hellman Algorithm
The Diffie-Hellman Algorithm
 

Kürzlich hochgeladen

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Kürzlich hochgeladen (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

map reduce Technic in big data

  • 1. Er. Jay Nagar(Technology Researcher ) Call:+91-960157620
  • 2. Before MapReduce…  Large scale data processing was difficult!  Managing hundreds or thousands of processors  Managing parallelization and distribution  I/O Scheduling  Status and monitoring  Fault/crash tolerance  MapReduce provides all of these, easily!
  • 3. MapReduce Overview  MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster  How does it solve our previously mentioned problems?  MapReduce is highly scalable and can be used across many computers.  Many small machines can be used to process jobs that normally could not be processed by a large machine.
  • 4. How MapReduce works?  MapReduce is a method for distributing a task across multiple nodes  Each node processes data stored on that node  Where possible  Consists of two phases:  Map  Reduce
  • 5. Features of MapReduce  Automatic parallelization and distribution  Fault‐tolerance  Status and monitoring tools  A clean abstraction for programmers  MapReduce programs are usually written in Java  Can be written in any language using Hadoop Streaming (see later)  All of Hadoop is written in Java  MapReduce abstracts all the ‘housekeeping’ away from the developer  Developer can concentrate simply on working the Map and Reduce functions
  • 10. MapReduce : The Mapper
  • 11. MapReduce : The Mapper
  • 12. MapReduce : The Reducer
  • 14. Creating and Running a MapReduce Job
  • 18. Our MapReduce Program: WordCount  This consists of three portions  The driver Code – Code that runs on the client to configure and submit the job  The Mapper  The Reducer
  • 20. Keys and Values  Keys and Values Are Objects  Values are objects that implements Writable  Keys are objects that implements WritableComparable  Hadoop defines its own ‘box classes’ for strings, integers etc  IntWritable  LongWritables  FloatWritables  Text  …
  • 21. Driver Codeimport org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.Job; public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.printf("Usage: WordCount <input dir> <output dir>n"); System.exit(-1); } Job job = new Job(); job.setJarByClass(WordCount.class); job.setJobName("Word Count"); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); boolean success = job.waitForCompletion(true); System.exit(success ? 0 : 1); } }
  • 22. Mapper Codeimport java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); for (String word : line.split("W+")) { if (word.length() > 0) { context.write(new Text(word), new IntWritable(1)); } } } }
  • 23. Reducer Codeimport java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; for (IntWritable value : values) { wordCount += value.get(); } context.write(key, new IntWritable(wordCount)); } }
  • 24. Hands-on to execute a MapReduce Job - WordCount
  • 25. Mean  We want to find the mean max temperature for every month Input Data: Temperature in Milan (DDMMYYY, MIN, MAX) 01012000, -4.0, 5.0 02012000, -5.0, 5.1 03012000, -5.0, 7.7 … 29122013, 3.0, 9.0 30122013, 0.0, 9.8 31122013, 0.0, 9.0
  • 26. Mean  Sample input data: 01012000, 0.0, 10.0 02012000, 0.0, 20.0 03012000, 0.0, 2.0 04012000, 0.0, 4.0 05012000, 0.0, 3.0  Mapper #1: lines 1, 2  Mapper #2: lines 3, 4, 5  Mapper#1: mean = (10.0 + 20.0) / 2 = 15.0  Mapper#2: mean = (2.0 + 4.0 + 3.0) / 3 = 3.0  Reducer mean = (15.0 + 3.0) / 2 = 9.0  But the correct mean is:  (10.0 + 20.0 + 2.0 + 4.0 + 3.0) / 5 = 7.8
  • 27. Hands-on to execute a MapReduce Job - Mean
  • 28.
  • 29. Sorting  MapReduce is very well suited to sorting large data sets  Recall: keys are passed to the Reducer in sorted order  Assuming the file to be sorted contains lines with a single value:  Mapper is merely the identity function for the value (k, v) -> (v, _)  Reducer is the identity function (k, _) -> (k, '')
  • 30. Searching  Assume the input is a set of files containing lines of text  Assume the Mapper has been passed the pattern for which to search as a special parameter  We saw how to pass parameters to your Mapper  Algorithm:  Mapper compares the line against the pattern  If the pattern matches, Mapper outputs (line, _)  Or (filename+line, _), or …  If the pattern does not match, Mapper outputs nothing  Reducer is the Identity Reducer  Just outputs each intermediate key
  • 31.
  • 32. The Streaming API: Motivation  The Streaming API allows developers to use any language they wish to write Mappers and Reducers  As long as the language can read from standard input and write to standard output  Advantages of the Streaming API:  No need for non‐Java coders to learn Java  Fast development time  Ability to use existing code Libraries  Disadvantages of the Streaming API:  Performance  Primarily suited for handling data that can be represented as text  Streaming jobs can use excessive amounts of RAM or fork excessive numbers of processes  Although Mappers and Reducers can be written using the Streaming API, Partitioners, InputFormats etc. must still be written in Java
  • 33. How Streaming Works  To implement streaming, write separate Mapper and Reducer programs in the language of your choice  They will receive input via stdin  They should write their output to stdout  If TextInputFormat (the default) is used, the streaming Mapper just receives each line from the file on stdin  No key is passed  Streaming Mapper and streaming Reducer’s output should be sent to stdout as key (tab) value (newline)  Separators other than tab can be specified
  • 34.
  • 35. Joins When processing large data sets the need for joining data by a common key can be very useful, if not essential.  We will be covering 2 types of joins, Reduce-Side joins, Map-Side joins SELECT Employees.Name, Employees.Age, Department.Name FROM Employees INNER JOIN Department ON Employees.Dept_Id=Department.Dept_Id
  • 37. Sample Code map (K table, V rec) { dept_id = rec.Dept_Id tagged_rec.tag = table tagged_rec.rec = rec emit(dept_id, tagged_rec) } reduce (K dept_id, list<tagged_rec> tagged_recs) { for (tagged_rec : tagged_recs) { for (tagged_rec1 : taagged_recs) { if (tagged_rec.tag != tagged_rec1.tag) { joined_rec = join(tagged_rec, tagged_rec1) } emit (tagged_rec.rec.Dept_Id, joined_rec) } }