SlideShare a Scribd company logo
1 of 80
Download to read offline
MapReduce
MapReduce
  reduced
PetaMengurangi
*
PetaMengurangi

          *google translate
the
problem
lots
of
data
e.g.

the
entire
interwebs
single
computer
not
going
to
work
lots
of
computers
we
have
that
cluster
programming
cluster
programming
        =
suck
MapReduce
makes
the
pain
go
away
2
main
stages
map
map
process
data
on
hosts
reduce
reduce
summarise
the
results
example
count
words
on
lines
>>>
reduce(operator.add,
map(countWords,
lines))
>>>
reduce(operator.add,
map(countWords,
lines))
>>>
reduce(operator.add,
map(countWords,
lines))
except
in
this
case
lots
of
machines
typical
cluster
O(103)
machines

each
2‐8Gb
RAM
 local
IDE
disks
GFS
distributes
the
data
process
data
on
hosts
  summarise
results
split
data
into
chunks
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
   start
processes
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
   start
processes
send
data
to
mappers
process
data
on
hosts
  summarise
results
split
data
into
chunks
  allocate
machines
   start
processes
send
data
to
mappers
process
data
on
hosts
    monitor
hosts
  summarise
results
split
data
into
chunks
   allocate
machines
    start
processes
 send
data
to
mappers
 process
data
on
hosts
     monitor
hosts
send
results
to
reducers
   summarise
results
split
data
into
chunks
    allocate
machines
     start
processes
  send
data
to
mappers
  process
data
on
hosts
      monitor
hosts
redo
failed
and
stragglers
 send
results
to
reducers
    summarise
results
split
data
into
chunks
    allocate
machines
     start
processes
  send
data
to
mappers
  process
data
on
hosts
      monitor
hosts
redo
failed
and
stragglers
 send
results
to
reducers
    summarise
results
   output
final
results
MapReduce
does
the

   yukky
stuff
split
data
into
chunks
                allocate
machines
                 start
processes
              send
data
to
mappers
MapReduce
              process
data
on
hosts
                  monitor
hosts          programmer
            redo
failed
and
stragglers
             send
results
to
reducers
                summarise
results
               output
final
results
handles
failures
handles
stragglers
a
vanity
search
%
of
refs
to
Anthony
%
of
refs
to
Anthony
       Baxter
count(‘Anthony
Baxter’)
   count(‘Anthony’)
C++
library
...
with
Python
bindings,

           yay!
class
AnthonyMapper(mrpython.Mapper):




def
Map(self,
map_input):








meCount
=
otherCount
=
0








docId
=
map_input.key()
#
ignored
‐
doc
id








src
=
map_input.value()
#
document
source








text
=
ExtractText(src).split()








seenAnthony
=
False








for
word
in
text:












if
not
seenAnthony:
















if
word.lower()
==
'anthony':




















seenAnthony
=
True












else:
















if
word.lower()
==
'baxter':




















meCount
+=
1
















else:




















otherCount
+=
1

















seenAnthony
=
False








yield
'me',
meCount








yield
'other',
otherCount
class
AnthonyReducer(mrpython.Reducer):




def
Reducer(self,
reduce_input):








'''
Passed
a
key
(either
'me'
or
'other')
and
a
list












of
counts.
Adds
the
counts
and
returns
them.








'''








count
=
0








for
val
in
reduce_input.values():












sum
+=
int(val)








yield
count
the
result:
the
result:

about
1
in
4000
other
uses
for
MapReduce
web
link
graphs
access
logs
text
analysis
google
news
clustering
local
search
road
traffic
take
speed
samples
group
by
road
segment
take
the
average
once
per
minute
output
to
a
map
layer
limitation:
availability
of

          data
MapReduce
is
pretty
cool
for
more
information
“mapreduce
paper”
   “gfs
paper”
 “google
papers”
if
you’d
like
to
play
hadoop.apache.org
open
source
java
implementation
HDFS
Map Reduce In 5 Minutes

More Related Content

Similar to Map Reduce In 5 Minutes

How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHP
Atsuhiro Kubo
 
JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009
gyuque
 
yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909
Yusuke Wada
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
adunne
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
Hadoop User Group
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
MySQLConference
 

Similar to Map Reduce In 5 Minutes (20)

Map reduce
Map reduceMap reduce
Map reduce
 
Ruby on Rails Tutorial Part I
Ruby on Rails Tutorial Part IRuby on Rails Tutorial Part I
Ruby on Rails Tutorial Part I
 
Gmaps Railscamp2008
Gmaps Railscamp2008Gmaps Railscamp2008
Gmaps Railscamp2008
 
HTML Parsing With Hpricot
HTML Parsing With HpricotHTML Parsing With Hpricot
HTML Parsing With Hpricot
 
XS Japan 2008 Xen Mgmt Japanese
XS Japan 2008 Xen Mgmt JapaneseXS Japan 2008 Xen Mgmt Japanese
XS Japan 2008 Xen Mgmt Japanese
 
How To Create Custom DSLs By PHP
How To Create Custom DSLs By PHPHow To Create Custom DSLs By PHP
How To Create Custom DSLs By PHP
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009JSplash - Adobe MAX 2009
JSplash - Adobe MAX 2009
 
Ms Dm Online
Ms Dm OnlineMs Dm Online
Ms Dm Online
 
yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909yusukebe in Yokohama.pm 090909
yusukebe in Yokohama.pm 090909
 
Adding Statistical Functionality to the DATA Step with PROC FCMP
Adding Statistical Functionality to the DATA Step with PROC FCMPAdding Statistical Functionality to the DATA Step with PROC FCMP
Adding Statistical Functionality to the DATA Step with PROC FCMP
 
Map reduce
Map reduceMap reduce
Map reduce
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
Mapreduce Pact06 Keynote
Mapreduce Pact06 KeynoteMapreduce Pact06 Keynote
Mapreduce Pact06 Keynote
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Timm – Telecom Network Module Management
Timm – Telecom Network Module ManagementTimm – Telecom Network Module Management
Timm – Telecom Network Module Management
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
 
機械学習と自動微分
機械学習と自動微分機械学習と自動微分
機械学習と自動微分
 
Roll-out of the NYU HSL Website and Drupal CMS
Roll-out of the NYU HSL Website and Drupal CMSRoll-out of the NYU HSL Website and Drupal CMS
Roll-out of the NYU HSL Website and Drupal CMS
 
An Integrated Management Supervisor for End-to-End Management of Heterogeneou...
An Integrated Management Supervisor for End-to-End Management of Heterogeneou...An Integrated Management Supervisor for End-to-End Management of Heterogeneou...
An Integrated Management Supervisor for End-to-End Management of Heterogeneou...
 

More from Linuxmalaysia Malaysia

FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
Linuxmalaysia Malaysia
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Linuxmalaysia Malaysia
 
33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development
Linuxmalaysia Malaysia
 

More from Linuxmalaysia Malaysia (20)

Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
 
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
 
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
 
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochureBrochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
 
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
 
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
 
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)
 
OSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT SlideOSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011
 
Building Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrainBuilding Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrain
 
OSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source CommunityOSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source Community
 
33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development
 
Open Source Tools for Creating Mashups with Government Datasets MOSC2010
Open Source Tools for Creating Mashups with Government Datasets MOSC2010Open Source Tools for Creating Mashups with Government Datasets MOSC2010
Open Source Tools for Creating Mashups with Government Datasets MOSC2010
 
DNS solution trumps cloud computing competition
DNS solution trumps cloud computing competitionDNS solution trumps cloud computing competition
DNS solution trumps cloud computing competition
 
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
 
Benchmarking On Web Server For Budget 2008 Day
Benchmarking On  Web  Server For  Budget 2008  DayBenchmarking On  Web  Server For  Budget 2008  Day
Benchmarking On Web Server For Budget 2008 Day
 
Sesuaikan Masa Sempena 2010
Sesuaikan Masa Sempena 2010Sesuaikan Masa Sempena 2010
Sesuaikan Masa Sempena 2010
 
OSS Community In Malaysia 2009 List
OSS Community In Malaysia 2009 ListOSS Community In Malaysia 2009 List
OSS Community In Malaysia 2009 List
 
List Of OSS Communities Malaysia 2009
List Of OSS Communities Malaysia 2009List Of OSS Communities Malaysia 2009
List Of OSS Communities Malaysia 2009
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Map Reduce In 5 Minutes