SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
MapReduce and Hadoop
Cadenelli Nicola
Datenbanken Implementierungstechniken
Introduction
● History
● Motivations
MapReduce
● What MapReduce is
● Why it is usefull
● Execution Details
● Some Examples
● Conclusions
Outline
Hadoop
● Introduction
● Hadoop Architecture
● Hadoop Ecosystem
● In real world
MapReduce&Databases
● SQL-MapReduce
● In-Database Map-Reduce
● Conclusions
Introduction MapReduce Hadoop MR&Databases
● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
GFS
MapReduce
BigTable
HDFS
MapReduce
Introduction MapReduce Hadoop MR&Databases
β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
2004: Google
publishes the
papers
2006:
Apache releases
Hadoop.
Is the first Open
Source
implementation of
GFS and
MapReduce.
Now:
Amazon, AOL,
eBay, Facebook,
HP, IBM, Last.fm,
LinkedIn, Microsoft,
Spotify,
Twitter and more
are using Hadoop.
A Brief History
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
● Data start to be really big: more than >10TB.
E.g: Large Synoptic Survey Telescope (30TB / night)
● The best idea is to scale out (not scale up) the
system, but . . .
ξ€Œ How do we scale to more than 1000+ machines?
ξ€Œ How do we handle machine failures?
ξ€Œ How can we facilitate communications between nodes?
ξ€Œ If we change system, do we lose all our optimisation
work?
● Google needed to recreate the index of the web.
Motivations
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
β€œMapReduce is a programming model and an
associated implementation for processing and
generating large data sets.” – Google, Inc.
MapReduce paper, 2004.
It is a really simple API that has just two serial
functions, map() and reduce() and is language
independent (Java, Python, Perl …).
What is MapReduce?
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
MapReduce hides messy details in the runtime
library:
● Parallelization and Distribution
● Load balancing
● Network and disk transfer optimization
● Handling of machine failures
● Fault tolerance
● Monitoring & status updates
All users obtain benefits from improvements on the
core library.
Why is MapReduce useful?
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
1. Read a lot of data
2. Map: extract something we care about from each record
3. Shuffle and Sort
4. Reduce: aggregate, summarize, filter, or transform
5. Write the results
From an outside view is the same (read, elaborate,
write), map and reduce change to fit the problem.
Typical problem solved by MapReduce
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
● Single master controls job execution on multiple slaves.
● Mappers preferentially placed on same node or same
rack as their input block β†’ minimizes network usage!!!
● Mappers save outputs to local disk before serving them
to reducers.
● If a map or reduce crashes: Re-execute!
● Allows having more mappers and reducers than nodes.
Some Execution Details
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Execution overview
Google, Inc. MapReduce paper, 2004.
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Programmer has to write two primary methods:
map (k1,v1) β†’ list(k2,v2)
reduce (k2,list(v2)) β†’ list(k2,v2)
● All v' with the same k' are reduced together, in
order.
● The input keys and values are drawn from a
different domain than the output keys and values.
MapReduce Programming Model
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
Example: Words Frequency
β€œdocumentx”, β€œTo be or not to be”
β€œbe”, 2
β€œnot”, 1
β€œor”, 1
β€œto”, 2
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
β€œdocument1”,
β€œTo be or not to be”
β€œbe”, 2
β€œnot”, 1
β€œor”, 1
β€œto”, 2
...
β€œto”, 1
β€œbe”, 1
β€œor”, 1
β€œnot”, 1
β€œto”, 1
β€œbe”, 1
key = β€œbe”
values = β€œ1”,”1”
key = β€œnot”
values = β€œ1”
key = β€œor”
values = β€œ1”
key = β€œto”
values = β€œ1”,”1”
...β€œdocument2”,
β€œtext”
...
...
β€œbe”, 1
β€œbe”, 1
...
β€œnot”, 1
...
β€œor”, 1
...
β€œto”, 1
β€œto”, 1
...
ShuffleandSort:aggregatevaluesbykey
Map Reduce
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
● Inverted index
- Find what documents contain a specific word.
- Map: parse document, emit <word, document-ID> pairs.
- Reduce: for each word, sort the corresponding document Ids.
Emit <word, list(document-ID)>
β€’ Reverse web-link graph
- Find where page links come from.
- Map: output <target, source> for each link to target in a page
source.
- Reduce: concatenate the list of all source URLs associated
with a target.
Emit <target, list(source)>
Others examples
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
● Proven to be a useful abstraction
● Really simplifies large-scala computations
● Fun to use:
- Focus on problem
- Let the library deal with messy details
Conclusions on MapReduce
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
GFS
MapReduce
HDFS
MapReduce
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
● Is a framework for distributed processing
● It is Open Source (Apache v2 Licence)
● It is a top-level Apache Project
● Written in Java
● Batch processing centric
● Runs on commodity hardware
What is Hadoop?
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Hadoop Distributed File System
● For very large files: TBs, PBs.
● Each file is partitioned into chunks of 64MB.
● Each chunk is replicated several times (>=3), on
different racks, for fault tolerance.
● Is an abstract FS, disks are formatted on ext3, ext4
or XFS.
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Hadoop Architecture
● TaskTracker is the MapReduce server
(processing part)
● DataNode is the HDFS server
(data part)
TaskTracker
DataNode
Machine
Hadoop Architecture - Master/Slave
TaskTracker
DataNode
JobTracker:
● Accepts users' jobs
● Assigns tasks to workers
● Keeps track of the jobs status
TaskTracker
DataNode
TaskTracker
DataNode
JobTracker
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Hadoop Architecture - Master/Slave
TaskTracker
DataNode
NameNode:
● Keeps information on data location
● Decides where a file has to be written
TaskTracker
DataNode
TaskTracker
DataNode
NameNode
Data never flows trough the NameNode!
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Hadoop Architecture – Scalable
TaskTracker
DataNode
Machine
● Having multiple machine with Hadoop creates a
cluster.
● What If we need more storage or compute power?
TaskTracker
DataNode
Machine
TaskTracker
DataNode
Machine
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Hadoop Architecture - Overview
B C
Client JobTracker
NameNode
Secondary
NameNode A
File
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Hadoop Ecosystem – Pig & Hive
MapReduce
HDFS
Pig Hive
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Hadoop Ecosystem – HBase
MapReduce
HDFS
Pig Hive
HBase
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
@Google
● Index construction for Google Search
● Article clustering for Google News
● Statistical machine translation
@Yahoo! (4100 nodes)
● β€œWeb map” powering Yahoo! Search
● Spam detection for Yahoo! Mail
@Facebook (>100 PB of storage)
● Data mining
● Ad optimization
● Spam detection
What is MapReduce/Hadoop used for?
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
MapReduce's use of input files and lack of schema
support prevents the performance improvements
enabled by features like B-trees and hash
partitioning . . .
. . . most of the data in companies are stored on
databases!
but . . .
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
● SQL-MapReduce by Teradata Aster
● In-Database Map-Reduce by Oracle
● Connectors to allow external Hadoop
programs to access data from databases
and to store Hadoop output in databases
Solutions
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
Is a framework to allow developers to write SQL-
MapReduce functions in languages such as Java,
C#, Python and C++ and push them into the
database for advanced in-database analytics.
SQL-MapReduce
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
MR functions can be used like custom SQL operators and
can implement any algorithm or transformation.
SQL-MapReduce - Syntax
http://www.asterdata.com/resources/mapreduce.php
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹
Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR
SELECT key AS word, value AS wordcount
FROM WordCountReduce (
ON Tokenize ( ON blogs )
PARTITION BY key
)
ORDER BY wordcount DESC
LIMIT 20;
Example: Words Frequency
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹
Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR
SELECT key AS word, value AS wordcount
FROM WordCountReduce (
ON Tokenize ( ON blogs )
PARTITION BY key
)
ORDER BY wordcount DESC
LIMIT 20;
Demo #2: Why do Reduce when we have SQL?
SELECT word, count(*) AS wordcount
FROM Tokenize ( ON blogs )
GROUP BY word
ORDER BY wordcount DESC
LIMIT 20;
Example: Words Frequency
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹
● Uses Table Functions to implement Map-Reduce within
the database.
● Parallelization is provided by the Oracle Parallel
Execution framework.
Using this in combination with SQL, Oracle provides an
simple mechanism for database developers to
develop Map-Reduce functionality using languages they
know.
In-Database Map-Reduce by Oracle
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹
SELECT *
FROM table(oracle_map_reduce.reducer(
cursor(
SELECT value(map_result).word word
FROM table(oracle_map_reduce.mapper(
cursor(
SELECT a FROM documents), ' '
)
)
map_result
)
));
Example: Words Frequency
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹
However this solutions are not source
compatible with Hadoop.
Native Hadoop programs need to be
rewritten before becoming usable in
databases.
Still not perfect!
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹
Questions?
Introduction MapReduce Hadoop MR&Databases
β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ●

Weitere Γ€hnliche Inhalte

Was ist angesagt?

Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
Β 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReducefvanvollenhoven
Β 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintijccsa
Β 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
Β 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
Β 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
Β 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windowsMuhammad Shahid
Β 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
Β 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopMohamed Elsaka
Β 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
Β 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
Β 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-toleranceRavindra Bandara
Β 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
Β 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
Β 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
Β 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
Β 

Was ist angesagt? (20)

Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
Β 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
Β 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Β 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
Β 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
Β 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
Β 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
Β 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Β 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
Β 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
Β 
Introduction to MapReduce and Hadoop
Introduction to MapReduce and HadoopIntroduction to MapReduce and Hadoop
Introduction to MapReduce and Hadoop
Β 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Β 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
Β 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Β 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Β 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
Β 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Β 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Β 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Β 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
Β 

Andere mochten auch

Smartphones' Security
Smartphones' SecuritySmartphones' Security
Smartphones' SecurityNicola Cadenelli
Β 
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SER
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SERUTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SER
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SERMartha Isabel LligΓΌi Pauta
Β 
Development
DevelopmentDevelopment
DevelopmentRobyn96
Β 
Go green curb global warming
Go green curb global warmingGo green curb global warming
Go green curb global warmingMohammed Suhail
Β 
Lα»‹ch học MαΊ§m non HΖ°Ζ‘ng Giang
Lα»‹ch học MαΊ§m non HΖ°Ζ‘ng GiangLα»‹ch học MαΊ§m non HΖ°Ζ‘ng Giang
Lα»‹ch học MαΊ§m non HΖ°Ζ‘ng GiangNon MαΊ§m
Β 
Affordable Day care program to empower indian mothers adb 3ie conference
Affordable Day care program to empower indian mothers adb 3ie conferenceAffordable Day care program to empower indian mothers adb 3ie conference
Affordable Day care program to empower indian mothers adb 3ie conferenceifmrcmf
Β 
Bang bao gia web ok!
Bang bao gia web ok!Bang bao gia web ok!
Bang bao gia web ok!Non MαΊ§m
Β 
TΓΌrkiye GEN Hareketi
TΓΌrkiye GEN HareketiTΓΌrkiye GEN Hareketi
TΓΌrkiye GEN HareketiGelecek Hane
Β 
Ekonomi 2.0 Raporu
Ekonomi 2.0 RaporuEkonomi 2.0 Raporu
Ekonomi 2.0 RaporuGelecek Hane
Β 
Measures on Design Drawings
Measures on Design DrawingsMeasures on Design Drawings
Measures on Design DrawingsGautam Shah
Β 
May 2016 Corporate Presentation
May 2016 Corporate PresentationMay 2016 Corporate Presentation
May 2016 Corporate Presentationoncolyticsinc
Β 
COIMOTION概忡介紹
COIMOTION概忡介紹COIMOTION概忡介紹
COIMOTION概忡介紹Ben Lue
Β 
Moby crm
Moby crmMoby crm
Moby crmmobilecrm
Β 
English presentation
English presentationEnglish presentation
English presentationRagadian S'
Β 
Maker Workshop 7 May 2014 - StudioX
Maker Workshop 7 May 2014 - StudioXMaker Workshop 7 May 2014 - StudioX
Maker Workshop 7 May 2014 - StudioXGelecek Hane
Β 
bang chu cai tieng nhat
bang chu cai tieng nhatbang chu cai tieng nhat
bang chu cai tieng nhatkhucxuanvuong-hut
Β 
Hack & Go! Redefining API @ MOPCON 2014
Hack & Go!  Redefining API @ MOPCON 2014Hack & Go!  Redefining API @ MOPCON 2014
Hack & Go! Redefining API @ MOPCON 2014Ben Lue
Β 

Andere mochten auch (19)

Smartphones' Security
Smartphones' SecuritySmartphones' Security
Smartphones' Security
Β 
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SER
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SERUTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SER
UTE-CONCEPCION DEL HOMBRE Y CUESTIONAMIENTO SOBRE EL SER
Β 
Development
DevelopmentDevelopment
Development
Β 
Go green curb global warming
Go green curb global warmingGo green curb global warming
Go green curb global warming
Β 
Lα»‹ch học MαΊ§m non HΖ°Ζ‘ng Giang
Lα»‹ch học MαΊ§m non HΖ°Ζ‘ng GiangLα»‹ch học MαΊ§m non HΖ°Ζ‘ng Giang
Lα»‹ch học MαΊ§m non HΖ°Ζ‘ng Giang
Β 
File_2013.12.
File_2013.12.File_2013.12.
File_2013.12.
Β 
Affordable Day care program to empower indian mothers adb 3ie conference
Affordable Day care program to empower indian mothers adb 3ie conferenceAffordable Day care program to empower indian mothers adb 3ie conference
Affordable Day care program to empower indian mothers adb 3ie conference
Β 
Bang bao gia web ok!
Bang bao gia web ok!Bang bao gia web ok!
Bang bao gia web ok!
Β 
TΓΌrkiye GEN Hareketi
TΓΌrkiye GEN HareketiTΓΌrkiye GEN Hareketi
TΓΌrkiye GEN Hareketi
Β 
Ekonomi 2.0 Raporu
Ekonomi 2.0 RaporuEkonomi 2.0 Raporu
Ekonomi 2.0 Raporu
Β 
Measures on Design Drawings
Measures on Design DrawingsMeasures on Design Drawings
Measures on Design Drawings
Β 
May 2016 Corporate Presentation
May 2016 Corporate PresentationMay 2016 Corporate Presentation
May 2016 Corporate Presentation
Β 
COIMOTION概忡介紹
COIMOTION概忡介紹COIMOTION概忡介紹
COIMOTION概忡介紹
Β 
Moby crm
Moby crmMoby crm
Moby crm
Β 
English presentation
English presentationEnglish presentation
English presentation
Β 
.
..
.
Β 
Maker Workshop 7 May 2014 - StudioX
Maker Workshop 7 May 2014 - StudioXMaker Workshop 7 May 2014 - StudioX
Maker Workshop 7 May 2014 - StudioX
Β 
bang chu cai tieng nhat
bang chu cai tieng nhatbang chu cai tieng nhat
bang chu cai tieng nhat
Β 
Hack & Go! Redefining API @ MOPCON 2014
Hack & Go!  Redefining API @ MOPCON 2014Hack & Go!  Redefining API @ MOPCON 2014
Hack & Go! Redefining API @ MOPCON 2014
Β 

Γ„hnlich wie MapReduce and Hadoop

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
Β 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
Β 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchEdward Capriolo
Β 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
Β 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
Β 
Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014
Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014
Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014soujavajug
Β 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
Β 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
Β 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
Β 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
Β 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
Β 
[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache Spark[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache SparkNaukri.com
Β 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAsLuis Marques
Β 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
Β 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
Β 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
Β 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
Β 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime
Β 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
Β 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
Β 

Γ„hnlich wie MapReduce and Hadoop (20)

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Β 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Β 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
Β 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
Β 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
Β 
Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014
Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014
Hadoop - Introduction to map reduce programming - ReuniΓ£o 12/04/2014
Β 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Β 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
Β 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Β 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Β 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Β 
[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache Spark[@NaukriEngineering] Apache Spark
[@NaukriEngineering] Apache Spark
Β 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
Β 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
Β 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
Β 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
Β 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Β 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Β 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Β 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Β 

KΓΌrzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
Β 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraΓΊjo
Β 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Β 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Β 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Β 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Β 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
Β 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Β 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Β 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Β 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Β 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
Β 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
Β 

KΓΌrzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Β 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Β 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Β 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Β 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Β 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Β 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Β 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Β 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Β 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Β 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Β 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Β 

MapReduce and Hadoop

  • 1. MapReduce and Hadoop Cadenelli Nicola Datenbanken Implementierungstechniken
  • 2. Introduction ● History ● Motivations MapReduce ● What MapReduce is ● Why it is usefull ● Execution Details ● Some Examples ● Conclusions Outline Hadoop ● Introduction ● Hadoop Architecture ● Hadoop Ecosystem ● In real world MapReduce&Databases ● SQL-MapReduce ● In-Database Map-Reduce ● Conclusions Introduction MapReduce Hadoop MR&Databases ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 3. GFS MapReduce BigTable HDFS MapReduce Introduction MapReduce Hadoop MR&Databases β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 4. 2004: Google publishes the papers 2006: Apache releases Hadoop. Is the first Open Source implementation of GFS and MapReduce. Now: Amazon, AOL, eBay, Facebook, HP, IBM, Last.fm, LinkedIn, Microsoft, Spotify, Twitter and more are using Hadoop. A Brief History Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 5. ● Data start to be really big: more than >10TB. E.g: Large Synoptic Survey Telescope (30TB / night) ● The best idea is to scale out (not scale up) the system, but . . . ξ€Œ How do we scale to more than 1000+ machines? ξ€Œ How do we handle machine failures? ξ€Œ How can we facilitate communications between nodes? ξ€Œ If we change system, do we lose all our optimisation work? ● Google needed to recreate the index of the web. Motivations Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 6. β€œMapReduce is a programming model and an associated implementation for processing and generating large data sets.” – Google, Inc. MapReduce paper, 2004. It is a really simple API that has just two serial functions, map() and reduce() and is language independent (Java, Python, Perl …). What is MapReduce? Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 7. MapReduce hides messy details in the runtime library: ● Parallelization and Distribution ● Load balancing ● Network and disk transfer optimization ● Handling of machine failures ● Fault tolerance ● Monitoring & status updates All users obtain benefits from improvements on the core library. Why is MapReduce useful? Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 8. 1. Read a lot of data 2. Map: extract something we care about from each record 3. Shuffle and Sort 4. Reduce: aggregate, summarize, filter, or transform 5. Write the results From an outside view is the same (read, elaborate, write), map and reduce change to fit the problem. Typical problem solved by MapReduce Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 9. ● Single master controls job execution on multiple slaves. ● Mappers preferentially placed on same node or same rack as their input block β†’ minimizes network usage!!! ● Mappers save outputs to local disk before serving them to reducers. ● If a map or reduce crashes: Re-execute! ● Allows having more mappers and reducers than nodes. Some Execution Details Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 10. Execution overview Google, Inc. MapReduce paper, 2004. Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 11. Programmer has to write two primary methods: map (k1,v1) β†’ list(k2,v2) reduce (k2,list(v2)) β†’ list(k2,v2) ● All v' with the same k' are reduced together, in order. ● The input keys and values are drawn from a different domain than the output keys and values. MapReduce Programming Model Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 12. map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Example: Words Frequency β€œdocumentx”, β€œTo be or not to be” β€œbe”, 2 β€œnot”, 1 β€œor”, 1 β€œto”, 2 Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 13. β€œdocument1”, β€œTo be or not to be” β€œbe”, 2 β€œnot”, 1 β€œor”, 1 β€œto”, 2 ... β€œto”, 1 β€œbe”, 1 β€œor”, 1 β€œnot”, 1 β€œto”, 1 β€œbe”, 1 key = β€œbe” values = β€œ1”,”1” key = β€œnot” values = β€œ1” key = β€œor” values = β€œ1” key = β€œto” values = β€œ1”,”1” ...β€œdocument2”, β€œtext” ... ... β€œbe”, 1 β€œbe”, 1 ... β€œnot”, 1 ... β€œor”, 1 ... β€œto”, 1 β€œto”, 1 ... ShuffleandSort:aggregatevaluesbykey Map Reduce Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 14. ● Inverted index - Find what documents contain a specific word. - Map: parse document, emit <word, document-ID> pairs. - Reduce: for each word, sort the corresponding document Ids. Emit <word, list(document-ID)> β€’ Reverse web-link graph - Find where page links come from. - Map: output <target, source> for each link to target in a page source. - Reduce: concatenate the list of all source URLs associated with a target. Emit <target, list(source)> Others examples Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 15. ● Proven to be a useful abstraction ● Really simplifies large-scala computations ● Fun to use: - Focus on problem - Let the library deal with messy details Conclusions on MapReduce Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 16. GFS MapReduce HDFS MapReduce Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 17. ● Is a framework for distributed processing ● It is Open Source (Apache v2 Licence) ● It is a top-level Apache Project ● Written in Java ● Batch processing centric ● Runs on commodity hardware What is Hadoop? Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 18. Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ Hadoop Distributed File System ● For very large files: TBs, PBs. ● Each file is partitioned into chunks of 64MB. ● Each chunk is replicated several times (>=3), on different racks, for fault tolerance. ● Is an abstract FS, disks are formatted on ext3, ext4 or XFS.
  • 19. Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ Hadoop Architecture ● TaskTracker is the MapReduce server (processing part) ● DataNode is the HDFS server (data part) TaskTracker DataNode Machine
  • 20. Hadoop Architecture - Master/Slave TaskTracker DataNode JobTracker: ● Accepts users' jobs ● Assigns tasks to workers ● Keeps track of the jobs status TaskTracker DataNode TaskTracker DataNode JobTracker Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 21. Hadoop Architecture - Master/Slave TaskTracker DataNode NameNode: ● Keeps information on data location ● Decides where a file has to be written TaskTracker DataNode TaskTracker DataNode NameNode Data never flows trough the NameNode! Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 22. Hadoop Architecture – Scalable TaskTracker DataNode Machine ● Having multiple machine with Hadoop creates a cluster. ● What If we need more storage or compute power? TaskTracker DataNode Machine TaskTracker DataNode Machine Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 23. Hadoop Architecture - Overview B C Client JobTracker NameNode Secondary NameNode A File Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 24. Hadoop Ecosystem – Pig & Hive MapReduce HDFS Pig Hive Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 25. Hadoop Ecosystem – HBase MapReduce HDFS Pig Hive HBase Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 26. @Google ● Index construction for Google Search ● Article clustering for Google News ● Statistical machine translation @Yahoo! (4100 nodes) ● β€œWeb map” powering Yahoo! Search ● Spam detection for Yahoo! Mail @Facebook (>100 PB of storage) ● Data mining ● Ad optimization ● Spam detection What is MapReduce/Hadoop used for? Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 27. MapReduce's use of input files and lack of schema support prevents the performance improvements enabled by features like B-trees and hash partitioning . . . . . . most of the data in companies are stored on databases! but . . . Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 28. ● SQL-MapReduce by Teradata Aster ● In-Database Map-Reduce by Oracle ● Connectors to allow external Hadoop programs to access data from databases and to store Hadoop output in databases Solutions Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 29. Is a framework to allow developers to write SQL- MapReduce functions in languages such as Java, C#, Python and C++ and push them into the database for advanced in-database analytics. SQL-MapReduce Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹ β—‹
  • 30. MR functions can be used like custom SQL operators and can implement any algorithm or transformation. SQL-MapReduce - Syntax http://www.asterdata.com/resources/mapreduce.php Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹ β—‹
  • 31. Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR SELECT key AS word, value AS wordcount FROM WordCountReduce ( ON Tokenize ( ON blogs ) PARTITION BY key ) ORDER BY wordcount DESC LIMIT 20; Example: Words Frequency Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹
  • 32. Demo #1: Map (Tokenization) and Reduce (WordCount) in SQL/MR SELECT key AS word, value AS wordcount FROM WordCountReduce ( ON Tokenize ( ON blogs ) PARTITION BY key ) ORDER BY wordcount DESC LIMIT 20; Demo #2: Why do Reduce when we have SQL? SELECT word, count(*) AS wordcount FROM Tokenize ( ON blogs ) GROUP BY word ORDER BY wordcount DESC LIMIT 20; Example: Words Frequency Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹ β—‹
  • 33. ● Uses Table Functions to implement Map-Reduce within the database. ● Parallelization is provided by the Oracle Parallel Execution framework. Using this in combination with SQL, Oracle provides an simple mechanism for database developers to develop Map-Reduce functionality using languages they know. In-Database Map-Reduce by Oracle Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹ β—‹
  • 34. SELECT * FROM table(oracle_map_reduce.reducer( cursor( SELECT value(map_result).word word FROM table(oracle_map_reduce.mapper( cursor( SELECT a FROM documents), ' ' ) ) map_result ) )); Example: Words Frequency Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹ β—‹
  • 35. However this solutions are not source compatible with Hadoop. Native Hadoop programs need to be rewritten before becoming usable in databases. Still not perfect! Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ● β—‹
  • 36. Questions? Introduction MapReduce Hadoop MR&Databases β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ β—‹ ●