Cloud jpl

Cloud Computing
i
Hadoop
X JPL
Barcelona, 01/07/2011

Marc de Palol
@lant

Els dos són sistemes distribuïts

“A distributed system is one in which the failure
of a computer you didn't even know existed can
render your own computer unusable”
Leslie Lamport

Els dos són sistemes distribuïts

“A distributed system is one in which the failure
of a computer you didn't even know existed can
render your own computer unusable”
Leslie Lamport

“A distributed system consists of multiple
autonomous computers that communicate
through a computer network.”
Wikipedia

Hadoop

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat

OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.

Hadoop

●
Nutch

●
Lucene

●
Hadoop

●
Avro

Hadoop

“Flexible infrastructure for large scale
computational and data processing on
a network of commodity hardware”

Parand Tony Darugar

Map & Reduce

Map :

V = [ 1 , 2 , 3 , 4 , 5 ]
Def quadrat( x ) = x * x;

Map ( V, quadrat ) =
For (var v : V) {
Output quadrat(v);
}
}

[1, 4, 9, 16, 25]

Map & Reduce

Map : Reduce :

V = [ 1 , 2 , 3 , 4 , 5 ] V = [ 1 , 4 , 9 , 16 , 25 ]
Def quadrat( x ) = x * x;

Map ( V, quadrat ) = Reduce ( V ) =
For (var v : V) { Var acum = 0;
output quadrat(v); For (var v : V) {
} acum = acum + v
} }
}

[1, 4, 9, 16, 25] 55

Hadoop DFS

The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

19th ACM Symposium on Operating Systems Principles,
Lake George, NY, October, 2003.

●
Dissenyat per Big Data ●
Des de fa poc permet 'append'
●
Write Once, Read Many ●
No pot ser muntat al SO
●
Datanode per màquina ●
Lectura seqüencial
●
Un Name Node per cluster (SPOAD) ●
Estable i robust
●
Tolerància a errors HW ●
Estable i robust
●
Replica Rack Aware ●
Estable i robust

Exemple
DFS

Mapper
Entrada: [ “paraula1”, “paraula2”,
“paraula3”, “paraula1” ]

Sortida: [
“paraula1” : 2,
“paraula2” : 1,
“paraula3” : 1
]

Exemple
DFS

“paraula1” : [ 2, x, y]
2 del mapper 1
x del mapper 2
y del mapper 3

“paraula2” : [ x, z, w]
x del mapper 1
z del mapper 2
w del mapper 3

“paraula3” : [ ... ]

Exemple
DFS

“paraula1”:x
“paraula2”:y
“paraula1” ∑ “paraula3”:z
...

“paraula2” ∑

“paraula3” ∑

Exemple de codi

public static class Map extends Mapper<LongWritable, Text, Text,
IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
Context context) {

String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

Exemple de codi

public static class Reduce extends Reducer<Text, IntWritable,
Text, IntWritable> {

public void reduce(Text key,
Iterable<IntWritable> values, Context context) {

int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}

Exemple de codi

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = new Job(conf, "wordcount");

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);
}

Workflow

DB

LOGS

HDFS

DB

NoSQL

Interessats ?

Per provar Hadoop:

http://www.cloudera.com ► Downloads
http://hadoop.apache.org

Grup d'usuaris de Hadoop i escalabilitat a nivell
nacional:

https://groups.google.com/group/spain-scalability-users

Grups al LinkedIn:

Hadoop España
Hive España

Preguntes ?

Marc de Palol
marc.de.palol@gmail.com
@lant

Cloud jpl

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Cloud jpl

Ähnlich wie Cloud jpl (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cloud jpl