SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Cloud Computing
        i
    Hadoop
           X JPL
   Barcelona, 01/07/2011



    Marc de Palol
       @lant
Qui sóc ?
Qui sóc ?
Qui sóc ?
Qui sóc ?
Qui sóc ?
Qui sóc ?
Grid Computing vs Cloud
Grid Computing vs Cloud
Els dos són sistemes distribuïts


   “A distributed system is one in which the failure
   of a computer you didn't even know existed can
   render your own computer unusable”
                                          Leslie Lamport
Els dos són sistemes distribuïts


   “A distributed system is one in which the failure
   of a computer you didn't even know existed can
   render your own computer unusable”
                                          Leslie Lamport



   “A distributed system consists of multiple
   autonomous computers that communicate
   through a computer network.”
                                              Wikipedia
Cloud
Cloud
Hadoop
Hadoop




   MapReduce: Simplified Data Processing on Large Clusters
   Jeffrey Dean and Sanjay Ghemawat

   OSDI'04: Sixth Symposium on Operating System Design and Implementation,
   San Francisco, CA, December, 2004.
Hadoop
Hadoop
Hadoop


         ●
             Nutch

         ●
             Lucene

         ●
             Hadoop

         ●
             Avro
Hadoop


  “Flexible infrastructure for large scale
  computational and data processing on
  a network of commodity hardware”

                       Parand Tony Darugar
Hadoop


  “Flexible infrastructure for large scale
  computational and data processing on
  a network of commodity hardware”

                       Parand Tony Darugar
Hadoop


  “Flexible infrastructure for large scale
  computational and data processing on
  a network of commodity hardware”

                       Parand Tony Darugar
Map & Reduce

Map :

V = [ 1 , 2 , 3 , 4 , 5 ]
Def quadrat( x ) = x * x;

Map ( V, quadrat ) =
  For (var v : V) {
    Output quadrat(v);
  }
}


[1, 4, 9, 16, 25]
Map & Reduce

Map :                       Reduce :

V = [ 1 , 2 , 3 , 4 , 5 ]   V = [ 1 , 4 , 9 , 16 , 25 ]
Def quadrat( x ) = x * x;

Map ( V, quadrat ) =        Reduce ( V ) =
  For (var v : V) {           Var acum = 0;
    output quadrat(v);        For (var v : V) {
  }                                acum = acum + v
}                             }
                            }

[1, 4, 9, 16, 25]           55
Hadoop DFS

         The Google File System
         Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung


         19th ACM Symposium on Operating Systems Principles,
         Lake George, NY, October, 2003.



 ●
     Dissenyat per Big Data             ●
                                            Des de fa poc permet 'append'
 ●
     Write Once, Read Many              ●
                                            No pot ser muntat al SO
 ●
     Datanode per màquina               ●
                                            Lectura seqüencial
 ●
     Un Name Node per cluster (SPOAD)   ●
                                            Estable i robust
 ●
     Tolerància a errors HW             ●
                                            Estable i robust
 ●
     Replica Rack Aware                 ●
                                            Estable i robust
Exemple
 DFS
Exemple
 DFS


          Mapper
          Entrada: [ “paraula1”, “paraula2”,
                     “paraula3”, “paraula1” ]

          Sortida: [
               “paraula1” : 2,
               “paraula2” : 1,
               “paraula3” : 1
          ]
Exemple
          DFS


                “paraula1” : [ 2, x, y]
                 2 del mapper 1
                 x del mapper 2
                 y del mapper 3

                “paraula2” : [ x, z, w]
                 x del mapper 1
                 z del mapper 2
                 w del mapper 3

                “paraula3” : [ ... ]
Exemple
 DFS


                        “paraula1”:x
                        “paraula2”:y
       “paraula1”   ∑   “paraula3”:z
                        ...



       “paraula2”   ∑



       “paraula3”   ∑
Exemple de codi


public static class Map extends Mapper<LongWritable, Text, Text,
       IntWritable> {

       private final static IntWritable one = new IntWritable(1);
       private Text word = new Text();

       public void map(LongWritable key, Text value,
       Context context) {

           String line = value.toString();
           StringTokenizer tokenizer = new StringTokenizer(line);
           while (tokenizer.hasMoreTokens()) {
               word.set(tokenizer.nextToken());
               context.write(word, one);
           }
       }
   }
Exemple de codi


 public static class Reduce extends Reducer<Text, IntWritable,
       Text, IntWritable> {

       public void reduce(Text key,
           Iterable<IntWritable> values, Context context) {

           int sum = 0;
           for (IntWritable val : values) {
               sum += val.get();
           }
           context.write(key, new IntWritable(sum));
       }
   }
Exemple de codi

public static void main(String[] args) throws Exception {

       Configuration conf = new Configuration();

       Job job = new Job(conf, "wordcount");

       job.setOutputKeyClass(Text.class);
       job.setOutputValueClass(IntWritable.class);

       job.setMapperClass(Map.class);
       job.setReducerClass(Reduce.class);

       job.setInputFormatClass(TextInputFormat.class);
       job.setOutputFormatClass(TextOutputFormat.class);

       FileInputFormat.addInputPath(job, new Path(args[0]));
       FileOutputFormat.setOutputPath(job, new Path(args[1]));

       job.waitForCompletion(true);
   }
Workflow

   DB




  LOGS



           HDFS




   DB



 NoSQL
Qui ho utilitza?
Qui ho utilitza?
Ecosistema Hadoop
Ecosistema Hadoop
Comunitat Hadoop




Suport:
Interessats ?

 Per provar Hadoop:

    http://www.cloudera.com ► Downloads
    http://hadoop.apache.org

 Grup d'usuaris de Hadoop i escalabilitat a nivell
 nacional:

    https://groups.google.com/group/spain-scalability-users

 Grups al LinkedIn:

    Hadoop España
    Hive España
Preguntes ?



     Marc de Palol
marc.de.palol@gmail.com
         @lant

Weitere ähnliche Inhalte

Was ist angesagt?

Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
MongoDB
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
Cdiscount
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
Dr. Christian Betz
 

Was ist angesagt? (20)

Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
 
User biglm
User biglmUser biglm
User biglm
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Queuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL ServerQueuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL Server
 
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Key-Value RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Python for R users
Python for R usersPython for R users
Python for R users
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 

Andere mochten auch

Andere mochten auch (8)

No bid left behind
No bid left behindNo bid left behind
No bid left behind
 
Competing to be unique
Competing to be uniqueCompeting to be unique
Competing to be unique
 
There Are Literally Thousands of Erlang Projects
There Are Literally Thousands of Erlang ProjectsThere Are Literally Thousands of Erlang Projects
There Are Literally Thousands of Erlang Projects
 
Hfile
HfileHfile
Hfile
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
State of the art introduction
State of the art introductionState of the art introduction
State of the art introduction
 
Erlang containers
Erlang containersErlang containers
Erlang containers
 
Netty from the trenches
Netty from the trenchesNetty from the trenches
Netty from the trenches
 

Ähnlich wie Cloud jpl

Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
Jacky Chu
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
gothicane
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
rhatr
 

Ähnlich wie Cloud jpl (20)

Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Hadoop
HadoopHadoop
Hadoop
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's Group
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startups
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Cloud jpl

  • 1. Cloud Computing i Hadoop X JPL Barcelona, 01/07/2011 Marc de Palol @lant
  • 10. Els dos són sistemes distribuïts “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” Leslie Lamport
  • 11. Els dos són sistemes distribuïts “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” Leslie Lamport “A distributed system consists of multiple autonomous computers that communicate through a computer network.” Wikipedia
  • 12. Cloud
  • 13. Cloud
  • 15. Hadoop MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.
  • 18. Hadoop ● Nutch ● Lucene ● Hadoop ● Avro
  • 19. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar
  • 20. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar
  • 21. Hadoop “Flexible infrastructure for large scale computational and data processing on a network of commodity hardware” Parand Tony Darugar
  • 22. Map & Reduce Map : V = [ 1 , 2 , 3 , 4 , 5 ] Def quadrat( x ) = x * x; Map ( V, quadrat ) = For (var v : V) { Output quadrat(v); } } [1, 4, 9, 16, 25]
  • 23. Map & Reduce Map : Reduce : V = [ 1 , 2 , 3 , 4 , 5 ] V = [ 1 , 4 , 9 , 16 , 25 ] Def quadrat( x ) = x * x; Map ( V, quadrat ) = Reduce ( V ) = For (var v : V) { Var acum = 0; output quadrat(v); For (var v : V) { } acum = acum + v } } } [1, 4, 9, 16, 25] 55
  • 24. Hadoop DFS The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003. ● Dissenyat per Big Data ● Des de fa poc permet 'append' ● Write Once, Read Many ● No pot ser muntat al SO ● Datanode per màquina ● Lectura seqüencial ● Un Name Node per cluster (SPOAD) ● Estable i robust ● Tolerància a errors HW ● Estable i robust ● Replica Rack Aware ● Estable i robust
  • 26. Exemple DFS Mapper Entrada: [ “paraula1”, “paraula2”, “paraula3”, “paraula1” ] Sortida: [ “paraula1” : 2, “paraula2” : 1, “paraula3” : 1 ]
  • 27. Exemple DFS “paraula1” : [ 2, x, y] 2 del mapper 1 x del mapper 2 y del mapper 3 “paraula2” : [ x, z, w] x del mapper 1 z del mapper 2 w del mapper 3 “paraula3” : [ ... ]
  • 28. Exemple DFS “paraula1”:x “paraula2”:y “paraula1” ∑ “paraula3”:z ... “paraula2” ∑ “paraula3” ∑
  • 29. Exemple de codi public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }
  • 30. Exemple de codi public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }
  • 31. Exemple de codi public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); }
  • 32. Workflow DB LOGS HDFS DB NoSQL
  • 38. Interessats ? Per provar Hadoop: http://www.cloudera.com ► Downloads http://hadoop.apache.org Grup d'usuaris de Hadoop i escalabilitat a nivell nacional: https://groups.google.com/group/spain-scalability-users Grups al LinkedIn: Hadoop España Hive España
  • 39. Preguntes ? Marc de Palol marc.de.palol@gmail.com @lant