SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
MapReduce





MapReduce
•   Google
•   Google
    •
    •                                        Map    Reduce
• Google                                           map   reduce

•                    MapReduce
    •   Map
        –     [1,2,3,4] – (*2)  [2,3,6,8]
    •   Reduce
        –     [1,2,3,4] – (sum)  10


–                              (Divide and Conquer)




                                                                  Copyright 2009 - Trend Micro Inc.
MapReduce
•   MapReduce        Google
                                                       Map   Reduce
       MapReduce
•
    – Map                      ”   ” key/value                ”   ”
      intermediate key/value
    – Reduce                            intermediate key
      intermediate values                 key/value
•                MapReduce




                                                                      Copyright 2009 - Trend Micro Inc.
•
    –
    –
    –


•

    http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/




                                                                       Copyright 2009 - Trend Micro Inc.
MapReduce
•
    – map         (K1, V1)  list(K2, V2)
    – reduce       (K2, list(V2))  list(K3, V3)

• grep
    – Map: (offset, line)  [(match, 1)]
    – Reduce: (match, [1, 1, ...])  [(match, n)]

• MapReduce                :




                                                    Copyright 2009 - Trend Micro Inc.
6
Classification   Copyright 2009 - Trend Micro Inc.
‧   ➝
‧       ➝




            Copyright 2009 - Trend Micro Inc.
Word Count




Classification   Copyright 2009 - Trend Micro Inc.
MapReduce

•               (Distributed Grep)
    –                                (pattern)

•               (Distributed Sort)
    –

•        URL               (Count of URL Access Frequency)
    –     Web                  URL




                                                     Copyright 2009 - Trend Micro Inc.
MapReduce




Classification    Copyright 2007 - Trend Micro Inc.
Hadoop      MapReduce

• Apache Hadoop      Google   MapReduce
   –              MapReduce
   –   Java
   –   Hadoop             (HDFS)
• Yahoo!
• Google, Yahoo!, IBM, Amazon          Hadoop
•          (Trend Micro)    Hadoop MapReduce




                                                Copyright 2009 - Trend Micro Inc.
Hadoop MapReduce

 •     Map/Reduce framework
         – JobTracker
         – TaskTracker
 •     JobTracker
         – Job
         –     Job            JobTracker                                Job.
 •     TaskTrackers
        •         Job




                                                                    Copyright 2009 - Trend Micro Inc.
                                Copyright 2007 - Trend Micro Inc.
Classification
Hadoop MapReduce
  class MyJob {

                 class Map {                 //    Map
                 }
                 class Reduce {             //     Reduce
                 }


                 }
                 main() {
                            //        job
                             JobConf conf = new JobConf(“MyJob.class”);
                                 conf.setInputPath(…);
                                 conf.setOutputPath(…);
                                 conf.setMapperClass(Map.class);
                                 conf.setReduceClass(Reduce.class)
                            //       Job
                                 JobClient.runJob(conf);
                 }
Classification                                           Copyright 2007 - Trend Micro Inc.
  }
•
    –
    –
    –

    –
        HDFS                                 MapReduce
•
    –
    –
         •
         •


             , GUID,       ,         ,
         1, 123, 131231231, VSAPI, open file
         2, 456, 123123123, VSAPI, connect internet




                                                      Copyright 2007 - Trend Micro Inc.
Map
•        Mapper                map()
•   Map : (K1, V1)  list(K2, V2)


map( WritableComparable, Writable,
  OutputCollector, Reporter)


•         input                                map()
•   OutputCollector               collect() method

OutputCollector.collect( WritableComparable,Writable )




                                                   Copyright 2007 - Trend Micro Inc.
Map

class MapClass extends MapReduceBase

implements Mapper<LongWritable, Text, Text, IntWritable> {

	       private final static IntWritable one = new IntWritable(1);

	       private Text hour = new Text();

	       public void map( LongWritable key, Text value,
OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException {

	             String line = ((Text) value).toString();

               String[] token = line.split(quot;,quot;);

               String timestamp = token[1];

               Calendar c = Calendar.getInstance();

               c.setTimeInMillis(Long.parseLong(timestamp));

               Integer h = c.get(Calendar.HOUR);

               hour.set(h.toString());

               output.collect(hour, one)

}}}                                    Copyright 2007 - Trend Micro Inc.
Reduce
•     Reducer                   reduce() method
• Reduce : (K2, list(V2))  list(K3, V3)

     reduce (WritableComparable, Iterator,
              OutputCollector, Reporter)



• OutputCollector            collect() method

   OutputCollector.collect( WritableComparable,Writable )




                                     Copyright 2007 - Trend Micro Inc.
Reduce

class ReduceClass extends MapReduceBase implements Reducer< Text,
IntWritable, Text, IntWritable> {

	       IntWritable SumValue = new IntWritable();

	       public void reduce( Text key, Iterator<IntWritable> values,

	       OutputCollector<Text, IntWritable> output, Reporter reporter)

	       throws IOException {

	       	       int sum = 0;

	       	       while (values.hasNext())

	       	       	        sum += values.next().get();

	       	       SumValue.set(sum);

	       	       output.collect(key, SumValue);

}}



                                      Copyright 2007 - Trend Micro Inc.
•   JobConf
    – Mapper    Reducer   Inputformat     OutputFormat Combiler Petitioner

    –
    –
    –
        • map    reduce
        •


•                         JobClient                             JobConf


	   JobClient.runJob(conf);
	   JobClient.submitJob(conf);
	   JobClient.setJobEndNotificationURI(URI);



                                        Copyright 2007 - Trend Micro Inc.
Main Function
Class MyJob{
public static void main(String[] args) {
	       JobConf conf = new JobConf(MyJob.class);
	       conf.setJobName(”Caculate feedback log time distributionquot;);
	       // set path
	       conf.setInputPath(new Path(args[0]));
	       conf.setOutputPath(new Path(args[1]));
	       // set map reduce
	       conf.setOutputKeyClass(Text.class);            // set every word as key
	       conf.setOutputValueClass(IntWritable.class); // set 1 as value
	       conf.setMapperClass(MapClass.class);
	       conf.setCombinerClass(Reduce.class);
	       conf.setReducerClass(ReduceClass.class);
	       onf.setInputFormat(TextInputFormat.class);
	       conf.setOutputFormat(TextOutputFormat.class);
	       // run
	       JobClient.runJob(conf);
}}


                                      Copyright 2007 - Trend Micro Inc.
1.
     –   javac -classpath hadoop-*-core.jar -d MyJava
         MyJob.java
2.
     –   jar –cvf MyJob.jar -C MyJava .
3.
     –   bin/hadoop jar MyJob.jar MyJob input/ output/




                                          Copyright 2007 - Trend Micro Inc.
• bin/hadoop jar MyJob.jar MyJob input/ output/




                                                                   Copyright 2009 - Trend Micro Inc.
                               Copyright 2007 - Trend Micro Inc.
Classification
Web Console
http://172.16.203.132:50030/




                                                                         Copyright 2009 - Trend Micro Inc.
                                     Copyright 2007 - Trend Micro Inc.
Classification
Hadoop MapReduce
 • Mapper                  ?
         – Mapper       Input          Input      Hadoop
                        Mapper
         –      JobConf       setNumMapTasks(int)     Hadoop
           Mapper                   Hadoop


 • Reducer                 ?
         –         JobConf     JobConf.setNumReduceTasks(int)
                 Reducer
         –       Reducer                            Reducer
                       MapReduce Map Reduce




                                                                          Copyright 2009 - Trend Micro Inc.
                                      Copyright 2007 - Trend Micro Inc.
Classification
Non-Java Interface
• Hadoop Pipes
  –   MapReduce       C++ API
  – C++             java
• Hadoop Streaming
  –               MapReduce




                                Copyright 2007 - Trend Micro Inc.
• Google MapReduce
  – http://labs.google.com/papers/mapreduce.html
• Google                   MapReduce
  – http://code.google.com/edu/submissions/mapreduce/listing.html
• Google                   MapReduce
  – http://code.google.com/edu/submissions/mapreduce-minilecture/
    listing.html
• Hadoop
  – http://hadoop.apache.org/core/




                                 Copyright 2007 - Trend Micro Inc.
•              Eclipse                 MapReduce                           (IBM
        )
    –          Eclipse          Hadoop
    –
            • http://code.google.com/edu/parallel/tools/hadoopvm/hadoop-
              eclipse-plugin.jar
• Hadoop                       (Google                      )
    –        VMware                                 Hadoop


               VMware                                     Google
    –
            • http://code.google.com/edu/parallel/tools/hadoopvm/
              index.html



                                       Copyright 2007 - Trend Micro Inc.

Weitere ähnliche Inhalte

Ähnlich wie Zh Tw Introduction To Map Reduce

Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
Functional Web Development
Functional Web DevelopmentFunctional Web Development
Functional Web DevelopmentFITC
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveEMC
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanGoogle_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanHiroshi Ono
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Zh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And HdfsZh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And Hdfskevin liao
 
GoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoGoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoConnorZanin
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)moai kids
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 

Ähnlich wie Zh Tw Introduction To Map Reduce (20)

Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Functional Web Development
Functional Web DevelopmentFunctional Web Development
Functional Web Development
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
MapReduce
MapReduceMapReduce
MapReduce
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanGoogle_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Zh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And HdfsZh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And Hdfs
 
GoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoGoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for Go
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 

Kürzlich hochgeladen

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Kürzlich hochgeladen (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Zh Tw Introduction To Map Reduce

  • 2. MapReduce • Google • Google • • Map Reduce • Google map reduce • MapReduce • Map – [1,2,3,4] – (*2)  [2,3,6,8] • Reduce – [1,2,3,4] – (sum)  10 – (Divide and Conquer) Copyright 2009 - Trend Micro Inc.
  • 3. MapReduce • MapReduce Google Map Reduce MapReduce • – Map ” ” key/value ” ” intermediate key/value – Reduce intermediate key intermediate values key/value • MapReduce Copyright 2009 - Trend Micro Inc.
  • 4. – – – • http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/ Copyright 2009 - Trend Micro Inc.
  • 5. MapReduce • – map (K1, V1)  list(K2, V2) – reduce (K2, list(V2))  list(K3, V3) • grep – Map: (offset, line)  [(match, 1)] – Reduce: (match, [1, 1, ...])  [(match, n)] • MapReduce : Copyright 2009 - Trend Micro Inc.
  • 6. 6 Classification Copyright 2009 - Trend Micro Inc.
  • 7. ➝ ‧ ➝ Copyright 2009 - Trend Micro Inc.
  • 8. Word Count Classification Copyright 2009 - Trend Micro Inc.
  • 9. MapReduce • (Distributed Grep) – (pattern) • (Distributed Sort) – • URL (Count of URL Access Frequency) – Web URL Copyright 2009 - Trend Micro Inc.
  • 10. MapReduce Classification Copyright 2007 - Trend Micro Inc.
  • 11. Hadoop MapReduce • Apache Hadoop Google MapReduce – MapReduce – Java – Hadoop (HDFS) • Yahoo! • Google, Yahoo!, IBM, Amazon Hadoop • (Trend Micro) Hadoop MapReduce Copyright 2009 - Trend Micro Inc.
  • 12. Hadoop MapReduce • Map/Reduce framework – JobTracker – TaskTracker • JobTracker – Job – Job JobTracker Job. • TaskTrackers • Job Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 13. Hadoop MapReduce class MyJob { class Map { // Map } class Reduce { // Reduce } } main() { // job JobConf conf = new JobConf(“MyJob.class”); conf.setInputPath(…); conf.setOutputPath(…); conf.setMapperClass(Map.class); conf.setReduceClass(Reduce.class) // Job JobClient.runJob(conf); } Classification Copyright 2007 - Trend Micro Inc. }
  • 14. – – – – HDFS MapReduce • – – • • , GUID, , , 1, 123, 131231231, VSAPI, open file 2, 456, 123123123, VSAPI, connect internet Copyright 2007 - Trend Micro Inc.
  • 15. Map • Mapper map() • Map : (K1, V1)  list(K2, V2) map( WritableComparable, Writable, OutputCollector, Reporter) • input map() • OutputCollector collect() method OutputCollector.collect( WritableComparable,Writable ) Copyright 2007 - Trend Micro Inc.
  • 16. Map class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text hour = new Text(); public void map( LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { String line = ((Text) value).toString(); String[] token = line.split(quot;,quot;); String timestamp = token[1]; Calendar c = Calendar.getInstance(); c.setTimeInMillis(Long.parseLong(timestamp)); Integer h = c.get(Calendar.HOUR); hour.set(h.toString()); output.collect(hour, one) }}} Copyright 2007 - Trend Micro Inc.
  • 17. Reduce • Reducer reduce() method • Reduce : (K2, list(V2))  list(K3, V3) reduce (WritableComparable, Iterator, OutputCollector, Reporter) • OutputCollector collect() method OutputCollector.collect( WritableComparable,Writable ) Copyright 2007 - Trend Micro Inc.
  • 18. Reduce class ReduceClass extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable> { IntWritable SumValue = new IntWritable(); public void reduce( Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) sum += values.next().get(); SumValue.set(sum); output.collect(key, SumValue); }} Copyright 2007 - Trend Micro Inc.
  • 19. JobConf – Mapper Reducer Inputformat OutputFormat Combiler Petitioner – – – • map reduce • • JobClient JobConf JobClient.runJob(conf); JobClient.submitJob(conf); JobClient.setJobEndNotificationURI(URI); Copyright 2007 - Trend Micro Inc.
  • 20. Main Function Class MyJob{ public static void main(String[] args) { JobConf conf = new JobConf(MyJob.class); conf.setJobName(”Caculate feedback log time distributionquot;); // set path conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); // set map reduce conf.setOutputKeyClass(Text.class); // set every word as key conf.setOutputValueClass(IntWritable.class); // set 1 as value conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(ReduceClass.class); onf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); // run JobClient.runJob(conf); }} Copyright 2007 - Trend Micro Inc.
  • 21. 1. – javac -classpath hadoop-*-core.jar -d MyJava MyJob.java 2. – jar –cvf MyJob.jar -C MyJava . 3. – bin/hadoop jar MyJob.jar MyJob input/ output/ Copyright 2007 - Trend Micro Inc.
  • 22. • bin/hadoop jar MyJob.jar MyJob input/ output/ Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 23. Web Console http://172.16.203.132:50030/ Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 24. Hadoop MapReduce • Mapper ? – Mapper Input Input Hadoop Mapper – JobConf setNumMapTasks(int) Hadoop Mapper Hadoop • Reducer ? – JobConf JobConf.setNumReduceTasks(int) Reducer – Reducer Reducer MapReduce Map Reduce Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 25. Non-Java Interface • Hadoop Pipes – MapReduce C++ API – C++ java • Hadoop Streaming – MapReduce Copyright 2007 - Trend Micro Inc.
  • 26. • Google MapReduce – http://labs.google.com/papers/mapreduce.html • Google MapReduce – http://code.google.com/edu/submissions/mapreduce/listing.html • Google MapReduce – http://code.google.com/edu/submissions/mapreduce-minilecture/ listing.html • Hadoop – http://hadoop.apache.org/core/ Copyright 2007 - Trend Micro Inc.
  • 27. Eclipse MapReduce (IBM ) – Eclipse Hadoop – • http://code.google.com/edu/parallel/tools/hadoopvm/hadoop- eclipse-plugin.jar • Hadoop (Google ) – VMware Hadoop VMware Google – • http://code.google.com/edu/parallel/tools/hadoopvm/ index.html Copyright 2007 - Trend Micro Inc.