SlideShare a Scribd company logo
1 of 40
Download to read offline
R 2. MAPREDUCE BASICS

               !       "       #       $       %         &




      '())*+
        ))              '())*+
                          ))                 '())*+
                                               ))                 '())*+
                                                                    ))


     ( -   , .         / 0     / 1         ( 2     / .           , 3     / 4

     /5',67*+           /5',67*+             /5',67*+             /5',67*+


     ( -   , .               / 8           ( 2     / .           , 3     / 4

     )
     )(+969657*+       )
                       )(+969657*+         )
                                           )(+969657*+           )
                                                                 )(+969657*+

              :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED

                   (   - 2            ,    . 3               /   . 8 4



             +*@</*+               +*@</*+            +*@</*+


                G 2                  H 3                 I 8
R 2. MAPREDUCE BASICS

               !       "       #       $       %         &




      '())*+
        ))              '())*+
                          ))                 '())*+
                                               ))                 '())*+
                                                                    ))


     ( -   , .         / 0     / 1         ( 2     / .           , 3     / 4

     /5',67*+           /5',67*+             /5',67*+             /5',67*+


     ( -   , .               / 8           ( 2     / .           , 3     / 4

     )
     )(+969657*+       )
                       )(+969657*+         )
                                           )(+969657*+           )
                                                                 )(+969657*+

              :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED

                   (   - 2            ,    . 3               /   . 8 4



             +*@</*+               +*@</*+            +*@</*+


                G 2                  H 3                 I 8
11



         Input Splits




 Map                    ...     <K, V>



Shuffle                        <K, list(V)>



Reduce                  ...    <list(V)>




         Output Files
Data Processing Framework                    Distributed File System                Log Servers




                            Hadoop MapReduce
Framework Users                                                    HDFS

                  Query




                                                                                                                5
                          Result


                                               Data Processing Framework (Continuous MapReduce)
    Framework Users
Figure 1.1: Log processing with the store-first-query-later model. Apache Hadoop [3]
is used as an example.
                     Query
                                                                                     Cloud Servers
                                                                                       with Logs
                          Results
frameworks in a traditional store-first-query-later model [17]. Companies migrate log
data from the source nodes to an append-only distributed file system such as GFS [18] or
HDFS [3]. The distributed file system replicates the log data for availability and fault-
       HDFS

tolerance. Once the data is placed in the file system, users can execute queries using
bulk-processing frameworks and retrieve results from the distributed file system. Figure
1.1 illustrates this model.
  Distributed File System
each input record, and the reduc
                                                                        of values, v[], that share the same
                                                                        for queries that are either highly
                                                                        duce functions that are distribu
                                                                        gates [14]. Thus we expect that u
                                                                        MapReduce combiner, allowing
                                                                        to merge values of a single key to
                                                                        and distribute processing overhe
                                                                           !"#$%"
                                                                        biner allows iMR to process win
                                                                            !"#$"
                                                                        further reduce data volumes throu
                                                                           %#&'()*
                                              !"#$$%&!'()*+,-./01
                                                                        tion. The only non-standard (but
Figure 1: The in-situ MapReduce architecture )01201*%$$,the )*+,-."
                                               avoids                   MapReduce jobs may impleme
                                                                                    )*+,-."

cost and latency of the store-first-query-later design by %#&'()* describe in Section 2.3.2.
                                                                        we          %#&'()*
                                                                   &'(                 &'(
moving processing onto the data sources.                                   However, the primary way in w
                                                             +&,-+.#",&#/0          +&,-+.#",&#/0
                                                )*+,-."         )*+,-." that they emit a stream of results
                                                                                   )*+,-."         )*+,-."

                                                %#&'()*
speed of social network updates or accuracy of ad target- &'(
                                                                %#&'()* uous input, e.g., server log files
                                                                                   %#&'()*         %#&'()*
                                                   &'(                                &'(             &'(
ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0      cessors [7], iMR bounds comp
                                               +&,-+.#",&#/0                      +&,-+.#",&#/0   +&,-+.#",&#/0

on previous work in stream processing [5, 7, 9] to sup-                 haps infinite) data streams by pr
Map Reduce and Stream Processing
",-./#"0-1.2 !               !'( !)%   !*+   !()
  E7F/!.:7!2# "#$%&
                             3014!5
 >.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A
             %      &      &      &      B      B
   10,!#         %&     '(     '(     )%     *+     ()

                      6!7819:7-;,<./,</10<.

   10<.# C+$ =>%?@//A =>&?@//A    C%$ =>&?@//A =>B?@//A
                  CD       CD              CD       CD
          +/3,<               )+/3,<               %&+/3,<
!"#$%"                                       ",-.
                                                                                 E7F
                                  !"#$"
                                  %#&'()*
!"#$$%&!'()*+,-./01                                                              >.

)01201*%$$,      )*+,-."                    )*+,-."

                  %#&'()*                   %#&'()*
                     &'(                       &'(
                +&,-+.#",&#/0               +&,-+.#",&#/0
 )*+,-."          )*+,-."                  )*+,-."          )*+,-."

 %#&'()*          %#&'()*                   %#&'()*         %#&'()*
    &'(               &'(                    &'(               &'(          Figure
+&,-+.#",&#/0     +&,-+.#",&#/0           +&,-+.#",&#/0     +&,-+.#",&#/0   sub-wi
                                                                            have a
3: iMR nodes process local log files to produce
 dows or panes. The system assumes log records
ogical timestamp and arrive in order.

     !#5 !# & !$ 67 !#5 84 9 !4 & !$

       " %        " %   " %
      !4&!4      !#&!# !$&!$


          '(()*("+*,-".*,-")+/"0,1"02*3
  :;/0<     "    "           %     %      :;/0<
   '       !#   !$          !#    !$       =



: iMR aggregates individual panes Pi in the net-
o produce a result, the root may either combine
# Call at each hit record
map(k1, hitRecord) {
    timestamp = hitRecord.time
    # look up paneId from timestamp
    paneId = lookupPane(timestamp)
    if (paneId.endFlag == True) {
        # Notify whole data of the pane is sent
        notify(paneId)
    }
    emitIntermediate(paneId, 1, timestamp)
}

                                      Map Reduce and Stream Processing
combine(paneId, countList) {
    hitCount = 0
    for count in countList {
        hitCount += count
    }
    # Send the message to the downstream node
    emitIntermediate(paneId, hitCount)
}                                        Map Reduce and Stream Processing
# if node == root of aggregation tree
reduce(paneId ,countList) {
    hitCount = 0
    for count in countList {
        hitCount += count
    }
    sv = SlideValue.new(paneId)
    sv.hitCount = hitCount
    return sv
}                                       Map Reduce and Stream Processing
# Window   slide
init(slide) {
    rangeValue = RangeValue.new
    rangeValue.hitCount = 0
    return rangeValue
}
# Reduce
merge(rangeValue, slideValue) {
    rangeValue.hitCount += slideValue.hitCount
}
#     slide   window
unmerge(rangeValue, slideValue) {
    rangeValue.hitCount -= slideValue.hitCount
}                                 Map Reduce and Stream Processing
K-Means Clustering in Map Reduce
Figure 2: MapReduce Classifier Training and Evaluation Procedure




                                A Comparison of Approaches for Large-Scale Data Mining
Google Pregel Graph Processing
Google Pregel Graph Processing
Map Reduce ~Continuous Map Reduce Design~

More Related Content

Viewers also liked

Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Takahiro Inoue
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDBTakahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Takahiro Inoue
 
MapReduceプログラミング入門
MapReduceプログラミング入門MapReduceプログラミング入門
MapReduceプログラミング入門Satoshi Noto
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4jTakahiro Inoue
 
Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Takahiro Inoue
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joinsShalish VJ
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理Makoto Yui
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けRecruit Technologies
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントCloudera Japan
 

Viewers also liked (14)

Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDB
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)
 
MapReduceプログラミング入門
MapReduceプログラミング入門MapReduceプログラミング入門
MapReduceプログラミング入門
 
MapReduce解説
MapReduce解説MapReduce解説
MapReduce解説
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4j
 
Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 

Similar to Map Reduce ~Continuous Map Reduce Design~

Introducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business ProcessesIntroducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business ProcessesSébastien Mosser
 
Empowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public DisplayEmpowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public DisplayTetsuo Yamabe
 
OSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle ManagerOSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle ManagerSkills Matter
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018Prasun Anand
 
Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500Cisco Russia
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеYandex
 
Django101 geodjango
Django101 geodjangoDjango101 geodjango
Django101 geodjangoCalvin Cheng
 
Moosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedMoosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedHeinrich Seeger
 
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...Antonio Marcos Alberti
 
Mapredtutorial
MapredtutorialMapredtutorial
MapredtutorialAnup Mohta
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON Padma shree. T
 
Gmaps Railscamp2008
Gmaps Railscamp2008Gmaps Railscamp2008
Gmaps Railscamp2008xilinus
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphAndrew Yongjoon Kong
 
Saving Gaia with GeoDjango
Saving Gaia with GeoDjangoSaving Gaia with GeoDjango
Saving Gaia with GeoDjangoCalvin Cheng
 
Celery - A Distributed Task Queue
Celery - A Distributed Task QueueCelery - A Distributed Task Queue
Celery - A Distributed Task QueueDuy Do
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
MapfilterreducepresentationManjuKumara GH
 
Application security
Application securityApplication security
Application securitykrusty43
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop IntroductionSNEHAL MASNE
 

Similar to Map Reduce ~Continuous Map Reduce Design~ (20)

Introducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business ProcessesIntroducing Security Access Control Policies into Legacy Business Processes
Introducing Security Access Control Policies into Legacy Business Processes
 
Empowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public DisplayEmpowering End-users to Find Point-of-interests with a Public Display
Empowering End-users to Find Point-of-interests with a Public Display
 
OSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle ManagerOSGI workshop - Become A Certified Bundle Manager
OSGI workshop - Become A Certified Bundle Manager
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018
 
Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500Архитектура коммутаторов Cisco Catalyst 6500
Архитектура коммутаторов Cisco Catalyst 6500
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Django101 geodjango
Django101 geodjangoDjango101 geodjango
Django101 geodjango
 
Moosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimizedMoosecon native apps_blackberry_10-optimized
Moosecon native apps_blackberry_10-optimized
 
Hadoop
HadoopHadoop
Hadoop
 
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
Internet of Information and Services (IoIS): A Conceptual Integrative Archite...
 
Mapredtutorial
MapredtutorialMapredtutorial
Mapredtutorial
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
Gmaps Railscamp2008
Gmaps Railscamp2008Gmaps Railscamp2008
Gmaps Railscamp2008
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraph
 
Saving Gaia with GeoDjango
Saving Gaia with GeoDjangoSaving Gaia with GeoDjango
Saving Gaia with GeoDjango
 
Celery - A Distributed Task Queue
Celery - A Distributed Task QueueCelery - A Distributed Task Queue
Celery - A Distributed Task Queue
 
Mapfilterreducepresentation
MapfilterreducepresentationMapfilterreducepresentation
Mapfilterreducepresentation
 
Application security
Application securityApplication security
Application security
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 

More from Takahiro Inoue

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2Takahiro Inoue
 

More from Takahiro Inoue (20)

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
MongoDB Oplog入門
MongoDB Oplog入門MongoDB Oplog入門
MongoDB Oplog入門
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Map Reduce ~Continuous Map Reduce Design~

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. R 2. MAPREDUCE BASICS ! " # $ % & '())*+ )) '())*+ )) '())*+ )) '())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5',67*+ /5',67*+ /5',67*+ /5',67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  • 7. R 2. MAPREDUCE BASICS ! " # $ % & '())*+ )) '())*+ )) '())*+ )) '())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5',67*+ /5',67*+ /5',67*+ /5',67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  • 8. 11 Input Splits Map ... <K, V> Shuffle <K, list(V)> Reduce ... <list(V)> Output Files
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Data Processing Framework Distributed File System Log Servers Hadoop MapReduce Framework Users HDFS Query 5 Result Data Processing Framework (Continuous MapReduce) Framework Users Figure 1.1: Log processing with the store-first-query-later model. Apache Hadoop [3] is used as an example. Query Cloud Servers with Logs Results frameworks in a traditional store-first-query-later model [17]. Companies migrate log data from the source nodes to an append-only distributed file system such as GFS [18] or HDFS [3]. The distributed file system replicates the log data for availability and fault- HDFS tolerance. Once the data is placed in the file system, users can execute queries using bulk-processing frameworks and retrieve results from the distributed file system. Figure 1.1 illustrates this model. Distributed File System
  • 15.
  • 16.
  • 17.
  • 18. each input record, and the reduc of values, v[], that share the same for queries that are either highly duce functions that are distribu gates [14]. Thus we expect that u MapReduce combiner, allowing to merge values of a single key to and distribute processing overhe !"#$%" biner allows iMR to process win !"#$" further reduce data volumes throu %#&'()* !"#$$%&!'()*+,-./01 tion. The only non-standard (but Figure 1: The in-situ MapReduce architecture )01201*%$$,the )*+,-." avoids MapReduce jobs may impleme )*+,-." cost and latency of the store-first-query-later design by %#&'()* describe in Section 2.3.2. we %#&'()* &'( &'( moving processing onto the data sources. However, the primary way in w +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." that they emit a stream of results )*+,-." )*+,-." %#&'()* speed of social network updates or accuracy of ad target- &'( %#&'()* uous input, e.g., server log files %#&'()* %#&'()* &'( &'( &'( ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0 cessors [7], iMR bounds comp +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 on previous work in stream processing [5, 7, 9] to sup- haps infinite) data streams by pr
  • 19.
  • 20.
  • 21. Map Reduce and Stream Processing
  • 22.
  • 23. ",-./#"0-1.2 ! !'( !)% !*+ !() E7F/!.:7!2# "#$%& 3014!5 >.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A % & & & B B 10,!# %& '( '( )% *+ () 6!7819:7-;,<./,</10<. 10<.# C+$ =>%?@//A =>&?@//A C%$ =>&?@//A =>B?@//A CD CD CD CD +/3,< )+/3,< %&+/3,<
  • 24.
  • 25. !"#$%" ",-. E7F !"#$" %#&'()* !"#$$%&!'()*+,-./01 >. )01201*%$$, )*+,-." )*+,-." %#&'()* %#&'()* &'( &'( +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." )*+,-." )*+,-." %#&'()* %#&'()* %#&'()* %#&'()* &'( &'( &'( &'( Figure +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 sub-wi have a
  • 26. 3: iMR nodes process local log files to produce dows or panes. The system assumes log records ogical timestamp and arrive in order. !#5 !# & !$ 67 !#5 84 9 !4 & !$ " % " % " % !4&!4 !#&!# !$&!$ '(()*("+*,-".*,-")+/"0,1"02*3 :;/0< " " % % :;/0< ' !# !$ !# !$ = : iMR aggregates individual panes Pi in the net- o produce a result, the root may either combine
  • 27.
  • 28.
  • 29. # Call at each hit record map(k1, hitRecord) { timestamp = hitRecord.time # look up paneId from timestamp paneId = lookupPane(timestamp) if (paneId.endFlag == True) { # Notify whole data of the pane is sent notify(paneId) } emitIntermediate(paneId, 1, timestamp) } Map Reduce and Stream Processing
  • 30. combine(paneId, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(paneId, hitCount) } Map Reduce and Stream Processing
  • 31. # if node == root of aggregation tree reduce(paneId ,countList) { hitCount = 0 for count in countList { hitCount += count } sv = SlideValue.new(paneId) sv.hitCount = hitCount return sv } Map Reduce and Stream Processing
  • 32. # Window slide init(slide) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Reduce merge(rangeValue, slideValue) { rangeValue.hitCount += slideValue.hitCount } # slide window unmerge(rangeValue, slideValue) { rangeValue.hitCount -= slideValue.hitCount } Map Reduce and Stream Processing
  • 33.
  • 34.
  • 35.
  • 36. K-Means Clustering in Map Reduce
  • 37. Figure 2: MapReduce Classifier Training and Evaluation Procedure A Comparison of Approaches for Large-Scale Data Mining
  • 38. Google Pregel Graph Processing
  • 39. Google Pregel Graph Processing