SlideShare a Scribd company logo
1 of 74
Download to read offline
http://www.catehuston.com/blog/2009/11/02/touchgraph/
Hadoop MapReduce デザ
インパターン
——MapReduceによる大規
模テキストデータ処理

1 Jimmy Lin, Chris Dyer�著、神
  林 飛志、野村 直之�監修、玉川
  竜司�訳
2 2011年10月01日 発売予定
3 210ページ
4 定価2,940円
Shuffle &
     barrier




    job start/
     shutdown
i                i+1
1
        B                   E

    5           1
                        4
A                   D               G
        3
            3           2
                                4

        C           5       F
5               1
            B                   E
    5               1
                        3   4
A                       D               G
            3
                3           2                       5!4               min(6,4)
                                    4                             1
                                                     B                     E
            C           5       F
                                            5                 1
                        i                                         3   4
                                        A                         D                G
                                                    3
                                                          3            2
                                                                               4
                                                3                          2
                                                    C             5        F

                                                              i+1
a super step




         http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
.
.
.
a super step
a super step
1
        B                    E
    5            1
                         4
A                    D               G
        3
             3           2
                                 4

        C            5       F

            initialize
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                    end
class ShortestPathMapper(Mapper)
  def map(self, node_id, Node):
    # send graph structure
    emit node_id, Node
    # get node value and add it to edge distance
    dist = Node.get_value()
    for neighbour_node_id in Node.get_adjacency_list():
      dist_to_nbr = Node.get_distance(
                             node_id, neighbour_node_id )
      emit neighbour_node_id, dist + dist_to_nbr
class ShortestPathReducer(Reducer):
    def reduce(self, node_id, dist_list):
      min_dist = sys.maxint
      for dist in dist_list:
        # dist_list contains a Node
        if is_node(dist):
          Node = dist
        elif dist < min_dist:
          min_dist = dist
      Node.set_value(min_dist)
"    emit node_id, Node
# In-Mapper Combiner
class ShortestPathMapper(Mapper):
  def __init__(self):
     self.buffer = {}

  def check_and_put(self, key, value):
    if key not in self.buffer or value < self.buffer[key]:
      self.buffer[key] = value

  def check_and_emit(self):
    if is_exceed_limit_buffer_size(self.buffer):
      for key, value in self.buffer.items():
         emit key, value
      self.buffer = {}

  def close(self):
    for key, value in self.buffer.items():
      emit key, value
#...continue
  def map(self, node_id, Node):
    # send graph structure
    emit node_id, Node
    # get node value and add it to edge distance
    dist = Node.get_value()
    for nbr_node_id in Node.get_adjacency_list():
      dist_to_nbr = Node.get_distance(node_id, nbr_node_id)
      dist_nbr = dist + dist_to_nbr
      check_and_put(nbr_node_id, dist_nbr)
      check_and_emit()
# Shimmy trick
class ShortestPathReducer(Reducer):
  def __init__(self):
    P.open_graph_partition()


  def emit_precede_node(self, node_id):
    for pre_node_id, Node in P.read():
      if node_id == pre_node_id:
        return Node
      else:
        emit pre_node_id, Node
#(...continue)
  def reduce(node_id, dist_list):
    Node = self.emit_precede_node(node_id)
    min_dist = sys.maxint
    for dist in dist_list:
      if dist < min_dist:
        min_dist = dist
    Node.set_value(min_dist)
    emit node_id, Node
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
+∞ B             1           +∞
                             E
    5           1
0                   +∞   4            +∞
A                   D                 G
        3
            3            2
                                  4

    +∞ C            5        F   +∞
                    1
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
5               1           +∞
            B                   E
    5               1
0                       3   4            +∞
A                       D                G
            3
                3           2
                                     4

    +∞ C                5       F   +∞
                        2
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           6
            B                   E
    5               1
0                       3   4           +∞
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        3
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        4
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        4
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        5
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                        5
4               1           5
            B                   E
    5               1
0                       3   4           9
A                       D               G
            3
                3           2
                                    4

    6       C           5       F   5
                    end
class ShortestPathVertex:
  def compute(self, msgs):
    min_dist = 0 if self.is_source() else sys.maxint;
    # get values from all incoming edges.
    for msg in msgs:
      min_dist = min(min_dist, msg.get_value())
    if min_dist < self.get_value():
      # update current value(state).
   " self.set_current_value(min_dist)
      # send new value to outgoing edge.
      out_edge_iterator = self.get_out_edge_iterator()
      for out_edge in out_edge_iterator:
        recipient =
            out_edge.get_other_element(self.get_id())
        self.send_massage(recipient.get_id(),
                             min_dist + out_edge.get_distance() )
    self.vote_to_halt()
Pregel
Science and Technology), South Korea             edwardyoon@apache.org                  Science and Technology), South Korea
          swseo@calab.kaist.ac.kr                                                                jaehong@calab.kaist.ac.kr

           Seongwook Jin                                 Jin-Soo Kim                                   Seungryoul Maeng
     Computer Science Division       School of Information and Communication      Computer Science Division
KAIST (Korea Advanced Institute of    Sungkyunkwan University, South Korea   KAIST (Korea Advanced Institute of
Science and Technology), South Korea            jinsookim@skku.edu           Science and Technology), South Korea
       swjin@calab.kaist.ac.kr                                                      maeng@calab.kaist.ac.kr



   Abstract—APPLICATION. Various scientific computations                                    HAMA API
have become so complex, and thus computation tools play an                       HAMA Core                 HAMA Shell
important role. In this paper, we explore the state-of-the-art
framework providing high-level matrix computation primitives                                                            Computation Engine
with MapReduce through the case study approach, and demon-              MapReduce            BSP            Dryad       (Plugged In/Out)
strate these primitives with different computation engines to
show the performance and scalability. We believe the opportunity                           Zookeeper                    Distributed Locking
for using MapReduce in scientific computation is even more
promising than the success to date in the parallel systems
literature.                                                              HBase
                                                                                                                        Storage Systems
                                                                             HDFS                       RDBMS
                      I. I NTRODUCTION                                                        File

   As cloud computing environment emerges, Google has
                                                                                 Fig. 1.    The overall architecture of HAMA.
introduced the MapReduce framework to accelerate parallel
                                                                                                 http://wiki.apache.org/hama/Articles
and distributed computing on more than a thousand of in-
expensive machines. Google has shown that the MapReduce
framework is easy to use and provides massive scalability             HAMA is a distributed framework on Hadoop for massive
with extensive fault tolerance [2]. Especially, MapReduce fits      matrix and graph computations. HAMA aims at a power-
well with complex data-intensive computations such as high-        ful tool for various scientific applications, providing basic
dimensional scientific simulation, machine learning, and data       primitives for developers and researchers with simple APIs.
mining. Google and Yahoo! are known to operate dedicated           HAMA is currently being incubated as one of the subprojects
clusters for MapReduce applications, each cluster consisting       of Hadoop by the Apache Software Foundation [10].
of several thousands of nodes. One of typical MapReduce               Figure 1 illustrates the overall architecture of HAMA.
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)

More Related Content

Viewers also liked

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Takahiro Inoue
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joinsShalish VJ
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理Makoto Yui
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けRecruit Technologies
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントCloudera Japan
 

Viewers also liked (8)

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
ビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分けビッグデータ処理データベースの全体像と使い分け
ビッグデータ処理データベースの全体像と使い分け
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 

More from Takahiro Inoue

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2Takahiro Inoue
 

More from Takahiro Inoue (20)

トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
MongoDB Oplog入門
MongoDB Oplog入門MongoDB Oplog入門
MongoDB Oplog入門
 
MongoDB全機能解説2
MongoDB全機能解説2MongoDB全機能解説2
MongoDB全機能解説2
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Large-Scale Graph Processing〜Introduction〜(完全版)

  • 1.
  • 2.
  • 4. Hadoop MapReduce デザ インパターン ——MapReduceによる大規 模テキストデータ処理 1 Jimmy Lin, Chris Dyer�著、神 林 飛志、野村 直之�監修、玉川 竜司�訳 2 2011年10月01日 発売予定 3 210ページ 4 定価2,940円
  • 5.
  • 6.
  • 7.
  • 8. Shuffle & barrier job start/ shutdown i i+1
  • 9.
  • 10. 1 B E 5 1 4 A D G 3 3 2 4 C 5 F
  • 11. 5 1 B E 5 1 3 4 A D G 3 3 2 5!4 min(6,4) 4 1 B E C 5 F 5 1 i 3 4 A D G 3 3 2 4 3 2 C 5 F i+1
  • 12. a super step http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
  • 13. . . .
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 22.
  • 23.
  • 24. 1 B E 5 1 4 A D G 3 3 2 4 C 5 F initialize
  • 25. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 26. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 27. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 28. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 29. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 30. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 31. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 2
  • 32. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 33. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 34. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 3
  • 35. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 end
  • 36. class ShortestPathMapper(Mapper) def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for neighbour_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance( node_id, neighbour_node_id ) emit neighbour_node_id, dist + dist_to_nbr
  • 37. class ShortestPathReducer(Reducer): def reduce(self, node_id, dist_list): min_dist = sys.maxint for dist in dist_list: # dist_list contains a Node if is_node(dist): Node = dist elif dist < min_dist: min_dist = dist Node.set_value(min_dist) " emit node_id, Node
  • 38.
  • 39.
  • 40.
  • 41.
  • 42. # In-Mapper Combiner class ShortestPathMapper(Mapper): def __init__(self): self.buffer = {} def check_and_put(self, key, value): if key not in self.buffer or value < self.buffer[key]: self.buffer[key] = value def check_and_emit(self): if is_exceed_limit_buffer_size(self.buffer): for key, value in self.buffer.items(): emit key, value self.buffer = {} def close(self): for key, value in self.buffer.items(): emit key, value
  • 43. #...continue def map(self, node_id, Node): # send graph structure emit node_id, Node # get node value and add it to edge distance dist = Node.get_value() for nbr_node_id in Node.get_adjacency_list(): dist_to_nbr = Node.get_distance(node_id, nbr_node_id) dist_nbr = dist + dist_to_nbr check_and_put(nbr_node_id, dist_nbr) check_and_emit()
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. # Shimmy trick class ShortestPathReducer(Reducer): def __init__(self): P.open_graph_partition() def emit_precede_node(self, node_id): for pre_node_id, Node in P.read(): if node_id == pre_node_id: return Node else: emit pre_node_id, Node
  • 49. #(...continue) def reduce(node_id, dist_list): Node = self.emit_precede_node(node_id) min_dist = sys.maxint for dist in dist_list: if dist < min_dist: min_dist = dist Node.set_value(min_dist) emit node_id, Node
  • 50.
  • 51. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 52. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 53. +∞ B 1 +∞ E 5 1 0 +∞ 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 1
  • 54. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 55. 5 1 +∞ B E 5 1 0 3 4 +∞ A D G 3 3 2 4 +∞ C 5 F +∞ 2
  • 56. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 57. 4 1 6 B E 5 1 0 3 4 +∞ A D G 3 3 2 4 6 C 5 F 5 3
  • 58. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 4
  • 59. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 4
  • 60. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 5
  • 61. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 5
  • 62. 4 1 5 B E 5 1 0 3 4 9 A D G 3 3 2 4 6 C 5 F 5 end
  • 63. class ShortestPathVertex: def compute(self, msgs): min_dist = 0 if self.is_source() else sys.maxint; # get values from all incoming edges. for msg in msgs: min_dist = min(min_dist, msg.get_value()) if min_dist < self.get_value(): # update current value(state). " self.set_current_value(min_dist) # send new value to outgoing edge. out_edge_iterator = self.get_out_edge_iterator() for out_edge in out_edge_iterator: recipient = out_edge.get_other_element(self.get_id()) self.send_massage(recipient.get_id(), min_dist + out_edge.get_distance() ) self.vote_to_halt()
  • 64.
  • 65.
  • 66.
  • 67.
  • 69.
  • 70.
  • 71.
  • 72. Science and Technology), South Korea edwardyoon@apache.org Science and Technology), South Korea swseo@calab.kaist.ac.kr jaehong@calab.kaist.ac.kr Seongwook Jin Jin-Soo Kim Seungryoul Maeng Computer Science Division School of Information and Communication Computer Science Division KAIST (Korea Advanced Institute of Sungkyunkwan University, South Korea KAIST (Korea Advanced Institute of Science and Technology), South Korea jinsookim@skku.edu Science and Technology), South Korea swjin@calab.kaist.ac.kr maeng@calab.kaist.ac.kr Abstract—APPLICATION. Various scientific computations HAMA API have become so complex, and thus computation tools play an HAMA Core HAMA Shell important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives Computation Engine with MapReduce through the case study approach, and demon- MapReduce BSP Dryad (Plugged In/Out) strate these primitives with different computation engines to show the performance and scalability. We believe the opportunity Zookeeper Distributed Locking for using MapReduce in scientific computation is even more promising than the success to date in the parallel systems literature. HBase Storage Systems HDFS RDBMS I. I NTRODUCTION File As cloud computing environment emerges, Google has Fig. 1. The overall architecture of HAMA. introduced the MapReduce framework to accelerate parallel http://wiki.apache.org/hama/Articles and distributed computing on more than a thousand of in- expensive machines. Google has shown that the MapReduce framework is easy to use and provides massive scalability HAMA is a distributed framework on Hadoop for massive with extensive fault tolerance [2]. Especially, MapReduce fits matrix and graph computations. HAMA aims at a power- well with complex data-intensive computations such as high- ful tool for various scientific applications, providing basic dimensional scientific simulation, machine learning, and data primitives for developers and researchers with simple APIs. mining. Google and Yahoo! are known to operate dedicated HAMA is currently being incubated as one of the subprojects clusters for MapReduce applications, each cluster consisting of Hadoop by the Apache Software Foundation [10]. of several thousands of nodes. One of typical MapReduce Figure 1 illustrates the overall architecture of HAMA.