SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
PREGEL
A	
  Systems	
  for	
  Large	
  Scale	
  Graph	
  Processing

            by Iuliia Proskurnia



                                                                   ber 2012
                                                               ovem
                                                      6 th of M
GRAPHS	
  ARE	
  EVERYWHERE
          Graphs	
  Examples




                  2
GRAPHS	
  ARE	
  EVERYWHERE
           Algorithms




               3
Outline

Mo#va#on

Basic	
  Concepts	
  in	
  design

Implementa#on	
  Details

Evalua#on

Conclusions


                                    4
MOTIVATION
  Large	
  Graph	
  Processing




   No Such System Exist




                                     Single Machine
                                       Algorithms


                                 Parallel Solution




               5
MAPREDUCE	
  LIKE	
  SOLUTION
          MapReduce	
  is	
  great	
  :)




                        6
MAPREDUCE	
  LIKE	
  SOLUTION
          MapReduce	
  is	
  great	
  :)




                                           BUT




                        6
MAPREDUCE	
  LIKE	
  SOLUTION
          MapReduce	
  is	
  great	
  :)




                                           BUT




                                            Pregel




                        6
Inspira@on
Valiant’s	
  Bulk	
  Synchronous	
  
      Parallel	
  Model




                                       7
BASIC	
  DESIGN	
  CONCEPTS




             8
VERTEX-­‐CENTRIC	
  APPROACH
         Uses	
  supersteps	
  for	
  computa@on



                                              Send/Receive Messages
                                                   Change the state
                                                   Modify topology

    VERTEX
                                                   Termination?




                           9
MAXIMUM	
  VALUE	
  EXAMPLE
          Chicken	
  Chicken




                               • Dotted Arrows -
                                    messages

                               •   Grey Nodes -
                                    InActive




                 10
API	
  	
  	
  	
  DETAILS
         Combiners




            11
API	
  	
  	
  	
  DETAILS
         Aggregators




             12
IMPLEMENTATION	
  DETAILS




            13
IMPLEMENTATION
    Master	
  is	
  chosen




             14
IMPLEMENTATION
    Master	
  is	
  chosen




                                Cluster
                              Management
                             System’s name
                                 service



             14
IMPLEMENTATION
     Par@@on

                  hash(VertexID) mod R




            hash(VertexID) mod R




                 hash(VertexID) mod R




       15
IMPLEMENTATION
    Reading	
  the	
  input




              16
IMPLEMENTATION
               Reading	
  the	
  input




  GFS,
BigTable


                         16
IMPLEMENTATION
     SuperStep




        17
IMPLEMENTATION
     SuperStep




        17
IMPLEMENTATION
                         SuperStep




 Termination
if (all VoteToHalt) {
   terminate();}

                            17
IMPLEMENTATION
      Saving	
  the	
  results




   Save graph state




                18
FAULT-­‐TOLERANCE
   CheckPoin@ng.	
  Chicken	
  Chicken.




                   19
FAULT-­‐TOLERANCE
   CheckPoin@ng.	
  Chicken	
  Chicken.




                   19
EVALUATION




    20
EVALUATION
 Number	
  of	
  Worker	
  Tasks




                                      300 PCs
                                     Multicore
                                   Billion Vertices
                                    Binary Tree




               21
EVALUATION
  Number	
  of	
  Ver@ces




                               300 PCs
                              Multicore
                             Tree with 127
                            average node
                               degree




            22
Conclusion

Vertex-­‐Centric	
  Approach

Computa#on	
  over	
  SuperSteps

Usability	
  and	
  Scalability

Fault	
  Tolerance	
  with	
  checkpoints

Performance	
  -­‐	
  almost	
  linear	
  to	
  the	
  size	
  of	
  the	
  graph


                                            23

Weitere ähnliche Inhalte

Ähnlich wie Pregel

ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...
ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...
ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...
IBM India Smarter Computing
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
Revolution Analytics
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and MonoFosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Achim Friedland
 

Ähnlich wie Pregel (20)

MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
An Intro to Hadoop
An Intro to HadoopAn Intro to Hadoop
An Intro to Hadoop
 
Hadoop at JavaZone 2010
Hadoop at JavaZone 2010Hadoop at JavaZone 2010
Hadoop at JavaZone 2010
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...
ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...
ITG whitepaper: Value Proposition for AIX on IBM Power Systems: Ownership Exp...
 
Hadoop v0.3.1
Hadoop v0.3.1Hadoop v0.3.1
Hadoop v0.3.1
 
Introduction to R for Data Mining
Introduction to R for Data MiningIntroduction to R for Data Mining
Introduction to R for Data Mining
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
 
Scalding
ScaldingScalding
Scalding
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
 
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and MonoFosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
Generator
GeneratorGenerator
Generator
 
SUBJECT
SUBJECTSUBJECT
SUBJECT
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
 
Beginner’s guide to sap abap 1
Beginner’s guide to sap abap 1Beginner’s guide to sap abap 1
Beginner’s guide to sap abap 1
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 

Mehr von Julia Proskurnia

Thesis finalpresentation
Thesis finalpresentationThesis finalpresentation
Thesis finalpresentation
Julia Proskurnia
 
Last.fm under the BFS "attack"
Last.fm under the BFS "attack"Last.fm under the BFS "attack"
Last.fm under the BFS "attack"
Julia Proskurnia
 
Fluidanimate:PARSEC Application Analysis
Fluidanimate:PARSEC Application AnalysisFluidanimate:PARSEC Application Analysis
Fluidanimate:PARSEC Application Analysis
Julia Proskurnia
 

Mehr von Julia Proskurnia (6)

Thesis finalpresentation
Thesis finalpresentationThesis finalpresentation
Thesis finalpresentation
 
Last.fm under the BFS "attack"
Last.fm under the BFS "attack"Last.fm under the BFS "attack"
Last.fm under the BFS "attack"
 
ZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processesZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processes
 
Planet Lab
Planet LabPlanet Lab
Planet Lab
 
Group7 presentation
Group7 presentationGroup7 presentation
Group7 presentation
 
Fluidanimate:PARSEC Application Analysis
Fluidanimate:PARSEC Application AnalysisFluidanimate:PARSEC Application Analysis
Fluidanimate:PARSEC Application Analysis
 

Pregel