SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Kineograph: Taking the Pulse of a
Fast-Changing and Connected World


         Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian
Information
  time-sensitive
rich connections
Challenges
1. Timeliness guarantees
2. Graph
3. Graph-mining
Kineograph
distr. in-memory graph storage
   incremental graph mining
Master           Progress
                 Continuous                              table
                 Data feeds


        Ingest
        nodes                                        Snapshooter


Graph
nodes                         Global consistent snapshots

                                                                   Graph Storage
                                                                   Computation

                        Incremental computation on a
                             static graph snapshot
Graph computation


 Graph updates
Graph nodes
  storage layer
computation layer
Storage layer
 key/value store
logical partitions
Graph partitioning
        edge-cut
no locality consideration
Snapshot
    ingest nodes
    graph nodes
global progress table
Ingest node
graph-update operations
   sequence number
Epoch commit protocol
Progress table

                                                      s1              1
                                                                      3
                                                                      2
                                                                      0
                                                       …              …         Global tx
                                                                                vector
Ingest nodes           s1        …    sn              sn              7
                                                                      3
                                                                      4
                                                             Snapshooter




               Partition u                     Partition v

                 1    2      4   s1              2    3      5   s1
                                           …                              Epoch specified by progress
                     …




Graph nodes


                                                      …
                                                                          table and snapshooter
                 4    6      7   sn              5    6      8   sn
Graph update / compute
         Pipeline
 Incoming
  Tweets       …                       …                                 Time



 Snapshot                  Si-1            Si             Si+1
Construction


  Graph                   Epoch                              Ci
Computation        ti-1           ti                ti’           ti’’
                                       Timeliness
Consistency
     no global serialization
(diff. from 2PL or t.s. ordering)
Atomicity

v               u


v               u
Deterministic
vertex creation
Computation layer
incremental graph-mining
vertex-based
computation model
Incremental Graph
             Computation
       Updates from
       other vertices
                                                             N


         Detect Vertex        Compute New              Change
Init
            Status            Vertex Values         Significantly?


                         Graph-Scale    Propagate            Y
                         Aggregation     Updates
Push model
sender-side aggregation
Pull model
read a subset of neighbors
Execution model
BSP + Dynamic scheduling
3 apps
 TunkRank
    SP
K-exposure
TunkRank
SP
K-exposure
Fault tolerance
among servers
Paxos-based solution
Ingest node failure
  incarnation number
Fault tolerance
 @ storage layer
quorum-based replication
Fault tolerance
@ computation layer
   roll back & re-execute
primary/backup replication
Incremental expansion
Decaying
C#
17,000 LOC
Twitter feeds
  8M vertices, 29M edges
100M tweets with 100K/sec
       power-law
Graph-update throughput
Incremental vs.
Non-incremental
Scalability
Incoming data rate
Failure recovery

Weitere ähnliche Inhalte

Was ist angesagt?

Generalized Isomorphism between Synchronous Circuits and State Machines
Generalized Isomorphism between Synchronous Circuits and State MachinesGeneralized Isomorphism between Synchronous Circuits and State Machines
Generalized Isomorphism between Synchronous Circuits and State Machines
Shunji Nishimura
 
Reduction of multiple subsystem [compatibility mode]
Reduction of multiple subsystem [compatibility mode]Reduction of multiple subsystem [compatibility mode]
Reduction of multiple subsystem [compatibility mode]
azroyyazid
 
MySQL Spatial Extensions - Boston MySQL Meetup April 2005
MySQL Spatial Extensions - Boston MySQL Meetup April 2005MySQL Spatial Extensions - Boston MySQL Meetup April 2005
MySQL Spatial Extensions - Boston MySQL Meetup April 2005
Andrew Collins
 
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Computer Science Club
 

Was ist angesagt? (20)

Block diagram reduction techniques
Block diagram reduction techniquesBlock diagram reduction techniques
Block diagram reduction techniques
 
Generalized Isomorphism between Synchronous Circuits and State Machines
Generalized Isomorphism between Synchronous Circuits and State MachinesGeneralized Isomorphism between Synchronous Circuits and State Machines
Generalized Isomorphism between Synchronous Circuits and State Machines
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressiveness
 
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchFast and Scalable NUMA-based Thread Parallel Breadth-first Search
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
 
GTC 2009 OpenGL Barthold
GTC 2009 OpenGL BartholdGTC 2009 OpenGL Barthold
GTC 2009 OpenGL Barthold
 
3.1,2,3 pushdown automata definition, moves & id
3.1,2,3 pushdown automata   definition, moves & id3.1,2,3 pushdown automata   definition, moves & id
3.1,2,3 pushdown automata definition, moves & id
 
Class 25: Reversing Reverse
Class 25: Reversing ReverseClass 25: Reversing Reverse
Class 25: Reversing Reverse
 
Reed Solomon Matlab Projects Research Ideas
Reed Solomon Matlab Projects Research IdeasReed Solomon Matlab Projects Research Ideas
Reed Solomon Matlab Projects Research Ideas
 
Reduction of multiple subsystem [compatibility mode]
Reduction of multiple subsystem [compatibility mode]Reduction of multiple subsystem [compatibility mode]
Reduction of multiple subsystem [compatibility mode]
 
MySQL Spatial Extensions - Boston MySQL Meetup April 2005
MySQL Spatial Extensions - Boston MySQL Meetup April 2005MySQL Spatial Extensions - Boston MySQL Meetup April 2005
MySQL Spatial Extensions - Boston MySQL Meetup April 2005
 
Parallel quicksort cz. 1
Parallel quicksort cz. 1Parallel quicksort cz. 1
Parallel quicksort cz. 1
 
Broad-sense Synchronous Circuits on Partially Ordered Time
Broad-sense Synchronous Circuits on Partially Ordered TimeBroad-sense Synchronous Circuits on Partially Ordered Time
Broad-sense Synchronous Circuits on Partially Ordered Time
 
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
Andrew Goldberg. Highway Dimension and Provably Efficient Shortest Path Algor...
 
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault ToleranceParallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
Parallel Algorithms: Sort & Merge, Image Processing, Fault Tolerance
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
 
Control chap7
Control chap7Control chap7
Control chap7
 
Control chap3
Control chap3Control chap3
Control chap3
 
Aa sort-v4
Aa sort-v4Aa sort-v4
Aa sort-v4
 
Lec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURELec 08 - DESIGN PROCEDURE
Lec 08 - DESIGN PROCEDURE
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 

Ähnlich wie Kineograph: Taking the Pulse of a Fast-Changing and Connected World

Sorting and Routing on Hypercubes and Hypercubic Architectures
Sorting and Routing on Hypercubes and Hypercubic ArchitecturesSorting and Routing on Hypercubes and Hypercubic Architectures
Sorting and Routing on Hypercubes and Hypercubic Architectures
CTOGreenITHub
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
npinto
 
Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
Fast & Energy-Efficient Breadth-First Search on a Single NUMA SystemFast & Energy-Efficient Breadth-First Search on a Single NUMA System
Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
Yuichiro Yasui
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
Universität Rostock
 

Ähnlich wie Kineograph: Taking the Pulse of a Fast-Changing and Connected World (20)

MSc Presentation
MSc PresentationMSc Presentation
MSc Presentation
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
 
Incremental pattern matching in the VIATRA2 model transformation system
Incremental pattern matching in the VIATRA2 model transformation systemIncremental pattern matching in the VIATRA2 model transformation system
Incremental pattern matching in the VIATRA2 model transformation system
 
Graph processing
Graph processingGraph processing
Graph processing
 
Sorting and Routing on Hypercubes and Hypercubic Architectures
Sorting and Routing on Hypercubes and Hypercubic ArchitecturesSorting and Routing on Hypercubes and Hypercubic Architectures
Sorting and Routing on Hypercubes and Hypercubic Architectures
 
A Closed-Form Expression for Queuing Delay in Rayleigh Fading Channels Using ...
A Closed-Form Expression for Queuing Delay in Rayleigh Fading Channels Using ...A Closed-Form Expression for Queuing Delay in Rayleigh Fading Channels Using ...
A Closed-Form Expression for Queuing Delay in Rayleigh Fading Channels Using ...
 
Lifting 1
Lifting 1Lifting 1
Lifting 1
 
Live model transformations driven by incremental pattern matching
Live model transformations driven by incremental pattern matchingLive model transformations driven by incremental pattern matching
Live model transformations driven by incremental pattern matching
 
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
[Harvard CS264] 16 - Managing Dynamic Parallelism on GPUs: A Case Study of Hi...
 
Eventually, Scylla Chooses Consistency
Eventually, Scylla Chooses ConsistencyEventually, Scylla Chooses Consistency
Eventually, Scylla Chooses Consistency
 
A sync in_sync
A sync in_syncA sync in_sync
A sync in_sync
 
Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
Fast & Energy-Efficient Breadth-First Search on a Single NUMA SystemFast & Energy-Efficient Breadth-First Search on a Single NUMA System
Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Pipeline
PipelinePipeline
Pipeline
 

Mehr von Qian Lin

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsFine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Qian Lin
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
Qian Lin
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Qian Lin
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
Qian Lin
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
Qian Lin
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
Qian Lin
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine Performance
Qian Lin
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a Writer
Qian Lin
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
Qian Lin
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 

Mehr von Qian Lin (13)

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsFine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
 
PaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChatPaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChat
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine Performance
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a Writer
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log Processing
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 

Kineograph: Taking the Pulse of a Fast-Changing and Connected World