SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Kineograph: Taking the Pulse of a
Fast-Changing and Connected World

           EuroSys '12



          Zuhair Khayyat
What is Kineograph
●   A distributed system for processing a continuously
    changing graph.
●   Input: stream of incoming data
●   Main contribution: Distributed in-memory storage that
    supports:
        –   Incremental graph algorithms that extracts
              important information on the fast changing
              graph structure.
        –   Maintain consistency for static structures
             graph algorithms.
Applications on top of Kinograph
●   Time sensitive applications: Trends & breaking news
●   Small data rich connectivity
●   Input data: Twitter feed including user interactions
    and hashtags
●   Algorithms:
        –   User ranking
        –   Approximate shortest paths
        –   Controversial topic detection
●   Claim that Kinograph can process 100K tweets per
    second on 40 machine cluster
Application Challenges
●   Handling fast updates and produce timely
    results.
●   Distributed flexible graph structure to capture
    relations.
●   Handle updates and static algorithms
    consistency.
●   Appropriate fault tolerance.
Kineograph design
●   Separate graph processing and graph updates
●   Provide consistent periodic graph snapshots
●   Distributed updates: multiple workers (Ingest
    nodes) receives twitter feed.
●   Updates to the graph are transformed into
    sequenced transactions.
System Overview
Consistent Distributed snapshots
●   Key/Value based storage system
●   Graph partition based on hashing, multiple partitions
    on a physical machine
●   An incoming record is converted to an update
    transaction which can span to multiple partitions.
●   Each transaction has a unique sequence number,
    which are used for system synchronization (global
    logical timestamp)
●   The global progress table keep track of updates
●   Snapshot and applying update transactions are
    overlapped.
Example
Snapshot consistency
     ●   Ensure atomic transactions


 Incoming
  Tweets       …               …                            Time



 Snapshot             Si-1         Si         Si+1
Construction


  Graph               Epoch                     Ci
Computation    ti-1           ti        ti’          ti’’
                               Timeliness
Incremental computations
●   The user provides:
        –   Rule to check if vertex changed
        –   User function to compute new values
        –   Support aggregations, push and pull
●   No need for consistency model.
●   Vertex-centric computations.
●   Supports supersteps like GraphLab and Pregel
●   Functions:
        –   Initialize, updateFunction, trigger, accululator,
              vertex-request
Example
Fault Tolerance
●   Future: Chubby or ZooKeeper for global
    progress table and monitoring
●   Feeds are stored reliably.
●   Each graph partition has k replicas in different
    physical machines, graph updates are sent to
    all replicas.
●   The result of the computation is stored in a
    master/slave model.
Evaluation
●   Code: C# on windows server 2008
●   Cluster size: 51 machines
●   Combination of machines:
        –   Intel Xeon X3360 quad CPU, 8GB memory
        –   Intel Xeon X5550 quad CPU, 12 GB memory
●   Gigabit network
●   Graph Size: 8M nodes & 29M edges
●   Simulated a live Twitter stream of 100 million
    archived tweets.
Graph update throughput
●   The fastest peak amount of Twitter traffic was
    8900 tweets/s on October 2011.
●   Kineograph: Up to 180000 tweet/s
Throughput vs ingest nodes
Scalability
Kineograph vs Pregel
Questions

Weitere ähnliche Inhalte

Andere mochten auch

BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTZuhair khayyat
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015Zuhair khayyat
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingZuhair khayyat
 
IEJoin and Big Data Cleansing
IEJoin and Big Data CleansingIEJoin and Big Data Cleansing
IEJoin and Big Data CleansingZuhair khayyat
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph ProcessingZuhair khayyat
 
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Zuhair khayyat
 

Andere mochten auch (7)

BigDansing presentation slides for KAUST
BigDansing presentation slides for KAUSTBigDansing presentation slides for KAUST
BigDansing presentation slides for KAUST
 
BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015BigDansing presentation slides for SIGMOD 2015
BigDansing presentation slides for SIGMOD 2015
 
Google appengine
Google appengineGoogle appengine
Google appengine
 
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph ProcessingMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
 
IEJoin and Big Data Cleansing
IEJoin and Big Data CleansingIEJoin and Big Data Cleansing
IEJoin and Big Data Cleansing
 
Large Graph Processing
Large Graph ProcessingLarge Graph Processing
Large Graph Processing
 
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
Presentation on "Mizan: A System for Dynamic Load Balancing in Large-scale Gr...
 

Ähnlich wie Kineograph

SDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming TelemetrySDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming TelemetryAnees Shaikh
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Seattle Apache Flink Meetup
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...Bowen Li
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedVasia Kalavri
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkWenrui Meng
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
Client side machine learning
Client side machine learningClient side machine learning
Client side machine learningKumar Abhinav
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 

Ähnlich wie Kineograph (20)

SDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming TelemetrySDN in the Management Plane: OpenConfig and Streaming Telemetry
SDN in the Management Plane: OpenConfig and Streaming Telemetry
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Apache flink
Apache flinkApache flink
Apache flink
 
Cassandra in xPatterns
Cassandra in xPatternsCassandra in xPatterns
Cassandra in xPatterns
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Client side machine learning
Client side machine learningClient side machine learning
Client side machine learning
 
Resume2015
Resume2015Resume2015
Resume2015
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 

Kürzlich hochgeladen

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Kürzlich hochgeladen (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Kineograph

  • 1. Kineograph: Taking the Pulse of a Fast-Changing and Connected World EuroSys '12 Zuhair Khayyat
  • 2. What is Kineograph ● A distributed system for processing a continuously changing graph. ● Input: stream of incoming data ● Main contribution: Distributed in-memory storage that supports: – Incremental graph algorithms that extracts important information on the fast changing graph structure. – Maintain consistency for static structures graph algorithms.
  • 3. Applications on top of Kinograph ● Time sensitive applications: Trends & breaking news ● Small data rich connectivity ● Input data: Twitter feed including user interactions and hashtags ● Algorithms: – User ranking – Approximate shortest paths – Controversial topic detection ● Claim that Kinograph can process 100K tweets per second on 40 machine cluster
  • 4. Application Challenges ● Handling fast updates and produce timely results. ● Distributed flexible graph structure to capture relations. ● Handle updates and static algorithms consistency. ● Appropriate fault tolerance.
  • 5. Kineograph design ● Separate graph processing and graph updates ● Provide consistent periodic graph snapshots ● Distributed updates: multiple workers (Ingest nodes) receives twitter feed. ● Updates to the graph are transformed into sequenced transactions.
  • 7. Consistent Distributed snapshots ● Key/Value based storage system ● Graph partition based on hashing, multiple partitions on a physical machine ● An incoming record is converted to an update transaction which can span to multiple partitions. ● Each transaction has a unique sequence number, which are used for system synchronization (global logical timestamp) ● The global progress table keep track of updates ● Snapshot and applying update transactions are overlapped.
  • 9. Snapshot consistency ● Ensure atomic transactions Incoming Tweets … … Time Snapshot Si-1 Si Si+1 Construction Graph Epoch Ci Computation ti-1 ti ti’ ti’’ Timeliness
  • 10. Incremental computations ● The user provides: – Rule to check if vertex changed – User function to compute new values – Support aggregations, push and pull ● No need for consistency model. ● Vertex-centric computations. ● Supports supersteps like GraphLab and Pregel ● Functions: – Initialize, updateFunction, trigger, accululator, vertex-request
  • 12. Fault Tolerance ● Future: Chubby or ZooKeeper for global progress table and monitoring ● Feeds are stored reliably. ● Each graph partition has k replicas in different physical machines, graph updates are sent to all replicas. ● The result of the computation is stored in a master/slave model.
  • 13. Evaluation ● Code: C# on windows server 2008 ● Cluster size: 51 machines ● Combination of machines: – Intel Xeon X3360 quad CPU, 8GB memory – Intel Xeon X5550 quad CPU, 12 GB memory ● Gigabit network ● Graph Size: 8M nodes & 29M edges ● Simulated a live Twitter stream of 100 million archived tweets.
  • 14. Graph update throughput ● The fastest peak amount of Twitter traffic was 8900 tweets/s on October 2011. ● Kineograph: Up to 180000 tweet/s