SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
1
Twitter Storm

1st Mentioning of Big Data
Michael Cox and David Ellsworth publish “Application-controlled
demand paging for out-of-core visualization” in the Proceedings of the
IEEE 8th conference on Visualization, 1997

http://www.gartner.com/it-glossary/big-data/

Pregel
Map Reduce
2
• Scalable platform for large scale graph analytics
in commodity clusters and cloud.
• Simple yet scalable programming abstraction for
large scale graph processing
• More than 10x improvements in some graph
algorithms over traditional graph processing
systems
• Components
– GoFS : Sub-graph centric Distributed Graph Storage
– Gopher : Distributed BSP-based programming
abstraction for Large Scale graph analytics framework
3
• Iterative vertex centric programming model
based on Bulk synchronous parallel model.
• In each iteration vertex can
1. Receive messages sent to it in previous iteration
2. Send messages to other vertices
3. Modify its own state

• Vertex centric programming mode- Think like
a vertex
• Very simple,
• High communication overhead
• Pregel, http://giraph.apache.org/
4
• Partition graph in to set of sets of vertices and edges
• Sub-graphs – Connected components in a partition.
• User logic operates on a sub graph
– Independent unit of computation

• Resource allocation
– Single Partition → Single Machine | Single Sub-graph → Single CPU

• Data loading
– Entire partition is loaded into memory before computation
– Tasks retain sub-graphs in memory within the task scope
Sub-graph-Task 1

Sub-graph-Task 2

Sub-graph-Task 3

5
• Algorithm : Seqn of super-steps separated
by global barrier synchronization
• Super Step i:
1.
2.
3.
4.

Sub-graphs compute in parallel
Receive messages sent to it in super step i-1
Execute same user defined function
Send messages to other sub graphs (to be
received in super step i+1)
5. Can vote to halt : I’m done / de-activate

http://www.ibm.com/developerworks/opensource/library/os-giraph/

• Global Vote to Halt check

BSP Manager

Control Channels

– termination → all sub-graphs voted to halt

• Active sub graphs participate in every
computation
• De-activated sub-graphs will not get
executed/activated unless it get new
messages

P1

P1

Data Channnels

P1

6
• Algorithm – Max vertex value
• Input – Connected graph with different vertex values {n1,n2…. ni}
• Output – Each vertex with the Max ({n1,n2…. ni})

• Compared to Vertex centric
model
• Less communication
• Fast convergence
7
8
• Cluster
– 12 Nodes, 8-Core Intel Xeon CPU (each)
– 16GB RAM (each), 1TB HDD (each)

• Network
– Gigabit Ethernet

• Apache Giraph latest version from trunk
– Includes Performance improvements from Facebook !
Data Set

Vertices

Edges

Diameter

RN: California Road
Network

1,965,206

2,766,607

849

TR: Internet path graph 19,442,778
from Tracesroutes

22,782,842

25

LJ: LiveJournal
Social Network

68,475,391

10

4,847,571

9
• Connected components
– 81x improvement using California
Road Network (RN) dataset
– 21x improvement using a Trace
route path network of a CDN. (TR)
dataset

Page Rank
4x improvement using RN dataset
1.5x improvement using TR dataset
Not an ideal algorithm for sub-graph
centric programming model

• Single Source Shortest Path
– 32x improvement using RN dataset
– 10x improvement using TR dataset

10
• Connected components
– ~79x reduction using RN dataset.
– ~4x reduction using TR dataset
– ~2x reduction using Live Journal dataset (LJ)

• Single source shortest path
– ~38x reduction using RN data set
– 4x reduction using TR dataset
– ~1.6x reduction using Live journal dataset

11
• Introduced a sub-graph centric programming abstraction
for large scale graph analytics on distributed systems
– Simple
– Enable using shared memory algorithms at sub-graph level.

• Sub-graph centric algorithms and performance results
– Connected Components
– Single Source Shortest Path
– Page Rank

• Issues and Future work
– Sub-graph aware partitioning
– Sub-graph centric algorithms

• Try now
– https://github.com/usc-cloud/goffish
12
http://thesciencepresenter.wordpress.com/category/behaviour-management/

13

Weitere ähnliche Inhalte

Was ist angesagt?

Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsYash Khandelwal
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
 
Scaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushScaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushDaniel Ben-Zvi
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationEUDAT
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joinsShalish VJ
 
Dive in with Databases – FME Summer Camp 2018
Dive in with Databases  – FME Summer Camp 2018Dive in with Databases  – FME Summer Camp 2018
Dive in with Databases – FME Summer Camp 2018Safe Software
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)Jonas Traub
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoopguest20d395b
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoopishan0019
 

Was ist angesagt? (20)

What's new in IP 4.4
What's new in IP 4.4What's new in IP 4.4
What's new in IP 4.4
 
Benchmarking tool for graph algorithms
Benchmarking tool for graph algorithmsBenchmarking tool for graph algorithms
Benchmarking tool for graph algorithms
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Whats new in IC 2016?
Whats new in IC 2016?Whats new in IC 2016?
Whats new in IC 2016?
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Mapreduce
MapreduceMapreduce
Mapreduce
 
Scaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushScaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rush
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
MapReduce
MapReduceMapReduce
MapReduce
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Hadoop MapReduce joins
Hadoop MapReduce joinsHadoop MapReduce joins
Hadoop MapReduce joins
 
Introduction to yarn
Introduction to yarnIntroduction to yarn
Introduction to yarn
 
Apache Giraph
Apache GiraphApache Giraph
Apache Giraph
 
Dive in with Databases – FME Summer Camp 2018
Dive in with Databases  – FME Summer Camp 2018Dive in with Databases  – FME Summer Camp 2018
Dive in with Databases – FME Summer Camp 2018
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
 

Ähnlich wie GoFFish - A Sub-graph centric framework for large scale graph analytics

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...NETWAYS
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkOpen Networking Summits
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analyticsinside-BigData.com
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plansinside-BigData.com
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14KALRAY
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontierinside-BigData.com
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchmestery
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...InfluxData
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 

Ähnlich wie GoFFish - A Sub-graph centric framework for large scale graph analytics (20)

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
 
try
trytry
try
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
LISP and NSH in Open vSwitch
LISP and NSH in Open vSwitchLISP and NSH in Open vSwitch
LISP and NSH in Open vSwitch
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 

Kürzlich hochgeladen

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Kürzlich hochgeladen (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

GoFFish - A Sub-graph centric framework for large scale graph analytics

  • 1. 1
  • 2. Twitter Storm 1st Mentioning of Big Data Michael Cox and David Ellsworth publish “Application-controlled demand paging for out-of-core visualization” in the Proceedings of the IEEE 8th conference on Visualization, 1997 http://www.gartner.com/it-glossary/big-data/ Pregel Map Reduce 2
  • 3. • Scalable platform for large scale graph analytics in commodity clusters and cloud. • Simple yet scalable programming abstraction for large scale graph processing • More than 10x improvements in some graph algorithms over traditional graph processing systems • Components – GoFS : Sub-graph centric Distributed Graph Storage – Gopher : Distributed BSP-based programming abstraction for Large Scale graph analytics framework 3
  • 4. • Iterative vertex centric programming model based on Bulk synchronous parallel model. • In each iteration vertex can 1. Receive messages sent to it in previous iteration 2. Send messages to other vertices 3. Modify its own state • Vertex centric programming mode- Think like a vertex • Very simple, • High communication overhead • Pregel, http://giraph.apache.org/ 4
  • 5. • Partition graph in to set of sets of vertices and edges • Sub-graphs – Connected components in a partition. • User logic operates on a sub graph – Independent unit of computation • Resource allocation – Single Partition → Single Machine | Single Sub-graph → Single CPU • Data loading – Entire partition is loaded into memory before computation – Tasks retain sub-graphs in memory within the task scope Sub-graph-Task 1 Sub-graph-Task 2 Sub-graph-Task 3 5
  • 6. • Algorithm : Seqn of super-steps separated by global barrier synchronization • Super Step i: 1. 2. 3. 4. Sub-graphs compute in parallel Receive messages sent to it in super step i-1 Execute same user defined function Send messages to other sub graphs (to be received in super step i+1) 5. Can vote to halt : I’m done / de-activate http://www.ibm.com/developerworks/opensource/library/os-giraph/ • Global Vote to Halt check BSP Manager Control Channels – termination → all sub-graphs voted to halt • Active sub graphs participate in every computation • De-activated sub-graphs will not get executed/activated unless it get new messages P1 P1 Data Channnels P1 6
  • 7. • Algorithm – Max vertex value • Input – Connected graph with different vertex values {n1,n2…. ni} • Output – Each vertex with the Max ({n1,n2…. ni}) • Compared to Vertex centric model • Less communication • Fast convergence 7
  • 8. 8
  • 9. • Cluster – 12 Nodes, 8-Core Intel Xeon CPU (each) – 16GB RAM (each), 1TB HDD (each) • Network – Gigabit Ethernet • Apache Giraph latest version from trunk – Includes Performance improvements from Facebook ! Data Set Vertices Edges Diameter RN: California Road Network 1,965,206 2,766,607 849 TR: Internet path graph 19,442,778 from Tracesroutes 22,782,842 25 LJ: LiveJournal Social Network 68,475,391 10 4,847,571 9
  • 10. • Connected components – 81x improvement using California Road Network (RN) dataset – 21x improvement using a Trace route path network of a CDN. (TR) dataset Page Rank 4x improvement using RN dataset 1.5x improvement using TR dataset Not an ideal algorithm for sub-graph centric programming model • Single Source Shortest Path – 32x improvement using RN dataset – 10x improvement using TR dataset 10
  • 11. • Connected components – ~79x reduction using RN dataset. – ~4x reduction using TR dataset – ~2x reduction using Live Journal dataset (LJ) • Single source shortest path – ~38x reduction using RN data set – 4x reduction using TR dataset – ~1.6x reduction using Live journal dataset 11
  • 12. • Introduced a sub-graph centric programming abstraction for large scale graph analytics on distributed systems – Simple – Enable using shared memory algorithms at sub-graph level. • Sub-graph centric algorithms and performance results – Connected Components – Single Source Shortest Path – Page Rank • Issues and Future work – Sub-graph aware partitioning – Sub-graph centric algorithms • Try now – https://github.com/usc-cloud/goffish 12