SlideShare ist ein Scribd-Unternehmen logo
1 von 26
MILAN 20/21.11.2015
Graphs are everywhere!
Distributed graph computing with Spark GraphX
Andrea Iacono
MILAN 20/21.11.2015 - Andrea Iacono
Agenda:
●
Graph definitions and usages
●
GraphX introduction
●
Pregel
●
Code examples
The main focus will be the programming model
The code is available at:
https://github.com/andreaiacono/TalkGraphX
MILAN 20/21.11.2015 - Andrea Iacono
A graph is a set of vertices and edges that connect them:
Graphs are used for modeling very different domains.
Edge
Verte
x
MILAN 20/21.11.2015 - Andrea Iacono
Network
s
MILAN 20/21.11.2015 - Andrea Iacono
Routing
MILAN 20/21.11.2015 - Andrea Iacono
Page Rank
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Undirected Directed
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Connected Disconnected
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
K5
K2,3
Complete Bipartite (and complete)
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Cyclic Acyclic
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
Multigraph Pseudograph
MILAN 20/21.11.2015 - Andrea Iacono
Definitions
An undirected acyclic connected graph is a tree!
MILAN 20/21.11.2015 - Andrea Iacono
What's wrong with MapReduce?
Every run of MapReduce reads from disk (e.g. HDFS) the initial data,
computes the results and then stores them on disk; since most
algorithms on graphs are iterative, this means that for every iteration
the whole data must be read and written from/to disk.
It's better to use a distributed dataflow framework
MILAN 20/21.11.2015 - Andrea Iacono
GraphX is a graph processing system
built on top of Apache Spark
“Graph processing systems represent graph structured data as a property
graph, which associates user-defined properties with each vertex and edge.”
“The Spark storage abstraction called Resilient Distributed Datasets (RDDs)
enables applications to keep data in memory, which is essential for iterative
graph algorithms.”
“RDDs permit user-defined data partitioning, and the execution engine can
exploit this to co-partition RDDs and co-schedule tasks to avoid data
movement. This is essential for encoding partitioned graphs.”
Excerpt from GraphX: Graph Processing in a Distributed Dataflow Framework
https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf
MILAN 20/21.11.2015 - Andrea Iacono
GraphX / Spark software stack
(image source: Spark site)
MILAN 20/21.11.2015 - Andrea Iacono
Graph Databases
●
Storage
●
Query Language
●
Transactions
●
Examples:
●
Neo4j
●
OrientDB
●
Titan
●
APIs for traversing and
processing
●
Better performance
(in-memory data)
●
Examples:
●
GraphX
●
Giraphe
●
GraphLab
Graph Processing
Systems
MILAN 20/21.11.2015 - Andrea Iacono
Pregel
is a computational model designed by Google
(https://kowshik.github.io/JPregel/pregel_paper.pdf)
It consists of a sequence of supersteps until termination. In each superstep,
every vertex can:
●
modify its state or the one of any of its neighbours
●
receive the messages sent to it during the previous superstep
●
send messages to its neighbours (that will be received in next superstep)
●
vote to halt
When a node votes to halt, it goes to inactive state; if in a later superstep it
receives a message, the framework will awake it changing its state to active.
When all the nodes have voted to halt, the computation stops; otherwise it can be
set a maximum number of iteration.
Edges don't have any computation.
When writing algorithms, you have to think as a vertex.
MILAN 20/21.11.2015 - Andrea Iacono
Pregel sample
Image source: Pregel paper
MILAN 20/21.11.2015 - Andrea Iacono
GraphX implementation of Pregel
GraphX uses three functions for implementing Pregel:
●
vprog: the vertex program computed for each vertex that receives the
incoming message and computes a new vertex value
●
sendMsg: the function used for sending messages to other vertices
●
mergeMsg: a function that takes two incoming messages and merges
them into a single message
Unlike Google's Pregel, GraphX implementation of Pregel:
●
leave the message construction out of the vertex-program, so to have
a more efficient distributed execution
●
permits access to both vertices attributes of an edge while building the
messages
●
contraints sending messages to graph structure (only to neighbours)
MILAN 20/21.11.2015 - Andrea Iacono
GraphX Pregel communication diagram
MILAN 20/21.11.2015 - Andrea Iacono
GraphX is well suited for algorithms that:
●
respect the neighborhood structure
GraphX is NOT well suited for algorithms that:
●
need iteration among distant vertices
●
change the structure of the graph
When to use GraphX
MILAN 20/21.11.2015 - Andrea Iacono
Algorithms out of the
box:
(as of Spark v1.5.1)
- Connected Components
- Label Propagation
- PageRank
- SVD++
- Shortest Paths
- Strongly Connected Components
- Triangle Count
MILAN 20/21.11.2015 - Andrea Iacono
Now some code!
MILAN 20/21.11.2015 - Andrea Iacono
Questions & Answers
MILAN 20/21.11.2015
Andrea Iacono
The code is available at:
https://github.com/andreaiacono/TalkGraphX
MILAN 20/21.11.2015 - Andrea Iacono
Leave your feedback on Joind.in!
https://m.joind.in/event/codemotion-milan-2015

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer space
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jExtending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Graph-Powered Machine Learning
Graph-Powered Machine Learning Graph-Powered Machine Learning
Graph-Powered Machine Learning
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Power of Polyglot Search
Power of Polyglot SearchPower of Polyglot Search
Power of Polyglot Search
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jCongressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
 

Andere mochten auch

Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Spark Summit
 
Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...
Mani kandan
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
Nesreen K. Ahmed
 

Andere mochten auch (20)

Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
 
The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
Graph Processing Applications @ HUG
Graph Processing Applications @ HUGGraph Processing Applications @ HUG
Graph Processing Applications @ HUG
 
Introductory Keynote at Hadoop Workshop by Ospcon (2014)
Introductory Keynote at Hadoop Workshop by Ospcon (2014)Introductory Keynote at Hadoop Workshop by Ospcon (2014)
Introductory Keynote at Hadoop Workshop by Ospcon (2014)
 
Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...Improving personalized recommendations through temporal overlapping community...
Improving personalized recommendations through temporal overlapping community...
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 

Ähnlich wie Graphs are everywhere! Distributed graph computing with Spark GraphX

Ähnlich wie Graphs are everywhere! Distributed graph computing with Spark GraphX (20)

Andrea Iacono - Graphs are everywhere!
 Andrea Iacono - Graphs are everywhere! Andrea Iacono - Graphs are everywhere!
Andrea Iacono - Graphs are everywhere!
 
PDE2011 pythonOCC project status and plans
PDE2011 pythonOCC project status and plansPDE2011 pythonOCC project status and plans
PDE2011 pythonOCC project status and plans
 
mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018mago3D FOSS4G NA 2018
mago3D FOSS4G NA 2018
 
CS267_Graph_Lab
CS267_Graph_LabCS267_Graph_Lab
CS267_Graph_Lab
 
g-Eclipse made Cloud Easy!
g-Eclipse made Cloud Easy!g-Eclipse made Cloud Easy!
g-Eclipse made Cloud Easy!
 
g-Eclipse Made Cloud Easy
g-Eclipse Made Cloud Easyg-Eclipse Made Cloud Easy
g-Eclipse Made Cloud Easy
 
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
 
Introduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explainedIntroduction to Aneka, Aneka Model is explained
Introduction to Aneka, Aneka Model is explained
 
CityEngine-OpenDS
CityEngine-OpenDSCityEngine-OpenDS
CityEngine-OpenDS
 
Remix & GraphQL: A match made in heaven with type-safety DX
Remix & GraphQL:  A match made in heaven with type-safety DXRemix & GraphQL:  A match made in heaven with type-safety DX
Remix & GraphQL: A match made in heaven with type-safety DX
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
Let's integrate CAD/BIM/GIS on the same platform: A practical approach in rea...
 
GraphTech Ecosystem - part 3: Graph Visualization
GraphTech Ecosystem - part 3: Graph VisualizationGraphTech Ecosystem - part 3: Graph Visualization
GraphTech Ecosystem - part 3: Graph Visualization
 
Polyline download and visualization over terrain models
Polyline download and visualization over terrain modelsPolyline download and visualization over terrain models
Polyline download and visualization over terrain models
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
STAF/ICGT 2018 Introduction to graph-oriented programming
STAF/ICGT 2018 Introduction to graph-oriented programmingSTAF/ICGT 2018 Introduction to graph-oriented programming
STAF/ICGT 2018 Introduction to graph-oriented programming
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 
State of GeoServer 2.10
State of GeoServer 2.10State of GeoServer 2.10
State of GeoServer 2.10
 

Kürzlich hochgeladen

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

Graphs are everywhere! Distributed graph computing with Spark GraphX

  • 1. MILAN 20/21.11.2015 Graphs are everywhere! Distributed graph computing with Spark GraphX Andrea Iacono
  • 2. MILAN 20/21.11.2015 - Andrea Iacono Agenda: ● Graph definitions and usages ● GraphX introduction ● Pregel ● Code examples The main focus will be the programming model The code is available at: https://github.com/andreaiacono/TalkGraphX
  • 3. MILAN 20/21.11.2015 - Andrea Iacono A graph is a set of vertices and edges that connect them: Graphs are used for modeling very different domains. Edge Verte x
  • 4. MILAN 20/21.11.2015 - Andrea Iacono Network s
  • 5. MILAN 20/21.11.2015 - Andrea Iacono Routing
  • 6. MILAN 20/21.11.2015 - Andrea Iacono Page Rank
  • 7. MILAN 20/21.11.2015 - Andrea Iacono Definitions Undirected Directed
  • 8. MILAN 20/21.11.2015 - Andrea Iacono Definitions Connected Disconnected
  • 9. MILAN 20/21.11.2015 - Andrea Iacono Definitions K5 K2,3 Complete Bipartite (and complete)
  • 10. MILAN 20/21.11.2015 - Andrea Iacono Definitions Cyclic Acyclic
  • 11. MILAN 20/21.11.2015 - Andrea Iacono Definitions Multigraph Pseudograph
  • 12. MILAN 20/21.11.2015 - Andrea Iacono Definitions An undirected acyclic connected graph is a tree!
  • 13. MILAN 20/21.11.2015 - Andrea Iacono What's wrong with MapReduce? Every run of MapReduce reads from disk (e.g. HDFS) the initial data, computes the results and then stores them on disk; since most algorithms on graphs are iterative, this means that for every iteration the whole data must be read and written from/to disk. It's better to use a distributed dataflow framework
  • 14. MILAN 20/21.11.2015 - Andrea Iacono GraphX is a graph processing system built on top of Apache Spark “Graph processing systems represent graph structured data as a property graph, which associates user-defined properties with each vertex and edge.” “The Spark storage abstraction called Resilient Distributed Datasets (RDDs) enables applications to keep data in memory, which is essential for iterative graph algorithms.” “RDDs permit user-defined data partitioning, and the execution engine can exploit this to co-partition RDDs and co-schedule tasks to avoid data movement. This is essential for encoding partitioned graphs.” Excerpt from GraphX: Graph Processing in a Distributed Dataflow Framework https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf
  • 15. MILAN 20/21.11.2015 - Andrea Iacono GraphX / Spark software stack (image source: Spark site)
  • 16. MILAN 20/21.11.2015 - Andrea Iacono Graph Databases ● Storage ● Query Language ● Transactions ● Examples: ● Neo4j ● OrientDB ● Titan ● APIs for traversing and processing ● Better performance (in-memory data) ● Examples: ● GraphX ● Giraphe ● GraphLab Graph Processing Systems
  • 17. MILAN 20/21.11.2015 - Andrea Iacono Pregel is a computational model designed by Google (https://kowshik.github.io/JPregel/pregel_paper.pdf) It consists of a sequence of supersteps until termination. In each superstep, every vertex can: ● modify its state or the one of any of its neighbours ● receive the messages sent to it during the previous superstep ● send messages to its neighbours (that will be received in next superstep) ● vote to halt When a node votes to halt, it goes to inactive state; if in a later superstep it receives a message, the framework will awake it changing its state to active. When all the nodes have voted to halt, the computation stops; otherwise it can be set a maximum number of iteration. Edges don't have any computation. When writing algorithms, you have to think as a vertex.
  • 18. MILAN 20/21.11.2015 - Andrea Iacono Pregel sample Image source: Pregel paper
  • 19. MILAN 20/21.11.2015 - Andrea Iacono GraphX implementation of Pregel GraphX uses three functions for implementing Pregel: ● vprog: the vertex program computed for each vertex that receives the incoming message and computes a new vertex value ● sendMsg: the function used for sending messages to other vertices ● mergeMsg: a function that takes two incoming messages and merges them into a single message Unlike Google's Pregel, GraphX implementation of Pregel: ● leave the message construction out of the vertex-program, so to have a more efficient distributed execution ● permits access to both vertices attributes of an edge while building the messages ● contraints sending messages to graph structure (only to neighbours)
  • 20. MILAN 20/21.11.2015 - Andrea Iacono GraphX Pregel communication diagram
  • 21. MILAN 20/21.11.2015 - Andrea Iacono GraphX is well suited for algorithms that: ● respect the neighborhood structure GraphX is NOT well suited for algorithms that: ● need iteration among distant vertices ● change the structure of the graph When to use GraphX
  • 22. MILAN 20/21.11.2015 - Andrea Iacono Algorithms out of the box: (as of Spark v1.5.1) - Connected Components - Label Propagation - PageRank - SVD++ - Shortest Paths - Strongly Connected Components - Triangle Count
  • 23. MILAN 20/21.11.2015 - Andrea Iacono Now some code!
  • 24. MILAN 20/21.11.2015 - Andrea Iacono Questions & Answers
  • 25. MILAN 20/21.11.2015 Andrea Iacono The code is available at: https://github.com/andreaiacono/TalkGraphX
  • 26. MILAN 20/21.11.2015 - Andrea Iacono Leave your feedback on Joind.in! https://m.joind.in/event/codemotion-milan-2015

Hinweis der Redaktion

  1. Question to public: - Who knows what a graph is? - Who ever used it? - Who knows the most used algorithms? (BFS, DFS, Dijkstra) - Who knows Scala?
  2. Vertici e archi
  3. Conteggio dei triangoli x raggruppare Interesse commerciale x proposte mirate a gruppi con stessi interessi
  4. Vertici = incroci Archi = strade Algoritmo cammino minimo (Dijkstra), dove gli archi hanno più pesi: tipicamente distanza, traffico, pagamento di un pedaggio, etc
  5. Pagine = vertici Archi = link in entrata Ogni arco in uscita ha un pesao legato a quello del suo vertice; maggiore la sommatoria dei valori degli archi in ingresso, maggiore il peso del vertice. Algoritmo iterativo
  6. Orientato / non orientato
  7. Connesso / Non connesso
  8. K è la nomeclatura standard x indicare questo tipo di grafi A bipartite graph is useful for e-commerce, when you a all the user nodes that can buy any of the product nodes.
  9. Ciclico / Aciclico (o senza cicli)
  10. Multi grafo: quando si possono avere più archi che hanno la stessa sorgente e la stessa destinazione Pseudo grafo: quando un arco può avere lo stesso vertice come sorgente e come destinazione
  11. Quando dicevo che gli archi sono dappertutto, è soprattuto per questo!
  12. Qui si parla di grafi di grosse dimensioni, che non stanno nella RAM di un solo PC.
  13. Il grafo rappresentato è un multi-pseduo grafo. ????? rappresentazione interna?
  14. A differenza di spark, che offre le API in scala, Java e python, GraphX le offre solo in Scala; tuttavia in un prossimo futuro dovrebbero essere disponibili.
  15. Gremlin graph query language (tinkerpop) Gremlin is a DSL for traversing property graphs Neo4j uses (proprietary) cypher as native query language Titan a graph database che supporta come backend di storage: - cassandra (column) - hbase (column) - berkeleyDB (key-value)
  16. Immaginiamo di avere un valore per ogni vertice e di voler trovare il valore massimo di tutto il grafo. Con questo modello di computazione, l'idea è che dobbiamo propagare le informazioni fra i nodi. In ogni superstep, ogni vertice che ha ricevuto un valore più alto del suo, lo manda a tutti i suoi vicini. Quando nessun vertice cambia più, l'agoritmo è terminato.
  17. Commutativa: 2 + 3 == 3 + 2 Associativa: (2 + 3) + 4 = 2 + (3 + 4)
  18. Estrazione JetBrains