SlideShare ist ein Scribd-Unternehmen logo
1 von 25
GraphFrames Access Methods
Jim Hatcher
Solution Architect, DataStax
Twitter: @thejimhatcher
Graph Day - San Francisco
September 2018
© DataStax, All Rights Reserved.1
Agenda
© 2016 DataStax, All Rights Reserved. 2
● Building Blocks
● OSS Spark GraphFrames
● DSEGraphFrames
● Demo
● Resources
Building Blocks
3
Concepts DataStax Enterprise (DSE)Open Source
Graph
Theory
Database
Graph
Database
Distributed
Database
Execution
Framework
Distributed
Execution
Framework
Apache
Spark
Apache
Cassandra
DSE
Graph
DSE Graph Frames - Mental Model of Concepts
Spark
GraphX
Spark
Graph
Frames
DSE
Graph
Frames
DSE
Search
DSE
Analytics
DSE
Core
Machine
Learning
Graph
Algorithms
Spark
Data
Frames
OLTP /
Realtime
Database
Resilient
Distributed
Dataset
(RDD) Spark
Query Plan
& Memory
Optimi-
zation
Apache
Tinkerpop
& Gremlin
Cluster
Data Center 1
OLTP / Realtime
Data Center 2
OLAP / Batch
Real-time Clients Batch Clients
Typical Cluster Topology in DSE Graph
OSS Spark GraphFrames
6
Capabilities
© 2016 DataStax, All Rights Reserved. 7
● Parallelization / Resilience / Distributed (from Spark)
● Query Plan Optimization (from Spark’s Catalyst engine)
● Memory Optimization (from Spark’s Tungsten engine)
● Spark SQL (from Spark DataFrames)
Motif Finding
© 2016 DataStax, All Rights Reserved. 8
● Motif Finding
○ g.find()
○ motif (subset of cypher)
Graph Algorithms
© 2016 DataStax, All Rights Reserved. 9
● Graph Algorithms (from GraphX)
○ Breadth-First Search (BFS)
○ Connected Components / Strongly Connected Components
○ Label Propagation Algorithm (LPA)
○ Page Rank
○ Shortest Paths
○ SVD++
○ Triangle Count
● Building blocks to write your own algorithms
○ aggregateMessages()
○ pregel() - GraphX
Data Source
© 2016 DataStax, All Rights Reserved. 10
● Load your vertices / edges from any Spark source
DSEGraphFrames
11
Data Source
© 2016 DataStax, All Rights Reserved. 12
● Point to your DSE Graph
val g = spark.dseGraph(“my_graph_name”)
● Or, point to any other data source
Apache Tinkerpop support
© 2016 DataStax, All Rights Reserved. 13
● The same Gremlin that you write for your OLTP-based traversals can be used for Analytical
requirements
● However, only a limited subset of the Gremlin steps are implemented currently
○ Inclusions:
■ DSE 5.1: https://docs.datastax.com/en/dse/5.1/dse-
dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html
■ DSE 6.0: https://docs.datastax.com/en/dse/6.0/dse-
dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html
○ Notable Exclusions:
■ repeat()
■ union()
■ as() / select() -- added in DSE 6.0
Good for Scan Operations
© 2016 DataStax, All Rights Reserved. 14
● Very good for operations that require table scans
○ Examples:
■ g.V().count()
■ g.E().count()
■ g.V().groupCount().by(__.label())
■ g.E().groupCount().by(__.label())
Mutations
© 2016 DataStax, All Rights Reserved. 15
● Effective way of mutating the graph (not available in OSS GraphFrames)
○ Mutations cannot be done using Gremlin OLAP
○ Takes advantage of Spark’s innate ability to parallelize processes
● Potential Use Cases
○ Migration from current graph schema to new graph schema
○ Adding shortcut edges
○ Initial load of the graph
■ Requires a distributed file system such as DSEFS or HDFS
○ Drop all instances of Vertex Label X
© 2016 DataStax, All Rights Reserved. 16
Demo
Dataset
© 2016 DataStax, All Rights Reserved. 17
KillrVideo - reference application
https://github.com/datastax/graph-examples/
Summary Traversals - TinkerPop/Gremlin
© 2016 DataStax, All Rights Reserved. 18
val g = spark.dseGraph("killrvideo")
g.V().count()
g.E().count()
g.V().groupCount().by(__.label())
g.E().groupCount().by(__.label())
//get count of actors by movie
g.V()
.hasLabel("movie")
//.has("title", "I Am Legend")
.as("m")
.out("actor")
.groupCount().by(__.select("m").values("title"))
.order(local).by(values, decr)
Summary Traversals - Spark SQL
© 2016 DataStax, All Rights Reserved. 19
//register our vertex and edge tables so we can reference them in Spark SQL
spark.read.format("com.datastax.bdp.graph.spark.sql.vertex").option("graph",
"killrvideo").load.createOrReplaceTempView("vertices")
spark.read.format("com.datastax.bdp.graph.spark.sql.edge").option("graph",
"killrvideo").load.createOrReplaceTempView("edges")
//get Count of Actors by movie
val moviesAndActorCounts = spark.sql("""
SELECT vMovie.title, COUNT(*) AS NumberOfActors
FROM vertices vMovie
INNER JOIN edges eActor ON vMovie.id = eActor.src AND eActor.`~label` = 'actor'
WHERE vMovie.`~label` = 'movie'
GROUP BY vMovie.id, vMovie.title
ORDER BY COUNT(*) DESC
""")
moviesAndActorCounts.show(false)
//moviesAndActorCounts.explain
Summary Traversals - Spark SQL (cont'd)
© 2016 DataStax, All Rights Reserved. 20
val actorsInMultipleGenres = spark.sql("""
SELECT ActorGenreGrouping.ActorName, ActorGenreGrouping.NumberOfGenres
FROM
(
SELECT vPerson.name AS ActorName, COUNT(*) AS NumberOfGenres
FROM vertices vPerson
INNER JOIN edges eActor ON vPerson.id = eActor.dst AND eActor.`~label` = 'actor'
INNER JOIN vertices vMovie ON vMovie.id = eActor.src AND vPerson.`~label` = 'person'
INNER JOIN edges eGenre ON vMovie.id = eGenre.src AND eGenre.`~label` = 'belongsTo'
INNER JOIN vertices vGenre ON vGenre.id = eGenre.dst AND vGenre.`~label` = 'genre'
WHERE vPerson.`~label` = 'person'
AND vPerson.name <> 'Animation'
GROUP BY vPerson.name, vGenre.name
) AS ActorGenreGrouping
WHERE ActorGenreGrouping.NumberOfGenres > 1
ORDER BY ActorGenreGrouping.NumberOfGenres DESC
""")
actorsInMultipleGenres.show(false)
Motif finding
© 2016 DataStax, All Rights Reserved. 21
val g = spark.dseGraph("killrvideo")
//get a list of actors who have worked in comedy movies
var comedyActors = g.find("(movie)-[e1]->(person); (movie)-[e2]->(genre)")
.filter("""
person.`~label` = 'person'
and e1.`~label` = 'actor'
and movie.`~label` = 'movie'
and e2.`~label` = 'belongsTo'
and genre.`~label` = 'genre'
and genre.name = 'Comedy'
""")
.select("person.name", "movie.title", "genre.name")
comedyActors.show(false)
//comedyActors.explain
Adding Shortcut Edges - DataFrames
© 2016 DataStax, All Rights Reserved. 22
val g = spark.dseGraph("killrvideo")
val vPerson1 = g.vertices.filter($"~label" === "person")
val eActor1 = g.edges.filter($"~label" === "actor")
val vMovie1 = g.vertices.filter($"~label" === "movie")
val eActor2 = g.edges.filter($"~label" === "actor")
val tempResults1 = vPerson1
.join(eActor1, vPerson1.col("id") === eActor1.col("dst"))
.select(vPerson1.col("id").as("vPerson1_id"), vPerson1.col("name").as("vPerson1_name"), eActor1.col("src").as("eActor1_src"))
val tempResults2 = tempResults1
.join(vMovie1, tempResults1.col("eActor1_src") === vMovie1.col("id"))
.select(tempResults1.col("vPerson1_id"), tempResults1.col("vPerson1_name"), vMovie1.col("id").as("vMovie1_id"), vMovie1.col("title"))
val tempResults3 = tempResults2
.join(eActor2, tempResults2.col("vMovie1_id") === eActor2.col("src"))
.select(tempResults2.col("vPerson1_id"), tempResults2.col("vPerson1_name"), tempResults2.col("title"), eActor2.col("dst").as("eActor2_dst"))
val shortcutEdges = tempResults3
.filter($"vPerson1_id" =!= $"eActor2_dst")
.select(tempResults3.col("vPerson1_id").as("src"), tempResults3.col("eActor2_dst").as("dst"), lit("workedTogether").as("~label"))
g.updateEdges(shortcutEdges)
Shortest Path
© 2016 DataStax, All Rights Reserved. 23
spark.sparkContext.setCheckpointDir("dsefs://127.0.0.1:5598/checkpoints")
val g = spark.dseGraph("killrvideo")
val johnWayneId = g.V.has("person", "name", "John Wayne").df.collect()(0)(0)
val jamesStewartId = g.V.has("person", "name", "James Stewart").df.collect()(0)(0)
val shortestPaths = g.shortestPaths.landmarks(Seq(johnWayneId, jamesStewartId)).run
//make a C* table that matches the schema of my dataframe
shortestPaths.createCassandraTable(
"test", //keyspace
"shortest_paths", //table_name
partitionKeyColumns = Some(Seq("id")),
clusteringKeyColumns = Some(Seq("~label")))
Shortest Path (cont'd)
© 2016 DataStax, All Rights Reserved. 24
//write to the table
shortestPaths.write.format("org.apache.spark.sql.cassandra")
.options(
Map(
"table" -> "shortest_paths",
"keyspace" -> "test",
"spark.cassandra.output.ignoreNulls" -> "true"
)
).save
//read it back in later
//val shortestPaths.read.cassandraFormat("shortest_paths", "test").load
shortestPaths
.filter($"~label" === "person")
.select('name, 'distances(johnWayneId).as("hopsFromDuke"), 'distances(jamesStewartId).as("hopsFromJimmy"))
.orderBy('hopsFromJohnWayne desc)
.show(500, false)
Resources
© 2016 DataStax, All Rights Reserved. 25
https://graphframes.github.io/user-guide.html
https://github.com/apache/spark/tree/master/graphx/src/main/scala/org/apache/spark/graphx
https://github.com/graphframes/graphframes
https://www.youtube.com/watch?v=DW09q18OHfc - Russell Spitzer / Artem Aliev - Spark Summit talk
https://www.datastax.com/dev/blog/dse-graph-frame
https://github.com/datastax/graph-examples/blob/master/dse-graph-frame/Spark-shell-notes.scala
https://www.manning.com/books/spark-graphx-in-action
https://academy.datastax.com/resources/ds332

Weitere ähnliche Inhalte

Was ist angesagt?

Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB graphdevroom
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in SearchAmund Tveit
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smetnkaluva
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databasesJulian Hyde
 
A Divine Data Comedy
A Divine Data ComedyA Divine Data Comedy
A Divine Data ComedyMike Harris
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
 
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...Trivadis
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat SheetHortonworks
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkZalando Technology
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascadingjohnynek
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_countKohei KaiGai
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Robert Stupp
 

Was ist angesagt? (20)

Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
AfterGlow
AfterGlowAfterGlow
AfterGlow
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smet
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
A Divine Data Comedy
A Divine Data ComedyA Divine Data Comedy
A Divine Data Comedy
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
 

Ähnlich wie GraphFrames Access Methods in DSE Graph

Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...Trivadis
 
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume LaforgeGroovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume LaforgeGuillaume Laforge
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
Bridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the GuardianBridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the GuardianKaelig Deloumeau-Prigent
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
GraphQL IndyJS April 2016
GraphQL IndyJS April 2016GraphQL IndyJS April 2016
GraphQL IndyJS April 2016Brad Pillow
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015dhiguero
 
Dax Declarative Api For Xml
Dax   Declarative Api For XmlDax   Declarative Api For Xml
Dax Declarative Api For XmlLars Trieloff
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Code-first GraphQL Server Development with Prisma
Code-first  GraphQL Server Development with PrismaCode-first  GraphQL Server Development with Prisma
Code-first GraphQL Server Development with PrismaNikolas Burk
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)Daniel Nüst
 
Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Kevin Lee
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
 

Ähnlich wie GraphFrames Access Methods in DSE Graph (20)

Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
 
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume LaforgeGroovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Bridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the GuardianBridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the Guardian
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
GraphQL IndyJS April 2016
GraphQL IndyJS April 2016GraphQL IndyJS April 2016
GraphQL IndyJS April 2016
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Dax Declarative Api For Xml
Dax   Declarative Api For XmlDax   Declarative Api For Xml
Dax Declarative Api For Xml
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
COLLADA & WebGL
COLLADA & WebGLCOLLADA & WebGL
COLLADA & WebGL
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Code-first GraphQL Server Development with Prisma
Code-first  GraphQL Server Development with PrismaCode-first  GraphQL Server Development with Prisma
Code-first GraphQL Server Development with Prisma
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 

Kürzlich hochgeladen

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Kürzlich hochgeladen (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

GraphFrames Access Methods in DSE Graph

  • 1. GraphFrames Access Methods Jim Hatcher Solution Architect, DataStax Twitter: @thejimhatcher Graph Day - San Francisco September 2018 © DataStax, All Rights Reserved.1
  • 2. Agenda © 2016 DataStax, All Rights Reserved. 2 ● Building Blocks ● OSS Spark GraphFrames ● DSEGraphFrames ● Demo ● Resources
  • 4. Concepts DataStax Enterprise (DSE)Open Source Graph Theory Database Graph Database Distributed Database Execution Framework Distributed Execution Framework Apache Spark Apache Cassandra DSE Graph DSE Graph Frames - Mental Model of Concepts Spark GraphX Spark Graph Frames DSE Graph Frames DSE Search DSE Analytics DSE Core Machine Learning Graph Algorithms Spark Data Frames OLTP / Realtime Database Resilient Distributed Dataset (RDD) Spark Query Plan & Memory Optimi- zation Apache Tinkerpop & Gremlin
  • 5. Cluster Data Center 1 OLTP / Realtime Data Center 2 OLAP / Batch Real-time Clients Batch Clients Typical Cluster Topology in DSE Graph
  • 7. Capabilities © 2016 DataStax, All Rights Reserved. 7 ● Parallelization / Resilience / Distributed (from Spark) ● Query Plan Optimization (from Spark’s Catalyst engine) ● Memory Optimization (from Spark’s Tungsten engine) ● Spark SQL (from Spark DataFrames)
  • 8. Motif Finding © 2016 DataStax, All Rights Reserved. 8 ● Motif Finding ○ g.find() ○ motif (subset of cypher)
  • 9. Graph Algorithms © 2016 DataStax, All Rights Reserved. 9 ● Graph Algorithms (from GraphX) ○ Breadth-First Search (BFS) ○ Connected Components / Strongly Connected Components ○ Label Propagation Algorithm (LPA) ○ Page Rank ○ Shortest Paths ○ SVD++ ○ Triangle Count ● Building blocks to write your own algorithms ○ aggregateMessages() ○ pregel() - GraphX
  • 10. Data Source © 2016 DataStax, All Rights Reserved. 10 ● Load your vertices / edges from any Spark source
  • 12. Data Source © 2016 DataStax, All Rights Reserved. 12 ● Point to your DSE Graph val g = spark.dseGraph(“my_graph_name”) ● Or, point to any other data source
  • 13. Apache Tinkerpop support © 2016 DataStax, All Rights Reserved. 13 ● The same Gremlin that you write for your OLTP-based traversals can be used for Analytical requirements ● However, only a limited subset of the Gremlin steps are implemented currently ○ Inclusions: ■ DSE 5.1: https://docs.datastax.com/en/dse/5.1/dse- dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html ■ DSE 6.0: https://docs.datastax.com/en/dse/6.0/dse- dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html ○ Notable Exclusions: ■ repeat() ■ union() ■ as() / select() -- added in DSE 6.0
  • 14. Good for Scan Operations © 2016 DataStax, All Rights Reserved. 14 ● Very good for operations that require table scans ○ Examples: ■ g.V().count() ■ g.E().count() ■ g.V().groupCount().by(__.label()) ■ g.E().groupCount().by(__.label())
  • 15. Mutations © 2016 DataStax, All Rights Reserved. 15 ● Effective way of mutating the graph (not available in OSS GraphFrames) ○ Mutations cannot be done using Gremlin OLAP ○ Takes advantage of Spark’s innate ability to parallelize processes ● Potential Use Cases ○ Migration from current graph schema to new graph schema ○ Adding shortcut edges ○ Initial load of the graph ■ Requires a distributed file system such as DSEFS or HDFS ○ Drop all instances of Vertex Label X
  • 16. © 2016 DataStax, All Rights Reserved. 16 Demo
  • 17. Dataset © 2016 DataStax, All Rights Reserved. 17 KillrVideo - reference application https://github.com/datastax/graph-examples/
  • 18. Summary Traversals - TinkerPop/Gremlin © 2016 DataStax, All Rights Reserved. 18 val g = spark.dseGraph("killrvideo") g.V().count() g.E().count() g.V().groupCount().by(__.label()) g.E().groupCount().by(__.label()) //get count of actors by movie g.V() .hasLabel("movie") //.has("title", "I Am Legend") .as("m") .out("actor") .groupCount().by(__.select("m").values("title")) .order(local).by(values, decr)
  • 19. Summary Traversals - Spark SQL © 2016 DataStax, All Rights Reserved. 19 //register our vertex and edge tables so we can reference them in Spark SQL spark.read.format("com.datastax.bdp.graph.spark.sql.vertex").option("graph", "killrvideo").load.createOrReplaceTempView("vertices") spark.read.format("com.datastax.bdp.graph.spark.sql.edge").option("graph", "killrvideo").load.createOrReplaceTempView("edges") //get Count of Actors by movie val moviesAndActorCounts = spark.sql(""" SELECT vMovie.title, COUNT(*) AS NumberOfActors FROM vertices vMovie INNER JOIN edges eActor ON vMovie.id = eActor.src AND eActor.`~label` = 'actor' WHERE vMovie.`~label` = 'movie' GROUP BY vMovie.id, vMovie.title ORDER BY COUNT(*) DESC """) moviesAndActorCounts.show(false) //moviesAndActorCounts.explain
  • 20. Summary Traversals - Spark SQL (cont'd) © 2016 DataStax, All Rights Reserved. 20 val actorsInMultipleGenres = spark.sql(""" SELECT ActorGenreGrouping.ActorName, ActorGenreGrouping.NumberOfGenres FROM ( SELECT vPerson.name AS ActorName, COUNT(*) AS NumberOfGenres FROM vertices vPerson INNER JOIN edges eActor ON vPerson.id = eActor.dst AND eActor.`~label` = 'actor' INNER JOIN vertices vMovie ON vMovie.id = eActor.src AND vPerson.`~label` = 'person' INNER JOIN edges eGenre ON vMovie.id = eGenre.src AND eGenre.`~label` = 'belongsTo' INNER JOIN vertices vGenre ON vGenre.id = eGenre.dst AND vGenre.`~label` = 'genre' WHERE vPerson.`~label` = 'person' AND vPerson.name <> 'Animation' GROUP BY vPerson.name, vGenre.name ) AS ActorGenreGrouping WHERE ActorGenreGrouping.NumberOfGenres > 1 ORDER BY ActorGenreGrouping.NumberOfGenres DESC """) actorsInMultipleGenres.show(false)
  • 21. Motif finding © 2016 DataStax, All Rights Reserved. 21 val g = spark.dseGraph("killrvideo") //get a list of actors who have worked in comedy movies var comedyActors = g.find("(movie)-[e1]->(person); (movie)-[e2]->(genre)") .filter(""" person.`~label` = 'person' and e1.`~label` = 'actor' and movie.`~label` = 'movie' and e2.`~label` = 'belongsTo' and genre.`~label` = 'genre' and genre.name = 'Comedy' """) .select("person.name", "movie.title", "genre.name") comedyActors.show(false) //comedyActors.explain
  • 22. Adding Shortcut Edges - DataFrames © 2016 DataStax, All Rights Reserved. 22 val g = spark.dseGraph("killrvideo") val vPerson1 = g.vertices.filter($"~label" === "person") val eActor1 = g.edges.filter($"~label" === "actor") val vMovie1 = g.vertices.filter($"~label" === "movie") val eActor2 = g.edges.filter($"~label" === "actor") val tempResults1 = vPerson1 .join(eActor1, vPerson1.col("id") === eActor1.col("dst")) .select(vPerson1.col("id").as("vPerson1_id"), vPerson1.col("name").as("vPerson1_name"), eActor1.col("src").as("eActor1_src")) val tempResults2 = tempResults1 .join(vMovie1, tempResults1.col("eActor1_src") === vMovie1.col("id")) .select(tempResults1.col("vPerson1_id"), tempResults1.col("vPerson1_name"), vMovie1.col("id").as("vMovie1_id"), vMovie1.col("title")) val tempResults3 = tempResults2 .join(eActor2, tempResults2.col("vMovie1_id") === eActor2.col("src")) .select(tempResults2.col("vPerson1_id"), tempResults2.col("vPerson1_name"), tempResults2.col("title"), eActor2.col("dst").as("eActor2_dst")) val shortcutEdges = tempResults3 .filter($"vPerson1_id" =!= $"eActor2_dst") .select(tempResults3.col("vPerson1_id").as("src"), tempResults3.col("eActor2_dst").as("dst"), lit("workedTogether").as("~label")) g.updateEdges(shortcutEdges)
  • 23. Shortest Path © 2016 DataStax, All Rights Reserved. 23 spark.sparkContext.setCheckpointDir("dsefs://127.0.0.1:5598/checkpoints") val g = spark.dseGraph("killrvideo") val johnWayneId = g.V.has("person", "name", "John Wayne").df.collect()(0)(0) val jamesStewartId = g.V.has("person", "name", "James Stewart").df.collect()(0)(0) val shortestPaths = g.shortestPaths.landmarks(Seq(johnWayneId, jamesStewartId)).run //make a C* table that matches the schema of my dataframe shortestPaths.createCassandraTable( "test", //keyspace "shortest_paths", //table_name partitionKeyColumns = Some(Seq("id")), clusteringKeyColumns = Some(Seq("~label")))
  • 24. Shortest Path (cont'd) © 2016 DataStax, All Rights Reserved. 24 //write to the table shortestPaths.write.format("org.apache.spark.sql.cassandra") .options( Map( "table" -> "shortest_paths", "keyspace" -> "test", "spark.cassandra.output.ignoreNulls" -> "true" ) ).save //read it back in later //val shortestPaths.read.cassandraFormat("shortest_paths", "test").load shortestPaths .filter($"~label" === "person") .select('name, 'distances(johnWayneId).as("hopsFromDuke"), 'distances(jamesStewartId).as("hopsFromJimmy")) .orderBy('hopsFromJohnWayne desc) .show(500, false)
  • 25. Resources © 2016 DataStax, All Rights Reserved. 25 https://graphframes.github.io/user-guide.html https://github.com/apache/spark/tree/master/graphx/src/main/scala/org/apache/spark/graphx https://github.com/graphframes/graphframes https://www.youtube.com/watch?v=DW09q18OHfc - Russell Spitzer / Artem Aliev - Spark Summit talk https://www.datastax.com/dev/blog/dse-graph-frame https://github.com/datastax/graph-examples/blob/master/dse-graph-frame/Spark-shell-notes.scala https://www.manning.com/books/spark-graphx-in-action https://academy.datastax.com/resources/ds332