SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Optimizing Cypher
Queries in Neo4j
Wes Freeman (@wefreema)
Mark Needham (@markhneedham)
Today's schedule
• Brief overview of cypher syntax
• Graph global vs Graph local queries
• Labels and indexes
• Optimization patterns
• Profiling cypher queries
• Applying optimization patterns
Cypher Syntax
• Statement parts
o Optional: Querying part (MATCH|WHERE)
o Optional: Updating part (CREATE|MERGE)
o Optional: Returning part (WITH|RETURN)
• Parts can be chained together
Cypher Syntax - Refresher
MATCH (n:Label)-[r:LINKED]->(m)
WHERE n.prop = "..."
RETURN n, r, m
Starting points
• Graph scan (global; potentially slow)
• Label scan (usually reserved for aggregation
queries; not ideal)
• Label property index lookup (local; good!)
Introducing the football dataset
The 1.9 global scan
O(n)
n = # of nodes
START pl = node(*)
MATCH (pl)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
150ms w/ 30k nodes, 120k rels
The 2.0 global scan
MATCH (pl)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
130ms w/ 30k nodes, 120k rels
O(n)
n = # of nodes
Why is it a global scan?
• Cypher is a pattern matching language
• It doesn't discriminate unless you tell it to
o It must try to start at all nodes to find this pattern, as
specified
Introduce a label
Label your starting points
CREATE (player:Player
{name: "Wayne Rooney"} )
O(k)
k = # of nodes with that labelLabel scan
MATCH (pl:Player)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
80ms w/ 30k nodes, 120k rels (~900 :Player nodes)
Indexes don't come for free
CREATE INDEX ON :Player(name)
OR
CREATE CONSTRAINT ON pl:Player
ASSERT pl.name IS UNIQUE
O(log k)
k = # of nodes with that labelIndex lookup
MATCH (pl:Player)-[:played]->(stats)
WHERE pl.name = "Wayne Rooney"
RETURN stats
6ms w/ 30k nodes, 120k rels (~900 :Player nodes)
Optimization Patterns
• Avoid cartesian products
• Avoid patterns in the WHERE clause
• Start MATCH patterns at the lowest
cardinality and expand outward
• Separate MATCH patterns with minimal
expansion at each stage
Introducing the movie data set
Anti-pattern: Cartesian Products
MATCH (m:Movie), (p:Person)
Subtle Cartesian Products
MATCH (p:Person)-[:KNOWS]->(c)
WHERE p.name="Tom Hanks"
WITH c
MATCH (k:Keyword)
RETURN c, k
Counting Cartesian Products
MATCH (pl:Player),(t:Team),(g:Game)
RETURN COUNT(DISTINCT pl),
COUNT(DISTINCT t),
COUNT(DISTINCT g)
80000 ms w/ ~900 players, ~40 teams, ~1200 games
MATCH (pl:Player)
WITH COUNT(pl) as players
MATCH (t:Team)
WITH COUNT(t) as teams, players
MATCH (g:Game)
RETURN COUNT(g) as games, teams, players8ms w/
~900 players, ~40 teams, ~1200 games
Better Counting
Directions on patterns
MATCH (p:Person)-[:ACTED_IN]-(m)
WHERE p.name = "Tom Hanks"
RETURN m
Parameterize your queries
MATCH (p:Person)-[:ACTED_IN]-(m)
WHERE p.name = {name}
RETURN m
Fast predicates first
Bad:
MATCH (t:Team)-[:played_in]->(g)
WHERE NOT (t)-[:home_team]->(g)
AND g.away_goals > g.home_goals
RETURN t, COUNT(g)
Better:
MATCH (t:Team)-[:played_in]->(g)
WHERE g.away_goals > g.home_goals
AND NOT (t)-[:home_team]->()
RETURN t, COUNT(g)
Fast predicates first
Patterns in WHERE clauses
• Keep them in the MATCH
• The only pattern that needs to be in a
WHERE clause is a NOT
MERGE and CONSTRAINTs
• MERGE is MATCH or CREATE
• MERGE can take advantage of unique
constraints and indexes
MERGE (without index)
MERGE (g:Game
{date:1290257100,
time: 1245,
home_goals: 2,
away_goals: 3,
match_id: 292846,
attendance: 60102})
RETURN g
188 ms w/ ~400 games
Adding an index
CREATE INDEX ON :Game(match_id)
MERGE (with index)
MERGE (g:Game
{date:1290257100,
time: 1245,
home_goals: 2,
away_goals: 3,
match_id: 292846,
attendance: 60102})
RETURN g
6 ms w/ ~400 games
Alternative MERGE approach
MERGE (g:Game { match_id: 292846 })
ON CREATE
SET g.date = 1290257100
SET g.time = 1245
SET g.home_goals = 2
SET g.away_goals = 3
SET g.attendance = 60102
RETURN g
Profiling queries
• Use the PROFILE keyword in front of the
query
o from webadmin or shell - won't work in browser
• Look for db_hits and rows
• Ignore everything else (for now!)
Reviewing the football dataset
Football Optimization
MATCH (game)<-[:contains_match]-(season:Season),
(team)<-[:away_team]-(game),
(stats)-[:in]->(game),
(team)<-[:for]-(stats)<-[:played]-(player)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 103137 ms w/ ~900 players, ~20 teams, ~400 games
Football Optimization
==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-e746407c504a", "
INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323"], returnItemNames=["player.name", "COLLECT(DISTINCT
team.name)", "goals"], _rows=10, _db_hits=0)
==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323 of type Number),false)"],
limit="Literal(10)", _rows=10, _db_hits=0)
==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-
e746407c504a,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-
8ca9-fb0286f3d323,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899)
==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED108", "season", " UNNAMED55", "player", "team", " UNNAMED124", "
UNNAMED85", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192)
==> PatternMatch(g="(player)-[' UNNAMED124']-(stats)", _rows=5192, _db_hits=0)
==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=5192, _db_hits=15542)
==> TraversalMatcher(trail="(season)-[ UNNAMED12:contains_match WHERE true AND true]->(game)<-[ UNNAMED85:in WHERE
true AND true]-(stats)-[ UNNAMED108:for WHERE true AND true]->(team)<-[ UNNAMED55:away_team WHERE true AND true]-
(game)", _rows=15542, _db_hits=1620462)
Break out the match statements
MATCH (game)<-[:contains_match]-(season:Season)
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
WHERE season.name = "2012-2013"
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10200 ms w/ ~900 players, ~20 teams, ~400 games
Start small
• Smallest cardinality label first
• Smallest intermediate result set first
Exploring cardinalities
MATCH (game)<-[:contains_match]-(season:Season)
RETURN COUNT(DISTINCT game), COUNT(DISTINCT season)
1140 games, 3 seasons
MATCH (team)<-[:away_team]-(game:Game)
RETURN COUNT(DISTINCT team), COUNT(DISTINCT game)
25 teams, 1140 games
Exploring cardinalities
MATCH (stats)-[:in]->(game:Game)
RETURN COUNT(DISTINCT stats), COUNT(DISTINCT game)
31117 stats, 1140 games
MATCH (stats)<-[:played]-(player:Player)
RETURN COUNT(DISTINCT stats), COUNT(DISTINCT player)
31117 stats, 880 players
Look for teams first
MATCH (team)<-[:away_team]-(game:Game)
MATCH (game)<-[:contains_match]-(season)
WHERE season.name = "2012-2013"
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10162 ms w/ ~900 players, ~20 teams, ~400 games
==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-b0c7ec85c969", "
INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5"], returnItemNames=["player.name", "COLLECT(DISTINCT
team.name)", "goals"], _rows=10, _db_hits=0)
==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5 of type Number),false)"],
limit="Literal(10)", _rows=10, _db_hits=0)
==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-
b0c7ec85c969,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE199af213-e3bd-400f-
aba9-8ca2a9e153c5,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899)
==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED168", "season", " UNNAMED125", "player", "team", " UNNAMED152", "
UNNAMED51", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192)
==> PatternMatch(g="(stats)-[' UNNAMED152']-(team),(player)-[' UNNAMED168']-(stats)", _rows=5192, _db_hits=0)
==> PatternMatch(g="(stats)-[' UNNAMED125']-(game)", _rows=10394, _db_hits=0)
==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=380, _db_hits=380)
==> PatternMatch(g="(season)-[' UNNAMED51']-(game)", _rows=380, _db_hits=1140)
==> TraversalMatcher(trail="(game)-[ UNNAMED12:away_team WHERE true AND true]->(team)", _rows=1140,
_db_hits=1140)
Look for teams first
Filter games a bit earlier
MATCH (game)<-[:contains_match]-(season:Season)
WHERE season.name = "2012-2013"
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10148 ms w/ ~900 players, ~20 teams, ~400 games
Filter out stats with no goals
MATCH (game)<-[:contains_match]-(season:Season)
WHERE season.name = "2012-2013"
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)WHERE stats.goals > 0
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10
59 ms w/ ~900 players, ~20 teams, ~400 games
Movie query optimization
MATCH (movie:Movie {title: {title} })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })MATCH (genre)<-[:HAS_GENRE]-
(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]-
>(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]-
>(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]-
>(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]-
>(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH
(movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related,
count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writersORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie,
collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres,
directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight:
actormoviesweight}) as actors,
movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as
related, genres, directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors,
movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as
related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-
[:HAS_KEYWORD]-(movies)
WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors,
genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })
MATCH (genre)<-[:HAS_GENRE]-(movie)
MATCH (director)-[:DIRECTED]->(movie)
MATCH (actor)-[:ACTED_IN]->(movie)
MATCH (writer)-[:WRITER_OF]->(movie)
MATCH (actor)-[:ACTED_IN]->(actormovies)
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword) as weight,
count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as
genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name)
as writers
ORDER BY weight DESC, actormoviesweight DESC
WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors,
movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as
related, genres, directors, writers
MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword,
count(movies) as keyword_weight, movie, related,
actors, genres, directors, writers
ORDER BY keyword_weight
RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor
ORDER BY actormoviesweight DESC
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor
ORDER BY actormoviesweight DESC
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)WITH movie, actor, length((actor)-[:ACTED_IN]-
>()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight:
actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]-
>(genre)
WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres,
directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]-
>(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie,
genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related,
movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie,
genres, directors, actors, writers // 1 row per related movie
ORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as
related, movie, actors, genres, directors, writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers
10x faster
Movie query optimization
MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)
WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight
ORDER BY actormoviesweight DESC // 1 row per actor
WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row
MATCH (movie)-[:HAS_GENRE]->(genre)
WITH movie, actors, collect(genre) as genres // 1 row
MATCH (director)-[:DIRECTED]->(movie)
WITH movie, actors, genres, collect(director.name) as directors // 1 row
MATCH (writer)-[:WRITER_OF]->(movie)
WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row
MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)
WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie,
genres, directors, actors, writers // 1 row per related movie
ORDER BY keywords DESC
WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as
related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword)
RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors,
writers // 1 row
10x faster
Design for Queryability
Model
Design for Queryability
Query
Design for Queryability
Model
Making the implicit explicit
• When you have implicit relationships in the
graph you can sometimes get better query
performance by modeling the relationship
explicitly
Making the implicit explicit
Refactor property to node
Bad:
MATCH (g:Game)
WHERE
g.date > 1343779200
AND g.date < 1369094400
RETURN g
Good:
MATCH (s:Season)-[:contains]->(g)
WHERE season.name = "2012-2013"
RETURN g
Refactor property to node
Conclusion
• Avoid the global scan
• Add indexes / unique constraints
• Split up MATCH statements
• Measure, measure, measure, tweak, repeat
• Soon Cypher will do a lot of this for you!
Bonus tip
• Use transactions/transactional cypher
endpoint
Q & A
• If you have them send them in

Weitere ähnliche Inhalte

Was ist angesagt?

DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
aiuy
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 

Was ist angesagt? (20)

Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Cypher
CypherCypher
Cypher
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data ScienceNeo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data Science
 
Introduction to Cypher
Introduction to Cypher Introduction to Cypher
Introduction to Cypher
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overview
 
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures LibraryAPOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Neo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseNeo4J : Introduction to Graph Database
Neo4J : Introduction to Graph Database
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 

Andere mochten auch

Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
Cloudera, Inc.
 

Andere mochten auch (20)

Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
OrientDB
OrientDBOrientDB
OrientDB
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + Scrapy
 
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Neo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> CypherNeo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> Cypher
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
 
OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Webinar: Stop Complex Fraud in its Tracks with Neo4j
Webinar: Stop Complex Fraud in its Tracks with Neo4jWebinar: Stop Complex Fraud in its Tracks with Neo4j
Webinar: Stop Complex Fraud in its Tracks with Neo4j
 

Ähnlich wie Optimizing Cypher Queries in Neo4j

Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier League
Mark Needham
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 

Ähnlich wie Optimizing Cypher Queries in Neo4j (20)

How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
 
Delivering Winning Results with Sports Analytics and HPCC Systems
Delivering Winning Results with Sports Analytics and HPCC SystemsDelivering Winning Results with Sports Analytics and HPCC Systems
Delivering Winning Results with Sports Analytics and HPCC Systems
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
GraphConnect 2014 SF: Betting the Company on a Graph Database - Part 2
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 
Extracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and PysparkExtracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and Pyspark
 
Building Streaming Recommendation Engines on Apache Spark with Rui Vieira
Building Streaming Recommendation Engines on Apache Spark with Rui VieiraBuilding Streaming Recommendation Engines on Apache Spark with Rui Vieira
Building Streaming Recommendation Engines on Apache Spark with Rui Vieira
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier League
 
Predictions European Championships 2020
Predictions European Championships 2020Predictions European Championships 2020
Predictions European Championships 2020
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Mignesh Birdi Assignment3.pdf
Mignesh Birdi Assignment3.pdfMignesh Birdi Assignment3.pdf
Mignesh Birdi Assignment3.pdf
 
MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...
MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...
MongoDB World 2019: How to Keep an Average API Response Time Less than 5ms wi...
 
Common Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsCommon Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo apps
 
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
 
R for you
R for youR for you
R for you
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 
Simple Strategies for faster knowledge discovery in big data
Simple Strategies for faster knowledge discovery in big dataSimple Strategies for faster knowledge discovery in big data
Simple Strategies for faster knowledge discovery in big data
 
Row Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cRow Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12c
 
Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013
 

Mehr von Neo4j

Mehr von Neo4j (20)

Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
 
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Optimizing Cypher Queries in Neo4j

  • 1. Optimizing Cypher Queries in Neo4j Wes Freeman (@wefreema) Mark Needham (@markhneedham)
  • 2. Today's schedule • Brief overview of cypher syntax • Graph global vs Graph local queries • Labels and indexes • Optimization patterns • Profiling cypher queries • Applying optimization patterns
  • 3. Cypher Syntax • Statement parts o Optional: Querying part (MATCH|WHERE) o Optional: Updating part (CREATE|MERGE) o Optional: Returning part (WITH|RETURN) • Parts can be chained together
  • 4. Cypher Syntax - Refresher MATCH (n:Label)-[r:LINKED]->(m) WHERE n.prop = "..." RETURN n, r, m
  • 5. Starting points • Graph scan (global; potentially slow) • Label scan (usually reserved for aggregation queries; not ideal) • Label property index lookup (local; good!)
  • 7. The 1.9 global scan O(n) n = # of nodes START pl = node(*) MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 150ms w/ 30k nodes, 120k rels
  • 8. The 2.0 global scan MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 130ms w/ 30k nodes, 120k rels O(n) n = # of nodes
  • 9. Why is it a global scan? • Cypher is a pattern matching language • It doesn't discriminate unless you tell it to o It must try to start at all nodes to find this pattern, as specified
  • 10. Introduce a label Label your starting points CREATE (player:Player {name: "Wayne Rooney"} )
  • 11. O(k) k = # of nodes with that labelLabel scan MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 80ms w/ 30k nodes, 120k rels (~900 :Player nodes)
  • 12. Indexes don't come for free CREATE INDEX ON :Player(name) OR CREATE CONSTRAINT ON pl:Player ASSERT pl.name IS UNIQUE
  • 13. O(log k) k = # of nodes with that labelIndex lookup MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 6ms w/ 30k nodes, 120k rels (~900 :Player nodes)
  • 14. Optimization Patterns • Avoid cartesian products • Avoid patterns in the WHERE clause • Start MATCH patterns at the lowest cardinality and expand outward • Separate MATCH patterns with minimal expansion at each stage
  • 16. Anti-pattern: Cartesian Products MATCH (m:Movie), (p:Person)
  • 17. Subtle Cartesian Products MATCH (p:Person)-[:KNOWS]->(c) WHERE p.name="Tom Hanks" WITH c MATCH (k:Keyword) RETURN c, k
  • 18. Counting Cartesian Products MATCH (pl:Player),(t:Team),(g:Game) RETURN COUNT(DISTINCT pl), COUNT(DISTINCT t), COUNT(DISTINCT g) 80000 ms w/ ~900 players, ~40 teams, ~1200 games
  • 19. MATCH (pl:Player) WITH COUNT(pl) as players MATCH (t:Team) WITH COUNT(t) as teams, players MATCH (g:Game) RETURN COUNT(g) as games, teams, players8ms w/ ~900 players, ~40 teams, ~1200 games Better Counting
  • 20. Directions on patterns MATCH (p:Person)-[:ACTED_IN]-(m) WHERE p.name = "Tom Hanks" RETURN m
  • 21. Parameterize your queries MATCH (p:Person)-[:ACTED_IN]-(m) WHERE p.name = {name} RETURN m
  • 22. Fast predicates first Bad: MATCH (t:Team)-[:played_in]->(g) WHERE NOT (t)-[:home_team]->(g) AND g.away_goals > g.home_goals RETURN t, COUNT(g)
  • 23. Better: MATCH (t:Team)-[:played_in]->(g) WHERE g.away_goals > g.home_goals AND NOT (t)-[:home_team]->() RETURN t, COUNT(g) Fast predicates first
  • 24. Patterns in WHERE clauses • Keep them in the MATCH • The only pattern that needs to be in a WHERE clause is a NOT
  • 25. MERGE and CONSTRAINTs • MERGE is MATCH or CREATE • MERGE can take advantage of unique constraints and indexes
  • 26. MERGE (without index) MERGE (g:Game {date:1290257100, time: 1245, home_goals: 2, away_goals: 3, match_id: 292846, attendance: 60102}) RETURN g 188 ms w/ ~400 games
  • 27. Adding an index CREATE INDEX ON :Game(match_id)
  • 28. MERGE (with index) MERGE (g:Game {date:1290257100, time: 1245, home_goals: 2, away_goals: 3, match_id: 292846, attendance: 60102}) RETURN g 6 ms w/ ~400 games
  • 29. Alternative MERGE approach MERGE (g:Game { match_id: 292846 }) ON CREATE SET g.date = 1290257100 SET g.time = 1245 SET g.home_goals = 2 SET g.away_goals = 3 SET g.attendance = 60102 RETURN g
  • 30. Profiling queries • Use the PROFILE keyword in front of the query o from webadmin or shell - won't work in browser • Look for db_hits and rows • Ignore everything else (for now!)
  • 32. Football Optimization MATCH (game)<-[:contains_match]-(season:Season), (team)<-[:away_team]-(game), (stats)-[:in]->(game), (team)<-[:for]-(stats)<-[:played]-(player) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 103137 ms w/ ~900 players, ~20 teams, ~400 games
  • 33. Football Optimization ==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-e746407c504a", " INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0) ==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8- e746407c504a,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE240cfcd2-24d9-48a2- 8ca9-fb0286f3d323,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899) ==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED108", "season", " UNNAMED55", "player", "team", " UNNAMED124", " UNNAMED85", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192) ==> PatternMatch(g="(player)-[' UNNAMED124']-(stats)", _rows=5192, _db_hits=0) ==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=5192, _db_hits=15542) ==> TraversalMatcher(trail="(season)-[ UNNAMED12:contains_match WHERE true AND true]->(game)<-[ UNNAMED85:in WHERE true AND true]-(stats)-[ UNNAMED108:for WHERE true AND true]->(team)<-[ UNNAMED55:away_team WHERE true AND true]- (game)", _rows=15542, _db_hits=1620462)
  • 34. Break out the match statements MATCH (game)<-[:contains_match]-(season:Season) MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10200 ms w/ ~900 players, ~20 teams, ~400 games
  • 35. Start small • Smallest cardinality label first • Smallest intermediate result set first
  • 36. Exploring cardinalities MATCH (game)<-[:contains_match]-(season:Season) RETURN COUNT(DISTINCT game), COUNT(DISTINCT season) 1140 games, 3 seasons MATCH (team)<-[:away_team]-(game:Game) RETURN COUNT(DISTINCT team), COUNT(DISTINCT game) 25 teams, 1140 games
  • 37. Exploring cardinalities MATCH (stats)-[:in]->(game:Game) RETURN COUNT(DISTINCT stats), COUNT(DISTINCT game) 31117 stats, 1140 games MATCH (stats)<-[:played]-(player:Player) RETURN COUNT(DISTINCT stats), COUNT(DISTINCT player) 31117 stats, 880 players
  • 38. Look for teams first MATCH (team)<-[:away_team]-(game:Game) MATCH (game)<-[:contains_match]-(season) WHERE season.name = "2012-2013" MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10162 ms w/ ~900 players, ~20 teams, ~400 games
  • 39. ==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-b0c7ec85c969", " INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0) ==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297- b0c7ec85c969,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE199af213-e3bd-400f- aba9-8ca2a9e153c5,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899) ==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED168", "season", " UNNAMED125", "player", "team", " UNNAMED152", " UNNAMED51", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192) ==> PatternMatch(g="(stats)-[' UNNAMED152']-(team),(player)-[' UNNAMED168']-(stats)", _rows=5192, _db_hits=0) ==> PatternMatch(g="(stats)-[' UNNAMED125']-(game)", _rows=10394, _db_hits=0) ==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=380, _db_hits=380) ==> PatternMatch(g="(season)-[' UNNAMED51']-(game)", _rows=380, _db_hits=1140) ==> TraversalMatcher(trail="(game)-[ UNNAMED12:away_team WHERE true AND true]->(team)", _rows=1140, _db_hits=1140) Look for teams first
  • 40. Filter games a bit earlier MATCH (game)<-[:contains_match]-(season:Season) WHERE season.name = "2012-2013" MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10148 ms w/ ~900 players, ~20 teams, ~400 games
  • 41. Filter out stats with no goals MATCH (game)<-[:contains_match]-(season:Season) WHERE season.name = "2012-2013" MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game)WHERE stats.goals > 0 MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10 59 ms w/ ~900 players, ~20 teams, ~400 games
  • 42. Movie query optimization MATCH (movie:Movie {title: {title} }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 43. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 44. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })MATCH (genre)<-[:HAS_GENRE]- (movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 45. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]- >(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 46. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]- >(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 47. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]- >(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 48. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]- >(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 49. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 50. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 51. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 52. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<- [:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 53. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  • 54. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor ORDER BY actormoviesweight DESC WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 55. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor ORDER BY actormoviesweight DESC WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 56. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)WITH movie, actor, length((actor)-[:ACTED_IN]- >()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 57. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 58. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]- >(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 59. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]- >(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 60. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  • 61. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers // 1 row 10x faster
  • 65. Making the implicit explicit • When you have implicit relationships in the graph you can sometimes get better query performance by modeling the relationship explicitly
  • 67. Refactor property to node Bad: MATCH (g:Game) WHERE g.date > 1343779200 AND g.date < 1369094400 RETURN g
  • 68. Good: MATCH (s:Season)-[:contains]->(g) WHERE season.name = "2012-2013" RETURN g Refactor property to node
  • 69. Conclusion • Avoid the global scan • Add indexes / unique constraints • Split up MATCH statements • Measure, measure, measure, tweak, repeat • Soon Cypher will do a lot of this for you!
  • 70. Bonus tip • Use transactions/transactional cypher endpoint
  • 71. Q & A • If you have them send them in