Inroduction of Grapgh database concepts, explained by comparing the widely popular relational databases and the the sql query language. Neo4j and cypher is used to describe how graph databases work in real life
Understanding Graph Databases with Neo4j and Cypher
1. Understanding Graph Databases with Neo4j
and Cypher
Group Members
S.S. Niranga MS-14901836
Nipuna Pannala MS-14902208
Ruhaim Izmeth MS-14901218
2. Trends in Data
Data is getting bigger:
“Every 2 days we
create as much
information as we did
up to 2003”
– Eric Schmidt, Google
3. The History of Graph Theory
● 1736: Leonard Euler writes a paper on the “Seven Bridges of Konisberg”
● 1845: Gustav Kirchoff publishes his electrical circuit laws
● 1852: Francis Guthrie poses the “Four Color Problem”
● 1878: Sylvester publishes an article in Nature magazine that describes graphs
● 1936: Dénes Kőnig publishes a textbook on Graph Theory
● 1941: Ramsey and Turán define Extremal Graph Theory
● 1959: De Bruijn publishes a paper summarizing Enumerative Graph Theory
● 1959: Erdos, Renyi and Gilbert define Random Graph Theory
● 1969: Heinrich Heesch solves the “Four Color” problem
● 2003: Commercial Graph Database products start appearing on the market
4. What is Graph database?
“A traditional relational database may tell you
the average age of everyone in this room..
..but a graph database will tell you who
is most likely to buy you a beer!”
7. What is a Graph Database?
● A database with an explicit graph structure
● Each node knows its adjacent nodes
● As the number of nodes increases, the cost of a local
step (or hop) remains the same
● Plus an Index for lookups
11. What is Neo4j?
● Neo4j is an open-source graph database, implemented in Java.
● Neo4j version 1.0 was released in February, 2010.
● Neo4j version 2.0 was released in December, 2013
● Neo4j was developed by Neo Technology, Inc.
● Neo Technology board of directors consists of Rod Johnson,
(founder of the Spring Framework), Magnus Christerson (Vice
President of Intentional Software Corp), Nikolaj Nyholm (CEO
of Polar Rose), Sami Ahvenniemi (Partner at Conor Venture
Partners) and Johan Svensson (CTO of Neo Technology).
21. Relational Schema
Person
p_id p_name
Book
b_id b_title
p_type
Wrote
p_id b_id
Purchased
p_id b_id pur_date
22. Cypher - Few Keywords
General Clauses
● Return
● Order by
● Limit
Writing Clauses
● Create
● Merge
● Set
● Delete
● Remove
Reading Clauses
● Match
● Optional Match
● Where
● Aggregation
Functions
● Predicates
● Scalar functions
● Collection functions
● Mathematical functions
● String functions
See Full list at Cypher RefCard
http://neo4j.com/docs/stable/cypher-refcard/
25. Cypher
Modifying nodes
MATCH (p:Person { namme:"Alan" })
SET p += {name2 : "Alan2"}
MATCH (p:Person { namme:"Alan" })
SET p.name = "Alan"
MATCH (p:Person { namme:"Alan" })
SET p = {name : "Alan"}
CREATE (:Person { namme:"Alan" })
MATCH (p:Person { name2:"Alan2" })
DELETE p
MATCH (p:Person { namme:"Alan" })
REMOVE p.namme
27. Cypher - Creating
Relationships
CREATE
(john:Person:Author { name:"John Le Carre" }),
(b:Book { title:"Tinker, Tailor, Soldier, Spy" }),
(john)-[:WROTE]->(b)
MATCH
(p:Person { name:"Ian" }),
(b:Book { title:"Our Man in Havana" })
MERGE
(p)-[:PURCHASED { date:"09-09-2011" }]->(b)
MATCH
(graham:Person:Author { name:"
Graham Greene" }),
(b:Book { title:"Our Man in
Havana" })
MERGE (graham)-[:WROTE]->
(b)
MATCH (t:Book { title:"Tinker, Tailor, Soldier, Spy" }),
(i:Person { name:"Ian" }),
(a:Person { name:"Alan" })
MERGE
(i)-[:PURCHASED { date:"03-02-2011" }]->(t)<-[:PURCHASED { date:"05-07-2011" }]-(a)
28. Cypher - Modifying Relationships
MATCH
(graham:Person:Author { name:"Graham Greene" }),
(b:Book { title:"Our Man in Havana" })
MERGE (graham)-[:WORTE]->(b)
MATCH
(graham:Person {name:"Graham Greene"})-[r]->(b:Book {title:"Our Man in Havana" })
DELETE r
MATCH (p:Person { name:"Ian" })-[r]->(b:Book { title:"Our Man in Havana" })
SET r.date = "09-09-2012"
29. Cypher - Querying DBs
Find All Books
SQL
SELECT * FROM Books
Cypher Query
MATCH (b:Book)
RETURN b
Person (p_id, p_name, p_type)
Wrote (p_id, b_id)
Book (b_id, b_title )
Purchased (p_id, b_id,
pur_date)
Cypher Result
+-----------------------------------------------+
| b |
+-----------------------------------------------+
| Node[2]{title:"Tinker, Tailor, Soldier, Spy"} |
| Node[3]{title:"Our Man in Havana"} |
+-----------------------------------------------+
2 rows
2 ms
30. Cypher - Querying DBs
Find All Authors
SQL
SELECT * FROM Person where p_type=”
Author”
Cypher Query
MATCH (a:Author)
RETURN a
Person (p_id, p_name, p_type)
Wrote (p_id, b_id)
Book (b_id, b_title )
Purchased (p_id, b_id,
pur_date)
Cypher Result
+-------------------------------+
| a |
+-------------------------------+
| Node[0]{name:"John Le Carre"} |
| Node[1]{name:"Graham Greene"} |
+-------------------------------+
2 rows
8 ms
31. Cypher - Querying DBs
Find All Authors and the Books written by them
SQL
SELECT p.p_name, b.b_title
FROM Person p, Wrote w,
Book b
where p.p_type=”Author” and
w.p_id = p.p_id and
w.b_id = b.b_id
Cypher Query
Person (p_id, p_name, p_type)
Wrote (p_id, b_id)
Book (b_id, b_title )
Purchased (p_id, b_id,
pur_date)
MATCH (a:Author)-[:WROTE]->(b:
Book)
RETURN a,b
Cypher Result
+-------------------------------------------------------------------------------+
| a | b |
+-------------------------------------------------------------------------------+
| Node[0]{name:"John Le Carre"} | Node[2]{title:"Tinker, Tailor, Soldier, Spy"} |
| Node[1]{name:"Graham Greene"} | Node[3]{title:"Our Man in Havana"} |
+-------------------------------------------------------------------------------+
2 rows
12 ms
32. Cypher - Querying DBs
Find Books written by Graham Greene
SQL
SELECT b.b_title
FROM Person p, Wrote w,
Book b
where p.p_type=”Author” and
w.p_id = p.p_id and
w.b_id = b.b_id and
p.name = “Graham Greene”
Person (p_id, p_name, p_type)
Wrote (p_id, b_id)
Book (b_id, b_title )
Purchased (p_id, b_id,
pur_date)
Cypher Query
MATCH (a:Author)-[:WROTE]->(b:
Book)
WHERE a.name = 'Graham Greene'
RETURN b
Cypher Result
+------------------------------------+
| b |
+------------------------------------+
| Node[3]{title:"Our Man in Havana"} |
+------------------------------------+
1 row
13 ms
33. Cypher - Querying DBs
Find names of all persons, the books they purchased
and the date the purchase was made
SQL
SELECT p.p_name, pur.pur_date,
b.b_title
FROM Person p, Book b,
Purchased pur
WHERE pur.p_id=p.p_id and b.
b_id = pur.b_id
Person (p_id, p_name, p_type)
Wrote (p_id, b_id)
Book (b_id, b_title )
Purchased (p_id, b_id,
pur_date)
Cypher Query
MATCH
(a)-[r:PURCHASED]->(b)
RETURN a,r.date,b
Cypher Result
+-------------------------------------------------------------------------------------+
| a | r.date | b |
+-------------------------------------------------------------------------------------+
| Node[4]{name:"Ian"} | "09-09-2011" | Node[3]{title:"Our Man in Havana"} |
| Node[4]{name:"Ian"} | "03-02-2011" | Node[2]{title:"Tinker, Tailor, Soldier, Spy"} |
| Node[5]{name:"Alan"} | "05-07-2011" | Node[2]{title:"Tinker, Tailor, Soldier, Spy"} |
+-------------------------------------------------------------------------------------+
3 rows
34. Cypher - Querying DBs
Find how Graham Greene is related to Ian
SQL
I won’t attempt!!!
Person (p_id, p_name, p_type)
Wrote (p_id, b_id)
Book (b_id, b_title )
Purchased (p_id, b_id,
pur_date)
Cypher Query
MATCH
(a:Author)-[r*]-(p:Person { name:'Ian' })
WHERE a.name = 'Graham Greene'
RETURN a,r,p
Cypher Result
+--------------------------------------------------------------------------------------------------------+
| a | r | p |
+--------------------------------------------------------------------------------------------------------+
| Node[1]{name:"Graham Greene"} | [:WROTE[1] {},:PURCHASED[0] {date:"09-09-2011"}] | Node[4]{name:"Ian"} |
+--------------------------------------------------------------------------------------------------------+
1 row
38 ms
35. Support for Graph Algorithms
● shortestPath
● allSimplePaths
● allPaths
● dijkstra (optionally with
cost_property and
default_cost
parameters)
36. Neo4j - Default locking behavior for
Concurrency
● When adding, changing or removing a property on a
node or relationship a write lock will be taken on the
specific node or relationship.
● When creating or deleting a node a write lock will be
taken for the specific node.
● When creating or deleting a relationship a write lock will
be taken on the specific relationship and both its nodes.
37. Neo4j - Performance
● As JVM runs on a shared environment, the way the
JVM is configured greatly related to Performance.
● More optimized for querying than CRUD operations,
Batch updates are recommended
● Indexes can be set on nodes, relationships and their
properties. Can boost query response times
● Mixed reports on querytimes and performance,
upcoming releases are optimizing this.
38. Neo4j Capacity - Data size
In Neo4j, data size is mainly limited by the address space
of the primary keys for Nodes, Relationships, Properties
and Relationship types. Currently, the address space is as
follows:
nodes 2^35 (∼ 34 billion)
relationships 2^35 (∼ 34 billion)
properties 2^36 to 2^38 depending on property types (maximum ∼
274 billion, always at least ∼ 68 billion)
relationship
types
2^15 (∼ 32 000)
39. Calling Neo4j
JVM Server
Neo4j DB
Java Application
Web Application Web REST API
Java API
Officially supported languages
● Java
● .NET
● JavaScript
● Python
● Ruby
● PHP
40. Neo4j Editions
Enterprise
Enterprise Lock Manager
High Performance Cache
Clustering
Hot Backups
Advanced Monitoring
NOT FREE
Community
FREE
OPEN SOURCE
41. If you’ve ever
● Joined more than 7 tables together
● Modeled a graph in a table
● Written a recursive CTE (Common Table Expression)
● Tried to write some crazy stored procedure with multiple
recursive self and inner joins
You should use Neo4j
42. Disadvantages
● JVM should configured properly to get the
optimal performance.
● Neo4j DB cannot be distributed. They should
replicated.
● Inappropriate for transactional information
like accounting and banking.