Presentation is about Neo4j database. Some slides i have taken from other presentations as well, but it will you some basic idea.
For sample exercies in the end, you can go with this schema:
1.) http://www.neo4j.org/graphgist?7820655
2.) Sample Movie Schema comes by default
12. Database full of linked nodes
Stores data as nodes and relationships
12
13. Some More Info..
• Widely Used
• Full ACID transactions
• Schema free, bottom-up data model design
• It’s a stable and in active developlment since
mid 2000
• High Performance, can cover upto 2 billion
nodes.
• Lot of Bindings
13
14. How do you query and work with
this graph database?
• Run it as
– Embedded
– Standalone
• Query it with
– Cyhper from Neo4j Browser
– Cypher from Java
– Rest API
– Rest API using Cypher query
– Programatically
– Gremlin
14
16. Cypher Query
• MATCH
– Matches the graph pattern in the real graph.
• WHERE
– Filters using predicates or anchors pattern elements.
• RETURN
– Returns and projects result data, also handles aggregation.
• ORDER BY
– Sorts the query result.
• SKIP/LIMIT
– Paginates the query result.
16
17. Go to Console…
• Neo4j start
• Go to url :
– http://localhost:7474/browser/
17
23. Core API
Neo4j Logical Architecture
REST API
JVM Language Bindings
Traversal Framework
Caches
Memory-Mapped (N)IO
Filesystem
Java Ruby Clojure…
Graph Matching
23
24. Graph Algorithms
• algorithm can have one of these values:
– shortestPath
– allSimplePaths
– allPaths
– dijkstra (optionally with cost_property and
default_cost parameters)
24
25. Graph Algorithms
• shortestPath()
MATCH p =
ShortestPath
( (keanu:Person)-[:KNOWS*]-(kevin:Person) )
WHERE
keanu.name="Keanu Reeves"
and
kevin.name = "Kevin Bacon"
RETURN
length(p)
25
26. Indexing a Graph?
• Graphs are their own indexes!
• But sometimes we want short-cuts to well-
known nodes
• Can do this in our own code
– Just keep a reference to any interesting nodes
e.g.
CREATE INDEX ON :MOVIE(title);
Don’t index every node!
26
27. Inbuilt Lucene Support
• Default index implementation for Neo4j
• Each index supports nodes or relationships
• Supports exact and regex-based matching
– Query(‘index_name’,’pattern’)
• Supports scoring
– Number of hits in the index for a given item
– Great for recommendations!
27
28. Scalability
• The HA component supports master-slave
replication
• But Scaling graphs are HARD.
• Acc to Emil -> Superb performance till some
billions of nodes
28
30. Pros and Cons
• Strengths
– Powerful data model
– Fast
• For connected data, can be many orders of magnitude
faster than RDBMS
• Weaknesses:
– Sharding
• Though they can scale reasonably well
• And for some domains you can shard too!
30
32. Sample Exercises
• Find all Persons who killed other in
Mahabharat.
• Find movies in which actor has acted along
with directed it.
• the five actors who have acted in the most
movies
• Shortest path between Kunti and Sehdev
• Find persons got killed by Kunti’s sons
32
The exponential growth of the volume of data generated by users, systems and sensors, further accelerated by some big distributed systems like Amazon, Google and other cloud services.
Every minute some systems are persisting around GBs of data .
Quora Facebook example
The increasing interdependency and complexity of data, accelerated by the Internet, Web2.0, social networks and open and standardized access to data sources from a large number of different systems.
individual, isolated, and discrete data items, but also the connections between them
And while we could easily fit the discrete data in relational tables, the connected data was more challenging to store and tremendously slow to query. the property graph model intuitive and easy to understand.