Complex hierarchical relationships between entities can only be mapped with difficulty in a relational database and demanding queries are usually quite slow.
Graph databases are optimized for exactly these kinds of relationships and can provide high-performance results even with huge amounts of data. Moreover, not only the entities that are stored in the database, have attributes, but also their relationships. Queries can look at entities as well as their relationships.
Get to know the basics of graph databases, using Neo4j as an example, and see how it is used C# projects.
4. What is it with Relationships?
• World is full of connected people, events, things
• There is “Value in Relationships” !
• What about Data Relationships?
• How do you store your object model?
• How do you explain
JOIN tables to your boss?
5. Neo4j – allows you to connect the dots
• Was built to efficiently
• store,
• query and
• manage highly connected data
• Transactional, ACID
• Real-time OLTP
• Open source
• Highly scalable on few machines
6. Value from Data Relationships
Common Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
11. The Whiteboard Model is the Physical Model
Eliminates Graph-to-
Relational Mapping
In your data model
Bridge the gap
between business
and IT models
In your application
Greatly reduce need
for application code
12. CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
LOVES
LOVES
LIVES WITH
PERSON PERSON
13. Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
14. Getting Data into Neo4j
Cypher-Based “LOAD CSV” Capability
• Transactional (ACID) writes
• Initial and incremental loads of up to
10 million nodes and relationships
Command-Line Bulk Loader neo4j-import
• For initial database population
• For loads up to 10B+ records
• Up to 1M records per second
4.58 million things
and their relationships…
Loads in 100 seconds!
CSV
15. From RDBMS to Neo4j
Relational Pains =
Graph Pleasure
16. Relational DBs Can’t Handle Relationships Well
• Cannot model or store data and relationships
without complexity
• Performance degrades with number and levels
of relationships, and database size
• Query complexity grows with need for JOINs
• Adding new types of data and relationships
requires schema redesign, increasing time to
market
… making traditional databases inappropriate
when data relationships are valuable in real-time
Slow development
Poor performance
Low scalability
Hard to maintain
17. Unlocking Value from Your Data Relationships
• Model your data naturally as a graph
of data and relationships
• Drive graph model from domain and
use-cases
• Use relationship information in real-
time to transform your business
• Add new relationships on the fly to
adapt to your changing requirements
18. High Query Performance with a Native Graph DB
• Relationships are first class citizen
• No need for joins, just follow pre-
materialized relationships of nodes
• Query & Data-locality – navigate out
from your starting points
• Only load what’s needed
• Aggregate and project results as you go
• Optimized disk and memory model for
graphs
19. MATCH (boss)-[:MANAGES*0..3]->(mgr)
WHERE boss.name = "John Doe" AND
(mgr)-[:MANAGES]->()
RETURN mgr.name AS Manager,
size((mgr)-[:MANAGES*1..3]->()) AS Total
Express Complex Queries Easily with Cypher
Find all reports and how many
people they manage,
each up to 3 levels down
Cypher Query
SQL Query
20. High Query Performance: Some Numbers
• Traverse 2-4M+ relationships per
second and core
• Cost based query optimizer –
complex queries return in
milliseconds
• Import 100K-1M records per second
transactionally
• Bulk import tens of billions of records
in a few hours
25. Query Comparison: Colleagues of Tom Hanks?
SELECT *
FROM Person as actor
JOIN ActorMovie AS am1 ON (actor.id = am1.actor_id)
JOIN ActorMovie AS am2 ON (am1.movie_id = am2.movie_id)
JOIN Person AS coll ON (coll.id = am2.actor_id)
WHERE actor.name = "Tom Hanks“
MATCH
(actor:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coll:Person)
WHERE actor.name = "Tom Hanks"
RETURN *
27. Most prolific actors and their filmography?
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, count(*), collect(m.title) as movies
ORDER BY count(*) desc, p.name asc
LIMIT 10;
29. Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2
• Uses database stats to select best plan
• Currently for Read Operations
• Query Plan Visualizer, finds
• Non optimal queries
• Cartesian Product
• Missing Indexes, Global Scans
• Typos
• Massive Fan-Out
31. Neo4j Remoting Protocols
• Cypher HTTP Endpoint is
• Fast
• Transactional (multi-request)
• Streaming
• Batching
• Parameters
• Statistics, Query Plan, Result Representations
:POST /db/data/transaction/commit
{"statements":[{"statement":
"MATCH (p:Person) WHERE p.name = {name} RETURN p",
"parameters":{"name":"Clint Eastwood"}}]}
• Up next: binary protocol
32. Neo4j for .Net Developers
Install, Drivers, Deployment,
Hosting
33. Neo4j for .Net Developers
Don’t be afraid or disgusted, because “Java”
It’s just a database implemented in some language
You’ll rarely see it.
34. Neo4j for .Net Developers - Installation
• Neo4j Windows Installer was first
• Chocolatey Packages for Neo4j
• Upcoming in Neo4j 2.3 - full PowerShell support
• Just install Neo4j as a service
• More to come
35. Neo4j for .Net Developers - Drivers
• Neo4jClient – one of the first Neo4j Drivers
• by Readify Australia
• Uses Neo4j’s HTTP APIs
• Opinionated
• Query DSL
• NetGain – new and thin layer over APIs
• New Drivers for binary protocol
36. Neo4j for .Net Developers – Development & Deployment
• Develop
• on Windows with Visual Studio
• everywhere with Mono / Xamarin
• Develop locally with local Neo4j instance
• Deploy to Azure, use provisioned instances
37. Neo4j on Azure – Hosting / Provisioning
• Hosted Neo4j Databases by GrapheneDB
• Just install on Linux instance
• VMDepot Images
• Upcoming: Docker
40. Single Page WebApp on the Movie Dataset
• Bootstrap
• Javascript (jQuery)
• 3 json http-endpoints
• Single: /movie/title/The%20Matrix
• Search: /search?query=Matrix
• Graph: /graph?limit=100
• Send XHR, Render results
41. Data Model
public class Person
{
public string name { get; set; }
public int born { get; set; }
}
public class Movie
{
public string title { get; set; }
public int released { get; set; }
public string tagline { get; set; }
}
ACTED_IN|
DIRECTED|…
name,born
Forrest
Gump
title
release
tagline
42. Setup
• Add Neo4jClient as dependency
• Store GraphDB-URL in WebConfig
• Connect in WebApiConfig
var url = AppSettings["GraphDBUrl"];
var client = new GraphClient(new Uri(url));
client.Connect();
43. Routes & Controllers
• Provide Routes for
• index.html and
• 3 endpoints
• 4 Controllers:
• query with parameter,
• return results as JSON
[RoutePrefix("search")]
public class SearchController : ApiController {
[HttpGet] [Route("")]
public IHttpActionResult SearchMoviesByTitle(string q) {
var data = WebApiConfig.GraphClient.Cypher
.Match("(m:Movie)")
.Where("m.title =~ {title}")
.WithParam("title", "(?i).*" + q + ".*")
.Return<Movie>("m")
.Results.ToList();
return Ok(data.Select(c => new { movie = c}));
}
}
45. Neo4j Clustering
Architecture Optimized for Speed & Availability at Scale
45
Performance Benefits
• No network hops within queries
• Real-time operations with fast and
consistent response times
• Cache sharding spreads cache across
cluster for very large graphs
Clustering Features
• Master-slave replication with
master re-election and failover
• Each instance has its own local cache
• Horizontal scaling & disaster recovery
Load Balancer
Neo4jNeo4jNeo4j
46. MIGRATE
ALL DATA
MIGRATE
GRAPH DATA
DUPLICATE
GRAPH DATA
Non-graph data Graph data
Graph dataAll data
All data
Relational
Database
Graph
Database
Application
Application
Application
Three Ways to Migrate Data to Neo4j
47. Data Storage and
Business Rules Execution
Data Mining
and Aggregation
Neo4j Fits into Your Enterprise Environment
Application
Graph Database Cluster
Neo4j Neo4j Neo4j
Ad Hoc
Analysis
Bulk Analytic
Infrastructure
Graph Compute Engine
EDW …
Data
Scientist
End User
Databases
Relational
NoSQL
Hadoop
In the data model = logical-to-physical mismatch
In the application = graph-tabular impedance mismatch, which is greater than the object-relational mismatch, which famously takes up 40% of project code & cost
Presenter Notes - Challenges with current technologies?
Database options are not suited to model or store data as a network of relationships
Performance degrades with number and levels of relationships making it harder to use for real-time applications
Not flexible to add or change relationships in realtime
Presenter Notes - How does one take advantage of data relationships for real-time applications?
To take advantage of relationships
Data needs to be available as a network of connections (or as a graph)
Real-time access to relationship information should be available regardless of the size of data set or number and complexity of relationships
The graph should be able to accommodate new relationships or modify existing ones
Presenter Notes - How does one take advantage of data relationships for real-time applications?
To take advantage of relationships
Data needs to be available as a network of connections (or as a graph)
Real-time access to relationship information should be available regardless of the size of data set or number and complexity of relationships
The graph should be able to accommodate new relationships or modify existing ones
Presenter Notes - How does one take advantage of data relationships for real-time applications?
To take advantage of relationships
Data needs to be available as a network of connections (or as a graph)
Real-time access to relationship information should be available regardless of the size of data set or number and complexity of relationships
The graph should be able to accommodate new relationships or modify existing ones
In the near future, many of your apps will be driven by data relationships and not transactions
You can unlock value from business relationships with Neo4j