This document provides an overview of GraphDB and Neo4j. It discusses why graphs are useful for modeling connected data and common use cases. It also summarizes Neo4j's transactional graph database capabilities, performance advantages, and deployment options. Key topics covered include causal clustering, query planning, and driver and tooling support for developers.
7. Value from Data Relationships
Common Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
8. The Rise of Connections in Data
Networks of People Business Processes Knowledge Networks
E.g., Risk management, Supply
chain, Payments
E.g., Employees, Customers,
Suppliers, Partners,
Influencers
E.g., Enterprise content,
Domain specific content,
eCommerce content
Data connections are increasing as rapidly as data volumes
9. 9
Harnessing Connections Drives Business Value
Enhanced Decision
Making
Hyper
Personalization
Massive Data
Integration
Data Driven Discovery
& Innovation
Product Recommendations
Personalized Health Care
Media and Advertising
Fraud Prevention
Network Analysis
Law Enforcement
Drug Discovery
Intelligence and Crime Detection
Product & Process Innovation
360 view of customer
Compliance
Optimize Operations
Connected Data at the Center
AI & Machine
Learning
Price optimization
Product Recommendations
Resource allocation
Digital Transformation Megatrends
13. Newcomers in the last 3 years
• DSE Graph
• Agens Graph
• IBM Graph
• JanusGraph
• Tibco GraphDB
• Microsoft CosmosDB
• TigerGraph
• MemGraph
• AWS Neptune
• SAP HANA Graph
21. Cancer Research - Candiolo Cancer Institute
“Our application relies on complex
hierarchical data, which required a more
flexible model than the one provided by
the traditional relational database
model,” said Andrea Bertotti, MD
neo4j.com/case-studies/candiolo-cancer-institute-ircc/
22. Graph Databases in Healthcare and Life Sciences
14 Presenters from all around Europe on:
• Genome
• Proteome
• Human Pathway
• Reactome
• SNP
• Drug Discovery
• Metabolic Symbols
• ...
neo4j.com/blog/neo4j-life-sciences-healthcare-workshop-berlin/
27. 30
• Record “Cyber Monday” sales
• About 35M daily transactions
• Each transaction is 3-22 hops
• Queries executed in 4ms or less
• Replaced IBM Websphere commerce
• 300M pricing operations per day
• 10x transaction throughput on half the
hardware compared to Oracle
• Replaced Oracle database
• Large postal service with over 500k
employees
• Neo4j routes 7M+ packages daily at peak,
with peaks of 5,000+ routing operations per
second.
Handling Large Graph Work Loads for Enterprises
Real-time promotion
recommendations
Marriott’s Real-time
Pricing Engine
Handling Package
Routing in Real-Time
35. The Whiteboard Model Is the Physical Model
Eliminates Graph-to-
Relational Mapping
In your data model
Bridge the gap
between business
and IT models
In your application
Greatly reduce need
for application code
36. CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
LOVES
LOVES
LIVES WITH
PERSON PERSON
37. Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
38. Relational Versus Graph Models
Relational Model Graph Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA
51. You all know SQL
SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o
ON (c.CustomerID = o.CustomerID)
JOIN order_details AS od
ON (o.OrderID = od.OrderID)
JOIN products AS p
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = 'Chocolat'
55. Basic Pattern: Customers Orders?
MATCH (:Customer {custName:"Delicatessen"} ) -[:ORDERED]-> (order:Order) RETURN order
VAR LABEL
NODE NODE
LABEL PROPERTY
ORDERED
Customer OrderOrder
REL
56. Basic Query: Customer's Orders?
MATCH (c:Customer)-[:ORDERED]->(order)
WHERE c.customerName = 'Delicatessen'
RETURN *
57. Basic Query: Customer's Frequent Purchases?
MATCH (c:Customer)-[:ORDERED]->
()-[:INCLUDES]->(p:Product)
WHERE c.customerName = 'Delicatessen'
RETURN p.productName, count(*) AS freq
ORDER BY freq DESC LIMIT 10;
61. openCypher...
...is a community effort to evolve Cypher, and to
make it the most useful language for querying
property graphs
openCypher implementations
SAP Hana Graph, Redis, Agens Graph, Cypher.PL, Neo4j
62. github.com/opencypher Language Artifacts
● Cypher 9 specification
● ANTLR and EBNF Grammars
● Formal Semantics (SIGMOD)
● TCK (Cucumber test suite)
● Style Guide
Implementations & Code
● openCypher for Apache Spark
● openCypher for Gremlin
● open source frontend (parser)
● ...
63. Cypher 10
● Next version of Cypher
● Actively working on natural language specification
● New features
○ Subqueries
○ Multiple graphs
○ Path patterns
○ Configurable pattern matching semantics
65. Extending Neo4j -
User Defined Procedures & Functions
Neo4j Execution Engine
User Defined
Procedure
User Defined
Functions
Applications
Bolt
User Defined Procedures & Functions let
you write custom code that is:
• Written in any JVM language
• Deployed to the Database
• Accessed by applications via Cypher
69. ”Graph analysis is possibly the single most effective
competitive differentiator for organizations pursuing data-
driven operations and decisions“
The Impact of Connected Data
70. Existing Options (so far)
•Data Processing
•Spark with GraphX, Flink with Gelly
•Gremlin Graph Computer
•Dedicated Graph Processing
•Urika, GraphLab, Giraph, Mosaic, GPS,
Signal-Collect, Gradoop
•Data Scientist Toolkit
•igraph, NetworkX, Boost in Python, R, C
71.
72. Goal: Iterate Quickly
•Combine data from sources into one graph
•Project to relevant subgraphs
•Enrich data with algorithms
•Traverse, collect, filter aggregate
with queries
•Visualize, Explore, Decide, Export
•From all APIs and Tools
73.
74. 1. Call as Cypher procedure
2. Pass in specification (Label, Prop, Query) and configuration
3. ~.stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
4. non-stream variant writes results to graph returns statistics
CALL algo.<name>('Label','TYPE',{conf})
Usage
75. Pass in Cypher statement for node- and relationship-lists.
CALL algo.<name>(
'MATCH ... RETURN id(n)',
'MATCH (n)-->(m)
RETURN id(n) as source,
id(m) as target', {graph:'cypher'})
Cypher Projection
78. Data Storage and
Business Rules Execution
Data Mining
and Aggregation
Neo4j Fits into Your Environment
Application
Graph Database Cluster
Neo4j Neo4j Neo4j
Ad Hoc
Analysis
Bulk Analytic
Infrastructure
Graph Compute Engine
EDW …
Data
Scientist
End User
Databases
Relational
NoSQL
Hadoop
79. Official Language Drivers
• Foundational drivers for popular
programming languages
• Bolt: streaming
binary wire protocol
• Authoritative mapping to
native type system,
uniform across drivers
• Pluggable into richer frameworks
JavaScript Java .NET Python PHP, ....
Drivers
Bolt
80. Bolt + Official Language Drivers
http://neo4j.com/developer/ http://neo4j.com/developer/language-guides/
81. Using Bolt: Official Language Drivers look all the same
With JavaScript
var driver = Graph.Database.driver("bolt://localhost");
var session = driver.session();
var result = session.run("MATCH (u:User) RETURN u.name");
82. neo4j.com/developer/spring-data-neo4j
Spring Data Neo4j Neo4j OGM
@NodeEntity
public class Talk {
@Id @GeneratedValue
Long id;
String title;
Slot slot;
Track track;
@Relationship(type="PRESENTS",
direction=INCOMING)
Set<Person> speaker = new HashSet<>();
}
83. Spring Data Neo4j Neo4j OGM
interface TalkRepository extends Neo4jRepository<Talk, Long> {
@Query("MATCH (t:Talk)<-[rating:RATED]-(user)
WHERE t.id = {talkId} RETURN rating")
List<Rating> getRatings(@Param("talkId") Long talkId);
List<Talk> findByTitleContaining(String title);
}
88. • Operational workloads
• Analytics workloads
Real-time Transactional
and Analytic Processing • Interactive graph exploration
• Graph representation of data
Discovery and
Visualization
• Native property graph model
• Dynamic schema
Agilit
y
• Cypher - Declarative query language
• Procedural language extensions
• Worldwide developer community
Developer Productivity
• 10x less CPU with index-free adjacency
• 10x less hardware than other platforms
Hardware efficiency
Neo4j: Graph Platform
Performance
• Index-free adjacency
• Millions of hops per second
89.
90. Index-free adjacency ensures lightning-
fast retrieval of data and relationships
Native Graph Architecture
Index free adjacency
Unlike other database models Neo4j
connects data as it is stored
91. Neo4j Query Planner
Cost based Query Planner since Neo4j
• Uses transactional database statistics
• High performance Query Engine
• Bytecode compiled queries
• Future: Parallism
92. 1
2
3
4
5
6
Architecture Components
Index-Free Adjacency
In memory and on flash/disk
vs
ACID Foundation
Required for safe writes
Full-Stack Clustering
Causal consistency
Security
Language, Drivers, Tooling
Developer Experience,
Graph Efficiency
Graph Engine
Cost-Based Optimizer, Graph
Statistics, Cypher Runtime
Hardware Optimizations
For next-gen infrastructure
93. Neo4j – allows you to connect the dots
• Was built to efficiently
• store,
• query and
• manage highly connected data
• Transactional, ACID
• Real-time OLTP
• Open source
• Highly scalable on few machines
94. High Query Performance: Some Numbers
• Traverse 2-4M+ relationships per
second and core
• Cost based query optimizer –
complex queries return in
milliseconds
• Import 100K-1M records per second
transactionally
• Bulk import tens of billions of records
in a few hours
97. How do I get it? Desktop – Container – Cloud
http://neo4j.com/download/
docker run neo4j
98. Neo4j Cluster Deployment Options
• Developer: Neo4j Desktop (free Enterprise License)
• On premise – Standalone or via OS package
• Containerized with official Docker Image
•
In the Cloud
• AWS, GCE, Azure
• Using Resource Managers
• DC/OS – Marathon
• Kubernetes
• Docker Swarm
99. 10M+
Downloads
3M+ from Neo4j Distribution
7M+ from Docker
Events
400+
Approximate Number of
Neo4j Events per Year
50k+
Meetups
Number of Meetup
Members Globally
Active Community
50k+
Trained/certified Neo4j
professionals
Trained Developers
100. Summary: Graphs allow you ...
• Keep your rich data model
• Handle relationships efficiently
• Write queries easily
• Develop applications quickly
• Have fun
104. Causal Clustering - Features
• Two Zones – Core + Edge
• Group of Core Servers – Consistent and Partition tolerant (CP)
• Transactional Writes
• Quorum Writes, Cluster Membership, Leader via Raft Consensus
• Scale out with Read Replicas
• Smart Bolt Drivers with
• Routing, Read & Write Sessions
• Causal Consistency with Bookmarks
105. • For massive query
throughput
• Read-only replicas
• Not involved in Consensus
Commit
Replica
• Small group of Neo4j
databases
• Fault-tolerant Consensus
Commit
• Responsible for data safety
Core
106. Writing to the Core Cluster
Neo4j
Driver
✓
✓
✓
Success
Neo4j
Cluster
109. Bookmark
• Session token
• String (for portability)
• Opaque to application
• Represents ultimate user’s most
recent view of the graph
• More capabilities to come
111. Neo4j 3.0 Neo4j 3.1
High Availability
Cluster
Causal Cluster
Master-Slave architecture
Paxos consensus used for
master election
Raft protocol used for leader
election, membership changes
and
commitment of all
transactions
Two part cluster: writeable
Core and read-only read
replicas.
Transaction committed
once written durably on
the master
Transaction committed once written
durably on a majority of the core
members
Practical deployments:
10s servers
Practical deployments: 100s
servers
112. Causal Clustering - Features
• Two Zones – Core + Edge
• Group of Core Servers – Consistent and Partition tolerant (CP)
• Transactional Writes
• Quorum Writes, Cluster Membership, Leader via Raft Consensus
• Scale out with Read Replicas
• Smart Bolt Drivers with
• Routing, Read & Write Sessions
• Causal Consistency with Bookmarks
113. • For massive query
throughput
• Read-only replicas
• Not involved in Consensus
Commit
Replica
• Small group of Neo4j
databases
• Fault-tolerant Consensus
Commit
• Responsible for data safety
Core
114. Writing to the Core Cluster – Raft Consensus
Commits
Neo4j
Driver
✓
✓
✓
Success
Neo4j
Cluster
117. Bookmark
• Session token
• String (for portability)
• Opaque to application
• Represents ultimate user’s most
recent view of the graph
• More capabilities to come
127. Case studySolving real-time recommendations for the
World’s largest retailer.
Challenge
• In its drive to provide the best web experience for its
customers, Walmart wanted to optimize its online
recommendations.
• Walmart recognized the challenge it faced in delivering
recommendations with traditional relational database
technology.
• Walmart uses Neo4j to quickly query customers’ past
purchases, as well as instantly capture any new interests
shown in the customers’ current online visit – essential
for making real-time recommendations.
Use of Neo4j
“As the current market leader in
graph databases, and with
enterprise features for scalability
and availability, Neo4j is the right
choice to meet our demands”.
- Marcos Vada, Walmart
• With Neo4j, Walmart could substitute a heavy batch
process with a simple and real-time graph database.
Result/Outcome
128. Case studyeBay Now Tackles eCommerce Delivery Service Routing with
Neo4j
Challenge
• The queries used to select the best courier for eBays
routing system were simply taking too long and they
needed a solution to maintain a competitive service.
• The MySQL joins being used created a code base too slow
and complex to maintain.
• eBay is now using Neo4j’s graph database platform to
redefine e-commerce, by making delivery of online and
mobile orders quick and convenient.
Use of Neo4j
• With Neo4j eBay managed to eliminate the biggest
roadblock between retailers and online shoppers: the
option to have your item delivered the same day.
• The schema-flexible nature of the database allowed easy
extensibility, speeding up development.
• Neo4j solution was more than 1000x faster than the prior
MySQL Soltution.
Our Neo4j solution is literally
thousands of times faster than the
prior MySQL solution, with queries
that require 10-100 times less code.
Result/Outcome
– Volker Pacher, eBay
129. Top Tier US Retailer
Case studySolving Real-time promotions for a top US
retailer
Challenge
• Suffered significant revenues loss, due to legacy
infrastructure.
• Particularly challenging when handling transaction volumes
on peak shopping occasions such as Thanksgiving and
Cyber Monday.
• Neo4j is used to revolutionize and reinvent its real-time
promotions engine.
• On an average Neo4j processes 90% of this retailer’s 35M+
daily transactions, each 3-22 hops, in 4ms or less.
Use of Neo4j
• Reached an all time high in online revenues, due to the
Neo4j-based friction free solution.
• Neo4j also enabled the company to be one of the first
retailers to provide the same promotions across both online
and traditional retail channels.
“On an average Neo4j processes
90% of this retailer’s 35M+ daily
transactions, each 3-22 hops, in
4ms or less.”
– Top Tier US Retailer
Result/Outcome
130. Relational DBs Can’t Handle Relationships Well
• Cannot model or store data and relationships
without complexity
• Performance degrades with number and levels
of relationships, and database size
• Query complexity grows with need for JOINs
• Adding new types of data and relationships
requires schema redesign, increasing time to
market
… making traditional databases inappropriate
when data relationships are valuable in real-time
Slow development
Poor performance
Low scalability
Hard to maintain
131. Unlocking Value from Your Data Relationships
• Model your data as a graph of data
and relationships
• Use relationship information in real-
time to transform your business
• Add new relationships on the fly to
adapt to your changing business