Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Data Modeling with Neo4j
1. Data Modeling with
Neo4j
1
Michael Hunger, Neo Technology
@neo4j | michael@neo4j.org
Thanks to: Ian Robinson, Mark Needham,Alistair Jones
Samstag, 31. August 13
2. Please ask questions
in the chat
I‘ll answer at the end.
Follow up email with missing answers,
video and slides.
2
Samstag, 31. August 13
4. This Webinar
๏Graphs are everywhere
๏Graph Model Building Blocks
๏(NOSQL) Data Models
๏Designing a Data Model
๏Embrace the Paradigm
4
Samstag, 31. August 13
21. Nodes
๏ Used to represent entities in your domain
๏ Can contain properties
• Used to represent entity attributes and/or metadata
(e.g. timestamps, version)
• Key-value pairs
‣Java primitives
‣Arrays
‣null is not a valid value
• Every node can have different properties
Samstag, 31. August 13
23. Relationships
๏ Every relationship has a name and a direction
• Add structure to the graph
• Provide semantic context for nodes
๏ Can contain properties
• Used to represent quality or weight of relationship,
or metadata
๏ Every relationship must have a start node and end node
• No dangling relationships
Samstag, 31. August 13
24. Relationships (continued)
Nodes can have
more than one
relationship
Self relationships are
allowed
Nodes can be connected by
more than one relationship
Samstag, 31. August 13
25. Variable Structure
๏ Relationships are defined with regard to node
instances, not classes of nodes
• Different nodes can be connected in different ways
• Allows for structural variation in the domain
• Contrast with relational schemas, where foreign key
relationships apply to all rows in a table
Samstag, 31. August 13
27. Labels
๏ Every node can have zero or more labels attached
๏ Used to represent roles (e.g. user, product, company)
• Group nodes
• Allow us to associate indexes and constraints with
groups of nodes
Samstag, 31. August 13
28. Four Building Blocks
๏ Nodes
• Entities
๏ Relationships
• Connect entities and structure domain
๏ Properties
• Attributes and metadata
๏ Labels
• Group nodes by role
Samstag, 31. August 13
31. 26
“There is a significant downside - the whole approach works
really well when data access is aligned with the aggregates, but
what if you want to look at the data in a different way? Order
entry naturally stores orders as aggregates, but analyzing
product sales cuts across the aggregate structure. The
advantage of not using an aggregate structure in the database
is that it allows you to slice and dice your data different ways
for different audiences.
This is why aggregate-oriented stores talk so much about map-
reduce.”
Martin Fowler
Aggregate Oriented Model
Samstag, 31. August 13
32. 27
The connected data model is based on fine grained elements
that are richly connected, the emphasis is on extracting many
dimensions and attributes as elements.
Connections are cheap and can be used not only for the
domain-level relationships but also for additional structures
that allow efficient access for different use-cases. The fine
grained model requires a external scope for mutating
operations that ensures Atomicity, Consistency, Isolation and
Durability - ACID also known as Transactions.
Michael Hunger
Connected Data Model
Samstag, 31. August 13
54. Method
1. Identify application/end-user goals
2. Figure out what questions to ask of the domain
3. Identify entities in each question
4. Identify relationships between entities in each
question
5. Convert entities and relationships to paths
These become the basis of the data model
6. Express questions as graph patterns
These become the basis for queries
Samstag, 31. August 13
55. From User Story to Model and Query
1.
User story
4.
Paths
3.
Entities and
relationships
?2.
Questions we want
to ask
5.
Data model
6.
Query
Samstag, 31. August 13
56. 1. Application/End-User Goals
As an employee
I want to know who in thecompany has similar skills to meSo that we can exchangeknowledge
Samstag, 31. August 13
57. 2. Questions To Ask of the Domain
Which people, who work for the same
company as me, have similar skills to me?
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge
Samstag, 31. August 13
58. Which people, who work for the same
company as me, have similar skills to me?
Person
Company
Skill
3. Identify Entities
Samstag, 31. August 13
59. Which people, who work for the same
company as me, have similar skills to me?
Person WORKS_FOR Company
Person HAS_SKILL Skill
4. Identify Relationships Between
Entities
Samstag, 31. August 13
60. 5. Convert to Cypher Paths
Person WORKS_FOR Company
Person HAS_SKILL Skill
Samstag, 31. August 13
61. 5. Convert to Cypher Paths
Person WORKS_FOR Company
Person HAS_SKILL Skill
Relationship
Label
Samstag, 31. August 13
62. 5. Convert to Cypher Paths
Person WORKS_FOR Company
Person HAS_SKILL Skill
Relationship
Label
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)
Samstag, 31. August 13
67. 6. Express Question as Graph Pattern
Which people, who work for the same
company as me, have similar skills to me?
Samstag, 31. August 13
68. Cypher Query
Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Samstag, 31. August 13
69. Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Graph Pattern
Samstag, 31. August 13
70. Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Anchor Pattern in Graph
Samstag, 31. August 13
71. Which people, who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Create Projection of Results
Samstag, 31. August 13
76. From User Story to Model and Query
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
Person WORKS_FOR Company
Person HAS_SKILL Skill
?Which people, who work for the
same company as me, have similar
skills to me?
Samstag, 31. August 13
81. Anti-Pattern: Node represents multiple
concepts
name
age
position
company
department
project
skills
Person
Samstag, 31. August 13
82. HAS_SKILL
Normalize into separate concepts
name
age
Person
name
number_of_employees
Company
WORKS_FOR
Skill
name
Samstag, 31. August 13
83. Challenge: Property or Relationship?
๏ Can every property be replaced by a relationship?
• Hint: triple stores. Are they easy to use?
๏ Should every entity with the same property values be
connected?
Samstag, 31. August 13
84. Object Mapping
๏ Similar to how you would map objects to a relational
database, using an ORM such as Hibernate
๏ Generally simpler and easier to reason about
๏ Examples
• Java: Spring Data Neo4j
• Ruby: Active Model
๏ Why Map?
• Do you use mapping because you are scared of SQL?
• Following DDD, could you write your repositories
directly against the graph API?
Samstag, 31. August 13
86. Relationships for querying
๏ like in other databases
• same structure for different use-cases (OLTP and
OLAP) doesn‘t work
• graph allows: add more structures
๏ Relationships should the primary means to access
nodes in the database
๏ Traversing relationships is cheap – that’s the whole
design goal of a graph database
๏ Use lookups only to find starting nodes for a query
Data Modeling examples in Manual
Samstag, 31. August 13
95. Evolution: Relationship to Node
68
Peter
SENT_EMAIL
Michael
Peter EMAIL_FROM
Michael
EMAIL_TO
Email
Emil
EMAIL_CC
Community
TAGGED
. . .
see Hyperedges
Samstag, 31. August 13
96. Combine multiple Domains in a Graph
๏ you start with a single domain
๏ add more connected domains as your system evolves
๏ more domains allow to ask different queries
๏ one domain „indexes“ the other
๏ Example Facebook Graph Search
• social graph
• location graph
• activity graph
• favorite graph
• ...
Samstag, 31. August 13
97. Notes on the Graph Data Model
๏Schema free, but constraints
๏Model your graph with a whiteboard and a wise man
๏Nodes as main entities but useless without connections
๏Relationships are first level citizens in the model and database
๏Normalize more than in a relational database
๏use meaningful relationship-types, not generic ones like IS_
๏use in-graph structures to allow different access paths
๏evolve your graph to your needs, incremental growth
70
Samstag, 31. August 13
106. Need to model the relationship
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
language_code
Country
Samstag, 31. August 13
107. What if the cardinality changes?
language_code
language_name
word_count
country_code
Language
country_code
country_name
flag_uri
Country
Samstag, 31. August 13
108. Or we go many-to-many?
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
LanguageCountry
Samstag, 31. August 13
109. Or we want to qualify the relationship?
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
primary
LanguageCountry
Samstag, 31. August 13
114. What’s different?
๏ Implementation of maintaining relationships is left up
to the database
๏ Artificial keys disappear or are unnecessary
๏ Relationships get an explicit name
• can be navigated in both directions
Samstag, 31. August 13
118. Keep on adding relationships
name
word_count
Language
name
flag_uri
Country
POPULATION_SPEAKS
population_fraction
SIMILAR_TO ADJACENT_TO
Samstag, 31. August 13
131. [A] ACL from Hell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
95
Samstag, 31. August 13
132. [A] ACL from Hell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
95
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
Samstag, 31. August 13
133. [A] ACL from Hell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
95
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
name: Andreas
subscription: sports
service: NFL
account: 9758352794
agreement: ultimate
owns
subscribes to
has plan
includes
provides group: graphistas
promotion: fall
member of
offered
discounts
company: Neo
Technologyworks with
gets discount on
subscription: local
subscribes to
provides service: Ravens
includes
Samstag, 31. August 13
135. [B] Timely Recommendations
๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
96
Samstag, 31. August 13
136. [B] Timely Recommendations
๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
96
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
Samstag, 31. August 13
137. [B] Timely Recommendations
๏ Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
96
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
name:Andreas
job: talking
name: Allison
job: plumber
name: Tobias
job: coding
knows
knows
name: Peter
job: building
name: Emil
job: plumber
knows
name: Stephen
job: DJ
knows
knows
name: Delia
job: barking
knows
knows
name: Tiberius
job: dancer
knows
knows
knows
knows
Samstag, 31. August 13
139. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
Samstag, 31. August 13
140. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Samstag, 31. August 13
141. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
Samstag, 31. August 13
142. [C] Collaboration on Global Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
Asia North America Europe
Samstag, 31. August 13
158. 112
Really, once you start
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors
Samstag, 31. August 13
159. 112
Really, once you start
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors
What will you build?
Samstag, 31. August 13
166. A graph database...
117
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
Samstag, 31. August 13
167. A graph database...
117
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
Samstag, 31. August 13
168. A graph database...
117
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
Samstag, 31. August 13
169. A graph database...
117
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
“A relational database may tell you the average age of everyone
in this place,
but a graph database will tell you who is most likely to buy you a
beer.”
Samstag, 31. August 13
171. Why Data Modeling
119
๏What is modeling?
๏Aren‘t we schema free?
๏How does it work in a
graph?
๏Where should modeling
happen? DB or Application
Samstag, 31. August 13
184. // lookup starting point in an index
START n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
125
Samstag, 31. August 13
185. // lookup starting point in an index
START n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
125
// then traverse to find results
START me=node:People(name = ‘Andreas’
MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2)
RETURN friend2
Samstag, 31. August 13
187. SELECT skills.*, user_skill.*
FROM users
JOIN user_skill ON users.id = user_skill.user_id
JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
126
START user = node(1)
MATCH user -[user_skill]-> skill
RETURN skill, user_skill
Samstag, 31. August 13