Automating Google Workspace (GWS) & more with Apps Script
Soffri di patologie da "domini complessi con tante relazioni"? C'è una nuova cura: Graph Database
1. Soffri di patologie da
"domini complessi con tante
relazioni"?
C'è una nuova cura:
Graph Database
Luca Garulli –
Founder and CEO
@Orient Technologies Ltd
Author of OrientDB
(c) Luca Garulli
www.twitter.com/lgarulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 1
www.orientechnologies.com
2. 1979
First Relational DBMS available as product
2009
NoSQL movement
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 2
3. 1979
First Relational DBMS available as product
Hey, 30 years in the
IT field is so huge!
2009
NoSQL movement
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 3
4. Before 2009 teams of developers
always fought to select:
Operative System
Programming Language
Middleware (App-Servers)
What about the Database?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 4
5. One of the main resistances of
RDBMS users to pass to a NoSQL product
are related to the
complexity of the model:
Ok, NoSQL products are super for
BigData and BigScale
but...
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 5
6. ...what about the model?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 6
7. What is the NoSQL answer
about managing complex domains?
Key-Value stores ?
Column-Based ?
Document database ?
Graph database !
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 7
8. CAUTION!
This presentation will not use a
social like domain with
the classic paradigm of
friend-of-friendN
where the graph databases
are already widely used...
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 8
9. ...But rather we will explore how
to think «graphically» with one of the
most common domains in the
enterprise world:
The old-classic CRM* domain
* today in 99% of the cases a RDBMS is used
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 9
10. Every developer knows
the Relational Model,
but who knows the
Graph one?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 10
11. Back to school:
Graph Theory crash course
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 11
12. Basic Graph
Luca
Luca
(c) Luca Garulli
Likes
NoSQL
NoSQL
Day
Day
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 12
13. Property Graph Model*
Vertices are
directed
Luca
Luca
Likes
name: Luca
name: Luca
surname: Garulli
surname: Garulli
company: Orient Tech
company: Orient Tech
since: 2013
NoSQL
NoSQL
Day
Day
date: Nov 15° 2013
date: Nov 15° 2013
Vertices and Edges
can have properties
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 13
16. Compliments, this is your diploma in
«Graph Theory»
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 16
17. Now go back
to our domain:
the CRM
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 17
18. Domain: the super minimal CRM
Customer
Customer
Address
Address
Registry system
Order system
Order
Order
(c) Luca Garulli
Stock
Stock
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 18
19. Domain: the super minimal CRM
Customer
Customer
Address
Address
How does
Relational DBMS
manage relationships?
Registry system
Order system
Order
Order
(c) Luca Garulli
Stock
Stock
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 19
20. Relational World: 1-1 Relationships
Primary key
Primary key
Customer
Id
Name
Address
Address
10 Luca
34
11 Jill
Foreign key
Id
Location
34
Rome
44
44
London
34 John
54
54
Moscow
56 Mark
66
66
New Mexico
88 Steve
68
68
Palo Alto
JOIN Customer.Address -> Address.Id
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 20
21. Relational World: 1-N Relationships
Customer
Id
Address
Name
Id
Customer
Location
10 Luca
24
10
Rome
11 Jill
33
10
London
34 John
44
34
Moscow
56 Mark
66
56
Cologne
88 Steve
68
88
Palo Alto
Inverse JOIN Address.Customer -> Customer.Id
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 21
22. Relational World: N-M Relationships
Customer
Id
Name
CustomerAddress
Id
Address
Address
Id
Location
10
Luca
10
24
24
Rome
11
Jill
10
33
33
London
34
John
34
44
44
Moscow
56
Mark
66
Cologne
88
Steve
68
Palo Alto
Additional table with 2 JOINs
(1) CustomerAddress.Id -> Customer.Id and
(2) CustomerAddress.Address -> Address.Id
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 22
23. What’s wrong with the
Relational Model?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 23
24. The JOIN is the evil!
Customer
Id
CustomerAddress
Name
Id
Address
Address
Id
Location
10
Luca
10
24
24
Rome
11
Jill
10
33
33
London
34
John
34
24
44
Moscow
56
Mark
66
Cologne
88
Steve
68
Palo Alto
These are all JOINs executed
everytime you traverse a
relationship!
relationship
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 24
25. A JOIN means searching for a key in
another table
The first rule to improve performance
is indexing all the keys
Index speeds up searches, but slows down
insert, updates and deletes
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 25
26. So in the best case a JOIN is a lookup
into an index
This is done per single join!
If you traverse hundreds of relationships
you’re executing hundreds of JOINs
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 26
27. Index Lookup
is it really that fast?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 27
28. Index Lookup: how does it works?
A-Z
A-L
M-Z
Think to an
Address Book
where we have to find
the Luca’s phone
number
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 28
29. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
M-R
S-Z
Index algorithms are all
similar and based on
balanced trees
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 29
30. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
M-R
A-D
A-B
(c) Luca Garulli
S-Z
E-L
C-D
E-G
H-L
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 30
31. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
M-R
A-D
A-B
E-L
C-D
E-G
H-L
E-G
E-F
(c) Luca Garulli
S-Z
H-L
G
H-J
K-L
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 31
32. Index Lookup: how does it works?
A-Z
A-L
M-Z
A-L
A-D
M-Z
E-L
A-D
A-B
Found!
M-R
S-Z
This lookup took 5
steps and grows
up with the index
size!
E-L
C-D
E-G
H-L
E-G
E-F
H-L
G
H-J
K-L
Luca
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 32
33. An index lookup is executed
for each JOIN
Querying more tables can easily
produce millions of JOINs/Lookups!
Here the rule: more entries
= more lookup steps = slower JOIN
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 33
34. Oh! This is why
performance of my database
drops down when
it becomes bigger,
and bigger,
and bigger!
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 34
35. Is there a better way to
manage relationships?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 35
36. “A graph database is any
storage system
that provides
index-free adjacency”
- Marko Rodriguez
(author of TinkerPop Blueprints)
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 36
37. How does GraphDB manage
index-free relationships?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 37
38. an Open Source (Apache licensed)
document-graph NoSQL dbms
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 38
39. Let’s go back
to the Graph Stuff
How does OrientDB
manage relationships?
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 39
40. OrientDB: traverse a relationship
The Record ID (RID)
is the physical position
RID = #13:35
RID = #13:35
RID = #13:100
RID = #13:100
Luca
Luca
Rome
Rome
label : :‘Customer’
label ‘Customer’
name : :‘Luca’
name ‘Luca’
(c) Luca Garulli
label = ‘Address’
label = ‘Address’
name = ‘Rome’
name = ‘Rome’
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 40
41. OrientDB: traverse a relationship
The Edge’s RID is saved
inside both vertices, as
«out» and «in»
RID = #13:35
RID = #13:35
RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54
Luca
Luca
out ::[#14:54]
out [#14:54]
label : :‘Customer’
label ‘Customer’
name : :‘Luca’
name ‘Luca’
(c) Luca Garulli
Lives
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
Label : :‘Lives’
Label ‘Lives’
Rome
Rome
in: [#14:54]
in: [#14:54]
label = ‘Address’
label = ‘Address’
name = ‘Rome’
name = ‘Rome’
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 41
42. OrientDB: traverse a relationship
RID = #13:35
RID = #13:35
RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54
Luca
Luca
out ::[#14:54]
out [#14:54]
label : :‘Customer’
label ‘Customer’
name : :‘Luca’
name ‘Luca’
(c) Luca Garulli
Lives
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
Label : :‘Lives’
Label ‘Lives’
Rome
Rome
in: [#14:54]
in: [#14:54]
label = ‘Address’
label = ‘Address’
name = ‘Rome’
name = ‘Rome’
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 42
43. OrientDB: traverse a relationship
RID = #13:35
RID = #13:35
RID = #13:100
RID = #13:100
RID = #14:54
RID = #14:54
Luca
Luca
out ::[#14:54]
out [#14:54]
label : :‘Customer’
label ‘Customer’
name : :‘Luca’
name ‘Luca’
(c) Luca Garulli
Lives
out: [#13:35]
out: [#13:35]
in: [#13:100]
in: [#13:100]
Label : :‘Lives’
Label ‘Lives’
Rome
Rome
in: [#14:54]
in: [#14:54]
label = ‘Address’
label = ‘Address’
name = ‘Rome’
name = ‘Rome’
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 43
44. GraphDB handles relationships as a
physical LINK to the record
assigned when the edge is created
on the other side
RDBMS computes the
relationship every time you query a database
Is not that crazy?!
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 44
45. This means jumping from a
O(log N) algorithm to a near O(1)
traversing cost is not more affected
by database size!
This is huge in the BigData age
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 45
46. OrientDB in the Blueprints micro-benchmark,
on common hw, with a hot cache,
traverses 29,6 Millions
of records in less than 5 seconds
about 6 Millions of nodes traversed per sec!
Do not try this at home
with a RDBMS*!
*unless you live in the Google’s server farm
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 46
47. Create the graph in SQL
$luca> cd bin
$luca> ./console.sh
OrientDB console v.1.3.0-SNAPSHOT (www.orientdb.org)
Type 'help' to display all the commands supported.
orientdb> create vertex Customer set name = ‘Luca’
Created vertex #13:35 in 0.03 secs
orientdb> create vertex Address set name = ‘Rome’
Created vertex #13:100 in 0.02 secs
orientdb> create edge Lives from #13:35 to #13:100
Created edge #14:54 in 0.02 secs
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 47
49. Query the graph in SQL
orientdb> select in(‘Lives’) from Address where name = ‘Rome’
---+------+---------|--------------------+--------------------+--------+
#| RID |@class |label |out_Lives |in |
---+------+---------+--------------------+--------------------+--------+
0| 13:35|Customer |Luca |[#14:54] | |
---+------+---------+--------------------+--------------------+--------+
1 item(s) found. Query executed in 0.007 sec(s).
Incoming vertices
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 49
50. More on query power
orientdb> select sum( out(‘Order’).total ) from Customer
where name = ‘Luca’
orientdb> traverse both(‘Friend’)
from Customer while $depth <= 7
orientdb> select from (
traverse both(‘Friend’)
from Customer while $depth <= 7
) where @class=‘Customer’ and city.name = ‘Udine’
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 50
51. Query vs traversal
Once you’ve a well connected database
in the form of a Super Graph you can
cross records instead of query them!
All you need is some root vertices
where to start traversing
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 51
53. Temporal based graph
Calendar
Calendar
Year
Year
2013
2013
Month
Month
April 2013
April 2013
Day
Day
9/4/2013
9/4/2013
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Order
Order
2332
2332
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Order
Order
2333
2333
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Order
Order
2334
2334
Page 53
55. Mix & Merge graphs
Region
Region
Lazio
Lazio
Country
Country
Italy
Italy
State
State
RM
RM
City
City
Rome
Rome
City
City
Fiumicino
Fiumicino
Location
Location
Order
Order
2332
2332
Order
Order
2333
2333
Order
Order
2334
2334
Calendar
Calendar
Year
Year
2013
2013
(c) Luca Garulli
Hour
Hour
9/4/2013
9/4/2013
09:00
09:00
Month
Month
April 2013
April 2013
Hour
Hour
9/4/2013
9/4/2013
10:00
10:00
Day
Day
9/4/2013
9/4/2013
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 55
56. This is your database
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 56
57. Get last customer bought ‘Barolo’
select last(out(‘Order’).in(‘Customer)) from Stock
where name = ‘Barolo’
#34:22
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 57
58. Get his’s country
select out(‘City’) from #34:22
Udine, Italy
#55:12
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 58
59. Get orders from that country
select in(‘Customer’) from #55:12
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 59
60. Let’s move like a
Spider
on the web
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 60
61. Subscribe using the
code “nosqlday”
to get 20% for all
NoSQLDay attendees!
(c) Luca Garulli
Licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Page 61