SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
O R I E N T D B
G O FA S T I N A G R A P H W O R L D
February 25th, 2016@bit_shark
Andrea Giuliano
Andrea Giuliano
@bit_shark
$whoami
O N C E U P O N A T I M E
O N C E U P O N A T I M E - 1 9 7 9
• first commercially available RDBMS
• written in assembly
• runs in 128K of memory
• not support for transactions
• support for basic sql queries and joins
R E L AT I O N A L D ATA B A S E S
• data is presented to the user in the form of
rows and columns (a relation)
• data can be manipulated through relational
operators in a tabular form
O V E R T I M E
• data start growing in size
• data become heterogeneous
• structured, semi-structured, unstructured data
• rate at which data is generated increased
B I G D ATA
3 0 Y E A R S L AT E R ( 2 0 0 9 )
• NoSQL movement
• some intents of NOSQL databases:
• being non-relational
• simplicity of design
• simpler horizontal scaling
• speed up some operations
• distributed
( S O M E ) T Y P E S O F N O S Q L D ATA B A S E S
• document
• key-value
• object-oriented
• graph
• multi-model
D O C U M E N T M O D E L
• the document encapsulate data in some standard
format: yaml, json, xml, bson
{
"id": 45,
"name": "Andrea",
"fav_colours": ["blue", "green"],
"driver_license": {
"number": "AA123"
}
}
K E Y- VA L U E M O D E L
• dictionary in which data is represented as a collection
of key-value pairs
> SET akey “Andrea”
> GET akey

“Andrea”
akey Andrea
O B J E C T- O R I E N T E D M O D E L
• data is represented in the form of objects
Animal
Dog Cat
G R A P H M O D E L
• data is represented in the form of a graph
M U LT I M O D E L
K e y - Va l u e
D o c u m e n t
O b j e c t - o r i e n t e dG r a p h
R E L AT I O N A L V S N O S Q L
• how data is represented
• how data is related
• relational databases have the concept of joins
• NoSQL databases have multiple concepts
• aggregation
• relation (through edges)
I S S U E S W I T H J O I N
User
name id
Andrea 45
John 48
Steven 53
Bill 70
Like
user_id food_id
45 13
45 49
70 38
Food
id name
13 Pasta
38 Sushi
49 Kebab
63 Meat
SELECT F.name FROM User U, Like L, Food F
WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;
I S S U E S W I T H J O I N
User
name id
Andrea 45
John 48
Steven 53
Bill 70
Like
user_id food_id
45 13
45 49
70 38
Food
id name
13 Pasta
38 Sushi
49 Kebab
63 Meat
SELECT F.name FROM User U, Like L, Food F
WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;
double JOIN per record at runtime
I S S U E S W I T H J O I N
• the relationships are computed every time a query is
performed
• time complexity grows with data: O(log n)
• heavy runtime cost with large datasets
• index lookup does not help
• speeds up searches but slows down inserts, updates, deletes
• imagine on billions of records
speakerdeck.com/agiuliano/index-management-in-depth
S U M M I N G U P J O I N
• a join operation involves
• searching a record in the starting table (User)
• use the foreign key to lookup the intermediate table
(Like) through its index
• traversing the intermediate table looking up the
target table (Food) ids
The more entries you have
the more your queries are SLOW
www.flickr.com/photos/blacktigersdream/8737830046
S AV I N G P R O J E C T I O N S
S AV I N G P R O J E C T I O N S
advantages
• data is predetermined
disadvantages
• data synchronization
• solves only reads
UserLikesFood
User user_id Like food_id
Andrea 45 Pasta 13
Andrea 45 Kebab 49
Bill 70 Sushi 38
R E L AT I O N S H I P S
I N N O S Q L W O R L D
R E L AT I O N S H I P S I N D O C U M E N T S
• embed information in documents where you need
them
• data duplication
• faster access
{
"id": 45,
"name": "Andrea",
"likes": ["Pasta", "Kebab"]
}
G R A P H S
G R A P H
G = (V, E)
Graph Vertices Edges
Edge Vertex
Graph
G R A P H
Andrea
BMW
name: Andrea
license: A123
drives
model: X5
doors: 5
V E RT I C E S
A R E D I R E C T E D
V E RT I C E S
C A N H AV E
P R O P E RT I E S
E D G E S
C A N H AV E
P R O P E RT I E S
G R A P H
Andrea
BMW
drives
owns
N-M relationships can be represented
using multiple edges
B U I L D S M A R T R E L AT I O N S H I P S
Andrea
Luxury Cars
BMW
Ferrari
Customers
John
Cars
Root vertices
B U I L D S M A R T R E L AT I O N S H I P S
• root vertices can be meta graphs
• meta graphs add information to make traversal 

easier and faster
a Car can be enriched with information regarding
• date of purchase
• country of manufacture
EXAMPLE
www.flickr.com/photos/aigle_dore/5952275132
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Ferrari
Maserati
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
get all the italian cars
sold on 01/15/2016
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
let’s start from Made
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
found the cars made in Italy
now filter by date using incoming edges
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
let’s try from Purchase
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
found the cars purchased on 01/15/2016
now filter by country using incoming edges
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year
2016
Month
Jan 2016
Day
01/15/2016
Ferrari
Maserati
Month
Feb 2016
Day
02/01/2016
EuropeItaly
Germany
O R I E N T D B
O R I E N T D B
• nosql database
• multimodel
• high performance (can write 400,000 records/sec*)
• http rest and json api
• ACID
*On Intel i7 8 core CPU, 16 GB RAM, SSD RPM, Multi-threads, no indexes (orientdb.com)
15+ languages
30+ drivers
I N S TA L L AT I O N
orientdb.com/docs/2.1/Tutorial-Installation.html
$ docker run -d -v … orientdb/orientdb
$ brew install orientdb
L O G I C A L C O N C E P T S
• class
• type of data model
• cluster
• stores groups of records within a class
class Car
cluster
USA_car
cluster
Italy_car
V E R T I C E S
• record identifier (RID)
• each record has its own self-assigned unique ID
• composed of 2 parts 

#<cluster-id>:<cluster-position>
• list of properties
• edge’s RID
• in
• out
E D G E S
• record identifier (RID)
• each record has its own self-assigned unique ID
• composed of 2 parts 

#<cluster-id>:<cluster-position>
• in
• RID of the ingoing vertex
• out
• RID of the outgoing vertex
R E L AT I O N S H I P S
• does not make use of JOINs like RDBMS
• physical links O(1)
• relationship managed by storing the edge’s RID in
both vertices as “out” and “in”
• for 1-to-n relationship collections of rid are used
o u t : [ # 1 3 : 3 5 ]
i n : [ # 1 5 : 1 0 0 ]
l i c e n s e : A 1 2 3
drives
o u t : [ # 1 4 : 5 4 ]
n a m e : A n d re a
i n : [ # 1 4 : 5 4 ]
m o d e l : X 5
#13:35 #15:100
#14:54
Andrea BMW
T R AV E R S E A R E L AT I O N S H I P
o u t : [ # 1 3 : 3 5 ]
i n : [ # 1 5 : 1 0 0 ]
drives
o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]
#13:35 #15:100
#14:54
Andrea BMW
T R AV E R S E A R E L AT I O N S H I P
drives
#13:35 #15:100
#14:54
Andrea BMW
o u t : [ # 1 3 : 3 5 ]
i n : [ # 1 5 : 1 0 0 ]
o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]
C R E AT E A C L A S S
CREATE CLASS Car EXTENDS V
V
C a r
E
d r i v e s
CREATE CLASS drives EXTENDS E
A D D P R O P E R T I E S T O A C L A S S
• create properties involves to define its name and its
type
• is mandatory in order to define indexes or constraints
CREATE PROPERTY Car.model String
C a r
m o d e l : S t r i n g
A D D C O N S T R A I N T S T O A P R O P E R T Y
• alter the defined property adding the constraint
ALTER PROPERTY Car.model MANDATORY TRUE
C a r
m o d e l : S t r i n g
Q U E RY I N G
SELECT FROM Car WHERE model=‘X5’
C a r
r i d : # 1 5 : 6
m o d e l : X 5
SELECT FROM #15:6
Q U E RY I N G
C a r
r i d : # 1 5 : 6
m o d e l : X 5
SELECT FROM [#15:6, #15:7]
C a r
r i d : # 1 5 : 7
m o d e l : Z 4
Q U E RY I N G
SELECT name, OUT(“drives”).model AS DrivesCar
FROM #17:0
name DrivesCar
Andrea [“X5”, “Z4”]
Q U E RY I N G
SELECT name, OUT(“drives”).model AS DrivesCar
FROM #17:0
UNWIND DrivesCar
name DrivesCar
Andrea X5
Andrea Z4
Q U E RY I N G
TRAVERSE * FROM #17:0 MAXDEPTH 4
Andrea
BMW
Maserati
drives
drives
D E P T H F I R S T S E A R C H
TRAVERSE * FROM #17:0 STRATEGY DEPTH_FIRST
1
2 87
3 6 9 1 2
1 11 054
B R E A D T H F I R S T S E A R C H
1
2 43
TRAVERSE * FROM #17:0 STRATEGY BREADTH_FIRST
5 6 7 8
1 21 11 09
W H E N
• store inter-connected data
• query data by relation of arbitrary length
• continuously evolving data set
• make it easy to evolve the database
Go fast in a graph world

Weitere ähnliche Inhalte

Andere mochten auch

Simple algorithm & hopcroft karp for bipartite graph
Simple algorithm & hopcroft karp for bipartite graphSimple algorithm & hopcroft karp for bipartite graph
Simple algorithm & hopcroft karp for bipartite graph
Miguel Pereira
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 

Andere mochten auch (20)

OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
Index management in depth
Index management in depthIndex management in depth
Index management in depth
 
Concurrent test frameworks
Concurrent test frameworksConcurrent test frameworks
Concurrent test frameworks
 
Docker: from zero to nonzero
Docker: from zero to nonzeroDocker: from zero to nonzero
Docker: from zero to nonzero
 
Stub you!
Stub you!Stub you!
Stub you!
 
Let's test!
Let's test!Let's test!
Let's test!
 
Consistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your ChoiceConsistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your Choice
 
Index management in shallow depth
Index management in shallow depthIndex management in shallow depth
Index management in shallow depth
 
Think horizontally @Codemotion
Think horizontally @CodemotionThink horizontally @Codemotion
Think horizontally @Codemotion
 
Event based modeling iad 2012
Event based modeling iad 2012Event based modeling iad 2012
Event based modeling iad 2012
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)
 
Everything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to askEverything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to ask
 
Graph db: time for serious stuff @ codemotion 23/03/2012
Graph db: time for serious stuff @ codemotion 23/03/2012Graph db: time for serious stuff @ codemotion 23/03/2012
Graph db: time for serious stuff @ codemotion 23/03/2012
 
OrientDB - Voxxed Days Berlin 2016
OrientDB - Voxxed Days Berlin 2016OrientDB - Voxxed Days Berlin 2016
OrientDB - Voxxed Days Berlin 2016
 
Asynchronous data processing
Asynchronous data processingAsynchronous data processing
Asynchronous data processing
 
Simple algorithm & hopcroft karp for bipartite graph
Simple algorithm & hopcroft karp for bipartite graphSimple algorithm & hopcroft karp for bipartite graph
Simple algorithm & hopcroft karp for bipartite graph
 
Symfony Camp 2013 UA RESTing with Symfony2
Symfony Camp 2013 UA RESTing with Symfony2Symfony Camp 2013 UA RESTing with Symfony2
Symfony Camp 2013 UA RESTing with Symfony2
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Who’s Afraid of Graphs?
Who’s Afraid of Graphs?Who’s Afraid of Graphs?
Who’s Afraid of Graphs?
 

Ähnlich wie Go fast in a graph world

Ähnlich wie Go fast in a graph world (20)

A Simple Introduction to R for Market Researchers
A Simple Introduction to R for Market ResearchersA Simple Introduction to R for Market Researchers
A Simple Introduction to R for Market Researchers
 
From Zero to DevOps Superhero: The Container Edition (Build 2019)
From Zero to DevOps Superhero: The Container Edition (Build 2019)From Zero to DevOps Superhero: The Container Edition (Build 2019)
From Zero to DevOps Superhero: The Container Edition (Build 2019)
 
Understanding indices
Understanding indicesUnderstanding indices
Understanding indices
 
Event Storming(이벤트 스토밍)
Event Storming(이벤트 스토밍)Event Storming(이벤트 스토밍)
Event Storming(이벤트 스토밍)
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
 
Technical SEO for MODX CMS (MODXpo 2017)
Technical SEO for MODX CMS (MODXpo 2017)Technical SEO for MODX CMS (MODXpo 2017)
Technical SEO for MODX CMS (MODXpo 2017)
 
Metadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingMetadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-Finding
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
How Do I Know I Need a Ledger Database? An Intro to Amazon QLDB: re:Invent 20...
How Do I Know I Need a Ledger Database? An Intro to Amazon QLDB: re:Invent 20...How Do I Know I Need a Ledger Database? An Intro to Amazon QLDB: re:Invent 20...
How Do I Know I Need a Ledger Database? An Intro to Amazon QLDB: re:Invent 20...
 
Introduction to Clustered Indexes and Heaps
Introduction to Clustered Indexes and HeapsIntroduction to Clustered Indexes and Heaps
Introduction to Clustered Indexes and Heaps
 
SQL Server Tips & Tricks
SQL Server Tips & TricksSQL Server Tips & Tricks
SQL Server Tips & Tricks
 
How to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven OrganizationHow to Transform Into a Data-Driven Organization
How to Transform Into a Data-Driven Organization
 
Webinar: From Relational Databases to MongoDB - What You Need to Know
Webinar: From Relational Databases to MongoDB - What You Need to KnowWebinar: From Relational Databases to MongoDB - What You Need to Know
Webinar: From Relational Databases to MongoDB - What You Need to Know
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (SQLBit...
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (SQLBit...Tools and Tips: From Accidental to Efficient Data Warehouse Developer (SQLBit...
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (SQLBit...
 
SQL Pass Architecture SQL Tips & Tricks
SQL Pass Architecture SQL Tips & TricksSQL Pass Architecture SQL Tips & Tricks
SQL Pass Architecture SQL Tips & Tricks
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Go fast in a graph world

  • 1. O R I E N T D B G O FA S T I N A G R A P H W O R L D February 25th, 2016@bit_shark Andrea Giuliano
  • 3. O N C E U P O N A T I M E
  • 4. O N C E U P O N A T I M E - 1 9 7 9 • first commercially available RDBMS • written in assembly • runs in 128K of memory • not support for transactions • support for basic sql queries and joins
  • 5. R E L AT I O N A L D ATA B A S E S • data is presented to the user in the form of rows and columns (a relation) • data can be manipulated through relational operators in a tabular form
  • 6. O V E R T I M E • data start growing in size • data become heterogeneous • structured, semi-structured, unstructured data • rate at which data is generated increased
  • 7. B I G D ATA
  • 8. 3 0 Y E A R S L AT E R ( 2 0 0 9 ) • NoSQL movement • some intents of NOSQL databases: • being non-relational • simplicity of design • simpler horizontal scaling • speed up some operations • distributed
  • 9. ( S O M E ) T Y P E S O F N O S Q L D ATA B A S E S • document • key-value • object-oriented • graph • multi-model
  • 10. D O C U M E N T M O D E L • the document encapsulate data in some standard format: yaml, json, xml, bson { "id": 45, "name": "Andrea", "fav_colours": ["blue", "green"], "driver_license": { "number": "AA123" } }
  • 11. K E Y- VA L U E M O D E L • dictionary in which data is represented as a collection of key-value pairs > SET akey “Andrea” > GET akey
 “Andrea” akey Andrea
  • 12. O B J E C T- O R I E N T E D M O D E L • data is represented in the form of objects Animal Dog Cat
  • 13. G R A P H M O D E L • data is represented in the form of a graph
  • 14. M U LT I M O D E L K e y - Va l u e D o c u m e n t O b j e c t - o r i e n t e dG r a p h
  • 15. R E L AT I O N A L V S N O S Q L • how data is represented • how data is related • relational databases have the concept of joins • NoSQL databases have multiple concepts • aggregation • relation (through edges)
  • 16. I S S U E S W I T H J O I N User name id Andrea 45 John 48 Steven 53 Bill 70 Like user_id food_id 45 13 45 49 70 38 Food id name 13 Pasta 38 Sushi 49 Kebab 63 Meat SELECT F.name FROM User U, Like L, Food F WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;
  • 17. I S S U E S W I T H J O I N User name id Andrea 45 John 48 Steven 53 Bill 70 Like user_id food_id 45 13 45 49 70 38 Food id name 13 Pasta 38 Sushi 49 Kebab 63 Meat SELECT F.name FROM User U, Like L, Food F WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id; double JOIN per record at runtime
  • 18. I S S U E S W I T H J O I N • the relationships are computed every time a query is performed • time complexity grows with data: O(log n) • heavy runtime cost with large datasets • index lookup does not help • speeds up searches but slows down inserts, updates, deletes • imagine on billions of records speakerdeck.com/agiuliano/index-management-in-depth
  • 19. S U M M I N G U P J O I N • a join operation involves • searching a record in the starting table (User) • use the foreign key to lookup the intermediate table (Like) through its index • traversing the intermediate table looking up the target table (Food) ids
  • 20. The more entries you have the more your queries are SLOW www.flickr.com/photos/blacktigersdream/8737830046
  • 21. S AV I N G P R O J E C T I O N S
  • 22. S AV I N G P R O J E C T I O N S advantages • data is predetermined disadvantages • data synchronization • solves only reads UserLikesFood User user_id Like food_id Andrea 45 Pasta 13 Andrea 45 Kebab 49 Bill 70 Sushi 38
  • 23. R E L AT I O N S H I P S I N N O S Q L W O R L D
  • 24. R E L AT I O N S H I P S I N D O C U M E N T S • embed information in documents where you need them • data duplication • faster access { "id": 45, "name": "Andrea", "likes": ["Pasta", "Kebab"] }
  • 25. G R A P H S
  • 26. G R A P H G = (V, E) Graph Vertices Edges Edge Vertex Graph
  • 27. G R A P H Andrea BMW name: Andrea license: A123 drives model: X5 doors: 5 V E RT I C E S A R E D I R E C T E D V E RT I C E S C A N H AV E P R O P E RT I E S E D G E S C A N H AV E P R O P E RT I E S
  • 28. G R A P H Andrea BMW drives owns N-M relationships can be represented using multiple edges
  • 29. B U I L D S M A R T R E L AT I O N S H I P S Andrea Luxury Cars BMW Ferrari Customers John Cars Root vertices
  • 30. B U I L D S M A R T R E L AT I O N S H I P S • root vertices can be meta graphs • meta graphs add information to make traversal 
 easier and faster
  • 31. a Car can be enriched with information regarding • date of purchase • country of manufacture EXAMPLE www.flickr.com/photos/aigle_dore/5952275132
  • 32. B U I L D S M A R T R E L AT I O N S H I P S BMW Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016
  • 33. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Ferrari Maserati EuropeItaly Germany
  • 34. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany
  • 35. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany get all the italian cars sold on 01/15/2016
  • 36. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany let’s start from Made
  • 37. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany
  • 38. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany found the cars made in Italy now filter by date using incoming edges
  • 39. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany
  • 40. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany let’s try from Purchase
  • 41. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany
  • 42. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany found the cars purchased on 01/15/2016 now filter by country using incoming edges
  • 43. B U I L D S M A R T R E L AT I O N S H I P S BMW Made Purchase Year 2016 Month Jan 2016 Day 01/15/2016 Ferrari Maserati Month Feb 2016 Day 02/01/2016 EuropeItaly Germany
  • 44. O R I E N T D B
  • 45. O R I E N T D B • nosql database • multimodel • high performance (can write 400,000 records/sec*) • http rest and json api • ACID *On Intel i7 8 core CPU, 16 GB RAM, SSD RPM, Multi-threads, no indexes (orientdb.com)
  • 47. I N S TA L L AT I O N orientdb.com/docs/2.1/Tutorial-Installation.html $ docker run -d -v … orientdb/orientdb $ brew install orientdb
  • 48. L O G I C A L C O N C E P T S • class • type of data model • cluster • stores groups of records within a class class Car cluster USA_car cluster Italy_car
  • 49. V E R T I C E S • record identifier (RID) • each record has its own self-assigned unique ID • composed of 2 parts 
 #<cluster-id>:<cluster-position> • list of properties • edge’s RID • in • out
  • 50. E D G E S • record identifier (RID) • each record has its own self-assigned unique ID • composed of 2 parts 
 #<cluster-id>:<cluster-position> • in • RID of the ingoing vertex • out • RID of the outgoing vertex
  • 51. R E L AT I O N S H I P S • does not make use of JOINs like RDBMS • physical links O(1) • relationship managed by storing the edge’s RID in both vertices as “out” and “in” • for 1-to-n relationship collections of rid are used o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ] l i c e n s e : A 1 2 3 drives o u t : [ # 1 4 : 5 4 ] n a m e : A n d re a i n : [ # 1 4 : 5 4 ] m o d e l : X 5 #13:35 #15:100 #14:54 Andrea BMW
  • 52. T R AV E R S E A R E L AT I O N S H I P o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ] drives o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ] #13:35 #15:100 #14:54 Andrea BMW
  • 53. T R AV E R S E A R E L AT I O N S H I P drives #13:35 #15:100 #14:54 Andrea BMW o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ] o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]
  • 54. C R E AT E A C L A S S CREATE CLASS Car EXTENDS V V C a r E d r i v e s CREATE CLASS drives EXTENDS E
  • 55. A D D P R O P E R T I E S T O A C L A S S • create properties involves to define its name and its type • is mandatory in order to define indexes or constraints CREATE PROPERTY Car.model String C a r m o d e l : S t r i n g
  • 56. A D D C O N S T R A I N T S T O A P R O P E R T Y • alter the defined property adding the constraint ALTER PROPERTY Car.model MANDATORY TRUE C a r m o d e l : S t r i n g
  • 57. Q U E RY I N G SELECT FROM Car WHERE model=‘X5’ C a r r i d : # 1 5 : 6 m o d e l : X 5 SELECT FROM #15:6
  • 58. Q U E RY I N G C a r r i d : # 1 5 : 6 m o d e l : X 5 SELECT FROM [#15:6, #15:7] C a r r i d : # 1 5 : 7 m o d e l : Z 4
  • 59. Q U E RY I N G SELECT name, OUT(“drives”).model AS DrivesCar FROM #17:0 name DrivesCar Andrea [“X5”, “Z4”]
  • 60. Q U E RY I N G SELECT name, OUT(“drives”).model AS DrivesCar FROM #17:0 UNWIND DrivesCar name DrivesCar Andrea X5 Andrea Z4
  • 61. Q U E RY I N G TRAVERSE * FROM #17:0 MAXDEPTH 4 Andrea BMW Maserati drives drives
  • 62. D E P T H F I R S T S E A R C H TRAVERSE * FROM #17:0 STRATEGY DEPTH_FIRST 1 2 87 3 6 9 1 2 1 11 054
  • 63. B R E A D T H F I R S T S E A R C H 1 2 43 TRAVERSE * FROM #17:0 STRATEGY BREADTH_FIRST 5 6 7 8 1 21 11 09
  • 64. W H E N • store inter-connected data • query data by relation of arbitrary length • continuously evolving data set • make it easy to evolve the database