Joins and aggregations in a 
distributed NoSQL DB 
NoSQLmatters, Barcelona, 22 November 2014 
Max Neunhöffer 
www.arangodb...
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "...
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "...
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "...
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "...
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "...
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "...
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
d...
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
d...
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
d...
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
d...
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
d...
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
3
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
[ { "name": "Schmidt", "firstname": "Helmut", 
"h...
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
[ { "name": "Schmidt", "firstname": "Helmut", 
"h...
Query 2 
Use 1ltering, sorting and limit 
FOR p IN people 
FILTER p.age >= @minage 
SORT p.name, p.firstname 
LIMIT @nrlim...
Query 2 
Use 1ltering, sorting and limit 
FOR p IN people 
FILTER p.age >= @minage 
SORT p.name, p.firstname 
LIMIT @nrlim...
Query 3 
Aggregation and functions 
FOR p IN people 
COLLECT a = p.age INTO L 
FILTER a >= @minage 
RETURN { "age": a, "nu...
Query 3 
Aggregation and functions 
FOR p IN people 
COLLECT a = p.age INTO L 
FILTER a >= @minage 
RETURN { "age": a, "nu...
Query 4 
Joins 
FOR p IN @@peoplecollection 
FOR h IN houses 
FILTER p._key == h.owner 
SORT h.streetname, h.housename 
RE...
Query 4 
Joins 
FOR p IN @@peoplecollection 
FOR h IN houses 
FILTER p._key == h.owner 
SORT h.streetname, h.housename 
RE...
Query 5 
Modifying data 
FOR e IN events 
FILTER e.timestamp < "2014-09-01T09:53+0200" 
INSERT e IN oldevents 
FOR e IN ev...
Query 6 
Graph queries 
FOR x IN GRAPH_SHORTEST_PATH( 
"routeplanner", "germanCity/Cologne", 
"frenchCity/Paris", {weight:...
Query 6 
Graph queries 
FOR x IN GRAPH_SHORTEST_PATH( 
"routeplanner", "germanCity/Cologne", 
"frenchCity/Paris", {weight:...
Life of a query 
Text and query parameters come from user 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute quer...
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
...
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
...
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
...
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
...
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
...
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
...
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
E...
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
E...
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
E...
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
E...
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
...
Move 1lters up 
FOR a IN collA 
FOR b IN collB 
FILTER a.x == 10 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
Singleton 
Enum...
Move 1lters up 
FOR a IN collA 
FOR b IN collB 
FILTER a.x == 10 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and ...
Move 1lters up 
FOR a IN collA 
FILTER a.x < 10 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and b...
Move 1lters up 
FOR a IN collA 
FILTER a.x < 10 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and b...
Remove unnecessary calculations 
FOR a IN collA 
LET L = LENGTH(a.hobbies) 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a...
Remove unnecessary calculations 
FOR a IN collA 
LET L = LENGTH(a.hobbies) 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a...
Remove unnecessary calculations 
FOR a IN collA 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calcul...
Remove unnecessary calculations 
FOR a IN collA 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calcul...
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Singl...
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assum...
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assum...
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assum...
Data distribution in a cluster 
Requests 
Coordinator Coordinator 
DBserver DBserver DBserver 
1 4 2 5 3 1 
The shards of ...
Data distribution in a cluster 
Requests 
Coordinator Coordinator 
DBserver DBserver DBserver 
1 4 2 5 3 1 
The shards of ...
Scatter/gather 
EnumerateCollection 
17
Scatter/gather 
Remote Remote 
EnumShard 
Remote 
EnumShard 
Remote 
Concat/Merge 
Remote 
EnumShard 
Remote 
Scatter 
17
Scatter/gather 
Remote Remote 
EnumShard 
Remote 
EnumShard 
Remote 
Concat/Merge 
Remote 
EnumShard 
Remote 
Scatter 
17
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes i...
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes i...
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes i...
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes i...
Nächste SlideShare
Wird geladen in …5
×

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL matters Barcelona 2014

973 Aufrufe

Veröffentlicht am

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB

NoSQL databases are renowned for their good horizontal scalability and sharding is these days an essential feature for every self-respecting DB. However, most systems choose to offer less features with respect to joins and aggregations in queries than traditional relational DBMS do. In this talk I report about the joys and pains of (re-)implementing the powerful query language AQL with joins and aggregations for ArangoDB. I will cover (distributed) execution plans, query optimisation and data locality issues.

Veröffentlicht in: Daten & Analysen
0 Kommentare
3 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
973
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
3
Aktionen
Geteilt
0
Downloads
17
Kommentare
0
Gefällt mir
3
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL matters Barcelona 2014

  1. 1. Joins and aggregations in a distributed NoSQL DB NoSQLmatters, Barcelona, 22 November 2014 Max Neunhöffer www.arangodb.com
  2. 2. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } 1
  3. 3. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. 1
  4. 4. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. 1
  5. 5. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. 1
  6. 6. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. 1
  7. 7. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. Sharding: the data of a collection is distributed between multiple servers. 1
  8. 8. Graphs A B D E F G C "likes" "hates" 2
  9. 9. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. 2
  10. 10. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. 2
  11. 11. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. 2
  12. 12. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. 2
  13. 13. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. 2
  14. 14. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. For example, paths in the graph are interesting. 2
  15. 15. Query 1 Fetch all documents in a collection FOR p IN people RETURN p 3
  16. 16. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] 3
  17. 17. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] (Actually, a cursor is returned.) 3
  18. 18. Query 2 Use 1ltering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } 4
  19. 19. Query 2 Use 1ltering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } [ { "name": "Neunhöffer, Max", "age": 44 }, { "name": "Schmidt, Helmut", "age": 95 }, ... ] 4
  20. 20. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } 5
  21. 21. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } [ { "age": 18, "number": 10 }, { "age": 19, "number": 17 }, { "age": 20, "number": 12 }, ... ] 5
  22. 22. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } 6
  23. 23. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } [ { "housename": "Firlefanz", "streetname": "Meyer street", "owner": "Hans Schmidt", "value": 423000 }, ... ] 6
  24. 24. Query 5 Modifying data FOR e IN events FILTER e.timestamp < "2014-09-01T09:53+0200" INSERT e IN oldevents FOR e IN events FILTER e.timestamp < "2014-09-01T09:53+0200" REMOVE e._key IN events 7
  25. 25. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } 8
  26. 26. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } [ { "begin": "germanCity/Cologne", "end" : {"_id": "frenchCity/Paris", ... }, "distance": 550, "nrPaths": 10 }, ... ] 8
  27. 27. Life of a query Text and query parameters come from user 9
  28. 28. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) 9
  29. 29. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters 9
  30. 30. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. 9
  31. 31. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) 9
  32. 32. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs 9
  33. 33. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster 9
  34. 34. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs 9
  35. 35. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost 9
  36. 36. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine 9
  37. 37. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers 9
  38. 38. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers Execute plan, provide cursor API 9
  39. 39. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP 10
  40. 40. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies 10
  41. 41. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline 10
  42. 42. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API 10
  43. 43. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline 10
  44. 44. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”??? 10
  45. 45. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. 11
  46. 46. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame 11
  47. 47. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan 11
  48. 48. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons 11
  49. 49. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y 12
  50. 50. Move 1lters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Filter a.x == 10 Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  51. 51. Move 1lters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Filter a.x == 10 Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  52. 52. Move 1lters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. However, the number of items trave-ling in the pipeline is decreased. Singleton EnumColl a Calc a.x == 10 Filter a.x == 10 EnumColl b Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  53. 53. Move 1lters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. However, the number of items trave-ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Calc a.x == 10 Filter a.x == 10 EnumColl b Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  54. 54. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  55. 55. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  56. 56. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  57. 57. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  58. 58. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Calc ... Filter ... Sort a.y, a.x Return a 15
  59. 59. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Calc ... Filter ... Sort a.y, a.x Return a 15
  60. 60. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton IndexRange a Sort a.y, a.x Return a 15
  61. 61. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton IndexRange a Return a 15
  62. 62. Data distribution in a cluster Requests Coordinator Coordinator DBserver DBserver DBserver 1 4 2 5 3 1 The shards of a collection are distributed across the DB servers. 16
  63. 63. Data distribution in a cluster Requests Coordinator Coordinator DBserver DBserver DBserver 1 4 2 5 3 1 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution 16
  64. 64. Scatter/gather EnumerateCollection 17
  65. 65. Scatter/gather Remote Remote EnumShard Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  66. 66. Scatter/gather Remote Remote EnumShard Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  67. 67. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. 18
  68. 68. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, 18
  69. 69. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modi1cation. 18
  70. 70. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modi1cation. Sometimes, we can even optimise away a gather/scatter combination and parallelise completely. 18

×