Complex queries in a
distributed multi-model
database
Max Neunhöffer
Tech Talk Geekdom, 16 March 2015
www.arangodb.com
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-1...
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-1...
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-1...
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-1...
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-1...
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-1...
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or un...
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or un...
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or un...
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or un...
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or un...
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
3
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
[ { "name": "Schmidt", "firstname": "Helmut",
"hobbie...
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
[ { "name": "Schmidt", "firstname": "Helmut",
"hobbie...
Query 2
Use filtering, sorting and limit
FOR p IN people
FILTER p.age >= @minage
SORT p.name, p.firstname
LIMIT @nrlimit
RE...
Query 2
Use filtering, sorting and limit
FOR p IN people
FILTER p.age >= @minage
SORT p.name, p.firstname
LIMIT @nrlimit
RE...
Query 3
Aggregation and functions
FOR p IN people
COLLECT a = p.age INTO L
FILTER a >= @minage
RETURN { "age": a, "number"...
Query 3
Aggregation and functions
FOR p IN people
COLLECT a = p.age INTO L
FILTER a >= @minage
RETURN { "age": a, "number"...
Query 4
Joins
FOR p IN @@peoplecollection
FOR h IN houses
FILTER p._key == h.owner
SORT h.streetname, h.housename
RETURN {...
Query 4
Joins
FOR p IN @@peoplecollection
FOR h IN houses
FILTER p._key == h.owner
SORT h.streetname, h.housename
RETURN {...
Query 5
Modifying data
FOR e IN events
FILTER e.timestamp<"2014-09-01T09:53+0200"
INSERT e IN oldevents
FOR e IN events
FI...
Query 6
Graph queries
FOR x IN GRAPH_SHORTEST_PATH(
"routeplanner", "germanCity/Cologne",
"frenchCity/Paris", {weight: "di...
Query 6
Graph queries
FOR x IN GRAPH_SHORTEST_PATH(
"routeplanner", "germanCity/Cologne",
"frenchCity/Paris", {weight: "di...
Life of a query
1. Text and query parameters come from user
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitut...
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
F...
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
F...
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
F...
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
F...
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
F...
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
F...
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET ...
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET ...
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET ...
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET ...
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
F...
Move filters up
FOR a IN collA
FOR b IN collB
FILTER a.x == 10
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
Singleton
EnumColl a
...
Move filters up
FOR a IN collA
FOR b IN collB
FILTER a.x == 10
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behavi...
Move filters up
FOR a IN collA
FILTER a.x < 10
FOR b IN collB
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behavio...
Move filters up
FOR a IN collA
FILTER a.x < 10
FOR b IN collB
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behavio...
Remove unnecessary calculations
FOR a IN collA
LET L = LENGTH(a.hobbies)
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobb...
Remove unnecessary calculations
FOR a IN collA
LET L = LENGTH(a.hobbies)
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobb...
Remove unnecessary calculations
FOR a IN collA
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation...
Remove unnecessary calculations
FOR a IN collA
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation...
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Singleton
En...
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA...
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA...
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA...
Data distribution in a cluster
Requests
DBserver DBserver DBserver
CoordinatorCoordinator
4 2 5 3 11
The shards of a colle...
Data distribution in a cluster
Requests
DBserver DBserver DBserver
CoordinatorCoordinator
4 2 5 3 11
The shards of a colle...
Scatter/gather
EnumerateCollection
17
Scatter/gather
Remote
EnumShard
Remote Remote
EnumShard
Remote
Concat/Merge
Remote
EnumShard
Remote
Scatter
17
Scatter/gather
Remote
EnumShard
Remote Remote
EnumShard
Remote
Concat/Merge
Remote
EnumShard
Remote
Scatter
17
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in s...
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in s...
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in s...
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in s...
Nächste SlideShare
Wird geladen in …5
×

Complex queries in a distributed multi-model database

776 Aufrufe

Veröffentlicht am

A multi-model database is a document store, a graph database as well as a key/value store. To allow for convenient and powerful querying such a database needs a query language that understands all three data models and allows to mix these models in queries. For example, it should be possible to find some documents in a collection according to some criteria, then follow some edges in a graph in which the documents represent vertices, and finally join the results with documents from yet another collection.

In this talk I will explain how a query engine for such a language works, give an overview of the life of a query from parsing, over translation into an execution plan, the optimisation phase and finally the execution. I will show how distributed query execution plans look like, how the query optimiser reasons about them and how the distributed execution works.

Veröffentlicht in: Technologie
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
776
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
2
Aktionen
Geteilt
0
Downloads
6
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Complex queries in a distributed multi-model database

  1. 1. Complex queries in a distributed multi-model database Max Neunhöffer Tech Talk Geekdom, 16 March 2015 www.arangodb.com
  2. 2. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } 1
  3. 3. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. 1
  4. 4. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. 1
  5. 5. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. 1
  6. 6. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. 1
  7. 7. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. Sharding: the data of a collection is distributed between multiple servers. 1
  8. 8. Graphs A B D E F G C "likes" "hates" 2
  9. 9. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. 2
  10. 10. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. 2
  11. 11. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. 2
  12. 12. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. 2
  13. 13. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. 2
  14. 14. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. For example, paths in the graph are interesting. 2
  15. 15. Query 1 Fetch all documents in a collection FOR p IN people RETURN p 3
  16. 16. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] 3
  17. 17. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] (Actually, a cursor is returned.) 3
  18. 18. Query 2 Use filtering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } 4
  19. 19. Query 2 Use filtering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } [ { "name": "Neunhöffer, Max", "age": 44 }, { "name": "Schmidt, Helmut", "age": 95 }, ... ] 4
  20. 20. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } 5
  21. 21. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } [ { "age": 18, "number": 10 }, { "age": 19, "number": 17 }, { "age": 20, "number": 12 }, ... ] 5
  22. 22. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } 6
  23. 23. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } [ { "housename": "Firlefanz", "streetname": "Meyer street", "owner": "Hans Schmidt", "value": 423000 }, ... ] 6
  24. 24. Query 5 Modifying data FOR e IN events FILTER e.timestamp<"2014-09-01T09:53+0200" INSERT e IN oldevents FOR e IN events FILTER e.timestamp<"2014-09-01T09:53+0200" REMOVE e._key IN events 7
  25. 25. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } 8
  26. 26. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } [ { "begin": "germanCity/Cologne", "end" : {"_id": "frenchCity/Paris", ... }, "distance": 550, "nrPaths": 10 }, ... ] 8
  27. 27. Life of a query 1. Text and query parameters come from user 9
  28. 28. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 9
  29. 29. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 9
  30. 30. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 9
  31. 31. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 9
  32. 32. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 9
  33. 33. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 9
  34. 34. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9
  35. 35. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 9
  36. 36. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 9
  37. 37. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 11. Distribute and link up engines on different servers 9
  38. 38. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 11. Distribute and link up engines on different servers 12. Execute plan, provide cursor API 9
  39. 39. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP 10
  40. 40. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies 10
  41. 41. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline 10
  42. 42. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API 10
  43. 43. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline 10
  44. 44. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”??? 10
  45. 45. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. 11
  46. 46. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame 11
  47. 47. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan 11
  48. 48. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons 11
  49. 49. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x 12
  50. 50. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10 13
  51. 51. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10 13
  52. 52. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10 13
  53. 53. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10 13
  54. 54. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  55. 55. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  56. 56. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  57. 57. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  58. 58. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a 15
  59. 59. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a 15
  60. 60. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton Sort a.y, a.x Return a IndexRange a 15
  61. 61. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton Return a IndexRange a 15
  62. 62. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. 16
  63. 63. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution 16
  64. 64. Scatter/gather EnumerateCollection 17
  65. 65. Scatter/gather Remote EnumShard Remote Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  66. 66. Scatter/gather Remote EnumShard Remote Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  67. 67. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. 18
  68. 68. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, 18
  69. 69. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modification. 18
  70. 70. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modification. Sometimes, we can even optimise away a gather/scatter combination and parallelise completely. 18

×