This is the presentation from barcamp in Altoros where I was explaining how various advanced non-relational schemas (or, simply, data structures) can be modelled on top of Key/Value storage. The set of covered schemas includes Dynamic Vector, File System, Searchable Bitmap, LOUDS Tree, Wavelet Tree and Inverted Index.
See https://bitbucket.org/vsmirnov/memoria/wiki/MemoriaForBigData
for additional details.
2. Non-Relational Schema
● Is just a data structure
● That uses some Memory Model
● Typically, Key->Value mapping
● Where Key is an Integer ID
● And Value is an arbitrary array of a limited size or
memory block
● It's assumed that operations on memory blocks
are atomic.
4. Partial (Prefix) Sums Tree
● Given a sequence of S[0, N) = s0...sn-1 of non-
negative integers
● Sum(i) returns X = s0+s1+...+si.
● FindLT(X) returns position i of largest Sum(i) < X
● FindLE(X) is the same, but Sum(i) <= X
● We can also define range versions of Sum(i, j) and
FindLT(j, X)
● All operations perform in O(log N) time.
7. Dynamic Vector
● An ordered sequence of elements (bytes, integers, strings)
of size N
● Acess(i) is O(log N)
● Insert(i, value) is O(log N)
● Delete(i) is O(log N)
● We can also define batch operations:
● Insert(i, value[])
● Delete(i, j)
● Split(i); Merge(AnotherVector);...
9. Dynamic Vector Operations
● FindLT(i) returns the B where i bounds and
offset j in the block B for i
● Acces(i) is O(log N)
● Insert(i, value) and Delete(i) are also O(log N)
because the tree is balanced.
10. File System: Map<ID, Vector<T>>
● Maps ID to Vector<T>
● Merge all values into one large Dynamic Vector, in ID
order
● Create separate “index” sequence from pairs <ID, Offset>
in ID order
● We can represent this “index” sequence as two partial
sums tree, for ID and for Offset
● We can merge both these trees to one because they have
exactly the same structure: multi-index balanced partial
sums tree.
12. Sharing Tree Structures
● Tree structure sharing saves both space and time:
SPMD principle (single program, multiple data)
● We can align partial sum trees with different structures
using interpolation (padding with zeroes)
● We can merge index and data streams (index and
data) of Map<ID, Vector<T>> in one multi-stream tree.
● Merging the trees, we will try to fix index pairs and
corresponding data into the same leaf node of multi-
stream tree.
15. ACID
● Atomic block operations are not enough
● Even simple tree update affects several blocks
● So, ACID is mandatory for advanced non-
relational schemas
● We can get ACID for free with Multi-Version
Concurrency Control (MVCC)
● We need Version History over data blocks
● Where each each transaction is a version.
17. Version History Implementation
● Version History maps pair <ID, Version> to an ID of real
data block for that version and given ID
● We have Map<ID, Vector<Version, ID>>
● We can turn it to Version History by sorting each
Vector<Version, ID> (less sapce, slower)
● Or by creating additional partial sums tree index on top of it
(more space, but much faster)
● We can do it in just one multi-stream balanced tree
● MVCC requires some other data structures but they can be
designed by analogy.
18. Concurrency Handling
● Version History is a
complicated data
structure
● Concurrent access to it
must be restricted
● Split whole Version
History to shards
● And shard blocks by ID
to reduce lock
contention on Version
History
19. Distributed Storage and Processing
● MVCC is very
Raft/Paxos-friendly
● Because of Version
History and MVCC
● So we can join storage
nodes to Raft groups
● And join Raft groups
to larger groups with
2PC
● Using split/merge
model to map data to
nodes.
21. Searchable Bitmaps
● rank1(n) = number of ones in [0, n)
● select1(i) = position of i-th 1 in the bitmap
● rank0(n) = number of zeroes in [0, n)
● select0(i) = position of i-th 0 in the bitmap
26. Wavelet Tree
● Searchable sequence [0...N) for large alphabets
● Rank(i, s) returns number of symbols s in [0, i)
● Select(k, s) returns position i of k-th symbol s
● Insert(i, s), Delere(i), Access(i) – insert, remove and
access the symbol at position i respectively
● All these operations have O(log N) time complexity
● By mapping numbers to symbols we can perform the
following lookup operations: >, >=, <, <=, <> in O(log N)
time.