B-tree Indexes and Fractal Tree Structures for Fast Data Access

Indexing Delight
Thinking Cap of Fractal-tree Indexes
BohuTANG@2012/12
overred.shuttler@gmail.com

B-tree
Invented in 1972, 40 years!

B-tree

Block0

Block1 Block2 Block3
.... ....

Block4 Block5
.....................................................................................

File on disk: ... Block0 ... ... Block3 ... Block5 ...

B-tree Insert
Insert x

Block0

seek

.... ....

Block4 Block5
.....................................................................................


B-tree Insert
Insert x

Block0

seek

.... ....

seek

Block4 Block5
.....................................................................................


B-tree Insert
Insert x

Block0

seek

.... ....

seek

Block4 Block5
.....................................................................................


Insert one item causes many random seeks!

B-tree Search
Search x

Block0

seek

.... ....

seek

Block4 Block5
.....................................................................................

Query is fast, I/Os costs O(logBN)

B-tree Conclusions
● Search: O(logBN ) block transfers.
● Insert: O(logBN ) block transfers(slow).
● B-tree range queries are slow.
● IMPORTANT:
--Parent and child blocks sparse in disk.

A Simpliﬁed Fractal-tree
Cache Oblivious Lookahead Array, invented by MITers

COLA

log2N

...........

Binary Search in one level:O(log2N) 2

COLA (Using Fractional Cascading)

log2N

...........

● Search: O(log2N) block transfers.
● Insert: O((1/B)log2N) amortized block transfers.
● Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
● Balanced Binary Search Tree

COLA Conclusions

● Search: O(log2N) block transfers(Using Fractional
Cascading).
● Insert: O((1/B)log2N) amortized block transfers.
● Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
● Balanced Binary Search Tree
● Lookahead(Prefetch), Data-Intensive!
● BUT, the bottom level will be big and bigger,
merging expensive.

COLA vs B-tree
● Search:
-- (log2N)/(logBN)
= log2B times slower than B-tree(In theory)
● Insert:
--(logBN)/((1/B)log2N)
= B/(log2B) times faster than B-trees(In theory)
if B = 4KB:
COLA search is 12 times slower than B-tree
COLA insert is 341 times faster than B-tree

LSM-tree
In memory
buffer

buffer ... buffer

buffer ... buffer ... buffer ... buffer

● Lazy insertion, Sorted before
● Leveli is the buffer of Leveli+1
● Search: O(logBN) * O(logN)
● Insert:O((logBN)/B)

LSM-tree (Using Fractional Cascading)
In memory
buffer

buffer ... buffer


● Search: O(logBN) (Using FC)
● Insert:O((logBN)/B)
● 0.618 Fractal-tree?But NOT Cache Oblivious...

LSM-tree (Merging)
In memory
buffer

buffer ... buffer
merge merge merge


A lot of I/O wasted during merging!
Like a headless fly flying... Zzz...

Fractal-tree Indexes
Just Fractal. Patented by Tokutek...

Fractal-tree Indexes

Search: O(logBN) Insert: O((logBN)/B) (amortized)
Search is same as B-tree, but insert faster than B-tree

Fractal-tree Indexes (Block size)

....

.... .... ....

B is 4MB...


full

....

.... .... ....

B is 4MB...


full ....

.... .... ....

B is 4MB...


..

.. .. ..

full

.. ... ... ... ..

.. .. .. .. .. ..

Fractal! 4MB one seek...

ε
B -tree
Just a constant factor on Block fanout...

ε
B -tree
B-tree
Fast ε=1/2

Search

Slow
AOF
Slow
Fast
Inserts

Optimal Curve

ε
B -tree

insert search

B-tree O(logBN) O(logBN)
(ɛ=1)

ɛ=1/2 O((logBN)/√B) O(logBN)

ɛ=0 O((logN)/B) O(logN)

if we want optimal point queries + very fast inserts, we
should choose ɛ=1/2

ε
B -tree

So, if block size is B, the fanout should be √B

Cache Oblivious Data
Structure
All the above is JUST Cache Oblivious Data Structures...

Cache Oblivious Data Structure
Question:
Reading a sequence of k consecutive blocks
at once is not much more expensive than
reading a single block. How to take advantage
of this feature?

Cache Oblivious Data Structure
My Questions(In Chinese):
Q1：
只有1MB内存，怎样把两个64MB有序文件合
并成一个有序文件？

Q2：
大多数机械磁盘，连续读取多个Block和读取
单个Block花费相差不大，在Q1中如何利用这个
优势？

nessDB
You should agree that VFS do better than yourself cache!
https://github.com/shuttler/nessDB

nessDB

.. ... ... ... ..

.. .. .. .. .. ..

Each Block is Small-Splittable Tree

nessDB, What's going on?

..

.. .. ..

.. ... ... ... ..

.. .. .. .. .. ..

From the line to the plane..

Thanks!
Most of the references are from:
Tokutek & MIT CSAIL & Stony Brook.

Drafted By BohuTANG using Google Drive, @2012/12/12

B-tree Indexes and Fractal Tree Structures for Fast Data Access

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Empfohlen

Empfohlen (20)

B-tree Indexes and Fractal Tree Structures for Fast Data Access