B+ TREE AND HEIGHT BALANCING
TREE
Prepared by Ms. Jasleen Kaur
Assistant Professor
Chandigarh University
Contents Cover
• Introduction to B+ Trees
• Properties
• Representation
• Advantages
• B v/s B+ Trees
• Insertion- Algorithm ,Pseudocode
• Deletion- Algorithm, Pseudocode
• Implementation of B+ Tree
• Key Points
• Height Balance Trees
• Solved Problems
• Home work Problem
B+ TREE- INTRODUCTION
• A B+ tree ("bee plus tree") is a data structure used as an index to
facilitate fast access to the elements of a larger body of data, such as
othe entries in a database or
othe blocks of memory storage ("pages") in an operating system.
• Each target object (entry, page) is associated with an index key.
• The B+ tree is laid out like a family tree, where each node has some
number of keys that is between some predetermined maximum limit
and half that limit (inclusive).
• Each node also has one more pointer than the number of its keys. (A
"pointer" is the address of a location in the computer's memory.)
B+ TREE- INTRODUCTION
• B+ Tree is an extension of B Tree which allows efficient insertion,
deletion and search operations.
• In B Tree, Keys and records both can be stored in the internal as
well as leaf nodes. Whereas, in B+ tree, records (data) can only be
stored on the leaf nodes while internal nodes can only store the key
values.
• The leaf nodes of a B+ tree are linked together in the form of a
singly linked lists to make the search queries more efficient.
• The B+-Tree consists of two types of nodes:
• internal nodes
• leaf nodes
B+ TREE- PROPERTIES
B+ TREE PROPERTIES
1. Internal nodes point to other nodes in the tree.
2. Leaf nodes point to data in the database using data pointers. Leaf
nodes also contain an additional pointer, called the sibling pointer,
which is used to improve the efficiency of certain types of search.
3. All the nodes in a B+-Tree must be at least half full except the
root node which may contain a minimum of two entries. The
algorithms that allow data to be inserted into and deleted from a
B+-Tree guarantee that each node in the tree will be at least half
full.
.
B+ TREE- PROPERTIES
B+ TREE PROPERTIES
4. Searching for a value in the B+-Tree always starts at the root node
and moves downwards until it reaches a leaf node.
5. Both internal and leaf nodes contain key values that are used to
guide the search for entries in the index.
6. The B+ Tree is called a balanced tree because every path from the
root node to a leaf node is the same length. A balanced tree means
that all searches for individual values require the same number of
nodes to be read from the disc
B+ TREE- INTRODUCTION
• B+ Tree are used to store the large amount of data which can not be
stored in the main memory. Due to the fact that, size of main
memory is always limited, the internal nodes (keys to access
records) of the B+ tree are stored in the main memory whereas, leaf
nodes are stored in the secondary memory.
• The internal nodes of B+ tree are often called index nodes. A B+ tree
of order 3 is shown in the following figure.
B+ TREE- ADVANTAGES
• Records can be fetched in equal number of disk accesses.
• Height of the tree remains balanced and less as compare to B tree.
• We can access the data stored in a B+ tree sequentially as well as
directly.
• Keys are used for indexing.
• Faster search queries as the data is stored only on the leaf nodes.
• A B+ tree with ‘l’ levels can store more entries in its internal nodes
compared to a B-tree having the same ‘l’ levels.
• This accentuates the significant improvement made to the search
time for any given key.
• Having lesser levels and presence of Pnext pointers imply that B+
tree are very quick and efficient in accessing records from disks.
B v/s B+ TREE
SN B Tree B+ Tree
1 Search keys can not be
repeatedly stored.
Redundant search keys can be present.
2 Data can be stored in leaf nodes
as well as internal nodes
Data can only be stored on the leaf nodes.
3 Searching for some data is a
slower process since data can be
found on internal nodes as well
as on the leaf nodes.
Searching is comparatively faster as data
can only be found on the leaf nodes.
4 Deletion of internal nodes are so
complicated and time consuming.
Deletion will never be a complexed
process since element will always be
deleted from the leaf nodes.
5 Leaf nodes can not be linked
together.
Leaf nodes are linked together to make
the search operations more efficient.
B+ TREE – INSERTION
INSERTION ALGORITHM
1. Allocate new leaf and move half the buckets elements to the new
bucket.
2. Insert the new leaf's smallest key and address into the parent.
3. If the parent is full, split it too.
4. Add the middle key to the parent node.
5. Repeat until a parent is found that need not split.
6. If the root splits, create a new root which has one key and two
pointers. (That is, the value that gets pushed to the new root gets
removed from the original node)
B+ TREE – INSERTION
INSERTION PSEUDOCODE
1) If the bucket is not full (at most b 1 entries after the insertion), add
the record.
2) Otherwise, split the bucket.
• Allocate new leaf and move half the buckets elements to the new
bucket.
• Insert the new leafs smallest key and address into the parent.
• If the parent is full, split it too.
Add the middle key to the parent node.
• Repeat until a parent is found that need not split.
3) If the root splits, create a new root which has one key and two
pointers. (That is, the value that gets pushed to the new root gets
removed from the original node)
B+ TREE – DELETION
DELETION ALGORITHM
1. Descend to the leaf where the key exists.
2. Remove the required key and associated reference from the node.
3. If the node still has enough keys and references to satisfy the
invariants, stop.
4. If the node has too few keys to satisfy the invariants, but its next
oldest or next youngest sibling at the same level has more than
necessary, distribute the keys between this node and the neighbor.
Repair the keys in the level above to represent that these nodes now
have a different “split point” between them; this involves simply
changing a key in the levels above, without deletion or insertion.
B+ TREE – DELETION
DELETION ALGORITHM
5. If the node has too few keys to satisfy the invariant, and the next
oldest or next youngest sibling is at the minimum for the invariant,
then merge the node with its sibling; if the node is a non-leaf, we
will need to incorporate the “split key” from the parent into our
merging.
6. In either case, we will need to repeat the removal algorithm on the
parent node to remove the “split key” that previously separated
these merged nodes — unless the parent is the root and we are
removing the final key from the root, in which case the merged
node becomes the new root (and the tree has become one level
shorter than before).
B+ TREE – DELETION
DELETION PSEUDOCODE
1) Start at the root and go up to leaf node containing the key K
2) Find the node n on the path from the root to the leaf node containing
K
A. If n is root, remove K
a. if root has mode than one keys, done
b. if root has only K
i) if any of its child node can lend a node
Borrow key from the child and adjust child links
ii) Otherwise merge the children nodes it will be new root
B+ TREE – DELETION
DELETION PSEUDOCODE
c. If n is a internal node, remove K
i) If n has at lease ceil(m/2) keys, done!
ii) If n has less than ceil(m/2) keys,
If a sibling can lend a key,
Borrow key from the sibling and adjust keys in n and the
parent node
Adjust child links
Else
Merge n with its sibling
Adjust child links
B+ TREE – DELETION
DELETION PSEUDOCODE
d. If n is a leaf node, remove K
i) If n has at least ceil(M/2) elements, done!
In case the smallest key is deleted, push up the next key
ii) If n has less than ceil(m/2) elements
If the sibling can lend a key
Borrow key from a sibling and adjust keys in n and its
parent node
Else
Merge n and its sibling
Adjust keys in the parent node
B+ TREE – SEARCHING
SEARCHING PSEUDOCODE
1) Apply Binary search on records.
2) If record with the search key is found
return required record
Else if current node is leaf node and key not found
print Element not Found
B+ TREE –KEY POINTS
1. A B/B+ tree with order p has maximum p pointers and
hence maximum p children.
2. A B/B+ tree with order p has minimum ceil(p/2) pointers
and hence minimum ceil(p/2) children.
3. A B/B+ tree with order p has maximum (p – 1) and
minimum ceil(p/2) – 1 keys.
B+ TREE –KEY POINTS
These are the key points related to searching in B/B+ trees:
1. For searching a key in B tree, we start from root node
and traverse until the key is found or leaf node is
reached.
2. For searching a key in B+ tree, we start from root node
and traverse until leaf node is reached as every key is
present in leaf nodes. Also, leaf nodes are connected to
each other which help in faster access of data for range
queries.
HEIGHT BALANCED TREES- HISTORY
Height balanced trees (or AVL trees) is named after its two
inventors, G.M. Adelson-Velskii and E.M. Landis, who
published it in their 1962 paper "An algorithm for the
organization of information."
As the name suggests AVL trees are used for organizing
information.
HEIGHT BALANCED TREES-
INTRODUTION
A height balanced tree is one where there is a bound on
the difference between the heights of the subtrees.
One of the classic examples of height balanced tree is AVL
trees.
In AVL trees each node has an attribute associated to it
called the balance factor. Balance factor of a node is
nothing but the difference between the heights of the
subtrees rooted at that particular node. In AVL tree the
constraint is that the heights may differ by atmost 1. In
other words balance factor of any node may be one of the
3 values namely -1 , 0 or 1
HEIGHT BALANCED TREES- KEY POINTS
Here are some important notions:
[1] The length of the longest road from the root node to one of
the terminal nodes is what we call the height of a tree.
[2] The difference between the height of the right subtree and
the height of the left subtree is what we call the balancing
factor.
[3] The binary tree is balanced when all the balancing factors
of all the nodes are -1,0,+1.
Formally, we can translate this to this: | hd – hs| ≤ 1, node X
being any node in the tree, where hs and hd
represent the heights of the left and the right subtrees.
HEIGHT BALANCED TREES- KEY POINTS
Following figure represents Balancing factor equals hd – hs .
HEIGHT BALANCED TREES- EXAMPLE
Following figure represents Binary search tree with computed
balancing factors.
HEIGHT BALANCED TREES- EXAMPLE
Here are some examples of procedures for the calculus of the
height of a subtree and of the balancing factor for the above
binary search tree.
- the height of the tree is 4, meaning the length of the longest
path from the root to a leaf node.
- the height of the left subtree of the root is 3, meaning that
the length of the longest path from the node 13 to one of the
leaf nodes (2, 7 or 12).
- for finding the balancing factor of the root we subtract the
height of the right subtree and the left subtree : 1-3 = - 2.
HEIGHT BALANCED TREES- EXAMPLE
-the balancing factor of the node with the key 12 is very easy
to determine. We notice that the node has no children so the
balancing factor is 0.
-for finding the balancing factor of the node with key 5 we
subtract the height of the right subtree from the height of the
left subtree: 1 - 0 = +1.
HEIGHT BALANCED TREES
In computer science, a self-balancing (or height-balanced)
binary search tree is any node-based binary search tree
that automatically keeps its height (maximal number of
levels below the root) small in the face of arbitrary item
insertions and deletions .
AVL trees are used for performing search operations on
high dimension external data storage.
For example, a phone call list may generate a huge
database which may be recorded only on external hard
drives, hard-disks or other storage devices .
HEIGHT BALANCED TREES
The structure of the nodes of a balanced tree can be
represented like:
struct NodeAVL{
int key;
int ech;
node *left, *right;
};
Where:
- key represents the tag of the node(integer number),
- ech represents the balancing factor
- left and right represent pointers to the left and right
children
HEIGHT BALANCED TREES- 2-3-4 TREE
• A B-tree of order 4 is known as a 2-3-4 tree.
• A 2–3–4 tree (also called a 2–4 tree) is a self-balancing data
structure that is commonly used to implement dictionaries.
The numbers mean a tree where every node with children
(internal node) has either two, three, or four child nodes:
oa 2-node has one data element, and if internal has two
child nodes;
oa 3-node has two data elements, and if internal has three
child nodes;
oa 4-node has three data elements, and if internal has four
child nodes.
HEIGHT BALANCED TREES- 2-3-4 TREE
Properties:
1. Every node (leaf or internal) is a 2-node, 3-node or a 4-
node, and holds one, two, or three data elements,
respectively.
2. All leaves are at the same depth (the bottom level).
3. All data is kept in sorted order.
B+ TREE –PROBLEM 1
1. Consider a B+-tree in which the maximum number of
keys in a node is 5. What is the minimum number of
keys in any non-root node? (GATE CS 2010)
(A) 1
(B) 2
(C) 3
(D) 4
B+ TREE –SOLUTION
Assuming order of B+ tree as p, maximum number of keys
will be (p – 1). As it is given that,
• p – 1 = 5 => p = 6
• Therefore, minimum number of keys:
• ceil(p/2) – 1 = 2
B+ TREE –PROBLEM 2
2. Consider the following 2-3-4 tree (i.e., B-tree with a
minimum degree of two) in which each data item is a
letter. The usual alphabetical ordering of letters is used
in constructing the tree.
What is the result of inserting G in the above tree?
B+ TREE –SOLUTION
Since the given B tree has minimum degree as 2, the
maximum degree or order will be 2*2 = 4. Therefore, it will
have at most 4 pointers or 3 keys.
We will traverse from root till leaf node where G is to be
inserted.
As G is less than L, it will be inserted in leaf node with
elements BHI. After insertion of G, the leaf node in sorted
order will be BGHI which leads to overflow.
It will be split into two parts BG and I and middle element H
will be sent to its parent node as:
B+ TREE –SOLUTION
Now root node with keys H, L, P, U is overflowed which
leads to splitting of root node into two parts HL and U and
middle element P will be root node which matches option
B.
Note:
There occur 2 splits for insertion of G.
The height of B tree is 1 (path from root node to leaf node)
before insertion of G. After insertion of G, the height of B
tree reaches 2.
B+ TREE –HOME WORK PROBLEM 1
A B-tree of order 4 is built from scratch by 10 successive
insertions. What is the maximum number of node splitting
operations that may take place? (GATE CS 2008)
(A) 3
(B) 4
(C) 5
(D) 6
References:
I. Interactive B+ Tree (C)
http://www.amittai.com/prose/bplustree.html
II. B+ Tree Visualization
https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
III. https://www.cs.nmsu.edu/~hcao/teaching/cs582/note/DB2_4_Bpl
usTreeExample.pdf
IV. Lecture notes on Height balance trees
http://software.ucv.ro/~mburicea/lab6ASD.pdf
V. https://www.cpp.edu/~ftang/courses/CS241/notes/self%20balanc
e%20bst.htm