SlideShare ist ein Scribd-Unternehmen logo
1 von 40
File Structures SNU-OOPSLA Lab. 1
Chap12. Extendible Hashing
서욞대학ꔐ ì»Ží“ší„°êł”í•™ë¶€
객ìČŽì§€í–„ì‹œìŠ€í…œì—°ê”Źì‹€
SNU-OOPSLA-LAB
ꔐ수 êč€ í˜• ìŁŒ
File Structures by Folk, Zoellick and Riccardi
File Structures SNU-OOPSLA Lab. 2
Chapter ObjectivesChapter Objectives
Describe the problem solved by extendible hashing and related
approaches
Explain how extendible hashing works; show how it combines
tries with conventional, static hashing
Use the buffer, file, and index classes of previous chapters to
implement extendible hashing, including deletion
Review studies of extendible hashing performance
Examine alternative approaches to the same problem, including
dynamic hashing, linear hashing, and hashing schemes that
control splitting by allowing for overflow buckets
File Structures SNU-OOPSLA Lab. 3
ContentsContents
12.1 Introduction
12.2 How extendible hashing works
12.3 Implementation
12.4 Deletion
12.5 Extendible hashing performance
12.6 Alternative approaches
File Structures SNU-OOPSLA Lab. 4
12.1 Introduction
Dynamic files
undergo a lot of growths
Static hashing
described in chapter 11 (direct hashing)
typically worse than B-Tree for dynamic files
eventually requires file reorganization
Extendible hashing
hashing for dynamic file
Fagin, Nievergelt, Pippenger, and Strong (ACM TODS 1979)
File Structures SNU-OOPSLA Lab. 5
Overview(1)Overview(1)
Direct access (hashing) files have static size, so not
suitable for files whose size is unknown in advance
Dynamic file structure is desired which retains the feature
of fast retrieval by primary key, and which also expands
and contracts as the number of records in the file
fluctuates (without reorganizing the whole file)
Similar motivation!
Indexed-sequential File ==> B tree
Hashing ==> Extendible Hashing
File Structures SNU-OOPSLA Lab. 6
Overview(2)Overview(2)
Extendible Hashing
Primary key H(key)
Hashing function
Directory
Index
Extract first d digit
File pointerTable look-up
File Structures SNU-OOPSLA Lab. 7
12.2 How Extendible Hashing works12.2 How Extendible Hashing works
Idea from Tries file (radix searching)
The branching factor of the tree is equal to the # of
alternative symbols in each position of the key
e.g.) Radix 26 trie - able, abrahms, adams, anderson,
adnrews, baird
Use the first n characters for branching
a
b
b
d
n
l
r
d
e
r
able
abrahms
adams
anderson
andrews
baird
File Structures SNU-OOPSLA Lab. 8
Extendible HashingExtendible Hashing
H maps keys to a fixed address space, with size the largest
prime less than a power of 2 (65531 < 216
)
File pointers point to blocks of records known as buckets,
where an entire bucket is read by one physical data transfer,
buckets may be added to or removed from the file dynamically
The d bits are used as an index in a directory array containing
2d
entries, which usually resides in primary memory
The value d, the directory size(2d
), and the number of buckets
change automatically as the file expands and contracts
File Structures SNU-OOPSLA Lab. 9
Extendible Hashing ExampleExtendible Hashing Example
000
001
010
011
100
101
110
111
d’=1
d’=3
d’=3
d’=2
Directory with d=3 and 4 buckets
B0
B100
B101
B11
H(key)=0
H(key)=100
H(key)=101
H(key)=11
d=3
File Structures SNU-OOPSLA Lab. 10
Turning the trie into a directoryTurning the trie into a directory
Using Trie for extendible hashing
(1) Use Radix 2 Trie :
Keys in A : beginning with 0
Keys in B : beginning with 10
Keys in C : beginning with 11
(2) Retrieving from secondary storage the buckets containing
keys, instead of individual keys
A
B
C
0
1 0
1
File Structures SNU-OOPSLA Lab. 11
Representation of Trie (1)Representation of Trie (1)
Tree is not preferable (directory is not big)
A flattened array
1. Make a complete full binary tree
2. Collapse it into the directory structure
0
1
0
1
0
1
C
A
B
00
01
10
11
A
B
C
File Structures SNU-OOPSLA Lab. 12
Representation of Trie(2)Representation of Trie(2)
Directory is a complete binary tree
Directory entry : a pointer to the associated bucket
Given an address beginning with the bits 10, the 210
directory
entries
Introduced for uniform distribution
File Structures SNU-OOPSLA Lab. 13
Retrieve a recordRetrieve a record
Steps in retrieving a record with a given key
find H(given key)
extract first d bits of H(given key)
use this value as an index into the directory to find a pointer
use this pointer to read a bucket into primary memory
locate the desired record within the bucket (scan)
File Structures SNU-OOPSLA Lab. 14
Expansion & Contraction(1)Expansion & Contraction(1)
A pair of adjunct buckets with the same value of d’ which
share a common value of the first d’-1 bits of H(key) can
be combined if the average load < 50%, so all records
would be able to fit into one bucket
File contraction is the reverse of expansion; the directory
can be compacted and d decremented whenever all pairs
of pointers have the same values
File Structures SNU-OOPSLA Lab. 15
Expansion & Contraction(2)Expansion & Contraction(2)
000
001
010
011
100
101
110
111
d’=2
Bucket B0 overflows, then splits into B0 and B1
B00 H(key)=00..
d’=2
B01 H(key)=01..
d’=3
B100 H(key)=100..
d’=3
B00 H(key)=101..
d’=2
B00 H(key)=11..
d=3
File Structures SNU-OOPSLA Lab. 16
Expansion & Contraction(3)Expansion & Contraction(3)
0000
d’=2
Bucket B100 overflows, d increase to 4
B00 H(key)=00..
d’=2
B01 H(key)=01..
d’=4
B1000H(key)=1000..
d’=4
B1001H(key)=1001..
d’=3
B101 H(key)=101..
d=4
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
d’=2
B11 H(key)=11..
File Structures SNU-OOPSLA Lab. 17
Splitting to Handle Overflow (1)Splitting to Handle Overflow (1)
When overflow occurs
e.g.1) Overflowing of bucket A
Split A into A and D
Come to use additional unused bits
No need to expand the directory
00
01
10
11
B
C
A
D
00
01
10
11
A
B
C
File Structures SNU-OOPSLA Lab. 18
Splitting to Handle Overflow(2)Splitting to Handle Overflow(2)
e.g. Overflowing of bucket B
Do not have additional unused bits
(need to expand the directory)
1. Divide B using 3 bits of hash address
2. Make a complete full binary tree
3. Collapse it into the directory structure
00
01
10
11
A
B
C
File Structures SNU-OOPSLA Lab. 19
A
B
C
D
0
1 0
1
0
1
0
1
0
10
1 0
1
0
1 0
1
0
1
A
B
D
C
000
001
010
011
A
100
101
110
111
C
B
D
1. Result of overflow of bucket B
3. Directory
2. Complete Binary Tree
File Structures SNU-OOPSLA Lab. 20
Creating Address
Function hash(KEY)
Fold/Add hashing algorithm
Do not MOD hashing value by address space since no fixed
address space exists
Output from the hash function for a number of keys
bill 0000 0011 0110 1100
lee 0000 0100 0010 1000
pauline 0000 1111 0110 0101
alan 0100 1100 1010 0010
julie 0010 1110 0000 1001
mike 0000 0111 0100 1101
elizabeth 0010 1100 0110 1010
mark 0000 1010 0000 0111
File Structures SNU-OOPSLA Lab. 21
Int Hash (char * key)
{
int sum = 0;
int len = strlen(key);
if (len % 2 == 1) len ++; // make len even
for (int j = 0; j < len; j+2)
sum = (sum + 100 * key[j] + key[j+1]) % 19937;
return sum;
}
Figure 12.7 Function Hash (key) returns an integer hash value for key
for a 15 bit
File Structures SNU-OOPSLA Lab. 22
Int MakeAddress (char * key, int depth)
{
int retval = 0;
int hashVal = Hash(key);
// reverse the bits
for (int j = 0; j < depth; j++)
{
retval = retval << 1;
int lowbit = hashVal & 1;
retval = retval | lowbit;
hashVal = hashVal >> 1;
}
return retval;
}
Figure 12.9 Function MakeAddress(key,depth)
File Structures SNU-OOPSLA Lab. 23
Class Bucket: protected TextIndex
{protected:
Bucket (Directory & dir, int maxKeys = defaultMaxKeys);
int Insert (char * key, int recAddr);
int Remove(char * key);
Bucket * Split ();
int NewRange (int & newStart, int & newEnd);
int Redistribute (Bucket & newBucket);
int FindBuddy ();
int TryCombine ();
int Combine (Bucket * buddy, int buddyIndex);
int Depth;
Directory & Dir;
int BucketAddr;
friend class Directory;
friend class BucketBuffer;
}; Figure 12.10 Main members of class Bucket
File Structures SNU-OOPSLA Lab. 24
class Directory
{public:
Directory (
..); ~Directory();
int Open (..); int Create(
); int Close();
int Insert(
); int Delete(
); int Search(
);
protected
int DoubleSize();
int Collape();
int InsertBucket (
.);
int Find (
);
int StoreBucket(
);
int LoadBucket(
)

..
}
Figure 12.11 Definition of class Directory
File Structures SNU-OOPSLA Lab. 25
12.4 Deletion12.4 Deletion
When to combine buckets
Buddy buckets: the buckets are siblings and at the leaf level
of the tree (Buddy means something like friend)
e.g., B and D in page 19 are buddy buckets
Examine the directory to see if we can make changes
there
Shrink the directory if none of the buckets requires the depth
of address information that is currently available in the
directory
File Structures SNU-OOPSLA Lab. 26
Buddy BucketBuddy Bucket
Given a bucket with an address uvwxy, where u,
v, w, x, and y have values of either 0 or 1, the
buddy bucket, if it exists, has the value uvwxz,
such that
z = y XOR 1
If enough keys are deleted, the contents of buddy
buckets can be combined into a single bucket
File Structures SNU-OOPSLA Lab. 27
Collapsing the DirectoryCollapsing the Directory
Collapse condition
If a single cell, downsizing is impossible
If there is a pair of directory cells that do not both point to the
same bucket, collapsing is impossible
Allocating space
Allocate half the size of the original
Copy the bucket references shared by each cell pair to a single
cell in the new directory
File Structures SNU-OOPSLA Lab. 28
12.5 Extendible Hashing Performance12.5 Extendible Hashing Performance
Time : O(1)
If the directory can kept in RAM: a single access
Otherwise: two accesses are necessary
Space utilization of the bucket
r (# of records), b (block size), N (# of Blocks)
Utilization = r / bN
Average utilization ==> 0.69
Space utilization for the directory
How large a directory should we expect to have,
given an expected number of keys?
Expected value for the directory size by Flajolet(1983)
Estimated directory size =3.92 / b X r(1+1/b)
File Structures SNU-OOPSLA Lab. 29
Periodic and fluctuating
With uniform distributed addresses, all the buckets tend to fill up at the
same time -> split at the same time
As buffer fills up : 90%
After a concentrated series of splits : 50%
r : # of records , b : block size
N ~= 4/(b ln 2)
Utilization = r / bN ~= ln 2 = 0.69
Average utilization of 69%
B tree space utilization
Normal B-tree : 67%, B-tree with redistribution in insertion : 85 %
Space utilization for buckets
File Structures SNU-OOPSLA Lab. 30
12.6 Alternative Approaches(1):12.6 Alternative Approaches(1): Dynamic Hashing
Similar to dynamic extendible hashing
Use a directory to track bucket addresses
Extend the directory through the use of tries
Start with a hash function that covers an address space of
a fixed size
When overflow occurs
splits forming the leaves of a trie that grows down from the
original address node makes a trie
File Structures SNU-OOPSLA Lab. 31
Two kinds of nodes
External node: reference a data bucket
Internal node: point to two children index nodes
When a node has split children, it changed from an external
node to an internal node
Two hash functions
Apply the first hash function original address space
if external node is found : search is completed
if internal node is found : apply second hash function
Alternative Approaches(2):Alternative Approaches(2): Dynamic Hashing
File Structures SNU-OOPSLA Lab. 32
1 2 3 4
41 2 3
40 41
41 3
1
410
20 21 41
411
2
Original
address
space
Original
address
space
Original
address
space
(a)
(b)
(c)
File Structures SNU-OOPSLA Lab. 33
Dynamic Hashing vs. Extendible Hashing(1)Dynamic Hashing vs. Extendible Hashing(1)
Overflow handling
Both schemes extend the hash function locally, as a binary search
trie
Both schemes use directory structure
Dynamic hashing: a linked structure
Extendible hashing: perfect tree expressible as an array
Space Utilization
both schemes is the same (space utilization : 69%)
File Structures SNU-OOPSLA Lab. 34
Dynamic Hashing and Extendible Hashing(2)Dynamic Hashing and Extendible Hashing(2)
Growth of directory
Dynamic hashing: slower, more gradual growth
Extendible hashing: extend directory by doubling it
Actual size of an index node
Dynamic hashing is lager than a directory cell in extendible
hashing (because of pointers)
Page fault
Dynamic hashing: more than one page fault (with linked structure
for the directory)
Extendible hashing: single page fault
File Structures SNU-OOPSLA Lab. 35
Alternative Approaches(3):Alternative Approaches(3): Linear Hashing
Unlike extendible hashing and dynamic hashing, linear hashing does
not use a directory.
The actual address space is extended one bucket at a time as
buckets overflow
Because the extension of the address space does not necessarily
correspond to the bucket that is overflowing,
linear hashing necessarily involves the use of overflow buckets, even
as the address space expands
No directories: Avoid additional seek resulting from additional layer
Use more bits of hashed value
hd(k) : depth d hashing function (using function make_address)
File Structures SNU-OOPSLA Lab. 36
a b c d
00 01 10 11
a b c d A
w
00 01 10 11 100 101
a b c d A B
x
a b c d A B C
00 01 10 11 100 101 110
x
y
(a) (b)
(c) (d)
(continued...)
The growth of address space in linear hashing(1)
000 01 10 11 100
File Structures SNU-OOPSLA Lab. 37
a b c d A B C D
00 01 10 11 100 101 110 111
x
(e)
The growth of address space in linear hashing(2)
File Structures SNU-OOPSLA Lab. 38
Alternative Approaches(5)Alternative Approaches(5)
::Approaches to Controlling SplittingApproaches to Controlling Splitting
Postpone splitting: increase space utilization
B-Tree: redistribution rather than splitting
Hashing: placing records in chains of overflow buckets to
postpone splitting
Triggering event for splitting
Linear hashing
Every time any bucket overflows
Not split overflowing bucket
Litwin(1980): overall load factor of the file
Below 2 seeks, 75% ~ 80% storage utilization
File Structures SNU-OOPSLA Lab. 39
Alternative Approaches(5)Alternative Approaches(5)
::Approaches to Controlling SplittingApproaches to Controlling Splitting
Postpone splitting for extensible hashing
Use chaining overflow bucket
Avoid doubling directory space
1.1 seek, 76% ~ 81% storage utilization
File Structures SNU-OOPSLA Lab. 40
Let’s Review !!!Let’s Review !!!
12.1 Introduction
12.2 How extendible hashing works
12.3 Implementation
12.4 Deletion
12.5 Extendible hashing performance
12.6 Alternative approaches

Weitere Àhnliche Inhalte

Was ist angesagt?

Queue in Data Structure
Queue in Data Structure Queue in Data Structure
Queue in Data Structure Janki Shah
 
1.1 binary tree
1.1 binary tree1.1 binary tree
1.1 binary treeKrish_ver2
 
Big o notation
Big o notationBig o notation
Big o notationhamza mushtaq
 
Basic SQL and History
 Basic SQL and History Basic SQL and History
Basic SQL and HistorySomeshwarMoholkar
 
Array operations
Array operationsArray operations
Array operationsZAFAR444
 
Transaction management in DBMS
Transaction management in DBMSTransaction management in DBMS
Transaction management in DBMSMegha Sharma
 
Deque and its applications
Deque and its applicationsDeque and its applications
Deque and its applicationsJsaddam Hussain
 
Leftist heap
Leftist heapLeftist heap
Leftist heapShuvro Roy
 
Transaction states and properties
Transaction states and propertiesTransaction states and properties
Transaction states and propertiesChetan Mahawar
 
Binary Search Tree in Data Structure
Binary Search Tree in Data StructureBinary Search Tree in Data Structure
Binary Search Tree in Data StructureDharita Chokshi
 
Functions in python
Functions in pythonFunctions in python
Functions in pythoncolorsof
 
Stack - Data Structure - Notes
Stack - Data Structure - NotesStack - Data Structure - Notes
Stack - Data Structure - NotesOmprakash Chauhan
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMSkoolkampus
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: NormalisationDamian T. Gordon
 
Queue data structure
Queue data structureQueue data structure
Queue data structureanooppjoseph
 

Was ist angesagt? (20)

Queue in Data Structure
Queue in Data Structure Queue in Data Structure
Queue in Data Structure
 
1.1 binary tree
1.1 binary tree1.1 binary tree
1.1 binary tree
 
Big o notation
Big o notationBig o notation
Big o notation
 
standard template library(STL) in C++
standard template library(STL) in C++standard template library(STL) in C++
standard template library(STL) in C++
 
Basic SQL and History
 Basic SQL and History Basic SQL and History
Basic SQL and History
 
Tree Traversal
Tree TraversalTree Traversal
Tree Traversal
 
Array operations
Array operationsArray operations
Array operations
 
Queue
QueueQueue
Queue
 
Arrays
ArraysArrays
Arrays
 
Transaction management in DBMS
Transaction management in DBMSTransaction management in DBMS
Transaction management in DBMS
 
Deque and its applications
Deque and its applicationsDeque and its applications
Deque and its applications
 
Leftist heap
Leftist heapLeftist heap
Leftist heap
 
Transaction states and properties
Transaction states and propertiesTransaction states and properties
Transaction states and properties
 
STL in C++
STL in C++STL in C++
STL in C++
 
Binary Search Tree in Data Structure
Binary Search Tree in Data StructureBinary Search Tree in Data Structure
Binary Search Tree in Data Structure
 
Functions in python
Functions in pythonFunctions in python
Functions in python
 
Stack - Data Structure - Notes
Stack - Data Structure - NotesStack - Data Structure - Notes
Stack - Data Structure - Notes
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
 
Databases: Normalisation
Databases: NormalisationDatabases: Normalisation
Databases: Normalisation
 
Queue data structure
Queue data structureQueue data structure
Queue data structure
 

Andere mochten auch

Chapter13
Chapter13Chapter13
Chapter13gourab87
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data StructuresSHAKOOR AB
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMSkoolkampus
 
1.5 weka an intoduction
1.5 weka an intoduction1.5 weka an intoduction
1.5 weka an intoductionKrish_ver2
 
2.5 graph dfs
2.5 graph dfs2.5 graph dfs
2.5 graph dfsKrish_ver2
 
2.3 shortest path dijkstra’s
2.3 shortest path dijkstra’s 2.3 shortest path dijkstra’s
2.3 shortest path dijkstra’s Krish_ver2
 
2.5 dfs & bfs
2.5 dfs & bfs2.5 dfs & bfs
2.5 dfs & bfsKrish_ver2
 
Hashing Algorithm
Hashing AlgorithmHashing Algorithm
Hashing AlgorithmHayi Nukman
 
5.5 back track
5.5 back track5.5 back track
5.5 back trackKrish_ver2
 
On Ahimsa : Reply To Lala Lajpat Rai A literature presentation
On Ahimsa : Reply To Lala Lajpat Rai A literature presentationOn Ahimsa : Reply To Lala Lajpat Rai A literature presentation
On Ahimsa : Reply To Lala Lajpat Rai A literature presentationMohammad Imad Shahid Khan
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
DBMS topics for BCA
DBMS topics for BCADBMS topics for BCA
DBMS topics for BCAAdbay
 
4 the relational data model and relational database constraints
4 the relational data model and relational database constraints4 the relational data model and relational database constraints
4 the relational data model and relational database constraintsKumar
 
Disk scheduling algorithms
Disk scheduling algorithms Disk scheduling algorithms
Disk scheduling algorithms Paresh Parmar
 
Disk scheduling
Disk schedulingDisk scheduling
Disk schedulingSuraj Shukla
 
9 cm402.18
9 cm402.189 cm402.18
9 cm402.18myrajendra
 
Arrays and addressing modes
Arrays and addressing modesArrays and addressing modes
Arrays and addressing modesBilal Amjad
 

Andere mochten auch (20)

Chapter13
Chapter13Chapter13
Chapter13
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 
1.5 weka an intoduction
1.5 weka an intoduction1.5 weka an intoduction
1.5 weka an intoduction
 
Extendible hashing
Extendible hashingExtendible hashing
Extendible hashing
 
2.5 graph dfs
2.5 graph dfs2.5 graph dfs
2.5 graph dfs
 
2.3 shortest path dijkstra’s
2.3 shortest path dijkstra’s 2.3 shortest path dijkstra’s
2.3 shortest path dijkstra’s
 
2.5 dfs & bfs
2.5 dfs & bfs2.5 dfs & bfs
2.5 dfs & bfs
 
Hashing Algorithm
Hashing AlgorithmHashing Algorithm
Hashing Algorithm
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
On Ahimsa : Reply To Lala Lajpat Rai A literature presentation
On Ahimsa : Reply To Lala Lajpat Rai A literature presentationOn Ahimsa : Reply To Lala Lajpat Rai A literature presentation
On Ahimsa : Reply To Lala Lajpat Rai A literature presentation
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
DBMS topics for BCA
DBMS topics for BCADBMS topics for BCA
DBMS topics for BCA
 
4 the relational data model and relational database constraints
4 the relational data model and relational database constraints4 the relational data model and relational database constraints
4 the relational data model and relational database constraints
 
Huffman
HuffmanHuffman
Huffman
 
Dijksatra
DijksatraDijksatra
Dijksatra
 
Disk scheduling algorithms
Disk scheduling algorithms Disk scheduling algorithms
Disk scheduling algorithms
 
Disk scheduling
Disk schedulingDisk scheduling
Disk scheduling
 
9 cm402.18
9 cm402.189 cm402.18
9 cm402.18
 
Arrays and addressing modes
Arrays and addressing modesArrays and addressing modes
Arrays and addressing modes
 

Ähnlich wie 4.4 external hashing

extensiblehashing-191010111114.pdf
extensiblehashing-191010111114.pdfextensiblehashing-191010111114.pdf
extensiblehashing-191010111114.pdfaukesify
 
Extensible hashing
Extensible hashingExtensible hashing
Extensible hashingrajshreemuthiah
 
File System Implementation.pptx
File System Implementation.pptxFile System Implementation.pptx
File System Implementation.pptxRajapriya82
 
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptxLecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptxROHIT738213
 
Hashing
HashingHashing
Hashingamoldkul
 
Hash Table.pptx
Hash Table.pptxHash Table.pptx
Hash Table.pptxMBablu1
 
Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Namit Jain
 
1.R_For_Libraries_Session_2_-_Data_Exploration.pptx
1.R_For_Libraries_Session_2_-_Data_Exploration.pptx1.R_For_Libraries_Session_2_-_Data_Exploration.pptx
1.R_For_Libraries_Session_2_-_Data_Exploration.pptxpathanthecreator1
 
VTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERS
VTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERSVTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERS
VTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERSvtunotesbysree
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Barry DeCicco
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonUC San Diego
 
Final exam in advance dbms
Final exam in advance dbmsFinal exam in advance dbms
Final exam in advance dbmsMd. Mashiur Rahman
 
Unlock user behavior with 87 Million events using Hudi, StarRocks & MinIO
Unlock user behavior with 87 Million events using Hudi, StarRocks & MinIOUnlock user behavior with 87 Million events using Hudi, StarRocks & MinIO
Unlock user behavior with 87 Million events using Hudi, StarRocks & MinIOnadine39280
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniyaTutorialsDuniya.com
 
Chapter 2 Linux File System and net.pptx
Chapter 2 Linux File System and net.pptxChapter 2 Linux File System and net.pptx
Chapter 2 Linux File System and net.pptxalehegn9
 
Bt0066 dbms
Bt0066 dbmsBt0066 dbms
Bt0066 dbmssmumbahelp
 

Ähnlich wie 4.4 external hashing (20)

extensiblehashing-191010111114.pdf
extensiblehashing-191010111114.pdfextensiblehashing-191010111114.pdf
extensiblehashing-191010111114.pdf
 
Extensible hashing
Extensible hashingExtensible hashing
Extensible hashing
 
File System Implementation.pptx
File System Implementation.pptxFile System Implementation.pptx
File System Implementation.pptx
 
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptxLecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
 
PAM.ppt
PAM.pptPAM.ppt
PAM.ppt
 
Hashing
HashingHashing
Hashing
 
Hash Table.pptx
Hash Table.pptxHash Table.pptx
Hash Table.pptx
 
Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009Hive Demo Paper at VLDB 2009
Hive Demo Paper at VLDB 2009
 
SKILLWISE-DB2 DBA
SKILLWISE-DB2 DBASKILLWISE-DB2 DBA
SKILLWISE-DB2 DBA
 
1.R_For_Libraries_Session_2_-_Data_Exploration.pptx
1.R_For_Libraries_Session_2_-_Data_Exploration.pptx1.R_For_Libraries_Session_2_-_Data_Exploration.pptx
1.R_For_Libraries_Session_2_-_Data_Exploration.pptx
 
VTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERS
VTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERSVTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERS
VTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERS
 
Hashing PPT
Hashing PPTHashing PPT
Hashing PPT
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Final exam in advance dbms
Final exam in advance dbmsFinal exam in advance dbms
Final exam in advance dbms
 
Unlock user behavior with 87 Million events using Hudi, StarRocks & MinIO
Unlock user behavior with 87 Million events using Hudi, StarRocks & MinIOUnlock user behavior with 87 Million events using Hudi, StarRocks & MinIO
Unlock user behavior with 87 Million events using Hudi, StarRocks & MinIO
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniya
 
Ch12
Ch12Ch12
Ch12
 
Chapter 2 Linux File System and net.pptx
Chapter 2 Linux File System and net.pptxChapter 2 Linux File System and net.pptx
Chapter 2 Linux File System and net.pptx
 
Bt0066 dbms
Bt0066 dbmsBt0066 dbms
Bt0066 dbms
 

Mehr von Krish_ver2

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back trackingKrish_ver2
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithmKrish_ver2
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02Krish_ver2
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedyKrish_ver2
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03Krish_ver2
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02Krish_ver2
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashingKrish_ver2
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing extKrish_ver2
 
4.2 bst 03
4.2 bst 034.2 bst 03
4.2 bst 03Krish_ver2
 
4.2 bst 02
4.2 bst 024.2 bst 02
4.2 bst 02Krish_ver2
 

Mehr von Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 
4.2 bst 03
4.2 bst 034.2 bst 03
4.2 bst 03
 
4.2 bst 02
4.2 bst 024.2 bst 02
4.2 bst 02
 

KĂŒrzlich hochgeladen

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
call girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 

KĂŒrzlich hochgeladen (20)

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
call girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Kamla Market (DELHI) 🔝 >àŒ’9953330565🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 

4.4 external hashing

  • 1. File Structures SNU-OOPSLA Lab. 1 Chap12. Extendible Hashing 서욞대학ꔐ ì»Ží“ší„°êł”í•™ë¶€ 객ìČŽì§€í–„ì‹œìŠ€í…œì—°ê”Źì‹€ SNU-OOPSLA-LAB ꔐ수 êč€ í˜• ìŁŒ File Structures by Folk, Zoellick and Riccardi
  • 2. File Structures SNU-OOPSLA Lab. 2 Chapter ObjectivesChapter Objectives Describe the problem solved by extendible hashing and related approaches Explain how extendible hashing works; show how it combines tries with conventional, static hashing Use the buffer, file, and index classes of previous chapters to implement extendible hashing, including deletion Review studies of extendible hashing performance Examine alternative approaches to the same problem, including dynamic hashing, linear hashing, and hashing schemes that control splitting by allowing for overflow buckets
  • 3. File Structures SNU-OOPSLA Lab. 3 ContentsContents 12.1 Introduction 12.2 How extendible hashing works 12.3 Implementation 12.4 Deletion 12.5 Extendible hashing performance 12.6 Alternative approaches
  • 4. File Structures SNU-OOPSLA Lab. 4 12.1 Introduction Dynamic files undergo a lot of growths Static hashing described in chapter 11 (direct hashing) typically worse than B-Tree for dynamic files eventually requires file reorganization Extendible hashing hashing for dynamic file Fagin, Nievergelt, Pippenger, and Strong (ACM TODS 1979)
  • 5. File Structures SNU-OOPSLA Lab. 5 Overview(1)Overview(1) Direct access (hashing) files have static size, so not suitable for files whose size is unknown in advance Dynamic file structure is desired which retains the feature of fast retrieval by primary key, and which also expands and contracts as the number of records in the file fluctuates (without reorganizing the whole file) Similar motivation! Indexed-sequential File ==> B tree Hashing ==> Extendible Hashing
  • 6. File Structures SNU-OOPSLA Lab. 6 Overview(2)Overview(2) Extendible Hashing Primary key H(key) Hashing function Directory Index Extract first d digit File pointerTable look-up
  • 7. File Structures SNU-OOPSLA Lab. 7 12.2 How Extendible Hashing works12.2 How Extendible Hashing works Idea from Tries file (radix searching) The branching factor of the tree is equal to the # of alternative symbols in each position of the key e.g.) Radix 26 trie - able, abrahms, adams, anderson, adnrews, baird Use the first n characters for branching a b b d n l r d e r able abrahms adams anderson andrews baird
  • 8. File Structures SNU-OOPSLA Lab. 8 Extendible HashingExtendible Hashing H maps keys to a fixed address space, with size the largest prime less than a power of 2 (65531 < 216 ) File pointers point to blocks of records known as buckets, where an entire bucket is read by one physical data transfer, buckets may be added to or removed from the file dynamically The d bits are used as an index in a directory array containing 2d entries, which usually resides in primary memory The value d, the directory size(2d ), and the number of buckets change automatically as the file expands and contracts
  • 9. File Structures SNU-OOPSLA Lab. 9 Extendible Hashing ExampleExtendible Hashing Example 000 001 010 011 100 101 110 111 d’=1 d’=3 d’=3 d’=2 Directory with d=3 and 4 buckets B0 B100 B101 B11 H(key)=0 H(key)=100 H(key)=101 H(key)=11 d=3
  • 10. File Structures SNU-OOPSLA Lab. 10 Turning the trie into a directoryTurning the trie into a directory Using Trie for extendible hashing (1) Use Radix 2 Trie : Keys in A : beginning with 0 Keys in B : beginning with 10 Keys in C : beginning with 11 (2) Retrieving from secondary storage the buckets containing keys, instead of individual keys A B C 0 1 0 1
  • 11. File Structures SNU-OOPSLA Lab. 11 Representation of Trie (1)Representation of Trie (1) Tree is not preferable (directory is not big) A flattened array 1. Make a complete full binary tree 2. Collapse it into the directory structure 0 1 0 1 0 1 C A B 00 01 10 11 A B C
  • 12. File Structures SNU-OOPSLA Lab. 12 Representation of Trie(2)Representation of Trie(2) Directory is a complete binary tree Directory entry : a pointer to the associated bucket Given an address beginning with the bits 10, the 210 directory entries Introduced for uniform distribution
  • 13. File Structures SNU-OOPSLA Lab. 13 Retrieve a recordRetrieve a record Steps in retrieving a record with a given key find H(given key) extract first d bits of H(given key) use this value as an index into the directory to find a pointer use this pointer to read a bucket into primary memory locate the desired record within the bucket (scan)
  • 14. File Structures SNU-OOPSLA Lab. 14 Expansion & Contraction(1)Expansion & Contraction(1) A pair of adjunct buckets with the same value of d’ which share a common value of the first d’-1 bits of H(key) can be combined if the average load < 50%, so all records would be able to fit into one bucket File contraction is the reverse of expansion; the directory can be compacted and d decremented whenever all pairs of pointers have the same values
  • 15. File Structures SNU-OOPSLA Lab. 15 Expansion & Contraction(2)Expansion & Contraction(2) 000 001 010 011 100 101 110 111 d’=2 Bucket B0 overflows, then splits into B0 and B1 B00 H(key)=00.. d’=2 B01 H(key)=01.. d’=3 B100 H(key)=100.. d’=3 B00 H(key)=101.. d’=2 B00 H(key)=11.. d=3
  • 16. File Structures SNU-OOPSLA Lab. 16 Expansion & Contraction(3)Expansion & Contraction(3) 0000 d’=2 Bucket B100 overflows, d increase to 4 B00 H(key)=00.. d’=2 B01 H(key)=01.. d’=4 B1000H(key)=1000.. d’=4 B1001H(key)=1001.. d’=3 B101 H(key)=101.. d=4 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 d’=2 B11 H(key)=11..
  • 17. File Structures SNU-OOPSLA Lab. 17 Splitting to Handle Overflow (1)Splitting to Handle Overflow (1) When overflow occurs e.g.1) Overflowing of bucket A Split A into A and D Come to use additional unused bits No need to expand the directory 00 01 10 11 B C A D 00 01 10 11 A B C
  • 18. File Structures SNU-OOPSLA Lab. 18 Splitting to Handle Overflow(2)Splitting to Handle Overflow(2) e.g. Overflowing of bucket B Do not have additional unused bits (need to expand the directory) 1. Divide B using 3 bits of hash address 2. Make a complete full binary tree 3. Collapse it into the directory structure 00 01 10 11 A B C
  • 19. File Structures SNU-OOPSLA Lab. 19 A B C D 0 1 0 1 0 1 0 1 0 10 1 0 1 0 1 0 1 0 1 A B D C 000 001 010 011 A 100 101 110 111 C B D 1. Result of overflow of bucket B 3. Directory 2. Complete Binary Tree
  • 20. File Structures SNU-OOPSLA Lab. 20 Creating Address Function hash(KEY) Fold/Add hashing algorithm Do not MOD hashing value by address space since no fixed address space exists Output from the hash function for a number of keys bill 0000 0011 0110 1100 lee 0000 0100 0010 1000 pauline 0000 1111 0110 0101 alan 0100 1100 1010 0010 julie 0010 1110 0000 1001 mike 0000 0111 0100 1101 elizabeth 0010 1100 0110 1010 mark 0000 1010 0000 0111
  • 21. File Structures SNU-OOPSLA Lab. 21 Int Hash (char * key) { int sum = 0; int len = strlen(key); if (len % 2 == 1) len ++; // make len even for (int j = 0; j < len; j+2) sum = (sum + 100 * key[j] + key[j+1]) % 19937; return sum; } Figure 12.7 Function Hash (key) returns an integer hash value for key for a 15 bit
  • 22. File Structures SNU-OOPSLA Lab. 22 Int MakeAddress (char * key, int depth) { int retval = 0; int hashVal = Hash(key); // reverse the bits for (int j = 0; j < depth; j++) { retval = retval << 1; int lowbit = hashVal & 1; retval = retval | lowbit; hashVal = hashVal >> 1; } return retval; } Figure 12.9 Function MakeAddress(key,depth)
  • 23. File Structures SNU-OOPSLA Lab. 23 Class Bucket: protected TextIndex {protected: Bucket (Directory & dir, int maxKeys = defaultMaxKeys); int Insert (char * key, int recAddr); int Remove(char * key); Bucket * Split (); int NewRange (int & newStart, int & newEnd); int Redistribute (Bucket & newBucket); int FindBuddy (); int TryCombine (); int Combine (Bucket * buddy, int buddyIndex); int Depth; Directory & Dir; int BucketAddr; friend class Directory; friend class BucketBuffer; }; Figure 12.10 Main members of class Bucket
  • 24. File Structures SNU-OOPSLA Lab. 24 class Directory {public: Directory (
..); ~Directory(); int Open (..); int Create(
); int Close(); int Insert(
); int Delete(
); int Search(
); protected int DoubleSize(); int Collape(); int InsertBucket (
.); int Find (
); int StoreBucket(
); int LoadBucket(
) 
.. } Figure 12.11 Definition of class Directory
  • 25. File Structures SNU-OOPSLA Lab. 25 12.4 Deletion12.4 Deletion When to combine buckets Buddy buckets: the buckets are siblings and at the leaf level of the tree (Buddy means something like friend) e.g., B and D in page 19 are buddy buckets Examine the directory to see if we can make changes there Shrink the directory if none of the buckets requires the depth of address information that is currently available in the directory
  • 26. File Structures SNU-OOPSLA Lab. 26 Buddy BucketBuddy Bucket Given a bucket with an address uvwxy, where u, v, w, x, and y have values of either 0 or 1, the buddy bucket, if it exists, has the value uvwxz, such that z = y XOR 1 If enough keys are deleted, the contents of buddy buckets can be combined into a single bucket
  • 27. File Structures SNU-OOPSLA Lab. 27 Collapsing the DirectoryCollapsing the Directory Collapse condition If a single cell, downsizing is impossible If there is a pair of directory cells that do not both point to the same bucket, collapsing is impossible Allocating space Allocate half the size of the original Copy the bucket references shared by each cell pair to a single cell in the new directory
  • 28. File Structures SNU-OOPSLA Lab. 28 12.5 Extendible Hashing Performance12.5 Extendible Hashing Performance Time : O(1) If the directory can kept in RAM: a single access Otherwise: two accesses are necessary Space utilization of the bucket r (# of records), b (block size), N (# of Blocks) Utilization = r / bN Average utilization ==> 0.69 Space utilization for the directory How large a directory should we expect to have, given an expected number of keys? Expected value for the directory size by Flajolet(1983) Estimated directory size =3.92 / b X r(1+1/b)
  • 29. File Structures SNU-OOPSLA Lab. 29 Periodic and fluctuating With uniform distributed addresses, all the buckets tend to fill up at the same time -> split at the same time As buffer fills up : 90% After a concentrated series of splits : 50% r : # of records , b : block size N ~= 4/(b ln 2) Utilization = r / bN ~= ln 2 = 0.69 Average utilization of 69% B tree space utilization Normal B-tree : 67%, B-tree with redistribution in insertion : 85 % Space utilization for buckets
  • 30. File Structures SNU-OOPSLA Lab. 30 12.6 Alternative Approaches(1):12.6 Alternative Approaches(1): Dynamic Hashing Similar to dynamic extendible hashing Use a directory to track bucket addresses Extend the directory through the use of tries Start with a hash function that covers an address space of a fixed size When overflow occurs splits forming the leaves of a trie that grows down from the original address node makes a trie
  • 31. File Structures SNU-OOPSLA Lab. 31 Two kinds of nodes External node: reference a data bucket Internal node: point to two children index nodes When a node has split children, it changed from an external node to an internal node Two hash functions Apply the first hash function original address space if external node is found : search is completed if internal node is found : apply second hash function Alternative Approaches(2):Alternative Approaches(2): Dynamic Hashing
  • 32. File Structures SNU-OOPSLA Lab. 32 1 2 3 4 41 2 3 40 41 41 3 1 410 20 21 41 411 2 Original address space Original address space Original address space (a) (b) (c)
  • 33. File Structures SNU-OOPSLA Lab. 33 Dynamic Hashing vs. Extendible Hashing(1)Dynamic Hashing vs. Extendible Hashing(1) Overflow handling Both schemes extend the hash function locally, as a binary search trie Both schemes use directory structure Dynamic hashing: a linked structure Extendible hashing: perfect tree expressible as an array Space Utilization both schemes is the same (space utilization : 69%)
  • 34. File Structures SNU-OOPSLA Lab. 34 Dynamic Hashing and Extendible Hashing(2)Dynamic Hashing and Extendible Hashing(2) Growth of directory Dynamic hashing: slower, more gradual growth Extendible hashing: extend directory by doubling it Actual size of an index node Dynamic hashing is lager than a directory cell in extendible hashing (because of pointers) Page fault Dynamic hashing: more than one page fault (with linked structure for the directory) Extendible hashing: single page fault
  • 35. File Structures SNU-OOPSLA Lab. 35 Alternative Approaches(3):Alternative Approaches(3): Linear Hashing Unlike extendible hashing and dynamic hashing, linear hashing does not use a directory. The actual address space is extended one bucket at a time as buckets overflow Because the extension of the address space does not necessarily correspond to the bucket that is overflowing, linear hashing necessarily involves the use of overflow buckets, even as the address space expands No directories: Avoid additional seek resulting from additional layer Use more bits of hashed value hd(k) : depth d hashing function (using function make_address)
  • 36. File Structures SNU-OOPSLA Lab. 36 a b c d 00 01 10 11 a b c d A w 00 01 10 11 100 101 a b c d A B x a b c d A B C 00 01 10 11 100 101 110 x y (a) (b) (c) (d) (continued...) The growth of address space in linear hashing(1) 000 01 10 11 100
  • 37. File Structures SNU-OOPSLA Lab. 37 a b c d A B C D 00 01 10 11 100 101 110 111 x (e) The growth of address space in linear hashing(2)
  • 38. File Structures SNU-OOPSLA Lab. 38 Alternative Approaches(5)Alternative Approaches(5) ::Approaches to Controlling SplittingApproaches to Controlling Splitting Postpone splitting: increase space utilization B-Tree: redistribution rather than splitting Hashing: placing records in chains of overflow buckets to postpone splitting Triggering event for splitting Linear hashing Every time any bucket overflows Not split overflowing bucket Litwin(1980): overall load factor of the file Below 2 seeks, 75% ~ 80% storage utilization
  • 39. File Structures SNU-OOPSLA Lab. 39 Alternative Approaches(5)Alternative Approaches(5) ::Approaches to Controlling SplittingApproaches to Controlling Splitting Postpone splitting for extensible hashing Use chaining overflow bucket Avoid doubling directory space 1.1 seek, 76% ~ 81% storage utilization
  • 40. File Structures SNU-OOPSLA Lab. 40 Let’s Review !!!Let’s Review !!! 12.1 Introduction 12.2 How extendible hashing works 12.3 Implementation 12.4 Deletion 12.5 Extendible hashing performance 12.6 Alternative approaches