Hashing

Hashing
Department of Computer Science
Islamia College Univerisity Peshawar

Fall 2012 Semester
BCS course: CS 00 Analysis of Algorithms
Course Instructor: Mr. Zahid

12/30/13

Lecture #9 Adapted from slides by Dr
Onaiza Maqbol

Dictionary
 Holds n records

 What data structure should be used to implement T?
12/30/13

Lecture #9 Adapted from slides by
Dr Onaiza Maqbol

Wednesday, March 18, 2009

Hashing

12/30/13

Onaiza Maqbol


Direct Addressing
 Assumptions



The set of keys
Keys are distinct



Create a table T[0..u-1]



Benefit
 Each operation takes constant time



Drawbacks
 The range of keys can be large

12/30/13

Onaiza Maqbol


Hashing
 Solution


12/30/13

Use a hash function h to map the universe U of all keys into {0, 1, …, m–
1}

Onaiza Maqbol


Hash Table
 The mapped keys are stored into table called hash table
 The table consists of m cells
 A hash table requires much less storage than a direct address
table
 With direct addressing, an element in key k is stored in slot k,
with hashing, this element is stored in slot h(k)
 So the hash function h : U → {0, 1, …., m-1}
 h(k) is also called hash value of key k

12/30/13

Onaiza Maqbol


Hashing Functions - Modulo Function
 Several functions can be used to map keys into a set of integers. The
choice is made on the basis of amount of computation time required,
and simplicity of the computational steps. A common choice is a
modulo function h(x) defined as:
h(k) = k mod m
where k is the key, m is some positive integer and mod denotes the
modulus operator which computes the remainder of key k divided by m.
 It follows that the hash function h(x) maps the set of keys {k1, k2, k3,
…….kn} into a set of integers {0,1,2,……m-1}
 In essence, the modulo function is used to create a hash table of size m
12/30/13

Onaiza Maqbol


Modulo Function (contd…)

12/30/13

Onaiza Maqbol


Hashing Functions - Multiplication
Method

12/30/13

Onaiza Maqbol


Hashing of Strings

12/30/13

Onaiza Maqbol


ASCII Sum Method

12/30/13

Onaiza Maqbol


Radix Method

12/30/13

Onaiza Maqbol


Universal Hashing

12/30/13

Onaiza Maqbol


Universal Hashing (contd…)
s
Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0
to p-1, inclusive, and 0<a<p and 0<=b<p
belongs to the the family of universal functions

mod 6

12/30/13

Onaiza Maqbol


Perfect Hashing

12/30/13

Onaiza Maqbol


Perfect Hashing
0
1
2

m2

a2

b2

4

10

18

S2
60

75

3
…
8



12/30/13

Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function
is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g.
h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table

Onaiza Maqbol


Collisions
 Two or more than two keys may hash to the same slot
 When a record to be inserted maps to an already occupied slot in
T, a collision occurs
 Can we avoid collisions altogether?
 Not if |U| > m
 We need a method to resolve collisions that occur

12/30/13

Onaiza Maqbol


Collisions

12/30/13

Onaiza Maqbol


Collision Resolution
 Two basic approaches to collision resolution are called chained
hashing and open address hashing
 Chained Hashing: In chained hashing the elements of a hash
table are stored in a set of linked lists.
 All colliding elements are kept in one linked list.
 The list head pointers are usually stored in an array.
 Chained hashing is also known as open hashing

 Open Address Hashing: In open address hashing, the hashed
keys are stored in the hash table itself.
 The colliding keys are allocated distinct cells in the table.
 Open address hashing is also referred to as closed hashing
12/30/13

Onaiza Maqbol


Collision Resolution by Chaining
 Records in the same slot are linked into a list

12/30/13

Onaiza Maqbol


Collision Resolution by Chaining (contd…)

12/30/13

Onaiza Maqbol


Analysis of Hashing with Chaining
 How long does it take to search for an element with a given key?
 Let n be the number of keys in the table, and let m be the number
of slots
 Define the load factor of T to be α = n/m = average number of
keys per slot
 Analysis is in terms of α, which can be less than, equal to, or
greater than 1

12/30/13

Onaiza Maqbol


Worst Hashing - Searching



All hash keys are mapped to a single list.



This situation may be referred to as worst distribution of hash keys



In practice, this extreme situation may not arise, but nevertheless, possibility
does exist



Worst case time for searching is thus θ(n), plus time to compute the hash
function



The best search time is θ(1), since the key will be found in the front node



On an average, half the list will be examined. Thus, average search time is θ(n)

12/30/13

Onaiza Maqbol


Worst Hashing - Insertion
 The worst case running time for insertion is θ(1)
 The assumption is that the key is not already present in the table
 To check presence, search of the key is required – As just
mentioned, worst case time of searching is θ(n)
 Thus worst case running time of insertion is θ(n)
 Average cost running time of insertion is also θ(n)
12/30/13

Onaiza Maqbol


Simple Uniform Hashing - Searching
 The keys are uniformly distributed among all the linked lists i.e. it is
assumed that any given element is equally likely to hash into any of the
m slots
 Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that
n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m
 We assume that hash value h(k) can be computed in O(1) time
 So time required to search for an element with key k depends linearly on
the length nh(k) of the list T[h(k)]

12/30/13

Onaiza Maqbol





Two cases





Unsuccessful search
Successful search

Unsuccessful search


Expected time to search unsuccessfully for a key k is the expected time to search to
the end of list T[h(k)], which has the expected length E[nh(k)]= α



Thus total time required is θ(1+ α)

12/30/13

Onaiza Maqbol


Simple Uniform Hashing - Insertion
 In order to find average time for inserting a key, let us consider the case
when kth key is inserted. At that stage, the list has already k-1 keys
distributed uniformly over m linked lists. Thus, prior to insertion of kth
key, the average length of each list is (k-1)/m, as shown in the diagram

12/30/13

Onaiza Maqbol


Simple Uniform Hashing - Insertion


The insertion of new key would require probing of (k-1)/m keys plus the cost of
adding new key.



Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each
operation consumes unit time 1.



The expected cost of inserting a key is obtained by summing over all possible
values of k. Thus, the expected cost I is given by

 The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)
12/30/13

Onaiza Maqbol


 Successful search
 We assume that element x to be searched is equally likely to be any
of the n elements stored in the table
 The number of elements examined is one more than number of
elements that appear before x is x’s list
 Elements before x in the list were all placed after x was inserted
 Total time required for a successful search is 1+ α /2- α /2n = θ(1+
α)
 If n=O(m), α=n/m=O(m)/m=1
 Thus searching takes constant time on average
12/30/13

Onaiza Maqbol


Open Addressing
 All elements are stored in the hash table itself
 In open addressing, the hash table can fill up, so that no further
insertions can be made
 The load factor α can never exceed 1
 Advantage is that open addressing avoids pointers altogether
 Extra memory freed provides hash table with a larger number of
slots for the same amount of memory
12/30/13

Onaiza Maqbol


Insertion
 We successively examine or probe the hash table until we find an
empty slot in which to put the key
 The sequence of positions probed depends upon the key being
inserted
 To determine which points to probe, we extend hash functions to
include the probe number as a second input. Thus hash function
becomes:
h : U x {0, 1, …., m-1} → {0, 1, …., m-1}

12/30/13

Onaiza Maqbol


Pseudo code
HASH-INSERT(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3.
if T[j]=NIL
4.
then T[j]←k
5.
return j
6.
else i ← i+1
7.
until i=m
8. Error “Table full”

12/30/13

Onaiza Maqbol


Linear Probing
 In linear probing the hashed key is incremented by an integer value. In
general the hash function is defined as function
h(k,i)=( h’(k)+ i) mod m,
where h’(k) is an auxiliary hash function and m is the table size.

12/30/13

Onaiza Maqbol


Linear Probing (contd…)

12/30/13

Onaiza Maqbol


Searching
HASH-SEARCH(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3.
if T[j]=k
4.
then return j
5.
i ← i+1
6.
until T[j]=NIL or i=m
7. Return NIL

12/30/13

Onaiza Maqbol


Quadratic Probing

12/30/13

Onaiza Maqbol


Hashing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Hashing

Ähnlich wie Hashing (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Hashing