2. Objectives
2
What is Substring search problem
Definition of the Rabin-Karp algorithm
How Rabin-Karp works
An example to illustrate Rabin-Karp
Complexity Analysis
Real Life applications
3. What is Substring search Problem
3
We assume that the text is an array T [1..N] of length n and that the pattern is an array P [1..M]
of length m, where m << n.
We also assume that the elements of P and T are characters in the finite alphabet S.
(e.g., S = {a,b} We want to find P = ‘aab’ in T = ‘abbaabaaaab’)
4. A string search algorithm which compares a string's hash values, rather than the strings
themselves.
For efficiency, the hash value of the next position in the text is easily computed from the
hash value of the current position.
Definition of the Rabin-Karp Algorithm
4
5. How Rabin-Karp Works
5
Let characters in both arrays T and P be digits in radix-S notation. S = (0,1,...,9)
Let p be the value of the characters in P
Choose a prime number q such that fits within a computer word to speed
computations.
Compute (p mod q)
The value of p mod q is what we will be using to find all matches of the pattern P in T.
6. How Rabin-Karp Works(Contd.)
6
Compute (T[s+1, .., s+m] mod q) for s = 0 .. n-m
Test against P only those sequences in T having the same (mod q) value
(T[s+1, .., s+m] mod q) can be incrementally computed by subtracting the high-order digit,
shifting, adding the low-order bit, all in modulo q arithmetic.
7. Algorithm
7
RABIN-KARP-MATCHER(T,P,d,q)
1. n = T.length
2. m= P.length
3. h = d^(m-1) mod q
4. p = 0
5. t0 = 0
6. for i = 1 to m // preprocessing
7. p = (dp + p[i]) mod q
8. t0 = (dt0 + p[i]) mod q
9. for s = 0 to n-m // matching
10. if p == ts
11. if P[1 . . . . M] == T[ s+1 . . . . s+m]
12. print “Pattern occurs with shift” s
13. if s<(n + m)
14. ts+1 = (d(ts – T[s+1]h)+T[s+m+1]) mod q
8. An Example to illustrate Rabin-Karp
8
• Given T = 31415926535 and P = 26
• We choose q = 11
• P mod q = 26 mod 11 = 4
13 14 95 62 35 5
13 14 95 62 35 5
14 mod 11 = 3 not equal to 4
31 mod 11 = 9 not equal to 4
13 14 95 62 35 5
41 mod 11 = 8 not equal to 4
9. An Example to illustrate Rabin-Karp(contd.)
9
13 14 95 62 35 5
15 mod 11 = 4 equal to 4 -> spurious hit
13 14 95 62 35 5
59 mod 11 = 4 equal to 4 -> spurious hit
13 14 95 62 35 5
92 mod 11 = 4 equal to 4 -> spurious hit
13 14 95 62 35 5
26 mod 11 = 4 equal to 4 -> an exact match!!
13 14 95 62 35 5
65 mod 11 = 10 not equal to 4
10. An Example to illustrate Rabin-Karp(contd.)
10
13 14 95 62 35 5
53 mod 11 = 9 not equal to 4
13 14 95 62 35 5
35 mod 11 = 2 not equal to 4
As we can see, when a match is found, further testing is done to insure that a match has
indeed been found.
11. Complexity Analysis 11
RABIN-KARP-MATCHER(T,P,d,q)
1. n = T.length
2. m= P.length
3. h = d^(m-1) mod q O(1)
4. p = 0
5. t0 = 0
6. for i = 1 to m O(m)
7. p = (dp + p[i]) mod q
8. t0 = (dt0 + p[i]) mod q
9. for s = 0 to n-m O((n-m+1)m)
10. if p == ts
11. if P[1 . . . . M] == T[ s+1 . . . . s+m]
12. print “Pattern occurs with shift” s
13. if s<n + m
14. ts+1 = (d(ts – T[s+1]h)+T[s+m+1]) mod q
12. Complexity Analysis Result
12
The running time of the Rabin-Karp algorithm in the worst-case scenario is
O((n-m+1))m but it has a good average-case running time.
If the expected number of valid shifts is small O(1) and the prime q is chosen to be
quite large, then the Rabin-Karp algorithm can be expected to run in time O(n+m) plus
the time to required to process spurious hits.
13. Real Time Applications
13
Bioinformatics
• Used in looking for similarities of two or more proteins; i.e. high sequence
similarity usually implies significant structural or functional similarity.
Example:
Hb A_human
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL
G+ +VK+HGKKV A++++++AH+ D++ ++ +++LS+LH KL
Hb B_human
GNPKVKAHGKKVLGAFSDGLAH LDNLKGTF ATLSELH CDKL
+ similar amino acids
14. 14
Good for plagiarism, because it can deal with multiple pattern matching!
With a good hashing function it can be quite effective and it’s easy to implement!
Real Time Applications
15. References
15
.
Cormen, Thomas S., et al. Introduction to Algorithms. 3rd ed. Boston: MIT Press, 2
Go2Net Website for String Matching Algorithms
[www.go2net.com/internet/deep/1997/05/14/body.html]
Yummy Yummy Animations Site for an animation of the Rabin-Karp algorithm at work
[www.mills.edu/ACAD_INFO/MCS/CS/S00MCS125/String.Matching.Algorithms/animations.html]
National Institute of Standards and Technology Dictionary of Algorithms, Data Structures, and Problems
[hissa.nist.gov/dads/HTML/rabinKarpAlgo.html]
Multi-Pattern String Matching with Very Large Pattern Sets
[https://www.dcc.uchile.cl/~gnavarro/workshop07/lsalmela.pdf]