SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Lecture 6:
           Hashing
         Steven Skiena

Department of Computer Science
 State University of New York
 Stony Brook, NY 11794–4400

http://www.cs.sunysb.edu/∼skiena
Dictionary / Dynamic Set Operations
Perhaps the most important class of data structures maintain
a set of items, indexed by keys.
 • Search(S,k) – A query that, given a set S and a key value
   k, returns a pointer x to an element in S such that key[x]
   = k, or nil if no such element belongs to S.
 • Insert(S,x) – A modifying operation that augments the set
   S with the element x.
 • Delete(S,x) – Given a pointer x to an element in the set S,
   remove x from S. Observe we are given a pointer to an
   element x, not a key value.
• Min(S), Max(S) – Returns the element of the totally
   ordered set S which has the smallest (largest) key.
 • Next(S,x), Previous(S,x) – Given an element x whose key
   is from a totally ordered set S, returns the next largest
   (smallest) element in S, or NIL if x is the maximum
   (minimum) element.
There are a variety of implementations of these dictionary
operations, each of which yield different time bounds for
various operations.
Problem of the Day
You are given the task of reading in n numbers and then
printing them out in sorted order. Suppose you have access
to a balanced dictionary data structure, which supports each
of the operations search, insert, delete, minimum, maximum,
successor, and predecessor in O(log n) time.
 • Explain how you can use this dictionary to sort in
   O(n log n) time using only the following abstract opera-
   tions: minimum, successor, insert, search.
• Explain how you can use this dictionary to sort in
  O(n log n) time using only the following abstract opera-
  tions: minimum, insert, delete, search.




• Explain how you can use this dictionary to sort in
  O(n log n) time using only the following abstract opera-
  tions: insert and in-order traversal.
Hash Tables
Hash tables are a very practical way to maintain a dictionary.
The idea is simply that looking an item up in an array is Θ(1)
once you have its index.
A hash function is a mathematical function which maps keys
to integers.
Collisions
Collisions are the set of keys mapped to the same bucket.
If the keys are uniformly distributed, then each bucket should
contain very few keys!
The resulting short lists are easily searched!
              0   1   2   3   4   5   6   7   8   9   10   11
Hash Functions
It is the job of the hash function to map keys to integers. A
good hash function:
1. Is cheap to evaluate
2. Tends to use all positions from 0 . . . M with uniform
   frequency.
The first step is usually to map the key to a big integer, for
example
                   keylength
              h=               128i × char(key[i])
                     i=0
Modular Arithmetic
This large number must be reduced to an integer whose size
is between 1 and the size of our hash table.
One way is by h(k) = k mod M , where M is best a large
prime not too close to 2i − 1, which would just mask off the
high bits.
This works on the same principle as a roulette wheel!
Bad Hash Functions
The first three digits of the Social Security Number
      0     1    2    3    4    5     6   7     8     9
Good Hash Functions
The last three digits of the Social Security Number
      0     1    2    3    4     5    6    7    8     9
Performance on Set Operations
With either chaining or open addressing:
 • Search - O(1) expected, O(n) worst case
 • Insert - O(1) expected, O(n) worst case
 • Delete - O(1) expected, O(n) worst case
 • Min, Max and Predecessor, Successor Θ(n + m) expected
   and worst case
Pragmatically, a hash table is often the best data structure
to maintain a dictionary. However, the worst-case time is
unpredictable.
The best worst-case bounds come from balanced binary
trees.
Substring Pattern Matching
Input: A text string t and a pattern string p.
Problem: Does t contain the pattern p as a substring, and if
so where?
E.g: Is Skiena in the Bible?
Brute Force Search
The simplest algorithm to search for the presence of pattern
string p in text t overlays the pattern string at every position in
the text, and checks whether every pattern character matches
the corresponding text character.
This runs in O(nm) time, where n = |t| and m = |p|.
String Matching via Hashing
Suppose we compute a given hash function on both the
pattern string p and the m-character substring starting from
the ith position of t.
If these two strings are identical, clearly the resulting hash
values will be the same.
If the two strings are different, the hash values will almost
certainly be different.
These false positives should be so rare that we can easily
spend the O(m) time it take to explicitly check the identity
of two strings whenever the hash values agree.
The Catch
This reduces string matching to n − m + 2 hash value
computations (the n − m + 1 windows of t, plus one hash
of p), plus what should be a very small number of O(m) time
verification steps.
The catch is that it takes O(m) time to compute a hash func-
tion on an m-character string, and O(n) such computations
seems to leave us with an O(mn) algorithm again.
The Trick
Look closely at our string hash function, applied to the m
characters starting from the jth position of string S:
                        m−1
            H(S, j) =          αm−(i+1) × char(si+j )
                         i=0
A little algebra reveals that
     H(S, j + 1) = (H(S, j) − αm−1char(sj ))α + sj+m
Thus once we know the hash value from the j position, we
can find the hash value from the (j + 1)st position for the
cost of two multiplications, one addition, and one subtraction.
This can be done in constant time.
Hashing, Hashing, and Hashing
Udi Manber says that the three most important algorithms at
Yahoo are hashing, hashing, and hashing.
Hashing has a variety of clever applications beyond just
speeding up search, by giving you a short but distinctive
representation of a larger document.
 • Is this new document different from the rest in a large
   corpus? – Hash the new document, and compare it to
   the hash codes of corpus.
 • How can I convince you that a file isn’t changed? – Check
   if the cryptographic hash code of the file you give me
   today is the same as that of the original. Any changes
   to the file will change the hash code.

Weitere ähnliche Inhalte

Was ist angesagt? (20)

Hashing in datastructure
Hashing in datastructureHashing in datastructure
Hashing in datastructure
 
Algorithm chapter 7
Algorithm chapter 7Algorithm chapter 7
Algorithm chapter 7
 
Hashing Algorithm
Hashing AlgorithmHashing Algorithm
Hashing Algorithm
 
Data Structure and Algorithms Hashing
Data Structure and Algorithms HashingData Structure and Algorithms Hashing
Data Structure and Algorithms Hashing
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 
Hash tables
Hash tablesHash tables
Hash tables
 
18 hashing
18 hashing18 hashing
18 hashing
 
Rehashing
RehashingRehashing
Rehashing
 
Hashing
HashingHashing
Hashing
 
Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
 
Hash tables
Hash tablesHash tables
Hash tables
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
Hashing
HashingHashing
Hashing
 
08 Hash Tables
08 Hash Tables08 Hash Tables
08 Hash Tables
 
Hash table
Hash tableHash table
Hash table
 
Hashing
HashingHashing
Hashing
 
Hashing
HashingHashing
Hashing
 
Hash presentation
Hash presentationHash presentation
Hash presentation
 
Hashing
HashingHashing
Hashing
 
Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashing
 

Andere mochten auch

Fcv rep a_berg
Fcv rep a_bergFcv rep a_berg
Fcv rep a_bergzukun
 
Fcv rep tenenbaum
Fcv rep tenenbaumFcv rep tenenbaum
Fcv rep tenenbaumzukun
 
Fcv appli science_golland
Fcv appli science_gollandFcv appli science_golland
Fcv appli science_gollandzukun
 
Fcv scene efros
Fcv scene efrosFcv scene efros
Fcv scene efroszukun
 
Fcv scene lazebnik
Fcv scene lazebnikFcv scene lazebnik
Fcv scene lazebnikzukun
 
Fcv taxo chellappa
Fcv taxo chellappaFcv taxo chellappa
Fcv taxo chellappazukun
 
Fcv acad ind_martin
Fcv acad ind_martinFcv acad ind_martin
Fcv acad ind_martinzukun
 
Fcv hum mach_belongie
Fcv hum mach_belongieFcv hum mach_belongie
Fcv hum mach_belongiezukun
 

Andere mochten auch (8)

Fcv rep a_berg
Fcv rep a_bergFcv rep a_berg
Fcv rep a_berg
 
Fcv rep tenenbaum
Fcv rep tenenbaumFcv rep tenenbaum
Fcv rep tenenbaum
 
Fcv appli science_golland
Fcv appli science_gollandFcv appli science_golland
Fcv appli science_golland
 
Fcv scene efros
Fcv scene efrosFcv scene efros
Fcv scene efros
 
Fcv scene lazebnik
Fcv scene lazebnikFcv scene lazebnik
Fcv scene lazebnik
 
Fcv taxo chellappa
Fcv taxo chellappaFcv taxo chellappa
Fcv taxo chellappa
 
Fcv acad ind_martin
Fcv acad ind_martinFcv acad ind_martin
Fcv acad ind_martin
 
Fcv hum mach_belongie
Fcv hum mach_belongieFcv hum mach_belongie
Fcv hum mach_belongie
 

Ähnlich wie Skiena algorithm 2007 lecture06 sorting

Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptxAgonySingh
 
Hashing.pptx
Hashing.pptxHashing.pptx
Hashing.pptxkratika64
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniyaTutorialsDuniya.com
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxjainaaru59
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingSam Light
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec IISajid Marwat
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15sumitbardhan
 
Resources
ResourcesResources
Resourcesbutest
 
Radix Sorting With No Extra Space
Radix Sorting With No Extra SpaceRadix Sorting With No Extra Space
Radix Sorting With No Extra Spacegueste5dc45
 

Ähnlich wie Skiena algorithm 2007 lecture06 sorting (20)

Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Randamization.pdf
Randamization.pdfRandamization.pdf
Randamization.pdf
 
Hashing.pptx
Hashing.pptxHashing.pptx
Hashing.pptx
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniya
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - Hashing
 
13-hashing.ppt
13-hashing.ppt13-hashing.ppt
13-hashing.ppt
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec II
 
Lec5
Lec5Lec5
Lec5
 
session 15 hashing.pptx
session 15   hashing.pptxsession 15   hashing.pptx
session 15 hashing.pptx
 
Unit viii searching and hashing
Unit   viii searching and hashing Unit   viii searching and hashing
Unit viii searching and hashing
 
Lec4
Lec4Lec4
Lec4
 
Cwkaa 2010
Cwkaa 2010Cwkaa 2010
Cwkaa 2010
 
Data Mining Lecture_6.pptx
Data Mining Lecture_6.pptxData Mining Lecture_6.pptx
Data Mining Lecture_6.pptx
 
Lec5
Lec5Lec5
Lec5
 
Hashing 1
Hashing 1Hashing 1
Hashing 1
 
358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15358 33 powerpoint-slides_15-hashing-collision_chapter-15
358 33 powerpoint-slides_15-hashing-collision_chapter-15
 
Resources
ResourcesResources
Resources
 
Hashing .pptx
Hashing .pptxHashing .pptx
Hashing .pptx
 
Radix Sorting With No Extra Space
Radix Sorting With No Extra SpaceRadix Sorting With No Extra Space
Radix Sorting With No Extra Space
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Skiena algorithm 2007 lecture06 sorting

  • 1. Lecture 6: Hashing Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.sunysb.edu/∼skiena
  • 2. Dictionary / Dynamic Set Operations Perhaps the most important class of data structures maintain a set of items, indexed by keys. • Search(S,k) – A query that, given a set S and a key value k, returns a pointer x to an element in S such that key[x] = k, or nil if no such element belongs to S. • Insert(S,x) – A modifying operation that augments the set S with the element x. • Delete(S,x) – Given a pointer x to an element in the set S, remove x from S. Observe we are given a pointer to an element x, not a key value.
  • 3. • Min(S), Max(S) – Returns the element of the totally ordered set S which has the smallest (largest) key. • Next(S,x), Previous(S,x) – Given an element x whose key is from a totally ordered set S, returns the next largest (smallest) element in S, or NIL if x is the maximum (minimum) element. There are a variety of implementations of these dictionary operations, each of which yield different time bounds for various operations.
  • 4. Problem of the Day You are given the task of reading in n numbers and then printing them out in sorted order. Suppose you have access to a balanced dictionary data structure, which supports each of the operations search, insert, delete, minimum, maximum, successor, and predecessor in O(log n) time. • Explain how you can use this dictionary to sort in O(n log n) time using only the following abstract opera- tions: minimum, successor, insert, search.
  • 5. • Explain how you can use this dictionary to sort in O(n log n) time using only the following abstract opera- tions: minimum, insert, delete, search. • Explain how you can use this dictionary to sort in O(n log n) time using only the following abstract opera- tions: insert and in-order traversal.
  • 6. Hash Tables Hash tables are a very practical way to maintain a dictionary. The idea is simply that looking an item up in an array is Θ(1) once you have its index. A hash function is a mathematical function which maps keys to integers.
  • 7. Collisions Collisions are the set of keys mapped to the same bucket. If the keys are uniformly distributed, then each bucket should contain very few keys! The resulting short lists are easily searched! 0 1 2 3 4 5 6 7 8 9 10 11
  • 8. Hash Functions It is the job of the hash function to map keys to integers. A good hash function: 1. Is cheap to evaluate 2. Tends to use all positions from 0 . . . M with uniform frequency. The first step is usually to map the key to a big integer, for example keylength h= 128i × char(key[i]) i=0
  • 9. Modular Arithmetic This large number must be reduced to an integer whose size is between 1 and the size of our hash table. One way is by h(k) = k mod M , where M is best a large prime not too close to 2i − 1, which would just mask off the high bits. This works on the same principle as a roulette wheel!
  • 10. Bad Hash Functions The first three digits of the Social Security Number 0 1 2 3 4 5 6 7 8 9
  • 11. Good Hash Functions The last three digits of the Social Security Number 0 1 2 3 4 5 6 7 8 9
  • 12. Performance on Set Operations With either chaining or open addressing: • Search - O(1) expected, O(n) worst case • Insert - O(1) expected, O(n) worst case • Delete - O(1) expected, O(n) worst case • Min, Max and Predecessor, Successor Θ(n + m) expected and worst case Pragmatically, a hash table is often the best data structure to maintain a dictionary. However, the worst-case time is unpredictable. The best worst-case bounds come from balanced binary trees.
  • 13. Substring Pattern Matching Input: A text string t and a pattern string p. Problem: Does t contain the pattern p as a substring, and if so where? E.g: Is Skiena in the Bible?
  • 14. Brute Force Search The simplest algorithm to search for the presence of pattern string p in text t overlays the pattern string at every position in the text, and checks whether every pattern character matches the corresponding text character. This runs in O(nm) time, where n = |t| and m = |p|.
  • 15. String Matching via Hashing Suppose we compute a given hash function on both the pattern string p and the m-character substring starting from the ith position of t. If these two strings are identical, clearly the resulting hash values will be the same. If the two strings are different, the hash values will almost certainly be different. These false positives should be so rare that we can easily spend the O(m) time it take to explicitly check the identity of two strings whenever the hash values agree.
  • 16. The Catch This reduces string matching to n − m + 2 hash value computations (the n − m + 1 windows of t, plus one hash of p), plus what should be a very small number of O(m) time verification steps. The catch is that it takes O(m) time to compute a hash func- tion on an m-character string, and O(n) such computations seems to leave us with an O(mn) algorithm again.
  • 17. The Trick Look closely at our string hash function, applied to the m characters starting from the jth position of string S: m−1 H(S, j) = αm−(i+1) × char(si+j ) i=0 A little algebra reveals that H(S, j + 1) = (H(S, j) − αm−1char(sj ))α + sj+m Thus once we know the hash value from the j position, we can find the hash value from the (j + 1)st position for the cost of two multiplications, one addition, and one subtraction. This can be done in constant time.
  • 18. Hashing, Hashing, and Hashing Udi Manber says that the three most important algorithms at Yahoo are hashing, hashing, and hashing. Hashing has a variety of clever applications beyond just speeding up search, by giving you a short but distinctive representation of a larger document. • Is this new document different from the rest in a large corpus? – Hash the new document, and compare it to the hash codes of corpus. • How can I convince you that a file isn’t changed? – Check if the cryptographic hash code of the file you give me today is the same as that of the original. Any changes to the file will change the hash code.