SlideShare a Scribd company logo
1 of 21
Minimum Edit Distance
K.A.S.H. Kulathilake
B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
Introduction
• Much of natural language processing is concerned with
measuring how similar two strings are.
• For example in spelling correction, the user typed
some erroneous string—let’s say graffe–and we want
to know what the user meant.
• The user probably intended a word that is similar to
graffe.
• Among candidate similar words, the word giraffe,
which differs by only one letter from graffe, seems
intuitively to be more similar than, say grail or graf,
which differ in more letters.
Introduction (Cont…)
• Edit distance gives us a way to quantify both
of these intuitions about string similarity.
• More formally, the minimum edit distance
between two strings is defined as the
minimum number of editing operations
(operations like insertion, deletion,
substitution) needed to transform one string
into another.
Introduction (Cont…)
• The gap between intention and execution, for
example, is 5 (delete an i, substitute e for n,
substitute x for t, insert c, substitute u for n).
• It’s much easier to see this by looking at the most
important visualization for string distances, an
alignment between the two strings.
Levenshtein Distance
• We can also assign a particular cost or weight to each of these
operations.
• The Levenshtein distance between two sequences is the simplest
weighting factor in which each of the three operations has a cost of
1 (Levenshtein, 1966)—we assume that the substitution of a letter
for itself, for example, t for t, has zero cost.
• The Levenshtein distance between intention and execution is 5.
• Levenshtein also proposed an alternative version of his metric in
which each insertion or deletion has a cost of 1 and substitutions
are not allowed. (This is equivalent to allowing substitution, but
giving each substitution a cost of 2 since any substitution can be
represented by one insertion and one deletion).
• Using this version, the Levenshtein distance between intention and
execution is 8.
Levenshtein distance - example
• distance(“William Cohen”, “Willliam Cohon”)
W I L L I A M _ C O H E N
W I L L L I A M _ C O H O N
C C C C I C C C C C C C S C
0 0 0 0 1 1 1 1 1 1 1 1 2 2
s
t
op
cost
Levenshtein distance - example
• distance(“William Cohen”, “Willliam Cohon”)
W I L L I A M _ C O H E N
W I L L L I A M _ C O H O N
C C C C I C C C C C C C S C
0 0 0 0 1 1 1 1 1 1 1 1 2 2
s
t
op
cost
gap
Dynamic Programming
• In order to find the edit distance we can use
dynamic programming.
• Dynamic programming is the name for a class
of algorithms, first introduced by Bellman
(1957), that apply a table-driven method to
solve problems by combining solutions to sub-
problems.
Edit Distance of Two Strings
• Let’s first define the minimum edit distance
between two strings.
• Given two strings, the source string X of length
n, and target string Y of length m, we’ll define
D(i, j) as the edit distance between X[1::i] and
Y[1:: j], i.e., the first i characters of X and the
first j characters of Y.
• The edit distance between X and Y is thus
D(n,m).
Edit Distance of Two Strings (Cont…)
• We’ll use dynamic programming to compute D(n,m)
bottom up, combining solutions to sub-problems.
• In the base case, with a source substring of length i but
an empty target string, going from i characters to 0
requires i deletes.
• With a target substring of length j but an empty source
going from 0 characters to j characters requires j
inserts.
• Having computed D(i, j) for small i, j we then compute
larger D(i, j) based on previously computed smaller
values.
Edit Distance of Two Strings (Cont…)
• The value of D(i, j) is computed by taking the
minimum of the three possible paths through
the matrix which arrive there:
Edit Distance of Two Strings (Cont…)
• If we assume the version of Levenshtein
distance in which the insertions and deletions
each have a cost of 1 (ins-cost() = del-cost() =
1), and substitutions have a cost of 2 (except
substitution of identical letters have zero
cost), the computation for D(i, j) becomes:
Edit Distance of Two Strings (Cont…)
Edit Distance of Two Strings (Cont…)
• Knowing the minimum edit distance is useful for algorithms
like finding potential spelling error corrections.
• But the edit distance algorithm is important in another
way; with a small change, it can also provide the minimum
cost alignment between two strings.
• Aligning two strings is useful throughout speech and
language processing.
• In speech recognition, minimum edit distance alignment is
used to compute the word error rate.
• Alignment plays a role in machine translation, in which
sentences in a parallel corpus (a corpus with a text in two
languages) need to be matched to each other.
Edit Distance of Two Strings (Cont…)
• To extend the edit distance algorithm to produce an
alignment, we can start by visualizing an alignment as a
path through the edit distance matrix.
• Following matrix shows this path with the boldfaced
cell.
• Each boldfaced cell represents an alignment of a pair of
letters in the two strings.
• If two boldfaced cells occur in the same row, there will
be an insertion in going from the source to the target;
two boldfaced cells in the same column indicate a
deletion.
Edit Distance of Two Strings (Cont…)
Edit Distance of Two Strings (Cont…)
• How to compute alignment path?
– The computation proceeds in two steps.
– In the first step, we augment the minimum edit
distance algorithm to store back pointers in each cell.
– The back pointer from a cell points to the previous cell
(or cells) that we came from in entering the current
cell.
– Some cells have multiple back pointers because the
minimum extension could have come from multiple
previous cells.
Edit Distance of Two Strings (Cont…)
– In the second step, we perform a back trace.
– In a back trace, we start from the last cell (at the
final row and column), and follow the pointers
back through the dynamic programming matrix.
– Each complete path between the final cell and the
initial cell is a minimum distance alignment.
Edit Distance of Two Strings (Cont…)
Later What We Do?
• Instead of computing the “minimum edit
distance” between two strings, Viterbi
computes the “maximum probability
alignment” of one string with another.
Question?
• How to handle transposition?
• E.g the < -- > hte

More Related Content

What's hot

Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Semantic net in AI
Semantic net in AISemantic net in AI
Semantic net in AIShahDhruv21
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfDeptii Chaudhari
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesDr. C.V. Suresh Babu
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfDeptii Chaudhari
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translationAkshaya Arunan
 
ProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) IntroductionProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) Introductionwahab khan
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AIVishal Singh
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesankit_ppt
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligenceharshita virwani
 

What's hot (20)

Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Semantic net in AI
Semantic net in AISemantic net in AI
Semantic net in AI
 
Specification-of-tokens
Specification-of-tokensSpecification-of-tokens
Specification-of-tokens
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdf
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdf
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
 
ProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) IntroductionProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) Introduction
 
Knowledge representation in AI
Knowledge representation in AIKnowledge representation in AI
Knowledge representation in AI
 
Perceptron
PerceptronPerceptron
Perceptron
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligence
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
Phases of Compiler
Phases of CompilerPhases of Compiler
Phases of Compiler
 

Similar to NLP_KASHK:Minimum Edit Distance

Design and analysis of Algorithms - Lecture 08 (1).ppt
Design and analysis of Algorithms - Lecture 08 (1).pptDesign and analysis of Algorithms - Lecture 08 (1).ppt
Design and analysis of Algorithms - Lecture 08 (1).pptZeenaJaba
 
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Tanjarul Islam Mishu
 
Dd 160506122947-160630175555-160701121726
Dd 160506122947-160630175555-160701121726Dd 160506122947-160630175555-160701121726
Dd 160506122947-160630175555-160701121726marangburu42
 
Multimedia lossy compression algorithms
Multimedia lossy compression algorithmsMultimedia lossy compression algorithms
Multimedia lossy compression algorithmsMazin Alwaaly
 
2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptxAnum Zehra
 
7 convolutional codes
7 convolutional codes7 convolutional codes
7 convolutional codesVarun Raj
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithmiosrjce
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftAlexanderCominsky
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distanceananth
 
Non standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded waveletsNon standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded waveletseSAT Publishing House
 
Non standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded waveletsNon standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded waveletseSAT Journals
 
Wavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGAWavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGADr. Mohieddin Moradi
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKnoldus Inc.
 

Similar to NLP_KASHK:Minimum Edit Distance (20)

Design and analysis of Algorithms - Lecture 08 (1).ppt
Design and analysis of Algorithms - Lecture 08 (1).pptDesign and analysis of Algorithms - Lecture 08 (1).ppt
Design and analysis of Algorithms - Lecture 08 (1).ppt
 
Quantum Noise and Error Correction
Quantum Noise and Error CorrectionQuantum Noise and Error Correction
Quantum Noise and Error Correction
 
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
 
Floyd aaaaaa
Floyd aaaaaaFloyd aaaaaa
Floyd aaaaaa
 
Dd 160506122947-160630175555-160701121726
Dd 160506122947-160630175555-160701121726Dd 160506122947-160630175555-160701121726
Dd 160506122947-160630175555-160701121726
 
Multimedia lossy compression algorithms
Multimedia lossy compression algorithmsMultimedia lossy compression algorithms
Multimedia lossy compression algorithms
 
2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx2_EditDistance_Jan_08_2020.pptx
2_EditDistance_Jan_08_2020.pptx
 
7 convolutional codes
7 convolutional codes7 convolutional codes
7 convolutional codes
 
M017327378
M017327378M017327378
M017327378
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithm
 
Lossy
LossyLossy
Lossy
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
MS Project
MS ProjectMS Project
MS Project
 
poster
posterposter
poster
 
Non standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded waveletsNon standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded wavelets
 
Non standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded waveletsNon standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded wavelets
 
Wavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGAWavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGA
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 

More from Hemantha Kulathilake

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar Hemantha Kulathilake
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishHemantha Kulathilake
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation Hemantha Kulathilake
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops Hemantha Kulathilake
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingHemantha Kulathilake
 

More from Hemantha Kulathilake (20)

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 

NLP_KASHK:Minimum Edit Distance

  • 1. Minimum Edit Distance K.A.S.H. Kulathilake B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
  • 2. Introduction • Much of natural language processing is concerned with measuring how similar two strings are. • For example in spelling correction, the user typed some erroneous string—let’s say graffe–and we want to know what the user meant. • The user probably intended a word that is similar to graffe. • Among candidate similar words, the word giraffe, which differs by only one letter from graffe, seems intuitively to be more similar than, say grail or graf, which differ in more letters.
  • 3. Introduction (Cont…) • Edit distance gives us a way to quantify both of these intuitions about string similarity. • More formally, the minimum edit distance between two strings is defined as the minimum number of editing operations (operations like insertion, deletion, substitution) needed to transform one string into another.
  • 4. Introduction (Cont…) • The gap between intention and execution, for example, is 5 (delete an i, substitute e for n, substitute x for t, insert c, substitute u for n). • It’s much easier to see this by looking at the most important visualization for string distances, an alignment between the two strings.
  • 5. Levenshtein Distance • We can also assign a particular cost or weight to each of these operations. • The Levenshtein distance between two sequences is the simplest weighting factor in which each of the three operations has a cost of 1 (Levenshtein, 1966)—we assume that the substitution of a letter for itself, for example, t for t, has zero cost. • The Levenshtein distance between intention and execution is 5. • Levenshtein also proposed an alternative version of his metric in which each insertion or deletion has a cost of 1 and substitutions are not allowed. (This is equivalent to allowing substitution, but giving each substitution a cost of 2 since any substitution can be represented by one insertion and one deletion). • Using this version, the Levenshtein distance between intention and execution is 8.
  • 6. Levenshtein distance - example • distance(“William Cohen”, “Willliam Cohon”) W I L L I A M _ C O H E N W I L L L I A M _ C O H O N C C C C I C C C C C C C S C 0 0 0 0 1 1 1 1 1 1 1 1 2 2 s t op cost
  • 7. Levenshtein distance - example • distance(“William Cohen”, “Willliam Cohon”) W I L L I A M _ C O H E N W I L L L I A M _ C O H O N C C C C I C C C C C C C S C 0 0 0 0 1 1 1 1 1 1 1 1 2 2 s t op cost gap
  • 8. Dynamic Programming • In order to find the edit distance we can use dynamic programming. • Dynamic programming is the name for a class of algorithms, first introduced by Bellman (1957), that apply a table-driven method to solve problems by combining solutions to sub- problems.
  • 9. Edit Distance of Two Strings • Let’s first define the minimum edit distance between two strings. • Given two strings, the source string X of length n, and target string Y of length m, we’ll define D(i, j) as the edit distance between X[1::i] and Y[1:: j], i.e., the first i characters of X and the first j characters of Y. • The edit distance between X and Y is thus D(n,m).
  • 10. Edit Distance of Two Strings (Cont…) • We’ll use dynamic programming to compute D(n,m) bottom up, combining solutions to sub-problems. • In the base case, with a source substring of length i but an empty target string, going from i characters to 0 requires i deletes. • With a target substring of length j but an empty source going from 0 characters to j characters requires j inserts. • Having computed D(i, j) for small i, j we then compute larger D(i, j) based on previously computed smaller values.
  • 11. Edit Distance of Two Strings (Cont…) • The value of D(i, j) is computed by taking the minimum of the three possible paths through the matrix which arrive there:
  • 12. Edit Distance of Two Strings (Cont…) • If we assume the version of Levenshtein distance in which the insertions and deletions each have a cost of 1 (ins-cost() = del-cost() = 1), and substitutions have a cost of 2 (except substitution of identical letters have zero cost), the computation for D(i, j) becomes:
  • 13. Edit Distance of Two Strings (Cont…)
  • 14. Edit Distance of Two Strings (Cont…) • Knowing the minimum edit distance is useful for algorithms like finding potential spelling error corrections. • But the edit distance algorithm is important in another way; with a small change, it can also provide the minimum cost alignment between two strings. • Aligning two strings is useful throughout speech and language processing. • In speech recognition, minimum edit distance alignment is used to compute the word error rate. • Alignment plays a role in machine translation, in which sentences in a parallel corpus (a corpus with a text in two languages) need to be matched to each other.
  • 15. Edit Distance of Two Strings (Cont…) • To extend the edit distance algorithm to produce an alignment, we can start by visualizing an alignment as a path through the edit distance matrix. • Following matrix shows this path with the boldfaced cell. • Each boldfaced cell represents an alignment of a pair of letters in the two strings. • If two boldfaced cells occur in the same row, there will be an insertion in going from the source to the target; two boldfaced cells in the same column indicate a deletion.
  • 16. Edit Distance of Two Strings (Cont…)
  • 17. Edit Distance of Two Strings (Cont…) • How to compute alignment path? – The computation proceeds in two steps. – In the first step, we augment the minimum edit distance algorithm to store back pointers in each cell. – The back pointer from a cell points to the previous cell (or cells) that we came from in entering the current cell. – Some cells have multiple back pointers because the minimum extension could have come from multiple previous cells.
  • 18. Edit Distance of Two Strings (Cont…) – In the second step, we perform a back trace. – In a back trace, we start from the last cell (at the final row and column), and follow the pointers back through the dynamic programming matrix. – Each complete path between the final cell and the initial cell is a minimum distance alignment.
  • 19. Edit Distance of Two Strings (Cont…)
  • 20. Later What We Do? • Instead of computing the “minimum edit distance” between two strings, Viterbi computes the “maximum probability alignment” of one string with another.
  • 21. Question? • How to handle transposition? • E.g the < -- > hte