SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Spell Correction Systems for E-commerce engines
Anjan Goswami HuiZhong Duan
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 1 / 31
The Spell correction problem
Rich literature [KCG90, Pet80].
Active research area [CB04].
Combination of NLP, Machine Learning [DH11, BB01, LDZ12] and
Systems problems [Kuk92].
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 2 / 31
Spell correction for e-commerce
Critical site feature for e-commerce.
Impact of ML based spell correction
Adds revenue.
Reduces bounce rate.
Reduces null Results.
Departments such as pharmacy can have huge gain in revenue with
Spell Correction.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 3 / 31
Spell correction for e-commerce
Science part is same as any other large scale spell correction systems.
Demand and supply side corpus.
Conversion focus.
User Interfaces.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 4 / 31
Spell correction Evaluation
Accuracy for misspelled queries.
Accuracy for correctly spelled queries.
Business metrics.
Coverage.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 5 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 6 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 7 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 8 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 9 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 10 / 31
Error statistics
Approximately 26% queries have spelling error in web queries [JM].
E-com data can be expected to be similar.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 11 / 31
Error Types
Typographic errors: Covr ← Cover
Cognitive errors: Visio Tv ← Vizio Tv
Non-english word errors: X345678 ← X345677
Contextual errors: life of Pie ← Life of Pi
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 12 / 31
Challenges
General Challenges
Large candidate pool: queries
Open dictionary: all terms are feasible
Efficiency: happens before search is executed
User behavior: query formulation is different from typical writing
Devices: different device may cause different types of typos
Under-correction: even a term is in correct form, it may need
correction
Over-correction: a term that doesn’t appear correct could still be
good search term
Languages: Different languages have different challenges.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 13 / 31
Query Spelling Challenges
Special Challenges (and Opportunities) in e-Commerce
optimization target: linguistic correct or conversion?
unique dictionary: model numbers, etc.
high cost for over-correction
availability of inventory data
availability of conversion data
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 14 / 31
General problems
Error modeling
Candidate generation
Ranking and selection of the best candidate.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 15 / 31
Modeling
A Noisy Channel Framework
Given user input query q, for every candidate correction c, compute the
conditional probability p(c|q)
p(c|q) =
p(q|c) · p(c)
p(q)
∝ p(q|c) · p(c) (1)
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 16 / 31
Modeling
A Noisy Channel Framework (cont.)
Source model p(c)
Captures: how likely user will pick query c in the first place
Typically: language model
Rationale: common phrases have high probabilities
Error model p(q|c)
Captures: how likely c is misspelled as q
Straightforward model: edit distance
Rationale: misspelled query should not be too different from original
query
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 17 / 31
Modeling
A Noisy Channel Framework (cont.)
More on Source model p(c)
Linguistic correction is important
Should also reflect query popularity
In e-Commerce, we also need to consider query conversion, and query
revenue
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 18 / 31
Modeling
A Noisy Channel Framework (cont.)
Language Model
n-gram language model: data sparsity as n goes up
backoff to/interpolation with lower-gram is necessary
smoothing is important
Good Turing smoothing: use 1-frequency items to estimate 0-frequency
probabilities
Additive smoothing: add pseudo count to terms/phrases
Knesser-Ney Smoothing: smart way of backoff and interpolation
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 19 / 31
Modeling
A Noisy Channel Framework (cont.)
More on Error model p(q|c)
Weighted edit model is better: p( a → e ) > p( a → n )
Context matters: p( a → e |context = ”be...”)
Multi-word errors need to be considered: p(”gopro”|”go pro”), can
be modeled by HMM, joint sequence model, etc.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 20 / 31
Modeling
A Noisy Channel Framework (cont.)
Hierarchical Error models
Character level error model
p( a → e |context = ”be...”)
generalizes well
less accurate
Syllable level error model
Word level error model
p( pi → pie |context = ”life of ...”)
sparse data
more accurate
Phrase level error model
...
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 21 / 31
Modeling
Discriminative Models
Why?
Noisy channel model is a generative framework
Multiplication is difficult as probabilities are estimated in different
ways
How to merge signals in one probability estimation is unknown (e.g.
linguistic correction vs. popularity vs. revenue)
There are other heuristic features and domain specific features that
cannot be subsumed in noisy channel model
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 22 / 31
Modeling
Discriminative Models (cont.)
How?
Learn to score < q, c > pair so that best correction has highest score
Challenges
Obtaining large scale training data: text parsing, human annotation
Learning methods
Classification
Learning to Rank
Structural learning
Efficiency: use noisy channel model to retrieve a handful candidates
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 23 / 31
Modeling
Discriminative Models (cont.)
Typically discriminative models such as SVM can also be used to
rerank the spelling candidates.
Recent successes with deep neural net.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 24 / 31
Modeling
Systems for Spelling Correction
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 25 / 31
Modeling
Candidate generation for Spelling Correction
Given a word find out all neighboring words under k edit distance.
Given a word find out potential close matches by hashing trick.
Generate candidates by using heuristic rules for common errors.
N-gram based techniques.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 26 / 31
Modeling
Candidate generation scaling up
Distributed implementation.
Hashing tricks.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 27 / 31
Modeling
Spell correction for E-commerce
UI for the spell correction.
Input data: Whether to include item titles or not?
Impact of autocorrection on conversion.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 28 / 31
Modeling
References I
Michele Banko and Eric Brill, Scaling to very very large corpora for
natural language disambiguation, Proceedings of the 39th Annual
Meeting on Association for Computational Linguistics, Association for
Computational Linguistics, 2001, pp. 26–33.
Silviu Cucerzan and Eric Brill, Spelling correction as an iterative
process that exploits the collective knowledge of web users., EMNLP,
vol. 4, 2004, pp. 293–300.
Huizhong Duan and Bo-June Paul Hsu, Online spelling correction for
query completion, Proceedings of the 20th international conference on
World wide web, ACM, 2011, pp. 117–126.
Daniel Jurafsky and James H Martin, Speech and language processing:
An introduction to natural language processing, computational
linguistics, and speech recognition.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 29 / 31
Modeling
References II
Mark D Kernighan, Kenneth W Church, and William A Gale, A
spelling correction program based on a noisy channel model,
Proceedings of the 13th conference on Computational
linguistics-Volume 2, Association for Computational Linguistics, 1990,
pp. 205–210.
Karen Kukich, Techniques for automatically correcting words in text,
ACM Computing Surveys (CSUR) 24 (1992), no. 4, 377–439.
Yanen Li, Huizhong Duan, and ChengXiang Zhai, A generalized
hidden markov model with discriminative training for query spelling
correction, Proceedings of the 35th international ACM SIGIR
conference on Research and development in information retrieval,
ACM, 2012, pp. 611–620.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 30 / 31
Modeling
References III
James L Peterson, Computer programs for detecting and correcting
spelling errors, Communications of the ACM 23 (1980), no. 12,
676–687.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 31 / 31

Weitere ähnliche Inhalte

Was ist angesagt?

Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Using AI to understand search intent
Using AI to understand search intentUsing AI to understand search intent
Using AI to understand search intentAritra Mandal
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
 
Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
 
Digital Marketing Medium Powerpoint Presentation Slides
Digital Marketing Medium Powerpoint Presentation SlidesDigital Marketing Medium Powerpoint Presentation Slides
Digital Marketing Medium Powerpoint Presentation SlidesSlideTeam
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolAmit Sharma
 
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsHolland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsDobo Radichkov
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningJames Ward
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsDatabricks
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisMercy Livingstone
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social mediaJeremiah Fadugba
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
User Research to Validate Product Ideas Workshop
User Research to Validate Product Ideas WorkshopUser Research to Validate Product Ideas Workshop
User Research to Validate Product Ideas WorkshopProduct School
 

Was ist angesagt? (20)

Text Classification
Text ClassificationText Classification
Text Classification
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Using AI to understand search intent
Using AI to understand search intentUsing AI to understand search intent
Using AI to understand search intent
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Digital Marketing Medium Powerpoint Presentation Slides
Digital Marketing Medium Powerpoint Presentation SlidesDigital Marketing Medium Powerpoint Presentation Slides
Digital Marketing Medium Powerpoint Presentation Slides
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
 
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsHolland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on Embeddings
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
Latent Semantic Indexing and Analysis
Latent Semantic Indexing and AnalysisLatent Semantic Indexing and Analysis
Latent Semantic Indexing and Analysis
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
User Research to Validate Product Ideas Workshop
User Research to Validate Product Ideas WorkshopUser Research to Validate Product Ideas Workshop
User Research to Validate Product Ideas Workshop
 

Andere mochten auch

Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)Felicia Samuel
 
$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$rafaella1997
 
Art fx programme_20h_blender
Art fx programme_20h_blenderArt fx programme_20h_blender
Art fx programme_20h_blenderdocteuratelier
 
Innovative Strategies
Innovative StrategiesInnovative Strategies
Innovative Strategiesrohtashmal
 
Sergio Baonza Presentacion.
Sergio Baonza Presentacion.Sergio Baonza Presentacion.
Sergio Baonza Presentacion.sergiobaonza10
 
From Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategyFrom Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategyLizaGR
 
AncientEgyptPsychiatry
AncientEgyptPsychiatryAncientEgyptPsychiatry
AncientEgyptPsychiatrySandra Knecht
 
Presentation restaurant de la fin du monde
Presentation restaurant de la fin du mondePresentation restaurant de la fin du monde
Presentation restaurant de la fin du mondedocteuratelier
 
Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !ARUCO
 
Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!Transformator Design Group
 
SGF Veg Restaurant Presentation
SGF Veg Restaurant PresentationSGF Veg Restaurant Presentation
SGF Veg Restaurant PresentationKewal Ahuja
 
Dominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertaintyDominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertaintySTEPS Centre
 
Marriott International Capstone Research Paper
Marriott International Capstone Research PaperMarriott International Capstone Research Paper
Marriott International Capstone Research PaperNatalia Poplawska
 
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...Laurence Thébault
 
MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet USERADGENTS
 

Andere mochten auch (17)

Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)
 
$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$
 
Art fx programme_20h_blender
Art fx programme_20h_blenderArt fx programme_20h_blender
Art fx programme_20h_blender
 
Innovative Strategies
Innovative StrategiesInnovative Strategies
Innovative Strategies
 
Sergio Baonza Presentacion.
Sergio Baonza Presentacion.Sergio Baonza Presentacion.
Sergio Baonza Presentacion.
 
From Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategyFrom Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategy
 
AncientEgyptPsychiatry
AncientEgyptPsychiatryAncientEgyptPsychiatry
AncientEgyptPsychiatry
 
Presentation restaurant de la fin du monde
Presentation restaurant de la fin du mondePresentation restaurant de la fin du monde
Presentation restaurant de la fin du monde
 
Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !
 
Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!
 
SGF Veg Restaurant Presentation
SGF Veg Restaurant PresentationSGF Veg Restaurant Presentation
SGF Veg Restaurant Presentation
 
Underwriting
UnderwritingUnderwriting
Underwriting
 
Dominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertaintyDominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertainty
 
The Race
The RaceThe Race
The Race
 
Marriott International Capstone Research Paper
Marriott International Capstone Research PaperMarriott International Capstone Research Paper
Marriott International Capstone Research Paper
 
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
 
MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet
 

Ähnlich wie Spelling correction systems for e-commerce platforms

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...MereoConsulting
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.docbutest
 
Iterative usability evaluation of DSLs
Iterative usability evaluation of DSLsIterative usability evaluation of DSLs
Iterative usability evaluation of DSLsAnkica Barisic
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for RequirementsClément Portet
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyArnab Bhadury
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...IRJET Journal
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...Siyamak Barzegar
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsSajeed Mahaboob
 
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...Shakas Technologies
 
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails:Latest Trends in Bioscience Literature SearchText, Tags and Thumbnails:Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Searchmarti_hearst
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 

Ähnlich wie Spelling correction systems for e-commerce platforms (20)

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.doc
 
Iterative usability evaluation of DSLs
Iterative usability evaluation of DSLsIterative usability evaluation of DSLs
Iterative usability evaluation of DSLs
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for Requirements
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutions
 
Question answering
Question answeringQuestion answering
Question answering
 
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails:Latest Trends in Bioscience Literature SearchText, Tags and Thumbnails:Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 

Mehr von Anjan Goswami

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Anjan Goswami
 
Discovery In Commerce Search
Discovery In Commerce SearchDiscovery In Commerce Search
Discovery In Commerce SearchAnjan Goswami
 
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Anjan Goswami
 
Controlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce SearchControlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce SearchAnjan Goswami
 
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Anjan Goswami
 
Assessing product image quality for online shopping
Assessing product image quality for online shoppingAssessing product image quality for online shopping
Assessing product image quality for online shopping Anjan Goswami
 

Mehr von Anjan Goswami (8)

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
 
Discovery In Commerce Search
Discovery In Commerce SearchDiscovery In Commerce Search
Discovery In Commerce Search
 
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
 
Controlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce SearchControlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce Search
 
Reputation systems
Reputation systemsReputation systems
Reputation systems
 
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
 
Assessing product image quality for online shopping
Assessing product image quality for online shoppingAssessing product image quality for online shopping
Assessing product image quality for online shopping
 
Clustering
ClusteringClustering
Clustering
 

Kürzlich hochgeladen

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Spelling correction systems for e-commerce platforms

  • 1. Spell Correction Systems for E-commerce engines Anjan Goswami HuiZhong Duan Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 1 / 31
  • 2. The Spell correction problem Rich literature [KCG90, Pet80]. Active research area [CB04]. Combination of NLP, Machine Learning [DH11, BB01, LDZ12] and Systems problems [Kuk92]. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 2 / 31
  • 3. Spell correction for e-commerce Critical site feature for e-commerce. Impact of ML based spell correction Adds revenue. Reduces bounce rate. Reduces null Results. Departments such as pharmacy can have huge gain in revenue with Spell Correction. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 3 / 31
  • 4. Spell correction for e-commerce Science part is same as any other large scale spell correction systems. Demand and supply side corpus. Conversion focus. User Interfaces. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 4 / 31
  • 5. Spell correction Evaluation Accuracy for misspelled queries. Accuracy for correctly spelled queries. Business metrics. Coverage. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 5 / 31
  • 6. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 6 / 31
  • 7. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 7 / 31
  • 8. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 8 / 31
  • 9. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 9 / 31
  • 10. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 10 / 31
  • 11. Error statistics Approximately 26% queries have spelling error in web queries [JM]. E-com data can be expected to be similar. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 11 / 31
  • 12. Error Types Typographic errors: Covr ← Cover Cognitive errors: Visio Tv ← Vizio Tv Non-english word errors: X345678 ← X345677 Contextual errors: life of Pie ← Life of Pi Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 12 / 31
  • 13. Challenges General Challenges Large candidate pool: queries Open dictionary: all terms are feasible Efficiency: happens before search is executed User behavior: query formulation is different from typical writing Devices: different device may cause different types of typos Under-correction: even a term is in correct form, it may need correction Over-correction: a term that doesn’t appear correct could still be good search term Languages: Different languages have different challenges. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 13 / 31
  • 14. Query Spelling Challenges Special Challenges (and Opportunities) in e-Commerce optimization target: linguistic correct or conversion? unique dictionary: model numbers, etc. high cost for over-correction availability of inventory data availability of conversion data Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 14 / 31
  • 15. General problems Error modeling Candidate generation Ranking and selection of the best candidate. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 15 / 31
  • 16. Modeling A Noisy Channel Framework Given user input query q, for every candidate correction c, compute the conditional probability p(c|q) p(c|q) = p(q|c) · p(c) p(q) ∝ p(q|c) · p(c) (1) Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 16 / 31
  • 17. Modeling A Noisy Channel Framework (cont.) Source model p(c) Captures: how likely user will pick query c in the first place Typically: language model Rationale: common phrases have high probabilities Error model p(q|c) Captures: how likely c is misspelled as q Straightforward model: edit distance Rationale: misspelled query should not be too different from original query Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 17 / 31
  • 18. Modeling A Noisy Channel Framework (cont.) More on Source model p(c) Linguistic correction is important Should also reflect query popularity In e-Commerce, we also need to consider query conversion, and query revenue Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 18 / 31
  • 19. Modeling A Noisy Channel Framework (cont.) Language Model n-gram language model: data sparsity as n goes up backoff to/interpolation with lower-gram is necessary smoothing is important Good Turing smoothing: use 1-frequency items to estimate 0-frequency probabilities Additive smoothing: add pseudo count to terms/phrases Knesser-Ney Smoothing: smart way of backoff and interpolation Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 19 / 31
  • 20. Modeling A Noisy Channel Framework (cont.) More on Error model p(q|c) Weighted edit model is better: p( a → e ) > p( a → n ) Context matters: p( a → e |context = ”be...”) Multi-word errors need to be considered: p(”gopro”|”go pro”), can be modeled by HMM, joint sequence model, etc. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 20 / 31
  • 21. Modeling A Noisy Channel Framework (cont.) Hierarchical Error models Character level error model p( a → e |context = ”be...”) generalizes well less accurate Syllable level error model Word level error model p( pi → pie |context = ”life of ...”) sparse data more accurate Phrase level error model ... Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 21 / 31
  • 22. Modeling Discriminative Models Why? Noisy channel model is a generative framework Multiplication is difficult as probabilities are estimated in different ways How to merge signals in one probability estimation is unknown (e.g. linguistic correction vs. popularity vs. revenue) There are other heuristic features and domain specific features that cannot be subsumed in noisy channel model Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 22 / 31
  • 23. Modeling Discriminative Models (cont.) How? Learn to score < q, c > pair so that best correction has highest score Challenges Obtaining large scale training data: text parsing, human annotation Learning methods Classification Learning to Rank Structural learning Efficiency: use noisy channel model to retrieve a handful candidates Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 23 / 31
  • 24. Modeling Discriminative Models (cont.) Typically discriminative models such as SVM can also be used to rerank the spelling candidates. Recent successes with deep neural net. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 24 / 31
  • 25. Modeling Systems for Spelling Correction Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 25 / 31
  • 26. Modeling Candidate generation for Spelling Correction Given a word find out all neighboring words under k edit distance. Given a word find out potential close matches by hashing trick. Generate candidates by using heuristic rules for common errors. N-gram based techniques. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 26 / 31
  • 27. Modeling Candidate generation scaling up Distributed implementation. Hashing tricks. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 27 / 31
  • 28. Modeling Spell correction for E-commerce UI for the spell correction. Input data: Whether to include item titles or not? Impact of autocorrection on conversion. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 28 / 31
  • 29. Modeling References I Michele Banko and Eric Brill, Scaling to very very large corpora for natural language disambiguation, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 2001, pp. 26–33. Silviu Cucerzan and Eric Brill, Spelling correction as an iterative process that exploits the collective knowledge of web users., EMNLP, vol. 4, 2004, pp. 293–300. Huizhong Duan and Bo-June Paul Hsu, Online spelling correction for query completion, Proceedings of the 20th international conference on World wide web, ACM, 2011, pp. 117–126. Daniel Jurafsky and James H Martin, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 29 / 31
  • 30. Modeling References II Mark D Kernighan, Kenneth W Church, and William A Gale, A spelling correction program based on a noisy channel model, Proceedings of the 13th conference on Computational linguistics-Volume 2, Association for Computational Linguistics, 1990, pp. 205–210. Karen Kukich, Techniques for automatically correcting words in text, ACM Computing Surveys (CSUR) 24 (1992), no. 4, 377–439. Yanen Li, Huizhong Duan, and ChengXiang Zhai, A generalized hidden markov model with discriminative training for query spelling correction, Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM, 2012, pp. 611–620. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 30 / 31
  • 31. Modeling References III James L Peterson, Computer programs for detecting and correcting spelling errors, Communications of the ACM 23 (1980), no. 12, 676–687. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 31 / 31