SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Mining Product Synonym
Information Retrieval and Extraction
Project On
Presented By :
Vrishank Shete(201305642)
Mohd. Salman Khan(201305513)
Ankush Jain(201101010)
Suprabh Shukla(201001082)
Guided By :
Priya Radhakrishnan
Computer Science and Engineering
International Institute of Information Technology
Introduction
Problem Statement : Given an entity query, find canonical
terms by which the entity can be distinguished.
Forms of web queries on structured data.
Gap between user queries and creators describing entities.
E.g. User may query Harry Potter 6 if he wish to search for
Harry Potter and The Half Blood Prince
Related Works
 String Similarity Measures:
◦ Levenshtein String Similarity function.
◦ Dice Coefficient.
◦ Jaccard String Similarity function.
 ExploitingWeb Search to Generate Synonyms for
Entities by Surajit Chaudhuri,Venkatesh Ganti, Dong Xin
System Components
 Extracting IDTokenSets using documents from web
search.
 Expanding IDTokenSets using p-Window context
 Searching for possible canonical names from pre-
crawled list.
 Validating canonical names from web documents
Algorithm
1: Let Le = Pe; //all subsets of e;
2: while (Le is not empty)
3: Te = getnext(Le);
4: SubmitTe to W, and retrieve W(Te);
5: if (corr(Te; e;W(Te)) ¸ µ)
Te is an IDTokenSet
6: Report Te and all its supersets as IDTokenSets;
7: Remove Te and all its supersets from Le;
8: else
Te is not an IDTokenSet
9: Remove Te and its subsets from Le;
10: return.
Here the correlation function (corr) gives the estimate of how much
theTe is important to the current document.
Algorithm
11.After getting substrings, we show evidence by levenValue (<= 0.95) ,
jaccard (> 0.10) && dice (> 0.20) (by taking these values) from our data
set.
12.After filtering in step 3, we again filter by correlation method which is
mentioned above.(In Step 12 we get all mentions and all strings which are
matching to the mentions.These strings may or may not be canonical
names.)
13. Now we store all strings in a p-window context for all mentions in the
results of search engine(which we already store in step 1-10) we got in
step 12.
14.We count the number of times each word is occurring in all strings from
step13.
15. Now we take top K words from count hash and search in all the strings
from step 12(those may or may not be a part of canonical names).
16.We match words from step 15 and strings from step 12. best matched
string is our canonical string and our synonym (our desired result).
Block Diagram

An Example
Challenges
 The web documents are highly unstructured.The
query string can be present anywhere and in any
form in the respective document.This case is
handled using the p-Window context in which the
string is supposed to be present.
 The web search engines do not allow automated
frequent queries in small intervals through a
program.A delay of 2 seconds is introduced
between two queries which makes the searching
somewhat slower but serves our purpose.
Cons
Time for web search.
Less usable data from web search.
References
 ExploitingWeb Search to Generate
Synonyms for Entities By Surajit
Chaudhuri,Venkatesh Ganti, Dong Xin.
 Entity Synonyms for StructuredWeb
Search by Tao Cheng, Hady W. Lauw, and
Stelios Paparizos
Thank you
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalssbd6985
 
Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Jie Bao
 
User behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applicationsUser behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applicationsKacper Łukawski
 
Information Retrieval-1
Information Retrieval-1Information Retrieval-1
Information Retrieval-1Jeet Das
 
Implementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserImplementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserWaqas Tariq
 
NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)Swetha Pallati
 
A wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_englishA wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_englishAdrian Walker
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introductionguest0edcaf
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsMauro Dragoni
 
Coling2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label InformationColing2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label InformationRyuchi Tachibana
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 

Was ist angesagt? (19)

Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics
 
User behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applicationsUser behaviour modeling for data prefetching in web applications
User behaviour modeling for data prefetching in web applications
 
Ir 02
Ir   02Ir   02
Ir 02
 
Information Retrieval-1
Information Retrieval-1Information Retrieval-1
Information Retrieval-1
 
Implementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserImplementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic Parser
 
NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)
 
A wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_englishA wiki for_business_rules_in_open_vocabulary_executable_english
A wiki for_business_rules_in_open_vocabulary_executable_english
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Ir 03
Ir   03Ir   03
Ir 03
 
Text mining
Text miningText mining
Text mining
 
Text mining
Text miningText mining
Text mining
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World Settings
 
Coling2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label InformationColing2014:Single Document Keyphrase Extraction Using Label Information
Coling2014:Single Document Keyphrase Extraction Using Label Information
 
Data science unit3
Data science unit3Data science unit3
Data science unit3
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Reasoned SPARQL
Reasoned SPARQLReasoned SPARQL
Reasoned SPARQL
 

Andere mochten auch

Web Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical ModelsWeb Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical ModelsGUANBO
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalSvitlana volkova
 
Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaAhmedali Durga
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersSriTeja Allaparthi
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosisask2372
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalChen Xi
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITAnkit Sharma
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2ndhit_alex
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and ExtractionChristopher Frenz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...Jim Jenkins
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesYunyao Li
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataGerard de Melo
 

Andere mochten auch (20)

Web Information Retrieval and Mining
Web Information Retrieval and MiningWeb Information Retrieval and Mining
Web Information Retrieval and Mining
 
Web Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical ModelsWeb Information Extraction Learning based on Probabilistic Graphical Models
Web Information Extraction Learning based on Probabilistic Graphical Models
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
 
Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social media
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
 
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosis
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and Extraction
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...
 
2 13
2 132 13
2 13
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challenges
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
 

Ähnlich wie Mining Product Synonyms - Slides

Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search enginecsandit
 
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...cscpconf
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
Information Extraction
Information ExtractionInformation Extraction
Information Extractionbutest
 
Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web dannyijwest
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
 
Automatically Constructing Semantic Web Services From Online Sources
Automatically Constructing Semantic Web Services From Online SourcesAutomatically Constructing Semantic Web Services From Online Sources
Automatically Constructing Semantic Web Services From Online SourcesAsia Smith
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked DataRuben Verborgh
 
Hybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithmHybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithmeSAT Publishing House
 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...Kumar Goud
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Editor IJARCET
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!Mark Wilkinson
 
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Waqas Tariq
 
Annotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontologyAnnotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontologyijnlc
 

Ähnlich wie Mining Product Synonyms - Slides (20)

Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
 
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
Object surface segmentation, Image segmentation, Region growing, X-Y-Z image,...
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web Semantic Knowledge Acquisition of Information for Syntactic web
Semantic Knowledge Acquisition of Information for Syntactic web
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
Automatically Constructing Semantic Web Services From Online Sources
Automatically Constructing Semantic Web Services From Online SourcesAutomatically Constructing Semantic Web Services From Online Sources
Automatically Constructing Semantic Web Services From Online Sources
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
 
Hybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithmHybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithm
 
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
professional fuzzy type-ahead rummage around in xml  type-ahead search techni...professional fuzzy type-ahead rummage around in xml  type-ahead search techni...
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020Volume 2-issue-6-2016-2020
Volume 2-issue-6-2016-2020
 
In3415791583
In3415791583In3415791583
In3415791583
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!
 
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
 
Annotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontologyAnnotation for query result records based on domain specific ontology
Annotation for query result records based on domain specific ontology
 
G1803054653
G1803054653G1803054653
G1803054653
 

Kürzlich hochgeladen

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 

Kürzlich hochgeladen (20)

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 

Mining Product Synonyms - Slides

  • 1. Mining Product Synonym Information Retrieval and Extraction Project On Presented By : Vrishank Shete(201305642) Mohd. Salman Khan(201305513) Ankush Jain(201101010) Suprabh Shukla(201001082) Guided By : Priya Radhakrishnan Computer Science and Engineering International Institute of Information Technology
  • 2. Introduction Problem Statement : Given an entity query, find canonical terms by which the entity can be distinguished. Forms of web queries on structured data. Gap between user queries and creators describing entities. E.g. User may query Harry Potter 6 if he wish to search for Harry Potter and The Half Blood Prince
  • 3. Related Works  String Similarity Measures: ◦ Levenshtein String Similarity function. ◦ Dice Coefficient. ◦ Jaccard String Similarity function.  ExploitingWeb Search to Generate Synonyms for Entities by Surajit Chaudhuri,Venkatesh Ganti, Dong Xin
  • 4. System Components  Extracting IDTokenSets using documents from web search.  Expanding IDTokenSets using p-Window context  Searching for possible canonical names from pre- crawled list.  Validating canonical names from web documents
  • 5. Algorithm 1: Let Le = Pe; //all subsets of e; 2: while (Le is not empty) 3: Te = getnext(Le); 4: SubmitTe to W, and retrieve W(Te); 5: if (corr(Te; e;W(Te)) ¸ µ) Te is an IDTokenSet 6: Report Te and all its supersets as IDTokenSets; 7: Remove Te and all its supersets from Le; 8: else Te is not an IDTokenSet 9: Remove Te and its subsets from Le; 10: return. Here the correlation function (corr) gives the estimate of how much theTe is important to the current document.
  • 6. Algorithm 11.After getting substrings, we show evidence by levenValue (<= 0.95) , jaccard (> 0.10) && dice (> 0.20) (by taking these values) from our data set. 12.After filtering in step 3, we again filter by correlation method which is mentioned above.(In Step 12 we get all mentions and all strings which are matching to the mentions.These strings may or may not be canonical names.) 13. Now we store all strings in a p-window context for all mentions in the results of search engine(which we already store in step 1-10) we got in step 12. 14.We count the number of times each word is occurring in all strings from step13. 15. Now we take top K words from count hash and search in all the strings from step 12(those may or may not be a part of canonical names). 16.We match words from step 15 and strings from step 12. best matched string is our canonical string and our synonym (our desired result).
  • 9. Challenges  The web documents are highly unstructured.The query string can be present anywhere and in any form in the respective document.This case is handled using the p-Window context in which the string is supposed to be present.  The web search engines do not allow automated frequent queries in small intervals through a program.A delay of 2 seconds is introduced between two queries which makes the searching somewhat slower but serves our purpose.
  • 10. Cons Time for web search. Less usable data from web search.
  • 11. References  ExploitingWeb Search to Generate Synonyms for Entities By Surajit Chaudhuri,Venkatesh Ganti, Dong Xin.  Entity Synonyms for StructuredWeb Search by Tao Cheng, Hady W. Lauw, and Stelios Paparizos
  • 13. Q & A