SlideShare ist ein Scribd-Unternehmen logo
1 von 18
CS-690 Fall-2009 Querying Capability Modelling and Construction of Deep Web Sources Paper by: Liangcai Shu, Weiyi Meng, Hai Fe, and Clement Yu Presentation by: Fabian Alenius
Problem definition ,[object Object]
Information in Deep Web is 400-550 times larger than the surface web
Crawlers need to extract
information
Problem definition ,[object Object]
A form submission consists of a set of values, one for every field of the form
Querying Capability ,[object Object]
An attribute is a field in a form
A query is a set of attributes
A set query instances are a set of attribute and value pairs, e.g. {<Author, Tolkien>, <Title, Bilbo>}
Four types of attributes Functional Attribute Range Attribute Categorical Attribute Value-Infinite Attribute
Conditional Dependency ,[object Object]
Conditional dependency – value of  C  insignificant if  A  is null
A -> C
Attribute Unions ,[object Object]
T is not attribute type, but value type (e.g. phone, zip code, etc.)
Valid queries  – accepted;  Invalid queries  – rejected

Weitere ähnliche Inhalte

Ähnlich wie Querying Capability Modeling

Exploiting web search engines to search structured
Exploiting web search engines to search structuredExploiting web search engines to search structured
Exploiting web search engines to search structuredNita Pawar
 
Building Semantic Web Portals with WebML
Building Semantic Web Portals with WebMLBuilding Semantic Web Portals with WebML
Building Semantic Web Portals with WebMLMarco Brambilla
 
Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)Igor Moochnick
 
Topic Maps: Theory & Practice
Topic Maps: Theory & PracticeTopic Maps: Theory & Practice
Topic Maps: Theory & Practicebbater
 
Breaking down data silos with the open data protocol
Breaking down data silos with the open data protocolBreaking down data silos with the open data protocol
Breaking down data silos with the open data protocolWoodruff Solutions LLC
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentJay Luker
 
Ellerslie User Group - ReST Presentation
Ellerslie User Group - ReST PresentationEllerslie User Group - ReST Presentation
Ellerslie User Group - ReST PresentationAlex Henderson
 
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...Amazon Web Services
 
Lerman Vvs14 Ef Tips And Tricks
Lerman Vvs14  Ef Tips And TricksLerman Vvs14  Ef Tips And Tricks
Lerman Vvs14 Ef Tips And TricksJulie Lerman
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and RetrievalOptum
 
Understanding the Entity API Module
Understanding the Entity API ModuleUnderstanding the Entity API Module
Understanding the Entity API ModuleSergiu Savva
 
devLink - What's New in C# 4?
devLink - What's New in C# 4?devLink - What's New in C# 4?
devLink - What's New in C# 4?Kevin Pilch
 
Linq Sanjay Vyas
Linq   Sanjay VyasLinq   Sanjay Vyas
Linq Sanjay Vyasrsnarayanan
 
Hibernate Session 4
Hibernate Session 4Hibernate Session 4
Hibernate Session 4b_kathir
 

Ähnlich wie Querying Capability Modeling (20)

Table Recognition
Table RecognitionTable Recognition
Table Recognition
 
Web clustring engine
Web clustring engineWeb clustring engine
Web clustring engine
 
Fasterflect
FasterflectFasterflect
Fasterflect
 
Exploiting web search engines to search structured
Exploiting web search engines to search structuredExploiting web search engines to search structured
Exploiting web search engines to search structured
 
Linq
LinqLinq
Linq
 
Linq
LinqLinq
Linq
 
Building Semantic Web Portals with WebML
Building Semantic Web Portals with WebMLBuilding Semantic Web Portals with WebML
Building Semantic Web Portals with WebML
 
Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)Ado.Net Data Services (Astoria)
Ado.Net Data Services (Astoria)
 
Topic Maps: Theory & Practice
Topic Maps: Theory & PracticeTopic Maps: Theory & Practice
Topic Maps: Theory & Practice
 
Breaking down data silos with the open data protocol
Breaking down data silos with the open data protocolBreaking down data silos with the open data protocol
Breaking down data silos with the open data protocol
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search Component
 
Ellerslie User Group - ReST Presentation
Ellerslie User Group - ReST PresentationEllerslie User Group - ReST Presentation
Ellerslie User Group - ReST Presentation
 
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
 
Lerman Vvs14 Ef Tips And Tricks
Lerman Vvs14  Ef Tips And TricksLerman Vvs14  Ef Tips And Tricks
Lerman Vvs14 Ef Tips And Tricks
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Understanding the Entity API Module
Understanding the Entity API ModuleUnderstanding the Entity API Module
Understanding the Entity API Module
 
Simple Data Binding
Simple Data BindingSimple Data Binding
Simple Data Binding
 
devLink - What's New in C# 4?
devLink - What's New in C# 4?devLink - What's New in C# 4?
devLink - What's New in C# 4?
 
Linq Sanjay Vyas
Linq   Sanjay VyasLinq   Sanjay Vyas
Linq Sanjay Vyas
 
Hibernate Session 4
Hibernate Session 4Hibernate Session 4
Hibernate Session 4
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Querying Capability Modeling

Hinweis der Redaktion

  1. blabla
  2. Functional Attribute – only affects the display of results Range Attribute – specifies a range, for example price or year Categorical Attribute – fixed set of distinct value, usually selections lists Value-Infinite Attribute – Infinite number of possible values, typically textboxes Value-Infinite can be categorized into more types, like phone-number, zip code, etc. Not addressed in this paper
  3. Enough to fill in one field
  4. A query is deemed accepted even if it returns no results Goal is find the types of queries that are accepted
  5. First extract HTML form and output XML Feed into Query Capability Analyzer Submit Queries and get result Output is a set of atomic queries
  6. Recall is lower than precision Method is conservative, if no strong evidence the system treats as rejected