SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Structured Search Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198 Version 4
Presentation Description ,[object Object],[object Object],[object Object]
After This Presentation Users Will Be Able To: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background for Dan McCreary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How Many People… ,[object Object],[object Object],[object Object],[object Object],[object Object]
Structured Search ,[object Object],[object Object],But first a story……
Information Retrieval Textbook ,[object Object],[object Object],[object Object],http://nlp.stanford.edu/IR-book/information-retrieval-book.html
117 Citations in Computer Science With 117 citations, the "Intro to IR" book is the second most cited Computer Science reference published in 2008.
Table 10.1 XML - Table 10.1 and structured information retrieval.  SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others ? main data structure table inverted index ? queries SQL free text queries ?
Excerpt from IR Book… ,[object Object],[object Object],[object Object]
eXist Native XML Developers eXist Meeting Prague March 12 th , 2010
Presentation Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Table 10.1 - Revised XML - Table 10.1 and structured information retrieval.  SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others XML hierarchy main data structure table inverted index trees with node-ids for document ids queries SQL free text queries XQuery fulltext
Relational DB Boolean Search ,[object Object],[object Object],[object Object],SELECT * FROM PERSON WHERE TITLE = 'manager' ORDER BY SALARY Note that the "order" is not the quality of a mach but another column in a table
Vector Model ,[object Object],[object Object],Your search keyword (green) Other documents (blue) Search score is distance measurement (red) Keyword 1 Keyword 2
Reverse Index For each word, a reverse index tells you what documents contain that word.  Word Document IDs hate 12344, 34235, 43513,  love 12344, 34235, 43513, 22345, 12313, 42345, 12313, 13124
Reverse Index in eXist 1.5 Terms that start with "love"
Sample Keyword Search ,[object Object],[object Object],Keyword Search: Resulting Hits: Code (XQuery):
Calculating Score ,[object Object],[object Object],[object Object],[object Object],[object Object]
How is "Structured Search" Different? ,[object Object],[object Object],[object Object]
Two Models ,[object Object],[object Object],[object Object],[object Object],[object Object],'love' 'hate' 'new' 'fear' keywords keywords keywords keywords keywords keywords doc-id
Keywords and Node IDs ,[object Object],Node-id Node-id Node-id Node-id Node-id Node-id keywords keywords keywords keywords keywords keywords document-id
Subdocuments ,[object Object],[object Object],[object Object],[object Object]
Books Have Structure Book Title Book Metadata
Presentations Have Structure Find all slides with the word "XML" in their title ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
E-mail has structure ,[object Object],[object Object],[object Object],[object Object]
Sample of XML ,[object Object]
Many Objects Have Structure Spreadsheets Find all forms with a label of "Zipcode" Find all spreadsheets with a first row cell that contains the word "SSN"
But What About Microsoft Office? ,[object Object],Office Open XML (also informally known as OOXML or OpenXML) is a zipped, XML-based file format developed by Microsoft  for representing spreadsheets, charts, presentations  and word processing documents.  File extensions: .docx, .xlsx, .pptx are zipped folders that contain XML files  ECMA-376
Open Document XML Formats ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Benefits ,[object Object],[object Object],[object Object],Target documents Other documents Actual Search Results  Missed Document
Results from Studies ,[object Object],[object Object],[object Object],[object Object],Source: INEX 2003/2004 "Bag-of-words" vs. "full structure"
Tibetan Buddhist Resource Center (TBRC) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 (low) 5 (high)
Woodruff Library, Emory University ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 (low) 5 (high)
Challenges ,[object Object],[object Object],[object Object],[object Object]
Getting Data into XML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What to Return in a Hit ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Allow searchers to specify level with search options ,[object Object],[object Object],[object Object],[object Object],[object Object]
Steps in Testing Structured Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Steps in Structured Search Project ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sample Queries ,[object Object],Find all "SPEECH" elements that contains the keyword 'love' (predicate or "WHERE" clause)
Near Operator ,[object Object],[object Object]
Skillsets Needed for Pilot Project ,[object Object],[object Object],[object Object],[object Object]
Predictions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Steps to Run Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sample Configuration File ,[object Object],Example of boost on title
XQuery Fulltext
XQuery/Lucene Search Wikibook
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Send e-mail to dan@danmccreary.com for extended list of "getting started" resources.
Questions? Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198

Weitere ähnliche Inhalte

Was ist angesagt?

One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
Connected Data World
 

Was ist angesagt? (20)

Conclusions - Linked Data
Conclusions - Linked DataConclusions - Linked Data
Conclusions - Linked Data
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
One Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACLOne Ontology, One Data Set, Multiple Shapes with SHACL
One Ontology, One Data Set, Multiple Shapes with SHACL
 
How Semantics Solves Big Data Challenges
How Semantics Solves Big Data ChallengesHow Semantics Solves Big Data Challenges
How Semantics Solves Big Data Challenges
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS Library
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Graph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysisGraph Data Science DEMO for fraud analysis
Graph Data Science DEMO for fraud analysis
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
 
TehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph DatabaseTehranDB Meet-up April 2018 Introduction to Graph Database
TehranDB Meet-up April 2018 Introduction to Graph Database
 
SKOS and Linked Data
SKOS and Linked DataSKOS and Linked Data
SKOS and Linked Data
 
Graph db
Graph dbGraph db
Graph db
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 

Andere mochten auch

Cookies1 passwords
Cookies1 passwordsCookies1 passwords
Cookies1 passwords
smgibbs
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
Brian Johnson
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
weiw_oz
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
Tony Ng
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentation
Andrada Astefanoaie
 

Andere mochten auch (16)

Cookies1 passwords
Cookies1 passwordsCookies1 passwords
Cookies1 passwords
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
 
Ebay search
Ebay searchEbay search
Ebay search
 
Presentation
PresentationPresentation
Presentation
 
Interactive Query and Search for your Big Data
Interactive Query and Search for your Big DataInteractive Query and Search for your Big Data
Interactive Query and Search for your Big Data
 
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)Keyword-based Search and Exploration on Databases (SIGMOD 2011)
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
 
PayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP SoftwarePayrollAdmin - Attendance and Payroll Management ERP Software
PayrollAdmin - Attendance and Payroll Management ERP Software
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
E-Learning Baseline, UCL
E-Learning Baseline, UCLE-Learning Baseline, UCL
E-Learning Baseline, UCL
 
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh WilliamsHadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh Williams
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
 
Keyword proximity search in xml trees andrada astefanoaie - presentation
Keyword proximity search in xml trees   andrada astefanoaie - presentationKeyword proximity search in xml trees   andrada astefanoaie - presentation
Keyword proximity search in xml trees andrada astefanoaie - presentation
 

Ähnlich wie Structured Document Search and Retrieval

Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
ankur881120
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
Steven Toole
 

Ähnlich wie Structured Document Search and Retrieval (20)

You Don't Know SEO
You Don't Know SEOYou Don't Know SEO
You Don't Know SEO
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
 
Sweo talk
Sweo talkSweo talk
Sweo talk
 
Demystifying analytics in e discovery white paper 06-30-14
Demystifying analytics in e discovery   white paper 06-30-14Demystifying analytics in e discovery   white paper 06-30-14
Demystifying analytics in e discovery white paper 06-30-14
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
 
Search for Clarify/Dovetail
Search for Clarify/DovetailSearch for Clarify/Dovetail
Search for Clarify/Dovetail
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010Mark Logic StrangeLoop 2010
Mark Logic StrangeLoop 2010
 
Making things findable
Making things findableMaking things findable
Making things findable
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 

Mehr von Optum

Mehr von Optum (6)

Building Bi Dashboards With Sas
Building Bi Dashboards With SasBuilding Bi Dashboards With Sas
Building Bi Dashboards With Sas
 
An Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEMAn Ontology for K-12 Education and the NIEM
An Ontology for K-12 Education and the NIEM
 
Promoting the Semantic Web
Promoting the Semantic WebPromoting the Semantic Web
Promoting the Semantic Web
 
Patterns of Semantic Integration
Patterns of Semantic IntegrationPatterns of Semantic Integration
Patterns of Semantic Integration
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
XRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUGXRX Presentation to Minnesota OTUG
XRX Presentation to Minnesota OTUG
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Structured Document Search and Retrieval

  • 1. Structured Search Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198 Version 4
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. 117 Citations in Computer Science With 117 citations, the "Intro to IR" book is the second most cited Computer Science reference published in 2008.
  • 9. Table 10.1 XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others ? main data structure table inverted index ? queries SQL free text queries ?
  • 10.
  • 11. eXist Native XML Developers eXist Meeting Prague March 12 th , 2010
  • 12.
  • 13. Table 10.1 - Revised XML - Table 10.1 and structured information retrieval. SQLRDB (relational database) search, unstructured information retrieval   RDB search unstructured retrieval structured retrieval objects records unstructured documents trees with text at leaves model relational model vector space & others XML hierarchy main data structure table inverted index trees with node-ids for document ids queries SQL free text queries XQuery fulltext
  • 14.
  • 15.
  • 16. Reverse Index For each word, a reverse index tells you what documents contain that word. Word Document IDs hate 12344, 34235, 43513, love 12344, 34235, 43513, 22345, 12313, 42345, 12313, 13124
  • 17. Reverse Index in eXist 1.5 Terms that start with "love"
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Books Have Structure Book Title Book Metadata
  • 25.
  • 26.
  • 27.
  • 28. Many Objects Have Structure Spreadsheets Find all forms with a label of "Zipcode" Find all spreadsheets with a first row cell that contains the word "SSN"
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 49.
  • 50.
  • 51. Questions? Dan McCreary President Dan McCreary & Associates [email_address] (952) 931-9198