SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Building Search SystemsBuilding Search Systems
for the Enterprisefor the Enterprise
IBM Research – Almaden
ACMACM
SIGIRSIGIR 20112011
Beijing, China
(on behalf of Shivakumar Vaithyanathan)
Yunyao Li
• Search for the EnterpriseSearch for the Enterprise
• Programmable Search (overview)Programmable Search (overview)
• Backend AnalyticsBackend Analytics
• Search RuntimeSearch Runtime
• Foundations and PrinciplesFoundations and Principles
• Concluding RemarksConcluding Remarks
outlineoutline
2
Experience at IBM Internal SearchExperience at IBM Internal Search
• IBM deployed a commercially available search engine
– Implementing standard IR techniques
• Search quality went down over time to the point that
Search results were unacceptable!Search results were unacceptable!
Success (≥ 1 relevant results): 14% on top-1, 23% on
top-5, 34% on top-50! [Zhu et al., WWW’07]
So, they implemented various solutions…
3
To the administrators managing the engine,
exposed knobs were insufficient
Attempts to Improve SearchAttempts to Improve Search
• Enhanced link analysis by
incorporating external links
to/from external WWW
• Creative hacks: added fake
terms to documents & queries
– # terms per document determined by
“popularity”: how much TF increase
required for needed rank boost ?
• Hard-coded custom results for
the top 1200+ queries
• Enhanced link analysis by
incorporating external links
to/from external WWW
• Creative hacks: added fake
terms to documents & queries
– # terms per document determined by
“popularity”: how much TF increase
required for needed rank boost ?
• Hard-coded custom results for
the top 1200+ queries
Didn’t help…
Quality went down!
Maintenance nightmare:
Heuristic needs to be updated
upon each nontrivial change in
term stats./ranking parameters
Even bigger nightmare!
How to deal with continuously
changing terminology?
4
What are the Problems?What are the Problems?
Network Station Manager search
Thin Client ManagerProduct names change:
Continually changing terminology!
Domain-specific meaning!
Paula Summa search
bring Paula Summa from
employee directories
per diem search
Domain-specific repetitions!
popcorn search
conference call!
These problems are not specific
to enterprise search… but:
• Result 1: IBM Travel: Per Diem
• Result 2: IBM Travel: Per Diem Rates
• Result 3: IBM Travel: National perdiems
• Result 25: IBM Travel: Per Diem Policy
5
…
The Enterprise Challenge!The Enterprise Challenge!
Domain-specific meaning! Domain-specific repetitions!
Generic search solutionGeneric search solution that is
customizable and maintainable in every
domain
Generic search solutionGeneric search solution that is
customizable and maintainable in every
domain
Simple customization with reasonable effort!Simple customization with reasonable effort!
Programmable SearchProgrammable Search
Ongoing search-quality managementOngoing search-quality management
6
Continually changing terminology!
• Search for the EnterpriseSearch for the Enterprise
• Programmable Search (overview)Programmable Search (overview)
• Backend AnalyticsBackend Analytics
• Search RuntimeSearch Runtime
• Foundations and PrinciplesFoundations and Principles
• Concluding RemarksConcluding Remarks
outlineoutline
7
Programmable Search: Main IdeaProgrammable Search: Main Idea
• Goals:Goals:
– Transparency
• Know “precisely” why every result item is being brought back
• Understand how changes in content/intents affect search
– Maintainability and “Debugability”
• Ranking logic is guided by explicit rules
• Properly react to changes in content/intents
• Building blocks:Building blocks:
– Deep analytics on documents
– Domain-specific analysis of queries
– Transparent customizable rule-driven ranking
runtime rulesruntime rules
backend
analytics
backend
analytics
interpretationsinterpretations
8
Distributed Analytics Platform
Crawling, information extraction, token generation (TG), indexing
Search runtime
Index
Index and rule
update services
backend
analytics
backend
analytics
runtime rulesruntime rulesinterpretationsinterpretations
Implementation Architecture
backend
frontend
9
• Search for the EnterpriseSearch for the Enterprise
• Programmable Search (overview)Programmable Search (overview)
• Backend AnalyticsBackend Analytics
• Search RuntimeSearch Runtime
• Foundations and PrinciplesFoundations and Principles
• Concluding RemarksConcluding Remarks
outlineoutline
10
Backend Analytics:Backend Analytics: 3 Parts3 Parts
Local AnalysisLocal Analysis
(per-page analysis)
Local AnalysisLocal Analysis
(per-page analysis)
Global AnalysisGlobal Analysis
(cross-page analysis)
Global AnalysisGlobal Analysis
(cross-page analysis)
Token GenerationToken Generation
(TG)
Token GenerationToken Generation
(TG)
index
11
Local AnalysisLocal Analysis
• Categorizing pages
– Label pages by custom categories
• IBM examples: HR, person, IT help, ISSI, sales information,
marketing, corporate standards, legal & IP-law, …
– Geo classification
• Associate documents with the relevant countries & regions
• Annotating pages
– Identify HomePage annotation for people, projects,
communities, …
Simply knowing where a page is physically hosted is not enough
(example: Czech Republic hosts all pages for IBM in Europe)
12
G J Chaitin Home Page
13
Homepage IdentificationHomepage Identification
Title ExtractionTitle Extraction
Matching title
patterns
Matching title
patterns
Title
s
Dictionary
Match
Dictionary
Match
Home Page for
G J Chaitin
• http://w3.ibm.com/hr/idp/
• http://w3-03.ibm.com/isc/index.html
• http://chis.at.ibm.com/
URL ExtractionURL Extraction
URLs
Matching URL
patterns
Matching URL
patterns
Homepage for:  idp  isc  chis
Employee
directory
… many more …
Intranet
page
Intranet
page
More details in
[Zhu et al., WWW’07]
14 IBM Confidential14 IBM Confidential
Among the 38 pages with the exact same title,
which is the best for “Paula Summa”?
Role of Global AnalysisRole of Global Analysis
14
PersonPerson
TitleTitle
Token Generation (TG)
Annotated values Index content
Ching-Tien T. (Howard) Ho
 Ho Ching-Tien  Tien Ho  Ho, Tien
 Howard Ho  Ching-Tien H.  ...
Global Technology Services
TG
personNameTG
 Howard  Ho Ching  Tien  ...
 gts  Global Technology Services
 Global Technology  Technology
Services  Global  Technology  ...
 GlobalTechnologyServices
nGramTG
spaceTG
acronymTG
nGramTG
……
… 15
…
…
• Search for the EnterpriseSearch for the Enterprise
• Programmable Search (overview)Programmable Search (overview)
• Backend AnalyticsBackend Analytics
• Search RuntimeSearch Runtime
• Foundations and PrinciplesFoundations and Principles
• Concluding RemarksConcluding Remarks
outlineoutline
16
3 Phases of Runtime Flow
Search QuerySearch Query
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
17
Phase 3:Phase 3: Result Construction
Phase 2:Phase 2: Relevance Ranking
Phase 1:Phase 1: Query SemanticsQuery Semantics
query search rewrite rules
queries
interpretations
partially ordered interpretations
interpretations execution
partially ordered results
result aggregation
ordered results
grouping rules
ordered & grouped results final results
re-ranking rules
Runtime Flow in More DetailsRuntime Flow in More Details
18
Runtime Rules:Runtime Rules: Pattern-Action Language
Query Pattern Queries Matching Possible Action
EQUALS
[r=ibm|information|info]
[d=COUNTRY]
• ibm germany
• info india
Rewrite into “[country] hr”
(e.g., germany hr)
ENDS_WITH installation
• acrobat installation
• db2 on aix installation
Replace installation with ISSI
(e.g., acrobat ISSI)
CONTAINS directions to
[d=SITE]
• driving directions to almaden
• directions to watson from jfk
Pages of “siteserv” category
should be ranked higher
STARTS_WITH
[d=PERSON]
• john kelly biography
• steve mills announcement
Group together pages that
represent blog entries
Pattern expression,
matched against the
keyword query
Perform when
matchQuery pattern → Action
19
3 Phases of Runtime Flow
Search QuerySearch Query
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
20
21
What’s Best for Benefits?What’s Best for Benefits?
The most important IBM page for benefits
changes over time: currently it is netbenefits
The most important IBM page for benefits
changes over time: currently it is netbenefits
21
Rewrite RulesRewrite Rules
benefits  netbenefits
interpretations
partially ordered interpretations
interpretations execution
partially ordered results
result aggregation
ordered results
grouping rules
ordered & grouped results final results
re-ranking rules
benefits, netbenefits
benefits  netbenefits
rewrite rules
queries
benefits search
22
Interpretations
Scenario: An IBM employee wants
to download Lotus Symphony 1.3
Scenario: An IBM employee wants
to download Lotus Symphony 1.3
Runtime interpretation:
download symphony 1.3  category=issi software=symphony 1.3
interpretations execution
partially ordered results
result aggregation
ordered results
grouping rules
ordered & grouped results final results
re-ranking rules
rewrite rules
queries
interpretations
partially ordered interpretations
download symphony 1.3 search
23
24
IBM Confidential
People with
first name Jim
People with
first name Jim
How can we avoid pages
from people category?
How can we avoid pages
from people category?
java  jim
Complex RulesComplex Rules
24
java  jim and not in person category
Complex RulesComplex Rules
interpretations execution
partially ordered results
result aggregation
ordered results
grouping rules
ordered & grouped results final results
re-ranking rules
interpretations
partially ordered interpretations
rewrite rules
queries
java search
25
3 Phases of Runtime Flow
Search QuerySearch Query
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
26
PersonPerson
TitleTitle
Recall: Token Generation (TG)
Annotated values Index content
Ching-Tien T. (Howard) Ho
Global Technology Services
TG
personNameTG
 Howard  Ho Ching  Tien  ...
 gts  Global Technology Services
 Global Technology  Technology
Services  Global  Technology  ...
 GlobalTechnologyServices
nGramTG
spaceTG
acronymTG
nGramTG
……
…
…
…
 Ho Ching-Tien  Tien Ho  Ho, Tien
 Howard Ho  Ching-Tien H.  ...Person + personNameTG
Person + nGramTG
Title + acronymTG
Title + spaceTG
Title + nGramTG
27
Annotation + TG  Relevance Bucket
 Howard  Ho Ching  Tien  ...
 GlobalTechnologyServices
… 28…
Person + personNameTG
Person + nGramTG
Title + acronymTG
Title + spaceTG
Title + nGramTG
query search
Relevance bucketsRelevance buckets
•Buckets are ranked
– Based on annotation type
– Based on TG quality
•A page can belong to
multiple buckets
•Within each bucket,
ranking is by
conventional IR
……
Ranking by Relevance Buckets
grouping rules
ordered & grouped results final results
re-ranking rules
interpretations
partially ordered interpretations
rewrite rules
queries
interpretations execution
partially ordered results
result aggregation
ordered results
employment verification search
29
3 Phases of Runtime Flow
Search QuerySearch Query
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
30
• Grouping rules define how search results should
be grouped together
• Search administrators can improve the diversity
of search results (in 1st
page)
– Based on their familiarity with the data sources
Group pages of the same category
per diem travel, you-and-ibm
ANY ISSI, IT Help Central, Forum,
Bluepedia, Media Library, …


Grouping RulesGrouping Rules
Query pattern
31
Need first page diversityNeed first page diversity
Flooding with Similar PagesFlooding with Similar Pages
32
33
33 IBM Confidential
Grouping Rule to the RescueGrouping Rule to the Rescue
per diem travel, you-and-ibm
final results
re-ranking rules
interpretations
partially ordered interpretations
rewrite rules
queries
interpretations execution
partially ordered results
result aggregation
ordered results
grouping rules
ordered & grouped results
per diem search
33
• Re-ranking rules adjust ranking of
search results based on categories
• Example: search administrator specifies the
important sources of “hot/current topics”
Re-ranking RulesRe-ranking Rules
Hot topics Rank these categories higher
 Bluepedia, News, About-IBM
smarter planet, cloud
computing, centennial, …
34
BluepediaBluepedia
Technical NewsTechnical News
Re-ranking Rule for Hot TopicsRe-ranking Rule for Hot Topics
Homepages of
“About IBM”
Homepages of
“About IBM”
Hot topics Rank these categories higher
 Bluepedia, News, About-IBM
smarter planet, cloud
computing, centennial, …
35
Re-ranking Rules for Person QueriesRe-ranking Rules for Person Queries
[d=PERSON]
executive_corner, media_library,
organization_chart, files
Media_librar
y
Media_librar
y
executive_cornerexecutive_corner
interpretations
partially ordered interpretations
rewrite rules
queries
interpretations execution
partially ordered results
result aggregation
ordered results
grouping rules
ordered & grouped results final results
re-ranking rules
Paula Summa search
36
3 Phases of Runtime Flow
Search QuerySearch Query
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets
+ conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
37
What Administrators Need…
• Search administrators have major problems
with an opaque search engine
• Programmable search provides
– Customization to the specific domain
– Ongoing search-quality management
• Search administrators have major problems
with an opaque search engine
• Programmable search provides
– Customization to the specific domain
– Ongoing search-quality management
Okay… but:
The proof of the pudding is in the eating!The proof of the pudding is in the eating!
Recap:
38
Pudding is Being Served!
39
• Search for the EnterpriseSearch for the Enterprise
• Programmable Search (overview)Programmable Search (overview)
• Backend AnalyticsBackend Analytics
• Search RuntimeSearch Runtime
• Foundations and PrinciplesFoundations and Principles
• Concluding RemarksConcluding Remarks
outlineoutline
40
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets +
conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets +
conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Foundations of Programmable SearchFoundations of Programmable Search
• Developed a framework laying the foundations
and principles of programmable search
• Formal search model and rule language
– Formalize “rules”, “interpretations,” “relevance
buckets,” and so on
Fagin, Kimelfeld, Li, Raghavan, Vaithyanathan: Understanding
queries in a search database system. PODS 2010.
41
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets +
conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets +
conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Example of a Principle: Rule SemanticsExample of a Principle: Rule Semantics
• How to apply rewrite rules to the search query?
• Simple way: each rule applied once, predefined order
• “Thorough” way: least fixpoint (apply repeatedly)
– Problem: “bad” (combinations of) rules lead to non-termination
• Real problem: detecting non-termination is undecidable
– Good news: robust & tractable “safety” guarantees termination
Fagin, Kimelfeld, Li, Raghavan, Vaithyanathan: Rewrite
rules for search database systems. PODS 2011.
42
• Search for the EnterpriseSearch for the Enterprise
• Programmable Search (overview)Programmable Search (overview)
• Backend AnalyticsBackend Analytics
• Search RuntimeSearch Runtime
• Foundations and PrinciplesFoundations and Principles
• Concluding RemarksConcluding Remarks
outlineoutline
43
Summary & Future WorkSummary & Future Work
Programmable search:Programmable search:
 Simple & flexibleSimple & flexible customizationcustomization
 Search quality managementSearch quality management
Programmable search:Programmable search:
 Simple & flexibleSimple & flexible customizationcustomization
 Search quality managementSearch quality management
Backend Analytics
Local analysisLocal analysis
(per-page analysis)
Local analysisLocal analysis
(per-page analysis)
Global AnalysisGlobal Analysis
(cross-page analysis)
Global AnalysisGlobal Analysis
(cross-page analysis)
Token GenerationToken Generation
(TG)
Token GenerationToken Generation
(TG)
[Fagin et al.,
PODS’10,
PODS’11]
Future Research: ToolingFuture Research: Tooling
• Search provenance
• Rule suggestion
• Utilization of relevance buckets
[Li et al.,
SIGIR’06,
Zhu et al.,
WWW’07]
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 1:Phase 1:
QueryQuery
SemanticsSemantics
• Rewrite rules
• Query interpretation
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets +
conventional IR
Phase 2:Phase 2:
RelevanceRelevance
RankingRanking
By relevance buckets +
conventional IR
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
Phase 3:Phase 3:
ResultResult
ConstructionConstruction
• Grouping rules
• Re-ranking rules
44

Weitere ähnliche Inhalte

Was ist angesagt?

Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsHisham Arafat
 
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365Shahzad S
 
Improve Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioImprove Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioComperio - Search Matters.
 
Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Mark Tabladillo
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Relecura - Features Overview
Relecura - Features OverviewRelecura - Features Overview
Relecura - Features OverviewRelecura Inc.
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Applied Semantic Search 201306
Applied Semantic Search 201306Applied Semantic Search 201306
Applied Semantic Search 201306Mark Tabladillo
 
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachSlides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachDATAVERSITY
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingCambridge Semantics
 
Webinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDBWebinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDBMongoDB
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 

Was ist angesagt? (20)

Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Improve Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioImprove Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - Comperio
 
Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305Applied Enterprise Semantic Search 201305
Applied Enterprise Semantic Search 201305
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Relecura - Features Overview
Relecura - Features OverviewRelecura - Features Overview
Relecura - Features Overview
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Applied Semantic Search 201306
Applied Semantic Search 201306Applied Semantic Search 201306
Applied Semantic Search 201306
 
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical ApproachSlides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
Slides: NoSQL Data Modeling Using JSON Documents – A Practical Approach
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
 
Webinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDBWebinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDB
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 

Ähnlich wie Building Search Systems for the Enterprise

Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterMongoDB
 
Fried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranetFried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranetJeff Fried
 
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive SearchTHAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive SearchBrian McKeiver
 
Search Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEO
Search Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEOSearch Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEO
Search Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEOEnterprise Ireland
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine OptimizationSD Sharma
 
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of IronstreamData Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of IronstreamPrecisely
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph DatabaseTobias Lindaaker
 
Global Search Strategy
Global Search StrategyGlobal Search Strategy
Global Search Strategyadlift
 
Microsoft Search Strategy Today - Exploring Office 365 Search in Real Life
Microsoft Search Strategy Today - Exploring Office 365 Search in Real LifeMicrosoft Search Strategy Today - Exploring Office 365 Search in Real Life
Microsoft Search Strategy Today - Exploring Office 365 Search in Real LifeJoel Oleson
 
Seo and analytics wk 2
Seo and analytics wk 2Seo and analytics wk 2
Seo and analytics wk 2Toby Eborn
 
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB FeatureMongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB FeatureMongoDB
 
TLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise AutomationTLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise AutomationAnna Royzman
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Oracle apps crm online training , oracle crm certification courses
Oracle apps crm online training , oracle crm certification coursesOracle apps crm online training , oracle crm certification courses
Oracle apps crm online training , oracle crm certification coursesmagnificsmile
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdfMarcos993896
 
The Business Case for Speed
The Business Case for SpeedThe Business Case for Speed
The Business Case for SpeedSiriusWay
 
Search Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersSearch Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersMatthew Robinson
 

Ähnlich wie Building Search Systems for the Enterprise (20)

Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Fried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranetFried toronto sps14 91 wcm intranet
Fried toronto sps14 91 wcm intranet
 
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive SearchTHAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
 
SearchLab
SearchLabSearchLab
SearchLab
 
Search Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEO
Search Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEOSearch Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEO
Search Engine Optimisation and Growth Hacking| John Caldwell | CreatorSEO
 
Search Engine Optimization
Search Engine OptimizationSearch Engine Optimization
Search Engine Optimization
 
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of IronstreamData Model for Mainframe in Splunk: The Newest Feature of Ironstream
Data Model for Mainframe in Splunk: The Newest Feature of Ironstream
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Global Search Strategy
Global Search StrategyGlobal Search Strategy
Global Search Strategy
 
Microsoft Search Strategy Today - Exploring Office 365 Search in Real Life
Microsoft Search Strategy Today - Exploring Office 365 Search in Real LifeMicrosoft Search Strategy Today - Exploring Office 365 Search in Real Life
Microsoft Search Strategy Today - Exploring Office 365 Search in Real Life
 
Seo and analytics wk 2
Seo and analytics wk 2Seo and analytics wk 2
Seo and analytics wk 2
 
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB FeatureMongoDB World 2018: How an Idea Becomes a MongoDB Feature
MongoDB World 2018: How an Idea Becomes a MongoDB Feature
 
TLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise AutomationTLC2018 Thomas Haver: Transform with Enterprise Automation
TLC2018 Thomas Haver: Transform with Enterprise Automation
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Oracle apps crm online training , oracle crm certification courses
Oracle apps crm online training , oracle crm certification coursesOracle apps crm online training , oracle crm certification courses
Oracle apps crm online training , oracle crm certification courses
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdf
 
The Business Case for Speed
The Business Case for SpeedThe Business Case for Speed
The Business Case for Speed
 
Search Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersSearch Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for Developers
 

Mehr von Yunyao Li

The Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsYunyao Li
 
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopBuilding, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopYunyao Li
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and ApplicationsYunyao Li
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLPYunyao Li
 
Towards Deep Table Understanding
Towards Deep Table UnderstandingTowards Deep Table Understanding
Towards Deep Table UnderstandingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Yunyao Li
 
Towards Universal Language Understanding
Towards Universal Language UnderstandingTowards Universal Language Understanding
Towards Universal Language UnderstandingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)Yunyao Li
 
Towards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural LanguagesTowards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural LanguagesYunyao Li
 
An In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaAn In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaYunyao Li
 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningYunyao Li
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingYunyao Li
 
Coling poster
Coling posterColing poster
Coling posterYunyao Li
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Yunyao Li
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsYunyao Li
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Yunyao Li
 

Mehr von Yunyao Li (20)

The Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
 
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopBuilding, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and Applications
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLP
 
Towards Deep Table Understanding
Towards Deep Table UnderstandingTowards Deep Table Understanding
Towards Deep Table Understanding
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases
 
Towards Universal Language Understanding
Towards Universal Language UnderstandingTowards Universal Language Understanding
Towards Universal Language Understanding
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)
 
Towards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural LanguagesTowards Universal Semantic Understanding of Natural Languages
Towards Universal Semantic Understanding of Natural Languages
 
An In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaAn In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social Media
 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active Learning
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role Labeling
 
Coling poster
Coling posterColing poster
Coling poster
 
Coling demo
Coling demoColing demo
Coling demo
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Building Search Systems for the Enterprise

  • 1. Building Search SystemsBuilding Search Systems for the Enterprisefor the Enterprise IBM Research – Almaden ACMACM SIGIRSIGIR 20112011 Beijing, China (on behalf of Shivakumar Vaithyanathan) Yunyao Li
  • 2. • Search for the EnterpriseSearch for the Enterprise • Programmable Search (overview)Programmable Search (overview) • Backend AnalyticsBackend Analytics • Search RuntimeSearch Runtime • Foundations and PrinciplesFoundations and Principles • Concluding RemarksConcluding Remarks outlineoutline 2
  • 3. Experience at IBM Internal SearchExperience at IBM Internal Search • IBM deployed a commercially available search engine – Implementing standard IR techniques • Search quality went down over time to the point that Search results were unacceptable!Search results were unacceptable! Success (≥ 1 relevant results): 14% on top-1, 23% on top-5, 34% on top-50! [Zhu et al., WWW’07] So, they implemented various solutions… 3 To the administrators managing the engine, exposed knobs were insufficient
  • 4. Attempts to Improve SearchAttempts to Improve Search • Enhanced link analysis by incorporating external links to/from external WWW • Creative hacks: added fake terms to documents & queries – # terms per document determined by “popularity”: how much TF increase required for needed rank boost ? • Hard-coded custom results for the top 1200+ queries • Enhanced link analysis by incorporating external links to/from external WWW • Creative hacks: added fake terms to documents & queries – # terms per document determined by “popularity”: how much TF increase required for needed rank boost ? • Hard-coded custom results for the top 1200+ queries Didn’t help… Quality went down! Maintenance nightmare: Heuristic needs to be updated upon each nontrivial change in term stats./ranking parameters Even bigger nightmare! How to deal with continuously changing terminology? 4
  • 5. What are the Problems?What are the Problems? Network Station Manager search Thin Client ManagerProduct names change: Continually changing terminology! Domain-specific meaning! Paula Summa search bring Paula Summa from employee directories per diem search Domain-specific repetitions! popcorn search conference call! These problems are not specific to enterprise search… but: • Result 1: IBM Travel: Per Diem • Result 2: IBM Travel: Per Diem Rates • Result 3: IBM Travel: National perdiems • Result 25: IBM Travel: Per Diem Policy 5 …
  • 6. The Enterprise Challenge!The Enterprise Challenge! Domain-specific meaning! Domain-specific repetitions! Generic search solutionGeneric search solution that is customizable and maintainable in every domain Generic search solutionGeneric search solution that is customizable and maintainable in every domain Simple customization with reasonable effort!Simple customization with reasonable effort! Programmable SearchProgrammable Search Ongoing search-quality managementOngoing search-quality management 6 Continually changing terminology!
  • 7. • Search for the EnterpriseSearch for the Enterprise • Programmable Search (overview)Programmable Search (overview) • Backend AnalyticsBackend Analytics • Search RuntimeSearch Runtime • Foundations and PrinciplesFoundations and Principles • Concluding RemarksConcluding Remarks outlineoutline 7
  • 8. Programmable Search: Main IdeaProgrammable Search: Main Idea • Goals:Goals: – Transparency • Know “precisely” why every result item is being brought back • Understand how changes in content/intents affect search – Maintainability and “Debugability” • Ranking logic is guided by explicit rules • Properly react to changes in content/intents • Building blocks:Building blocks: – Deep analytics on documents – Domain-specific analysis of queries – Transparent customizable rule-driven ranking runtime rulesruntime rules backend analytics backend analytics interpretationsinterpretations 8
  • 9. Distributed Analytics Platform Crawling, information extraction, token generation (TG), indexing Search runtime Index Index and rule update services backend analytics backend analytics runtime rulesruntime rulesinterpretationsinterpretations Implementation Architecture backend frontend 9
  • 10. • Search for the EnterpriseSearch for the Enterprise • Programmable Search (overview)Programmable Search (overview) • Backend AnalyticsBackend Analytics • Search RuntimeSearch Runtime • Foundations and PrinciplesFoundations and Principles • Concluding RemarksConcluding Remarks outlineoutline 10
  • 11. Backend Analytics:Backend Analytics: 3 Parts3 Parts Local AnalysisLocal Analysis (per-page analysis) Local AnalysisLocal Analysis (per-page analysis) Global AnalysisGlobal Analysis (cross-page analysis) Global AnalysisGlobal Analysis (cross-page analysis) Token GenerationToken Generation (TG) Token GenerationToken Generation (TG) index 11
  • 12. Local AnalysisLocal Analysis • Categorizing pages – Label pages by custom categories • IBM examples: HR, person, IT help, ISSI, sales information, marketing, corporate standards, legal & IP-law, … – Geo classification • Associate documents with the relevant countries & regions • Annotating pages – Identify HomePage annotation for people, projects, communities, … Simply knowing where a page is physically hosted is not enough (example: Czech Republic hosts all pages for IBM in Europe) 12
  • 13. G J Chaitin Home Page 13 Homepage IdentificationHomepage Identification Title ExtractionTitle Extraction Matching title patterns Matching title patterns Title s Dictionary Match Dictionary Match Home Page for G J Chaitin • http://w3.ibm.com/hr/idp/ • http://w3-03.ibm.com/isc/index.html • http://chis.at.ibm.com/ URL ExtractionURL Extraction URLs Matching URL patterns Matching URL patterns Homepage for:  idp  isc  chis Employee directory … many more … Intranet page Intranet page More details in [Zhu et al., WWW’07]
  • 14. 14 IBM Confidential14 IBM Confidential Among the 38 pages with the exact same title, which is the best for “Paula Summa”? Role of Global AnalysisRole of Global Analysis 14
  • 15. PersonPerson TitleTitle Token Generation (TG) Annotated values Index content Ching-Tien T. (Howard) Ho  Ho Ching-Tien  Tien Ho  Ho, Tien  Howard Ho  Ching-Tien H.  ... Global Technology Services TG personNameTG  Howard  Ho Ching  Tien  ...  gts  Global Technology Services  Global Technology  Technology Services  Global  Technology  ...  GlobalTechnologyServices nGramTG spaceTG acronymTG nGramTG …… … 15 … …
  • 16. • Search for the EnterpriseSearch for the Enterprise • Programmable Search (overview)Programmable Search (overview) • Backend AnalyticsBackend Analytics • Search RuntimeSearch Runtime • Foundations and PrinciplesFoundations and Principles • Concluding RemarksConcluding Remarks outlineoutline 16
  • 17. 3 Phases of Runtime Flow Search QuerySearch Query Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules 17
  • 18. Phase 3:Phase 3: Result Construction Phase 2:Phase 2: Relevance Ranking Phase 1:Phase 1: Query SemanticsQuery Semantics query search rewrite rules queries interpretations partially ordered interpretations interpretations execution partially ordered results result aggregation ordered results grouping rules ordered & grouped results final results re-ranking rules Runtime Flow in More DetailsRuntime Flow in More Details 18
  • 19. Runtime Rules:Runtime Rules: Pattern-Action Language Query Pattern Queries Matching Possible Action EQUALS [r=ibm|information|info] [d=COUNTRY] • ibm germany • info india Rewrite into “[country] hr” (e.g., germany hr) ENDS_WITH installation • acrobat installation • db2 on aix installation Replace installation with ISSI (e.g., acrobat ISSI) CONTAINS directions to [d=SITE] • driving directions to almaden • directions to watson from jfk Pages of “siteserv” category should be ranked higher STARTS_WITH [d=PERSON] • john kelly biography • steve mills announcement Group together pages that represent blog entries Pattern expression, matched against the keyword query Perform when matchQuery pattern → Action 19
  • 20. 3 Phases of Runtime Flow Search QuerySearch Query Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules 20
  • 21. 21 What’s Best for Benefits?What’s Best for Benefits? The most important IBM page for benefits changes over time: currently it is netbenefits The most important IBM page for benefits changes over time: currently it is netbenefits 21
  • 22. Rewrite RulesRewrite Rules benefits  netbenefits interpretations partially ordered interpretations interpretations execution partially ordered results result aggregation ordered results grouping rules ordered & grouped results final results re-ranking rules benefits, netbenefits benefits  netbenefits rewrite rules queries benefits search 22
  • 23. Interpretations Scenario: An IBM employee wants to download Lotus Symphony 1.3 Scenario: An IBM employee wants to download Lotus Symphony 1.3 Runtime interpretation: download symphony 1.3  category=issi software=symphony 1.3 interpretations execution partially ordered results result aggregation ordered results grouping rules ordered & grouped results final results re-ranking rules rewrite rules queries interpretations partially ordered interpretations download symphony 1.3 search 23
  • 24. 24 IBM Confidential People with first name Jim People with first name Jim How can we avoid pages from people category? How can we avoid pages from people category? java  jim Complex RulesComplex Rules 24
  • 25. java  jim and not in person category Complex RulesComplex Rules interpretations execution partially ordered results result aggregation ordered results grouping rules ordered & grouped results final results re-ranking rules interpretations partially ordered interpretations rewrite rules queries java search 25
  • 26. 3 Phases of Runtime Flow Search QuerySearch Query Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules 26
  • 27. PersonPerson TitleTitle Recall: Token Generation (TG) Annotated values Index content Ching-Tien T. (Howard) Ho Global Technology Services TG personNameTG  Howard  Ho Ching  Tien  ...  gts  Global Technology Services  Global Technology  Technology Services  Global  Technology  ...  GlobalTechnologyServices nGramTG spaceTG acronymTG nGramTG …… … … …  Ho Ching-Tien  Tien Ho  Ho, Tien  Howard Ho  Ching-Tien H.  ...Person + personNameTG Person + nGramTG Title + acronymTG Title + spaceTG Title + nGramTG 27
  • 28. Annotation + TG  Relevance Bucket  Howard  Ho Ching  Tien  ...  GlobalTechnologyServices … 28… Person + personNameTG Person + nGramTG Title + acronymTG Title + spaceTG Title + nGramTG query search Relevance bucketsRelevance buckets •Buckets are ranked – Based on annotation type – Based on TG quality •A page can belong to multiple buckets •Within each bucket, ranking is by conventional IR ……
  • 29. Ranking by Relevance Buckets grouping rules ordered & grouped results final results re-ranking rules interpretations partially ordered interpretations rewrite rules queries interpretations execution partially ordered results result aggregation ordered results employment verification search 29
  • 30. 3 Phases of Runtime Flow Search QuerySearch Query Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules 30
  • 31. • Grouping rules define how search results should be grouped together • Search administrators can improve the diversity of search results (in 1st page) – Based on their familiarity with the data sources Group pages of the same category per diem travel, you-and-ibm ANY ISSI, IT Help Central, Forum, Bluepedia, Media Library, …   Grouping RulesGrouping Rules Query pattern 31
  • 32. Need first page diversityNeed first page diversity Flooding with Similar PagesFlooding with Similar Pages 32
  • 33. 33 33 IBM Confidential Grouping Rule to the RescueGrouping Rule to the Rescue per diem travel, you-and-ibm final results re-ranking rules interpretations partially ordered interpretations rewrite rules queries interpretations execution partially ordered results result aggregation ordered results grouping rules ordered & grouped results per diem search 33
  • 34. • Re-ranking rules adjust ranking of search results based on categories • Example: search administrator specifies the important sources of “hot/current topics” Re-ranking RulesRe-ranking Rules Hot topics Rank these categories higher  Bluepedia, News, About-IBM smarter planet, cloud computing, centennial, … 34
  • 35. BluepediaBluepedia Technical NewsTechnical News Re-ranking Rule for Hot TopicsRe-ranking Rule for Hot Topics Homepages of “About IBM” Homepages of “About IBM” Hot topics Rank these categories higher  Bluepedia, News, About-IBM smarter planet, cloud computing, centennial, … 35
  • 36. Re-ranking Rules for Person QueriesRe-ranking Rules for Person Queries [d=PERSON] executive_corner, media_library, organization_chart, files Media_librar y Media_librar y executive_cornerexecutive_corner interpretations partially ordered interpretations rewrite rules queries interpretations execution partially ordered results result aggregation ordered results grouping rules ordered & grouped results final results re-ranking rules Paula Summa search 36
  • 37. 3 Phases of Runtime Flow Search QuerySearch Query Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules 37
  • 38. What Administrators Need… • Search administrators have major problems with an opaque search engine • Programmable search provides – Customization to the specific domain – Ongoing search-quality management • Search administrators have major problems with an opaque search engine • Programmable search provides – Customization to the specific domain – Ongoing search-quality management Okay… but: The proof of the pudding is in the eating!The proof of the pudding is in the eating! Recap: 38
  • 39. Pudding is Being Served! 39
  • 40. • Search for the EnterpriseSearch for the Enterprise • Programmable Search (overview)Programmable Search (overview) • Backend AnalyticsBackend Analytics • Search RuntimeSearch Runtime • Foundations and PrinciplesFoundations and Principles • Concluding RemarksConcluding Remarks outlineoutline 40
  • 41. Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Foundations of Programmable SearchFoundations of Programmable Search • Developed a framework laying the foundations and principles of programmable search • Formal search model and rule language – Formalize “rules”, “interpretations,” “relevance buckets,” and so on Fagin, Kimelfeld, Li, Raghavan, Vaithyanathan: Understanding queries in a search database system. PODS 2010. 41
  • 42. Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Example of a Principle: Rule SemanticsExample of a Principle: Rule Semantics • How to apply rewrite rules to the search query? • Simple way: each rule applied once, predefined order • “Thorough” way: least fixpoint (apply repeatedly) – Problem: “bad” (combinations of) rules lead to non-termination • Real problem: detecting non-termination is undecidable – Good news: robust & tractable “safety” guarantees termination Fagin, Kimelfeld, Li, Raghavan, Vaithyanathan: Rewrite rules for search database systems. PODS 2011. 42
  • 43. • Search for the EnterpriseSearch for the Enterprise • Programmable Search (overview)Programmable Search (overview) • Backend AnalyticsBackend Analytics • Search RuntimeSearch Runtime • Foundations and PrinciplesFoundations and Principles • Concluding RemarksConcluding Remarks outlineoutline 43
  • 44. Summary & Future WorkSummary & Future Work Programmable search:Programmable search:  Simple & flexibleSimple & flexible customizationcustomization  Search quality managementSearch quality management Programmable search:Programmable search:  Simple & flexibleSimple & flexible customizationcustomization  Search quality managementSearch quality management Backend Analytics Local analysisLocal analysis (per-page analysis) Local analysisLocal analysis (per-page analysis) Global AnalysisGlobal Analysis (cross-page analysis) Global AnalysisGlobal Analysis (cross-page analysis) Token GenerationToken Generation (TG) Token GenerationToken Generation (TG) [Fagin et al., PODS’10, PODS’11] Future Research: ToolingFuture Research: Tooling • Search provenance • Rule suggestion • Utilization of relevance buckets [Li et al., SIGIR’06, Zhu et al., WWW’07] Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 1:Phase 1: QueryQuery SemanticsSemantics • Rewrite rules • Query interpretation Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 2:Phase 2: RelevanceRelevance RankingRanking By relevance buckets + conventional IR Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules Phase 3:Phase 3: ResultResult ConstructionConstruction • Grouping rules • Re-ranking rules 44

Hinweis der Redaktion

  1. The people in change of search are not SIGIR audience; they are IT admins; hence, all they can do are these hacks and hardcoding.
  2. “ It may be the case that a day before, Thin Client Manager meant something else; so, intents change over nights as well.”
  3. So we have different types of tokenization applied to the different types of annotated items; for each annotation type and TG type, the result is stored in a separate part of the index. In a few slides, I will explain how we use that during runtime.
  4. In phase 1, we manipulate the search query, add variants and so on, without touching the index. The result is a set of queries. Next, in phase 2, we run the queries against the index and apply ranking, by a combination of conventional IR and relevance buckets that I will describe shortly. In phase 3, we build the final result by invoking the grouping and re-ranking rules supplied by the admins.
  5. This slide gives a more detailed view of the runtime flow. Mouse click. And these are where the three phases are. Next, I will discuss the different actions in the boxes here.