SlideShare ist ein Scribd-Unternehmen logo
1 von 102
Google Is Just a Two Page
SiteRelevant Results with Sitecore.ContentSearch
Martina Helene Welander
Technical Consulting Engineer, Sitecore
Speaker
• Technical Consulting Engineer at Sitecore
• Community and Information Enthusiast
• Ecosystem Sites with Dnepropetrovsk Team
Martina Helene
Welander
Hi!
• Martina Welander
• Technical Consulting Engineer
• Ecosystem sites
• mhwelander.net / @mhwelander
Speaker
• Technical Consulting Engineer at Sitecore
• Community and Information Enthusiast
• Ecosystem Sites with Dnepropetrovsk Team
• @mhwelander / mhwelander.net
Martina Helene
Welander
Speaker
In the direction of awesome, that’s where
…let’s do search!
Can haz
knowledge?
Google Is Just a Two Page
SiteRelevant Results with Sitecore.ContentSearch
Martina Helene Welander
Technical Consulting Engineer, Sitecore
“Google is simply a search box with
a second page of results. And those
results are from other sites!”
Lalala hello
world examples
lalala ten items
in my tree!
Sitecore.ContentSearch 101
Sitecore 7
Search and index
ALL the items
*
*
Search API
(LINQ-based)
Search Technology Provider
(DLLs and Configuration)
Search Technology API and Indexes
IEnumerable<DocSearchResult>
var index =
Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index");
using (var context = index.CreateSearchContext())
{
var query = context.GetQueryable<ResultItem>().Where(x => x.Title == "Hej");
var executedResults = query.GetResults();
myModel.myList = executedResults.Hits.Select(x => x.Document).ToList();
}
Where Sitecore adds value
• Source content to index to strongly typed object – and back again!
• You can actually index anything
• Provider model – Solr, Lucene, Elastic Search, Azure Search
• Provider-agnostic LINQ-based search API
• Highly configurable
Sitecore.ContentSearch is an API
Where should I focus my efforts?
CONFIIIIIG!
Crawlers
Mappers
Converters
Sitecore Field  Index Field  Object Property
Analyzers
Sitecore Field  Searchable Data
Analyzer Wrappers
Back to Plain Ol’ Search
Actually kind of difficult
It’s all about the Pentiums analyzers
(Tokenizers and Filters)
Tokenizers
Hello my name is
Martina
“Hello”, “my”, “name”, “is”,
“Martina”
Types of Tokenizer
StandardTokenizer
“My name is Martina”  “My”, “name”, “is”, “Martina”
KeywordTokenizer
“My name is Martina”  “My name is Martina”
N-Gram Tokenizer (Min 4, Max 5)
“sitecore” -> “site”, “itec”, “ecor”, “core”, “siteco”, “iteco” … etc
Filters
Examples of Filters
• Standard Filter
• (Snowball) Porter Stem Filter
• Stop Filter
• Synonym Filter
• Keep Words Filter
• Pattern Replace Filter
ORDER
MATTERS!
Indexing Process
Index
Query
Results
“name”
”Hello”
“Hello, my name is Martina”
“Martina”
“my”
Rebuild when
analyser
changes!
Contains(“Hello, my
name is Martina”)
Configuring a custom analyzer
Lucene – What does it look like?
Solr – What does it look like?
<fieldType name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"/>
</analyzer>
</fieldType>
Previewing and help
6492 12:54:21 INFO ExecuteQueryAgainstLucene
(sitecore_master_index): content:make~0.7
title:make~0.7 content:new~0.7 title:new~0.7
content:item~0.7 title:item~0.7 - Filter :
Debugging A Lucene-Based ContentSearch In Sitecore
- Dan Cruickshank
My Super-Duper Analyzer
…which isn’t very special at all
• Standard analyser
• Standard filter
• Porter Stem Filter
• StopWords Filter
• Synonym Filter (EXM / ECM, PXM / APS)*
• Lowercase filter
The Query
What makes something relevant? (tf.idf)
• tf – term frequency
• Idf – inverse document frequency
• coord - # of terms found in document
• fieldNorm – field length
My fields
• Title
• Text
• Byline
• Keywords
• Product
context.GetQueryable<ResultItem>()
.Where(…)
.Filter() vs .Where()
#1 – Find me a match
• Equals()
• Contains()
• StartsWith()
.Where(x => x.ResultsTitle.Contains("scaling"))
.Where(x => x["scaling"].Contains("scaling"))
.Match()
.EndsWith()
#2 – Slop and fuzziness!
• Like()
• Fuzzy search – fuzziness factor (float)
• Phrase search – slop (int)
#3 – I love you, PredicateBuilder
Expression<Func<ResultItem, bool>> predicate =
PredicateBuilder.True<ResultItem>();
foreach (var word in list)
{
predicate = predicate.Or(x =>
x.Title.Contains(word);
}
False for ‘OR’,
True for ‘AND’
#4 – Boost
• At query time
• At index time (type or field)
• Rules-based
BOOST
BOOST
~1000 real items
storageType=“true”
Attempt #1: EVERYTHING
If the title…
• Like phrase (with slop)
• Contains phrase
• Starts with phrase
• Equals phrase
If the content…
• Like phrase (with slop)
• Contains phrase
• Starts with phrase
• Equals phrase
Search: xDB Scaling
Search: Managing engagement plans
Search: Create engagement plans
A couple of important lessons
• Whole Phrases vs Individual Terms
• Boost()
• Contains() / Equals()
Attempt #2: Phrase and terms
“engagement plan setup”
OR
“engagement” OR “plan” OR “setup”
“engagement” AND “plan”
OR
“engagement” AND “setup”
OR
“plan” AND “setup”
OR
“engagement” AND “plan” AND “setup”
Needs more boost
Attempt #3: Favouring titles
Sitecore 7 ContentSearch Tips
- Matt Burke
“Finding a user’s search term in the title or
keywords of a document is probably more
relevant than one where the term is only in
the body”
My work in progress
If nothing is working, you probably didn’t
rebuild your index
Search: xDB Scaling
Search: Manage engagement plans
Search: Create engagement plans
// TODO: On the plane home
• Keywords
• Location
• Pinning exact title matches – “scaling”
• Expected search phrases with boost – e.g. “scaling xDB”, “xDB
scaling”, “xDB scaling options”
xDB
• Key Behaviour Cache – developer or editor?
• Common searches
It’s not all queries and indexes
• Vague titles are a bit of a nightmare
• Review use of keywords in content
• “I would never search for that!”
• Continuous user testing and tuning
What I learned
• It isn’t magic
• Get to know the provider
• Content and content structure matter
• Search is actually quite hard
OrganizersSponsor
Thanks to our… &…

Weitere ähnliche Inhalte

Was ist angesagt?

Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniques
sawarkar17
 
Xhtml and html5 basics
Xhtml and html5 basicsXhtml and html5 basics
Xhtml and html5 basics
messinam
 
The WordPress University 2012
The WordPress University 2012The WordPress University 2012
The WordPress University 2012
Stephanie Leary
 

Was ist angesagt? (20)

Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniques
 
Xhtml and html5 basics
Xhtml and html5 basicsXhtml and html5 basics
Xhtml and html5 basics
 
The WordPress University 2012
The WordPress University 2012The WordPress University 2012
The WordPress University 2012
 
SEO for Ecommerce - an overview
SEO for Ecommerce - an overviewSEO for Ecommerce - an overview
SEO for Ecommerce - an overview
 
Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014Advanced Search: WebSearch University 2014
Advanced Search: WebSearch University 2014
 
Seo and Content Presentation
Seo and Content PresentationSeo and Content Presentation
Seo and Content Presentation
 
Getting to Know Underscores
Getting to Know Underscores Getting to Know Underscores
Getting to Know Underscores
 
Html
Html Html
Html
 
Digital marketing course
Digital marketing course Digital marketing course
Digital marketing course
 
What Is SEO / Search Engine Optimization?
What Is SEO / Search Engine Optimization?What Is SEO / Search Engine Optimization?
What Is SEO / Search Engine Optimization?
 
Search Engine Optimization Tutorial
Search Engine Optimization TutorialSearch Engine Optimization Tutorial
Search Engine Optimization Tutorial
 
Leveraging Plone for Search Engine Optimization (SEO)
Leveraging Plone for Search Engine Optimization (SEO)Leveraging Plone for Search Engine Optimization (SEO)
Leveraging Plone for Search Engine Optimization (SEO)
 
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-1910 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
 
Google linkedinhaapc
Google linkedinhaapcGoogle linkedinhaapc
Google linkedinhaapc
 
Advance searching techniques
Advance searching techniquesAdvance searching techniques
Advance searching techniques
 
Google Search Operators Lesson
Google Search Operators LessonGoogle Search Operators Lesson
Google Search Operators Lesson
 
Seo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminarSeo - Search Engine Optimization seminar
Seo - Search Engine Optimization seminar
 
XML
XMLXML
XML
 
Cascading style sheets
Cascading style sheetsCascading style sheets
Cascading style sheets
 
Search engines
Search enginesSearch engines
Search engines
 

Andere mochten auch

The Challenges Of Building A Sitecore Digital Marketing Platform
The Challenges Of Building A Sitecore Digital Marketing PlatformThe Challenges Of Building A Sitecore Digital Marketing Platform
The Challenges Of Building A Sitecore Digital Marketing Platform
Thomas Eldblom
 

Andere mochten auch (7)

Top Marketing Automation Statistics
Top Marketing Automation StatisticsTop Marketing Automation Statistics
Top Marketing Automation Statistics
 
Sustav za upravljanje dokumentima DMS sustavi
Sustav za upravljanje dokumentima DMS sustaviSustav za upravljanje dokumentima DMS sustavi
Sustav za upravljanje dokumentima DMS sustavi
 
Digital Finance Sitecore Finland: Michael Leander keynote presentation
Digital Finance Sitecore Finland: Michael Leander keynote presentationDigital Finance Sitecore Finland: Michael Leander keynote presentation
Digital Finance Sitecore Finland: Michael Leander keynote presentation
 
The Challenges Of Building A Sitecore Digital Marketing Platform
The Challenges Of Building A Sitecore Digital Marketing PlatformThe Challenges Of Building A Sitecore Digital Marketing Platform
The Challenges Of Building A Sitecore Digital Marketing Platform
 
5 Critical Keys to Success with Sitecore DMS
5 Critical Keys to Success with Sitecore DMS5 Critical Keys to Success with Sitecore DMS
5 Critical Keys to Success with Sitecore DMS
 
Website personalization with Sitecore Experience Platform
Website personalization with Sitecore Experience PlatformWebsite personalization with Sitecore Experience Platform
Website personalization with Sitecore Experience Platform
 
Sitecore Personalization on websites cached on CDN servers
Sitecore Personalization on websites cached on CDN serversSitecore Personalization on websites cached on CDN servers
Sitecore Personalization on websites cached on CDN servers
 

Ähnlich wie Google Is a Two Page Site

Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
Trey Grainger
 

Ähnlich wie Google Is a Two Page Site (20)

Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Everything You Wish You Knew About Search
Everything You Wish You Knew About SearchEverything You Wish You Knew About Search
Everything You Wish You Knew About Search
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
Casablanca SharePoint Days Power User Search Tips
Casablanca SharePoint Days Power User Search TipsCasablanca SharePoint Days Power User Search Tips
Casablanca SharePoint Days Power User Search Tips
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty database
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Internet Research Presentation
Internet Research PresentationInternet Research Presentation
Internet Research Presentation
 
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of Search
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
SharePoint Search Secrets for Power Users & Administrators - Mike Smith
SharePoint Search Secrets for Power Users & Administrators - Mike SmithSharePoint Search Secrets for Power Users & Administrators - Mike Smith
SharePoint Search Secrets for Power Users & Administrators - Mike Smith
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
 
Search Quality Management
Search Quality ManagementSearch Quality Management
Search Quality Management
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
 
Spsvb Developer Intro to SharePoint Search
Spsvb   Developer Intro to SharePoint SearchSpsvb   Developer Intro to SharePoint Search
Spsvb Developer Intro to SharePoint Search
 
Spsvb Developer Intro to SharePoint Search
Spsvb   Developer Intro to SharePoint SearchSpsvb   Developer Intro to SharePoint Search
Spsvb Developer Intro to SharePoint Search
 
GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao Introduction
 
Houston tech fest dev intro to sharepoint search
Houston tech fest   dev intro to sharepoint searchHouston tech fest   dev intro to sharepoint search
Houston tech fest dev intro to sharepoint search
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Google Is a Two Page Site

Hinweis der Redaktion

  1. http://getfishtank.ca/blog/debugging-lucene-contentsearch-errors-in-sitecore Dan Cruickshank