SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Relevancy Hacks for eCommerce
RELEVANCY HACKS FOR ECOMMERCE
VARUN THACKER

!

@VARUNTHACKER
AGENDA
•
•
•
•
•

How to solve multiple eCommerce use cases by using the features present in Solr
Query Parsing
Building on the TF-IDF scoring model and improving it for your data set
Adding relevancy signals to your score to rank documents better
Customising search results on a per query basis
HOW DO QUERIES SCORE DOCUMENTS?
•

Example document:

{
“title” : ”LG Nexus 5”,
“brand” : ”LG”,
“category” : “Smartphones”
“tags” : “phones, android, touch”
}
•

query = LG Nexus
HOW DO QUERIES SCORE DOCUMENTS?
•
•
•

•

Scores are field relative.
I want a Query which will match against all the fields for each token.
Approach 1: Use a BooleanQuery
• Query query1 = new TermQuery(new Term("title", “lg"));
• Query query2 = new TermQuery(new Term("title", "nexus"));
• Query query3 = new TermQuery(new Term("brand", "lg"));
• Query query4 = new TermQuery(new Term("brand", "nexus"));
• Add all the queries into a BooleanQuery
• Score = query1 + query2 + query3 + query4
• This would add the match for "lg" twice.
Approach 2: Use DisjunctionMaxQuery - It automatically scores each document
with the maximum score for that document as produced by any subquery
DEFAULT SIMILARITY FACTORS
•
•
•
•
•

TF - number of occurrences of the term in the document.
IDF - Is a measure of how unique or rare the term is.
Normalisation's - Both at index time and at query time
Coordination factor - number of matches of the query term in each document
These statistics are per field
WHY THE DEFAULT SCORING MAY NOT WORK?
•
•

TF-IDF is calculated per field.
Lets take term frequency first:
• Product 1: iPad Air
• Product 2: iPad Air case. Works well with iPad 3 and iPad 2
• query = iPad
• Product 2 would rank before Product 1
• But obviously this is not what the user would be looking for
• Does iPad occurring multiple times make it more important?
• Idea - Let’s make TF = 1 for a token match
WHY THE DEFAULT SCORING MAY NOT WORK?
•

Tackling Inverse Document Frequency
• Product 1 - brown jacket
• Product 2 - leather jacket
• q = brown leather jacket
• IDF is Not a measure of usefulness but a measure of rarity.
• Should IDF from your corpus be the true judge on whether “leather” is more
important than “brown”
• Maybe you stock less brown jackets but it doesn’t mean that it is more
important than a leather jacket.
• Combine data of many stores in your vertical and compute the IDF score
offline
• Feed it back into your Custom Similarity implementation
WHY THE DEFAULT SCORING MAY NOT WORK?
•

The "tie" factor between two documents with the same number of term matches is
"fieldNorm". This means the document which contains lesser number of tokens.
FUNCTION QUERIES
•
•
•

FunctionQuery allows one to use the actual value of a field and functions of those
fields in a relevancy score.
It iterates over all documents serially applying the function
Can be multiplied into the score by using the boost param in the eDismax request
handler
INCLUDE POPULARITY DATA
•
•
•
•
•

Popularity could be anything - Maximum selling items, Highest viewed products,
trending etc.
Compute the "popularity" score offline for each document in the index.
Stick them into the document if your data set is small else you could use a
ExternalFileField
Use a function query:
•
&boost= multiple popularity score value * score
With the new expressions module coming in Lucene 4.6 it’s fairly simple to add
multiple signals into your ranking formula
• Expression expr = JavascriptCompiler.compile("_score + ln(popularity) +
ln(margin)");
ADDING CLICK THROUGH DATA
•
•

Use this on a per query basis or a set of similar queries.
We used function queries which take
• id’s and their associated boost

!

•

An external application would enable the function query depending on the search
query
BOOSTING NEWER PRODUCTS
•
•

Blindly sort the result
• &sort = release_date desc
Give preference to Newer Products
• recip(ms(NOW/DAY,pub_date),3.16e-11,1,1)
• Where recip(m, x, a, b) = a / (mx + b)
• Picking a=2, b =1, m = 3.16e-11
• Gives a boost =2 for todays product
• Gives a boost =1.3 for 1/2 year old product
• Gives a boost =1 for 1 year old product and so on
I'M STILL NOT SATISFIED!
•
•
•

Take your top N queries and use the QueryElevationComponent :)
Fix particular documents for certain queries
No scoring is taken into consideration for these queries

!

<elevate>
<query text="android phones">
<doc id="nexus 4" />
<doc id="iPhone" exclude="true"/>
</query>
</elevate>
THANK YOU
•

Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...Lucidworks
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solrguest432cd6
 
Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonSujit Pal
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...Lucidworks
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionLucidworks
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexingUmesh Prasad
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Lucidworks
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...
AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...
AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...Amazon Web Services
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartLucidworks
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 

Was ist angesagt? (20)

Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry H...
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and Python
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexing
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...
AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...
AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Div...
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 

Ähnlich wie Relevancy Hacks for eCommerce

Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Lucidworks
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Lucidworks
 
GetX Universal Search Tool for Sage 100
GetX Universal Search Tool for Sage 100GetX Universal Search Tool for Sage 100
GetX Universal Search Tool for Sage 100Net at Work
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Software Project Cost Estimation
Software Project Cost EstimationSoftware Project Cost Estimation
Software Project Cost EstimationDrew Tkac
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Metamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy ExampleMetamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy ExampleVinayaka Mayura G G
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsIke Ellis
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackPrecisely
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorizationAndreas Loupasakis
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization Warply
 
Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#J On The Beach
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 

Ähnlich wie Relevancy Hacks for eCommerce (20)

Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
Empowering Customers to Self Solve - A Findability Journey - Manikandan Sivan...
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
GetX Universal Search Tool for Sage 100
GetX Universal Search Tool for Sage 100GetX Universal Search Tool for Sage 100
GetX Universal Search Tool for Sage 100
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Software Project Cost Estimation
Software Project Cost EstimationSoftware Project Cost Estimation
Software Project Cost Estimation
 
Test data generation
Test data generationTest data generation
Test data generation
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Metamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy ExampleMetamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy Example
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Splunk bsides
Splunk bsidesSplunk bsides
Splunk bsides
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
QA and scrum
QA and scrumQA and scrum
QA and scrum
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Kürzlich hochgeladen

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 

Kürzlich hochgeladen (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 

Relevancy Hacks for eCommerce

  • 2. RELEVANCY HACKS FOR ECOMMERCE VARUN THACKER ! @VARUNTHACKER
  • 3. AGENDA • • • • • How to solve multiple eCommerce use cases by using the features present in Solr Query Parsing Building on the TF-IDF scoring model and improving it for your data set Adding relevancy signals to your score to rank documents better Customising search results on a per query basis
  • 4. HOW DO QUERIES SCORE DOCUMENTS? • Example document: { “title” : ”LG Nexus 5”, “brand” : ”LG”, “category” : “Smartphones” “tags” : “phones, android, touch” } • query = LG Nexus
  • 5. HOW DO QUERIES SCORE DOCUMENTS? • • • • Scores are field relative. I want a Query which will match against all the fields for each token. Approach 1: Use a BooleanQuery • Query query1 = new TermQuery(new Term("title", “lg")); • Query query2 = new TermQuery(new Term("title", "nexus")); • Query query3 = new TermQuery(new Term("brand", "lg")); • Query query4 = new TermQuery(new Term("brand", "nexus")); • Add all the queries into a BooleanQuery • Score = query1 + query2 + query3 + query4 • This would add the match for "lg" twice. Approach 2: Use DisjunctionMaxQuery - It automatically scores each document with the maximum score for that document as produced by any subquery
  • 6. DEFAULT SIMILARITY FACTORS • • • • • TF - number of occurrences of the term in the document. IDF - Is a measure of how unique or rare the term is. Normalisation's - Both at index time and at query time Coordination factor - number of matches of the query term in each document These statistics are per field
  • 7. WHY THE DEFAULT SCORING MAY NOT WORK? • • TF-IDF is calculated per field. Lets take term frequency first: • Product 1: iPad Air • Product 2: iPad Air case. Works well with iPad 3 and iPad 2 • query = iPad • Product 2 would rank before Product 1 • But obviously this is not what the user would be looking for • Does iPad occurring multiple times make it more important? • Idea - Let’s make TF = 1 for a token match
  • 8. WHY THE DEFAULT SCORING MAY NOT WORK? • Tackling Inverse Document Frequency • Product 1 - brown jacket • Product 2 - leather jacket • q = brown leather jacket • IDF is Not a measure of usefulness but a measure of rarity. • Should IDF from your corpus be the true judge on whether “leather” is more important than “brown” • Maybe you stock less brown jackets but it doesn’t mean that it is more important than a leather jacket. • Combine data of many stores in your vertical and compute the IDF score offline • Feed it back into your Custom Similarity implementation
  • 9. WHY THE DEFAULT SCORING MAY NOT WORK? • The "tie" factor between two documents with the same number of term matches is "fieldNorm". This means the document which contains lesser number of tokens.
  • 10. FUNCTION QUERIES • • • FunctionQuery allows one to use the actual value of a field and functions of those fields in a relevancy score. It iterates over all documents serially applying the function Can be multiplied into the score by using the boost param in the eDismax request handler
  • 11. INCLUDE POPULARITY DATA • • • • • Popularity could be anything - Maximum selling items, Highest viewed products, trending etc. Compute the "popularity" score offline for each document in the index. Stick them into the document if your data set is small else you could use a ExternalFileField Use a function query: • &boost= multiple popularity score value * score With the new expressions module coming in Lucene 4.6 it’s fairly simple to add multiple signals into your ranking formula • Expression expr = JavascriptCompiler.compile("_score + ln(popularity) + ln(margin)");
  • 12. ADDING CLICK THROUGH DATA • • Use this on a per query basis or a set of similar queries. We used function queries which take • id’s and their associated boost ! • An external application would enable the function query depending on the search query
  • 13. BOOSTING NEWER PRODUCTS • • Blindly sort the result • &sort = release_date desc Give preference to Newer Products • recip(ms(NOW/DAY,pub_date),3.16e-11,1,1) • Where recip(m, x, a, b) = a / (mx + b) • Picking a=2, b =1, m = 3.16e-11 • Gives a boost =2 for todays product • Gives a boost =1.3 for 1/2 year old product • Gives a boost =1 for 1 year old product and so on
  • 14. I'M STILL NOT SATISFIED! • • • Take your top N queries and use the QueryElevationComponent :) Fix particular documents for certain queries No scoring is taken into consideration for these queries ! <elevate> <query text="android phones"> <doc id="nexus 4" /> <doc id="iPhone" exclude="true"/> </query> </elevate>