SlideShare a Scribd company logo
1 of 35
Download to read offline
SYSTEM TEARDOWN: SOLR AS A
PRACTICAL RECOMMENDATION ENGINE

Michael Hausenblas
Twitter: @mhausenblas

Chief Data Engineer EMEA, MapR Technologies
What does Machine Learning look like?
What does Machine Learning look like?
! T #
! A A # ! A A # = % A1 &! A A #
2 $ "
1
2 $
2 $
" 1
% AT &" 1
" 2 $
! T
#
T
A1 A1 A1 A 2 &
=% T
% A 2 A1 AT A 2 &
2
"
$
O(κ	
  k	
  d	
  +	
  k3	
  d)	
  =	
  O(k2	
  d	
  log	
  ! 	
  +	
  k3	
  d)	
  for	
  T A k,	
  T A #!
n r # ! A small	
  A
1
2 &
% 1 &=% 1 1
%
high	
  quality	
  
T
T
&"
% r2 & % A 2 k 1
"
$ "
O(κ	
  d	
  log	
  k)	
  or	
  O(d	
  log	
  κ	
  log	
  k)	
  for	
  larger	
  A,	
   A 2 A 2 $%
looser	
  quality	
  
!
! T
#%
T
r1 = % A1 A1 A1 A 2 &
"
$%
"
T

h1 #
&
h2 &
$
h1 #
&
h2 &
$
Recommendations as Machine Learning
• 

Observation of interactions between users taking
actions and items for input data to recommender
model

• 

Goal: suggest additional appropriate or desirable
interactions

• 

Example applications:
–  similar movie, music, books (topic, style, etc.)
–  map-based restaurant choices
–  suggesting sale items for e-stores or cash-register
receipts
Recommendations

Recap:	
  
Behavior	
  of	
  a	
  crowd	
  helps	
  us	
  
understand	
  what	
  individuals	
  will	
  do	
  
Recommendations

Alice	
  

Charles	
  

Alice	
  got	
  an	
  apple	
  and	
  a	
  puppy	
  

Charles	
  got	
  a	
  bicycle	
  
Recommendations

Alice	
  
Bob	
  
Charles	
  

Alice	
  got	
  an	
  apple	
  and	
  a	
  puppy	
  
Bob	
  got	
  an	
  apple	
  
Charles	
  got	
  a	
  bicycle	
  
Recommendations

Alice	
  
Bob	
  
Charles	
  

?	
  

What	
  else	
  would	
  Bob	
  like?	
  
Recommendations

Alice	
  
Bob	
  
Charles	
  

A	
  puppy,	
  of	
  course!	
  
You	
  get	
  the	
  idea	
  of	
  how	
  
recommenders	
  work	
  …	
  
	
  	
  
Recommendations

Alice	
  

What	
  if	
  everybody	
  gets	
  a	
  pony?	
  
	
  

Bob	
  
Amelia	
  
Charles	
  

?	
  

	
  
What	
  else	
  would	
  you	
  
recommend	
  for	
  Amelia?	
  
Recommendations

Alice	
  
Bob	
  
Amelia	
  
Charles	
  

?	
  

If	
  everybody	
  gets	
  a	
  pony,	
  it’s	
  
not	
  a	
  very	
  good	
  indicator	
  of	
  
what	
  to	
  else	
  predict	
  ...	
  
Problems with Raw Co-occurrence
• 
• 

• 

Very popular items co-occur with everything
–  Examples: welcome document; elevator music
Very widespread occurrence is not interesting as a way to generate
indicators
–  Unless you want to offer an item that is constantly desired, such as razor
blades
What we want is anomalous co-occurrence
–  This is the source of interesting indicators of preference on which to base
recommendation
Get Useful Indicators from Behaviors
1. 

Use log files to build history matrix of users x items
–  Remember: this history of interactions will be sparse compared to all potential
combinations

2. 

Transform to a co-occurrence matrix of items x items

3. 

Look for useful co-occurrence by looking for anomalous co-occurrences to
make an indicator matrix
–  Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with
confidence be used as indicators of preference
–  RowSimilarityJob in Apache Mahout uses LLR
Log Files

Alice	
  
Charles	
  
Charles	
  
Alice	
  
Alice	
  
Bob	
  
Bob	
  
Log Files

u1	
  

t1	
  

u2	
  

t4	
  

u2	
  

t3	
  

u1	
  

t2	
  

u1	
  

t3	
  

u3	
  

t3	
  

u3	
  

t1	
  
Log Files and Dimensions
u1	
  

t1	
  

u2	
  

t4	
  

u2	
  

t3	
  

u1	
  

Things	
  
t1	
  

t2	
  

u1	
  

t3	
  

u3	
  

t3	
  

u3	
  

t1	
  

Users	
  
u1	
   Alice	
  
u2	
   Charles	
  
u3	
   Bob	
  

t2	
  
t3	
  
t4	
  
History Matrix: Users by Items

Alice	
  
Bob	
  
Charles	
  

✔	
   ✔	
  

✔	
  

✔	
  

✔	
  
✔	
  

✔	
  
Co-occurrence Matrix: Items by Items

How	
  do	
  you	
  tell	
  which	
  
co-­‐occurrences	
  are	
  
useful?	
  

1	
  

1	
  
2	
   1	
  
0	
   0	
  

-­‐	
  

2	
  
1	
  

0	
  
0	
  
1	
  

1	
  

Use	
  LLR	
  test	
  to	
  turn	
  co-­‐occurrence	
  into	
  indicators…	
  
Co-occurrence Binary Matrix

not	
  
not	
  

1	
  
1	
   1	
  
Spot the Anomaly
A	
  

not	
  A	
  

B	
  

13	
  

1000	
  

not	
  B	
  

1000	
  

100,000	
  

A	
  

not	
  A	
  

B	
  

1	
  

0	
  

not	
  B	
  

0	
  

10,000	
  

A	
  

not	
  A	
  

B	
  

1	
  

0	
  

not	
  B	
  

0	
  

2	
  

A	
  

not	
  A	
  

B	
  

10	
  

0	
  

not	
  B	
  

0	
  

100,000	
  

What	
  conclusion	
  do	
  you	
  draw	
  from	
  each	
  situa9on?	
  
Spot the Anomaly
A	
  

not	
  A	
  

B	
  

13	
  

1000	
  

not	
  B	
  

1000	
  

100,000	
  

A	
  

not	
  A	
  

B	
  

1	
  

0	
  

not	
  B	
  

0	
  

10,000	
  

0.90	
  

4.52	
  
• 
• 

A	
  

not	
  A	
  

B	
  

1	
  

0	
  

not	
  B	
  

0	
  

2	
  

A	
  

not	
  A	
  

B	
  

10	
  

0	
  

not	
  B	
  

0	
  

100,000	
  

1.95	
  
14.3	
  

Root LLR is roughly like standard deviations
In Apache Mahout, RowSimilarityJob uses	
  LLR
Indicator Matrix: Anomalous Co-cccurrence
Result:	
  The	
  marked	
  row	
  
will	
  be	
  added	
  to	
  the	
  
indicator	
  field	
  in	
  the	
  
item	
  document	
  …	
  	
  

✔	
  
✔	
  

Significant	
  co-­‐occurrences!	
  indicators	
  	
  
Indicator Matrix

✔	
  
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet

indicators:
(t1)


That	
  one	
  row	
  from	
  indicator	
  
matrix	
  becomes	
  the	
  
indicator	
  field	
  in	
  the	
  Solr	
  
document	
  used	
  to	
  deploy	
  the	
  
recommenda@on	
  engine	
  
Note:	
  data	
  for	
  the	
  indicator	
  field	
  
is	
  added	
  directly	
  to	
  meta	
  data	
  for	
  
a	
  document	
  in	
  Solr	
  index.	
  	
  
You	
  don’t	
  need	
  to	
  create	
  a	
  
separate	
  index	
  for	
  the	
  indicators.	
  
Demo time!
Internals of the Recommender Engine

27	
  
Looking Inside LucidWorks

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”

28	
  
History collector
(6)

User behavior
generator (1)

Presentation
tier (2)

Diagnostic
browsing (9)

Cooccurrence
analysis (7)

Post to
search
engine (8)

Search
engine (4)

Session
collector
(3)

http://bita.ly/18vbbaT	
  	
  

Metrics and
logs (5)
Example: search based
recommendation
Search-based recommendation
• 

Sample Document
–  Merchant Id original	
  data	
  
–  Field for text description
and	
  meta-­‐data	
  
–  Phone
–  Address
–  Location
– 
– 
– 
– 
– 

• 

Sample Query
–  Current location
–  Recent merchant descriptions
–  Recent merchant id’s
–  Recent SIC codes
–  Recent accepted offers
–  Local Top40

Indicator merchant id’s
recommendaRon	
  query	
  
Indicator industry (SIC) id’s
Indicator offers
Indicator text
derived	
  from	
  co-­‐occurrence	
  analysis	
  
Local Top40
Analyze with MapReduce

complete	
  
history	
  

Co-­‐occurrence	
  
(Mahout)	
  

Item	
  meta-­‐data	
  

SolR	
  
SolR	
  
Solr	
  
Indexer	
  
Indexer	
  
indexing	
  

Index	
  
shards	
  
Deploy with Conventional Search System

user	
  
history	
  

Web	
  Rer	
  

Item	
  meta-­‐data	
  

SolR	
  
SolR	
  
Solr	
  
Indexer	
  
Indexer	
  
search	
  

Index	
  
shards	
  
Outro

• 

Kudos to Ted Dunning, Grant Ingersoll and LucidWorks,
for the idea & the demo!

• 

Get in touch: Twitter—@mhausenblas, @MapR

• 

Ah, and, btw: we’re hiring ;)

More Related Content

Similar to 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptxLonghow Lam
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature EngineeringAlice Zheng
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Gábor Szárnyas
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.Alex Powers
 
Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science PresentationMax De Marzi
 
From DBA to DE: Becoming a Data Engineer
From DBA to DE:  Becoming a Data Engineer From DBA to DE:  Becoming a Data Engineer
From DBA to DE: Becoming a Data Engineer Jim Czuprynski
 
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfCSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfAlexanderKyalo3
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
 
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...Cathrine Wilhelmsen
 
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is bestlucenerevolution
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services
 

Similar to 2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine (20)

Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
Towards the Characterization of Realistic Models: Evaluation of Multidiscipli...
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science Presentation
 
From DBA to DE: Becoming a Data Engineer
From DBA to DE:  Becoming a Data Engineer From DBA to DE:  Becoming a Data Engineer
From DBA to DE: Becoming a Data Engineer
 
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfCSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...
Tools and Tips: From Accidental to Efficient Data Warehouse Developer (24 Hou...
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Big data
Big dataBig data
Big data
 
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Big data
Big dataBig data
Big data
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine

  • 1.
  • 2. SYSTEM TEARDOWN: SOLR AS A PRACTICAL RECOMMENDATION ENGINE Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  • 3. What does Machine Learning look like?
  • 4. What does Machine Learning look like? ! T # ! A A # ! A A # = % A1 &! A A # 2 $ " 1 2 $ 2 $ " 1 % AT &" 1 " 2 $ ! T # T A1 A1 A1 A 2 & =% T % A 2 A1 AT A 2 & 2 " $ O(κ  k  d  +  k3  d)  =  O(k2  d  log  !  +  k3  d)  for  T A k,  T A #! n r # ! A small  A 1 2 & % 1 &=% 1 1 % high  quality   T T &" % r2 & % A 2 k 1 " $ " O(κ  d  log  k)  or  O(d  log  κ  log  k)  for  larger  A,   A 2 A 2 $% looser  quality   ! ! T #% T r1 = % A1 A1 A1 A 2 & " $% " T h1 # & h2 & $ h1 # & h2 & $
  • 5. Recommendations as Machine Learning •  Observation of interactions between users taking actions and items for input data to recommender model •  Goal: suggest additional appropriate or desirable interactions •  Example applications: –  similar movie, music, books (topic, style, etc.) –  map-based restaurant choices –  suggesting sale items for e-stores or cash-register receipts
  • 6. Recommendations Recap:   Behavior  of  a  crowd  helps  us   understand  what  individuals  will  do  
  • 7. Recommendations Alice   Charles   Alice  got  an  apple  and  a  puppy   Charles  got  a  bicycle  
  • 8. Recommendations Alice   Bob   Charles   Alice  got  an  apple  and  a  puppy   Bob  got  an  apple   Charles  got  a  bicycle  
  • 9. Recommendations Alice   Bob   Charles   ?   What  else  would  Bob  like?  
  • 10. Recommendations Alice   Bob   Charles   A  puppy,  of  course!  
  • 11. You  get  the  idea  of  how   recommenders  work  …      
  • 12. Recommendations Alice   What  if  everybody  gets  a  pony?     Bob   Amelia   Charles   ?     What  else  would  you   recommend  for  Amelia?  
  • 13. Recommendations Alice   Bob   Amelia   Charles   ?   If  everybody  gets  a  pony,  it’s   not  a  very  good  indicator  of   what  to  else  predict  ...  
  • 14. Problems with Raw Co-occurrence •  •  •  Very popular items co-occur with everything –  Examples: welcome document; elevator music Very widespread occurrence is not interesting as a way to generate indicators –  Unless you want to offer an item that is constantly desired, such as razor blades What we want is anomalous co-occurrence –  This is the source of interesting indicators of preference on which to base recommendation
  • 15. Get Useful Indicators from Behaviors 1.  Use log files to build history matrix of users x items –  Remember: this history of interactions will be sparse compared to all potential combinations 2.  Transform to a co-occurrence matrix of items x items 3.  Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix –  Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference –  RowSimilarityJob in Apache Mahout uses LLR
  • 16. Log Files Alice   Charles   Charles   Alice   Alice   Bob   Bob  
  • 17. Log Files u1   t1   u2   t4   u2   t3   u1   t2   u1   t3   u3   t3   u3   t1  
  • 18. Log Files and Dimensions u1   t1   u2   t4   u2   t3   u1   Things   t1   t2   u1   t3   u3   t3   u3   t1   Users   u1   Alice   u2   Charles   u3   Bob   t2   t3   t4  
  • 19. History Matrix: Users by Items Alice   Bob   Charles   ✔   ✔   ✔   ✔   ✔   ✔   ✔  
  • 20. Co-occurrence Matrix: Items by Items How  do  you  tell  which   co-­‐occurrences  are   useful?   1   1   2   1   0   0   -­‐   2   1   0   0   1   1   Use  LLR  test  to  turn  co-­‐occurrence  into  indicators…  
  • 21. Co-occurrence Binary Matrix not   not   1   1   1  
  • 22. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   What  conclusion  do  you  draw  from  each  situa9on?  
  • 23. Spot the Anomaly A   not  A   B   13   1000   not  B   1000   100,000   A   not  A   B   1   0   not  B   0   10,000   0.90   4.52   •  •  A   not  A   B   1   0   not  B   0   2   A   not  A   B   10   0   not  B   0   100,000   1.95   14.3   Root LLR is roughly like standard deviations In Apache Mahout, RowSimilarityJob uses  LLR
  • 24. Indicator Matrix: Anomalous Co-cccurrence Result:  The  marked  row   will  be  added  to  the   indicator  field  in  the   item  document  …     ✔   ✔   Significant  co-­‐occurrences!  indicators    
  • 25. Indicator Matrix ✔   id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That  one  row  from  indicator   matrix  becomes  the   indicator  field  in  the  Solr   document  used  to  deploy  the   recommenda@on  engine   Note:  data  for  the  indicator  field   is  added  directly  to  meta  data  for   a  document  in  Solr  index.     You  don’t  need  to  create  a   separate  index  for  the  indicators.  
  • 27. Internals of the Recommender Engine 27  
  • 28. Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 28  
  • 29.
  • 30. History collector (6) User behavior generator (1) Presentation tier (2) Diagnostic browsing (9) Cooccurrence analysis (7) Post to search engine (8) Search engine (4) Session collector (3) http://bita.ly/18vbbaT     Metrics and logs (5)
  • 32. Search-based recommendation •  Sample Document –  Merchant Id original  data   –  Field for text description and  meta-­‐data   –  Phone –  Address –  Location –  –  –  –  –  •  Sample Query –  Current location –  Recent merchant descriptions –  Recent merchant id’s –  Recent SIC codes –  Recent accepted offers –  Local Top40 Indicator merchant id’s recommendaRon  query   Indicator industry (SIC) id’s Indicator offers Indicator text derived  from  co-­‐occurrence  analysis   Local Top40
  • 33. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  • 34. Deploy with Conventional Search System user   history   Web  Rer   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  • 35. Outro •  Kudos to Ted Dunning, Grant Ingersoll and LucidWorks, for the idea & the demo! •  Get in touch: Twitter—@mhausenblas, @MapR •  Ah, and, btw: we’re hiring ;)