SlideShare ist ein Scribd-Unternehmen logo
The Big Idea
Universal Recommender
RECOMMENDATIONS
REQUIRED
A LITTLE HISTORY:
MOTIVATION
• Coocurrence: Mahout 2012
• Factorized ALS: Mahout then Spark’s MLlib
• Experience with then current Recommender Tech
• Evaluation and Experiments
• Could only use “purchase” data threw out 100x view data
• No “realtime”
• too many edge cases, users that had no recommendations
• didn’t adapt to metadata/content of items
• Lots of discussions with Ted Dunning, Sean Owen, Sebastian
Schelter, Pat Ferrel (me)
• Cooccurrence and cross-cooccurrence led to many innovations
ANATOMY OF A RECOMMENDATION
PERSONALIZED
r = recommendations
hp = a user’s history of some action
(purchase for instance)
P = the history of all users’ primary action
rows are users, columns are items
(PtP) = compares column to column using
log-likelihood based correlation test
r = (PtP)hp
COOCCURRENCE WITH LLR
• Let’s call (PtP) an indicator matrix for some primary action like
purchase
• Rows = items, columns = items, element =
similarity/correlation score
• The score is row compared to column using a “similarity” or
“correlation” metric
• Log-Likelihood Ratio (LLR) finds important/correlating
cooccurrences and filters out the rest—a major improvement
in quality over simple cooccurrence or other similarity metrics.
• Experiments on real-world data show LLR is significantly
better than other similarity metrics
* http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
LLR AND SIMILARITY METRICS
PRECISION (MAP@K)
Higher is better
MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10
Similarity Metrics
Mean Average Precision
Mahout Cooccurrence Recommender with E-Commerce Data
Cosine Tanimoto Log-likelihood
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
• That’s exactly what a search engine
does!
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !
USER HISTORY + COOCCURRENCES
+ SEARCH = RECOMMENDATIONS
• The final calculation uses hp as the query on the Cooccurrence
Matrix (PtP), returns a ranked set of items
• Query is a “similarity” query, not relational or key based fetch
• Uses Search Engine as Cosine-based K-Nearest Neighbor
(KNN) Engine with norms and TF-IDF weighting
• Highly optimized for serving these queries in realtime
• Several (Solr, Elasticsearch) have High Availability, massively
scalable clustered auto-sharding features like the best of
NoSQL DBs.
r = (PtP)hp
THE UNIVERSAL RECOMMENDER:
THE BREAKTHROUGH IDEA
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …
THE UNIVERSAL RECOMMENDER:
CORRELATED CROSS-OCCURRENCE
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
CROSS-OCCURRENCE
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …
• Comparing the history of the primary action to other actions finds
actions that lead to the one you want to recommend
• Given strong data about user preferences on a general population
we can also use
• items clicked
• terms searched
• categories viewed
• items shared
• people followed
• items disliked (yes dislikes may predict likes)
• location
• device preference
• gender
• age bracket
• Virtually any anything we know about the population can be
tested for correlation and used to predict a particular users
preferences
CORRELATED CROSS-OCCURRENCE:
SO WHAT?
CORRELATED CROSS-OCCURRENCE;
ADDING CONTENT MODELS
• Collaborative Topic Filtering
• Use Latent Dirichlet Allocation (LDA) to model topics directly from the
textual content
• Calculate based on Word2Vec type word vectors instead of bag-of-
words analysis to boost quality
• Create cross-occurrence indicators from topics the user has preferred
• Repeat periodically
• Entity Preferences:
• Use a Named Entity Recognition (NER) system to find entities in
textual content
• Create cross-occurrence indicators for these entities
• Entities and Topics are long lived and richly describe user
interests, these are very good for use in the Universal
Recommender.
THE UNIVERSAL RECOMMENDER
ADDING CONTENT-BASED RECS
Indicators can also be based on content
similarity
(TTt) is a calculation that compares every 2
documents to each other and finds the most
similar—based upon content alone
r = (TTt)ht + l*L …
INDICATOR TYPES
• Cooccurrence
• Find the best indicator of a user preference for the item type to be recommended: examples are “buy”,
“read”, “video_watch”, “share”, “follow”, “like”.
• Cross-occurrence
• Item metadata as “user” preference, for example: treat item category as a user category-preferences
• Calculated from user actions on any data that may give information about user— category-preferences,
search terms, gender, location
• Create with Mahout-Samsara SimilarityAnalysis.cooccurrence
• Content or metadata
• Content text, tags, categories, description text, anything describing an item
• Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity
• Intrinsic
• Popularity rank, geo-location, anything describing an item
• Some may be derived from usage data like popularity rank, or hotness
• Is a known or specially calculated property of the item
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all indicators at once
Unified query:
purchase-correlator: users-history-of-purchases
view-correlator: users-history-of-views
category-correlator: users-history-of-categories-viewed
tags-correlator: users-history-of-purchases
geo-location-correlator: users-location
…
r = (PtP)hp + (PtV)hv + (PtC)hc + …
(TTt)ht + l*L …
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all correlators at once
Once indicators are indexed as search fields this entire
equation is a single query
Fast!
r = (PtP)hp + (PtV)hv + (PtC)hc + …
(TTt)ht + l*L …
THE UNIVERSAL RECOMMENDER:
BETTER USER COVERAGE
• Any number of user actions—entire user clickstream
• Metadata—from user profile or items
• Context—on-site, time, location
• Content—unstructured text or semi-structured
categorical
• Mixes any number of “indicators” to increase quality
or tune to specific context
• Solution to the “cold-start” problem—items with too
short a lifespan or new users with no history
• Can recommend to new users using
realtime history
• Can use new interaction data from
any user in realtime
• 95% implemented in Universal Recommender
v0.3.0—most current release
All Users
Universal Recommender
ALS or 1-action
Recommenders
POLISH THE APPLE
• Dithering for auto-optimize via explore-exploit:
Randomize some returned recs, if they are acted upon they become
part of the new training data and are more likely to be recommended
in the future
• Visibility control:
• Don’t show dups, blacklist items already shown
• Filter items the user has already seen
• Zero-downtime Deployment: deploy prediction server
once then hot-swap new index when ready.
• Generate some intrinsic indicators like hot, popular—
helps solve the “cold-start” problem
• Asymmetric train vs query—query with most recent user
data, train on all historical data
Architecture Based on
PredictionIO
Universal Recommender
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
background
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
realtime
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
backgroundREALTIME
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
events
&
item
metadata
RECOMMENDATION SERVING
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Spark-Mahout’s
Correlation Engine
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
BACKGROUNDREALTIME
Appendix
TECH STACK
• Hbase 1.X
• Postgres, MySQL, or other JDBC possible
• Spark 1.6.X
• Fast, massively scalable, seems like the “winner”
• HDFS 2.6—Hadoop Distributed File System
• Reiable, massively scalable, the defacto standard
• Spray
• Supplies REST endpoints, muti-threaded via Akka actors
• Elasticsearch 1.7.X or 2.X
• Reliable, massively scalable, fast
• Scala & Java 8
• Fits functional and oop programming style for productivity
• Stable, Scalable, High Availability, Well Supported
* The ES json query looks like this:
* {
* "size": 20
* "query": {
* "bool": {
* "should": [
* {
* "terms": {
* "rate": ["0", "67", "4"]
* }
* },
* {
* "terms": {
* "buy": ["0", "32"],
* "boost": 2
* }
* },
* { // categorical boosts
* "terms": {
* "category": ["cat1"],
* "boost": 1.05
* }
* }
* ],
* "must": [ // categorical filters
* {
* "terms": {
* "category": ["cat1"],
* "boost": 0
* }
* },
* {
* "must_not": [//blacklisted items
* {
* "ids": {
* "values": ["items-id1", "item-id2", ...]
* }
* },
* {
* "constant_score": {// date in query must fall between the expire and avqilable dates of an item
* "filter": {
* "range": {
* "availabledate": {
* "lte": "2015-08-30T12:24:41-07:00"
* }
* }
* },
* "boost": 0
* }
* },
* {
* "constant_score": {// date range filter in query must be between these item property values
* "filter": {
* "range" : {
* "expiredate" : {
* "gte": "2015-08-15T11:28:45.114-07:00"
* "lt": "2015-08-20T11:28:45.114-07:00"
* }
* }
* }, "boost": 0
* }
* },
* {
* "constant_score": { // this orders popular items for backfill
* "filter": {
* "match_all": {}
* },
* "boost": 0.000001 // must have as least a small number to be boostable
* }
* }
* }
* }
* }
*
An example Elasticsearch query on a multi-
field index created from the output of the CCO
engine. The index includes about 90% of the
data in the “whole enchilada” equation.
This executes in 50ms on a non-cached
cluster and ~26ms on an unoptimized cluster.

Weitere ähnliche Inhalte

Was ist angesagt?

Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
James Kirk
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Carlos Castillo (ChaTo)
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
Justin Basilico
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
Amit Sharma
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Zachary Schendel
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
 
Data science Framework
Data science FrameworkData science Framework
Data science Framework
Renan Moreira de Oliveira
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
Pranav Prakash
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
Justin Basilico
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
SAIFUR RAHMAN
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
Steven Ensslen
 
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
Massimo Quadrana
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
Alexander Konduforov
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
Altinity Ltd
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Active Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a SurveyActive Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a Survey
University of Bergen
 

Was ist angesagt? (20)

Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Data science Framework
Data science FrameworkData science Framework
Data science Framework
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
 
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Active Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a SurveyActive Learning in Collaborative Filtering Recommender Systems : a Survey
Active Learning in Collaborative Filtering Recommender Systems : a Survey
 

Ähnlich wie The Universal Recommender

Discovery
DiscoveryDiscovery
Discovery
Pat Ferrel
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
TejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
prathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
prathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
prathyusha1234
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Dataconomy Media
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
Yousef Fadila
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Kris Jack
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
Aravindharamanan S
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issue
NutanBhor
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket data
mniranjanmurthy
 
Lec7 collaborative filtering
Lec7 collaborative filteringLec7 collaborative filtering
Lec7 collaborative filtering
Aravindharamanan S
 
Quick introduction to the click-through filter
Quick introduction to the click-through filterQuick introduction to the click-through filter
Quick introduction to the click-through filter
pontneo
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
Konstantin Savenkov
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
AkashPatil334
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
Betclic Everest Group Tech Team
 
Recommender lecture
Recommender lectureRecommender lecture
Recommender lecture
Aravindharamanan S
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
Georgian Micsa
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
Rumman Chowdhury
 

Ähnlich wie The Universal Recommender (20)

Discovery
DiscoveryDiscovery
Discovery
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issue
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket data
 
Lec7 collaborative filtering
Lec7 collaborative filteringLec7 collaborative filtering
Lec7 collaborative filtering
 
Quick introduction to the click-through filter
Quick introduction to the click-through filterQuick introduction to the click-through filter
Quick introduction to the click-through filter
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Recommender lecture
Recommender lectureRecommender lecture
Recommender lecture
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 

Kürzlich hochgeladen

introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...
sachin chaurasia
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
jealousviolet
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
Srinivas Dukka
 
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTIONBITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
ssuser2b426d1
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
aslasdfmkhan4750
 
11 Top Cross Browser Testing Tools to Know About.pdf
11 Top Cross Browser Testing Tools to Know About.pdf11 Top Cross Browser Testing Tools to Know About.pdf
11 Top Cross Browser Testing Tools to Know About.pdf
kalichargn70th171
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Softwares
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
akshesh doshi
 
dachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdfdachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdf
DNUG e.V.
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
ThousandEyes
 
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to KnowThe Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
onemonitarsoftware
 
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdfBuilding infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
mohitd6
 
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction InnovationNYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS Construction ERP Software
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
Severalnines
 
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
bhumivarma35300
 
當測試開始左移
當測試開始左移當測試開始左移
當測試開始左移
Jersey (CHE-PING) Su
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
sofiafernandezon
 
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
ashiklo9823
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
rachitkumar09887
 

Kürzlich hochgeladen (20)

introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...introduction of Ansys software and basic and advance knowledge of modelling s...
introduction of Ansys software and basic and advance knowledge of modelling s...
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
 
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTIONBITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
 
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
 
11 Top Cross Browser Testing Tools to Know About.pdf
11 Top Cross Browser Testing Tools to Know About.pdf11 Top Cross Browser Testing Tools to Know About.pdf
11 Top Cross Browser Testing Tools to Know About.pdf
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
 
dachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdfdachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdf
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
 
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to KnowThe Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
 
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdfBuilding infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
 
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction InnovationNYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction Innovation
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
 
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
 
當測試開始左移
當測試開始左移當測試開始左移
當測試開始左移
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
 
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
Vip Girls Call ServiCe Hyderabad 0000000000 Pooja Best High Class Hyderabad A...
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
 

The Universal Recommender

  • 3. A LITTLE HISTORY: MOTIVATION • Coocurrence: Mahout 2012 • Factorized ALS: Mahout then Spark’s MLlib • Experience with then current Recommender Tech • Evaluation and Experiments • Could only use “purchase” data threw out 100x view data • No “realtime” • too many edge cases, users that had no recommendations • didn’t adapt to metadata/content of items • Lots of discussions with Ted Dunning, Sean Owen, Sebastian Schelter, Pat Ferrel (me) • Cooccurrence and cross-cooccurrence led to many innovations
  • 4. ANATOMY OF A RECOMMENDATION PERSONALIZED r = recommendations hp = a user’s history of some action (purchase for instance) P = the history of all users’ primary action rows are users, columns are items (PtP) = compares column to column using log-likelihood based correlation test r = (PtP)hp
  • 5. COOCCURRENCE WITH LLR • Let’s call (PtP) an indicator matrix for some primary action like purchase • Rows = items, columns = items, element = similarity/correlation score • The score is row compared to column using a “similarity” or “correlation” metric • Log-Likelihood Ratio (LLR) finds important/correlating cooccurrences and filters out the rest—a major improvement in quality over simple cooccurrence or other similarity metrics. • Experiments on real-world data show LLR is significantly better than other similarity metrics * http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
  • 6. LLR AND SIMILARITY METRICS PRECISION (MAP@K) Higher is better MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10 Similarity Metrics Mean Average Precision Mahout Cooccurrence Recommender with E-Commerce Data Cosine Tanimoto Log-likelihood
  • 7. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  • 8. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? • That’s exactly what a search engine does! r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  • 9. USER HISTORY + COOCCURRENCES + SEARCH = RECOMMENDATIONS • The final calculation uses hp as the query on the Cooccurrence Matrix (PtP), returns a ranked set of items • Query is a “similarity” query, not relational or key based fetch • Uses Search Engine as Cosine-based K-Nearest Neighbor (KNN) Engine with norms and TF-IDF weighting • Highly optimized for serving these queries in realtime • Several (Solr, Elasticsearch) have High Availability, massively scalable clustered auto-sharding features like the best of NoSQL DBs. r = (PtP)hp
  • 10. THE UNIVERSAL RECOMMENDER: THE BREAKTHROUGH IDEA • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  • 11. THE UNIVERSAL RECOMMENDER: CORRELATED CROSS-OCCURRENCE • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… CROSS-OCCURRENCE r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  • 12. • Comparing the history of the primary action to other actions finds actions that lead to the one you want to recommend • Given strong data about user preferences on a general population we can also use • items clicked • terms searched • categories viewed • items shared • people followed • items disliked (yes dislikes may predict likes) • location • device preference • gender • age bracket • Virtually any anything we know about the population can be tested for correlation and used to predict a particular users preferences CORRELATED CROSS-OCCURRENCE: SO WHAT?
  • 13. CORRELATED CROSS-OCCURRENCE; ADDING CONTENT MODELS • Collaborative Topic Filtering • Use Latent Dirichlet Allocation (LDA) to model topics directly from the textual content • Calculate based on Word2Vec type word vectors instead of bag-of- words analysis to boost quality • Create cross-occurrence indicators from topics the user has preferred • Repeat periodically • Entity Preferences: • Use a Named Entity Recognition (NER) system to find entities in textual content • Create cross-occurrence indicators for these entities • Entities and Topics are long lived and richly describe user interests, these are very good for use in the Universal Recommender.
  • 14. THE UNIVERSAL RECOMMENDER ADDING CONTENT-BASED RECS Indicators can also be based on content similarity (TTt) is a calculation that compares every 2 documents to each other and finds the most similar—based upon content alone r = (TTt)ht + l*L …
  • 15. INDICATOR TYPES • Cooccurrence • Find the best indicator of a user preference for the item type to be recommended: examples are “buy”, “read”, “video_watch”, “share”, “follow”, “like”. • Cross-occurrence • Item metadata as “user” preference, for example: treat item category as a user category-preferences • Calculated from user actions on any data that may give information about user— category-preferences, search terms, gender, location • Create with Mahout-Samsara SimilarityAnalysis.cooccurrence • Content or metadata • Content text, tags, categories, description text, anything describing an item • Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity • Intrinsic • Popularity rank, geo-location, anything describing an item • Some may be derived from usage data like popularity rank, or hotness • Is a known or specially calculated property of the item
  • 16. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all indicators at once Unified query: purchase-correlator: users-history-of-purchases view-correlator: users-history-of-views category-correlator: users-history-of-categories-viewed tags-correlator: users-history-of-purchases geo-location-correlator: users-location … r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  • 17. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all correlators at once Once indicators are indexed as search fields this entire equation is a single query Fast! r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  • 18. THE UNIVERSAL RECOMMENDER: BETTER USER COVERAGE • Any number of user actions—entire user clickstream • Metadata—from user profile or items • Context—on-site, time, location • Content—unstructured text or semi-structured categorical • Mixes any number of “indicators” to increase quality or tune to specific context • Solution to the “cold-start” problem—items with too short a lifespan or new users with no history • Can recommend to new users using realtime history • Can use new interaction data from any user in realtime • 95% implemented in Universal Recommender v0.3.0—most current release All Users Universal Recommender ALS or 1-action Recommenders
  • 19. POLISH THE APPLE • Dithering for auto-optimize via explore-exploit: Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future • Visibility control: • Don’t show dups, blacklist items already shown • Filter items the user has already seen • Zero-downtime Deployment: deploy prediction server once then hot-swap new index when ready. • Generate some intrinsic indicators like hot, popular— helps solve the “cold-start” problem • Asymmetric train vs query—query with most recent user data, train on all historical data
  • 21. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION background events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties realtime RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  • 22. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties backgroundREALTIME RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  • 23. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations events & item metadata RECOMMENDATION SERVING PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Spark-Mahout’s Correlation Engine Elasticsearch Spark MODEL UPDATE HBase user history itemProperties BACKGROUNDREALTIME
  • 25. TECH STACK • Hbase 1.X • Postgres, MySQL, or other JDBC possible • Spark 1.6.X • Fast, massively scalable, seems like the “winner” • HDFS 2.6—Hadoop Distributed File System • Reiable, massively scalable, the defacto standard • Spray • Supplies REST endpoints, muti-threaded via Akka actors • Elasticsearch 1.7.X or 2.X • Reliable, massively scalable, fast • Scala & Java 8 • Fits functional and oop programming style for productivity • Stable, Scalable, High Availability, Well Supported
  • 26. * The ES json query looks like this: * { * "size": 20 * "query": { * "bool": { * "should": [ * { * "terms": { * "rate": ["0", "67", "4"] * } * }, * { * "terms": { * "buy": ["0", "32"], * "boost": 2 * } * }, * { // categorical boosts * "terms": { * "category": ["cat1"], * "boost": 1.05 * } * } * ], * "must": [ // categorical filters * { * "terms": { * "category": ["cat1"], * "boost": 0 * } * }, * { * "must_not": [//blacklisted items * { * "ids": { * "values": ["items-id1", "item-id2", ...] * } * }, * { * "constant_score": {// date in query must fall between the expire and avqilable dates of an item * "filter": { * "range": { * "availabledate": { * "lte": "2015-08-30T12:24:41-07:00" * } * } * }, * "boost": 0 * } * }, * { * "constant_score": {// date range filter in query must be between these item property values * "filter": { * "range" : { * "expiredate" : { * "gte": "2015-08-15T11:28:45.114-07:00" * "lt": "2015-08-20T11:28:45.114-07:00" * } * } * }, "boost": 0 * } * }, * { * "constant_score": { // this orders popular items for backfill * "filter": { * "match_all": {} * }, * "boost": 0.000001 // must have as least a small number to be boostable * } * } * } * } * } * An example Elasticsearch query on a multi- field index created from the output of the CCO engine. The index includes about 90% of the data in the “whole enchilada” equation. This executes in 50ms on a non-cached cluster and ~26ms on an unoptimized cluster.