SlideShare ist ein Scribd-Unternehmen logo
1 von 23
The Guide to Predictive Analytics 
A FINDERBOTS.COM 
PRODUCTION 
DISCOVERY
FINDERBOTS.COM 
• Independent Consulting Service 
• Specialize in Big-data Predictive Analytics 
• Recommenders 
• Personalized discovery 
• Search optimization and personalization 
• Committer to open source machine learning projects 
(Apache Mahout, Finderbots Solr-recommender) 
Pat Ferrel 
pat@finderbots.com 
A FINDERBOTS.COM 
PRODUCTION
DISCOVERY: 
• Browse 
• editorial categories 
• user generated content—tags, hashtags, comments, likes, shares 
• realtime predictive analytics driven “concepts” 
• Search 
• keywords is not enough 
• inferred keywords (from usage data) 
• personalized search (from collaborative filtering data, just like Google) 
• Recommendations 
• profile based, content based, usage based 
• entire catalog can be skewed by predictive analytics 
• required 
• why? 
A FINDERBOTS.COM 
PRODUCTION
DISCOVERY: 
• Browse 
• editorial categories 
• user generated content—tags, hashtags, comments, likes, shares 
• realtime predictive analytics driven “concepts” 
Netflix—80% of views 
• Search 
• keywords is not enough 
• inferred Amazon—keywords (from 60% usage of data) 
sales 
• personalized search (from collaborative filtering data, just like Google) 
• Recommendations 
Yahoo News—40% increase in TOS 
• profile based, content based, usage based 
• entire catalog can be skewed by predictive analytics 
• required 
• why? 
Better Discovery = Better Engagement 
A FINDERBOTS.COM 
PRODUCTION
NOT JUST 
RECOMMENDATIONS 
Pervasive Content 
Personalization 
A FINDERBOTS.COM 
PRODUCTION
RECOMMENDATIONS CAN DO 
WHAT SEARCH CANNOT 
• Search for “leather laptop bag” 
• Hmm, some are ok but not quite right 
• Put some in “wishlist” 
• Look at recommendations 
• Add and remove as you like… 
A FINDERBOTS.COM 
PRODUCTION 
…things improve! 
• Never knew I wanted a 
“Messenger bag with a leather strap” 
• Didn’t know what one was 
so would never have searched for it
SEARCH THAT KNOWS WHAT 
THE USER MEANS 
• Search for “leather laptop bag” 
• Buy “leather messenger bag with leather strap” 
• With the right usage data we can infer “messenger bag” = 
“laptop bag” 
• Now 
–the the words I know 
will get me 
–the object I want 
even though 
–I didn’t know how to ask for it 
A FINDERBOTS.COM 
PRODUCTION
THE CUTTING EDGE IN 
PREDICTIVE ANALYTICS 
• Uses any number of user actions—entire user clickstream 
• Uses metadata—from user profile or item 
• Uses context—on-site, time, location 
• Uses content—unstructured text or semi-structured 
• Personalizes recommendations even when content-based 
• Mixes any number of “indicators” to increase quality or tune to 
specific context 
• Solves the “cold-start” problem—items with too short a lifespan 
• Can recommend to new users in realtime 
• Improves Search 
• Personalizes Search 
A FINDERBOTS.COM 
PRODUCTION
THE GOOD NEWS 
• 90% of these features come from 3 
technologies 
• Search engine (Solr, Elasticsearch) 
• Mahout 
• Spark 
• 90% of the flexibility comes at runtime 
via query—not from new analytical 
models. 
A FINDERBOTS.COM 
PRODUCTION
THE UNIVERSAL 
RECOMMENDER 
A FINDERBOTS.COM 
PRODUCTION 
Technical Overview
ARCHITECTURE 
action logging HDFS 
A FINDERBOTS.COM 
PRODUCTION 
action logs 
Mahout 1.0 
spark-itemsimilarity 
cooccurrence 
indicators 
Scalable 
Store 
HDFS or DB 
content or 
metadata = 
intrinsic indicators 
Spark 
Mahout 1.0 
spark-rowsimilarity 
Application 
Catalog 
creation and 
editing 
query 
indicators 
index 
Search Engine 
realtime background
ANATOMY OF A 
RECOMMENDATION 
r = recommendations 
hp = a user’s history of some primary action 
(purchase for instance) 
P = the history of all users’ primary action 
rows are users, columns are items 
[PtP] = compares column to column using 
log-likelihood based cooccurrence 
A FINDERBOTS.COM 
PRODUCTION 
r = hp[PtP]
THE UNIVERSAL 
RECOMMENDER 
• Virtually all collaborative filtering type 
recommenders can use only one indicator of 
preference—one action 
r = hp[PtP] 
• But the theory doesn’t stop there 
r = hp[PtP] + hv[VtP] + hc[CtP] + … 
• Virtually all user actions can be used to improve 
recommendations—purchase, view, category 
view… 
A FINDERBOTS.COM 
PRODUCTION
A COOCCURRENCE 
INDICATOR 
• [PtP] is an indicator matrix for some primary action 
like purchase 
• Rows = users, columns = items, boolean data 
• Compares cooccurring interactions using the log-likelihood 
A FINDERBOTS.COM 
PRODUCTION 
ratio—column-wise similarity 
• LLR finds important cooccurrences and filters out 
the rest 
• Comparing the history of the primary action to 
other actions finds the secondary actions that lead 
to the primary—the effect is to scrub secondary 
actions of non-meaningful ones
CROSS-COOCCURRENCE 
INDICATORS 
hi = a user’s history of an action 
P, V, C = the history of all users’ history of some 
action (purchase, view, category view) 
[PtX] = the pairwise comparison of column to 
column—comparison may be across two 
actions but is always anchored by primary 
r = hp[PtP] + hv[VtP] + hc[CtP] + … 
A FINDERBOTS.COM 
PRODUCTION
CROSS-COOCCURRENCE 
SO WHAT? 
• The entire user’s clickstream can be used 
• Items clicked 
• Terms searched 
• Categories viewed 
• Items shared 
• People followed 
• Items liked or disliked 
• Video watched 
• Virtually any action the user can takes makes it 
easier to predict what they will like in the future. 
A FINDERBOTS.COM 
PRODUCTION
FROM INDICATOR TO 
RECOMMENDATION 
r = hp[PtP] 
• This actually means to take the user’s history hp and 
compare it to rows of the indicator matrix [PtP] 
• TF-IDF weighting of indicators would be nice to mitigate 
popular items 
• Query the indicator with user history 
• Sort these by similarity strength and keep only the highest 
—you have recommendations 
• Sound familiar? 
• That is exactly what a search engine does 
—except for calculating indicators 
A FINDERBOTS.COM 
PRODUCTION
INDICATOR TYPES 
• Cooccurrence and cross-cooccurrence 
• Calculated from user actions as discussed 
• Create with Mahout 1.0 spark-itemsimilarity 
• Content or metadata 
• Tags, categories, description text, anything describing an item 
• Create with Mahout 1.0 spark-rowsimilarity 
• Intrinsic 
• Tags, genres, categories, popularity rank, geo-location, anything 
describing an item 
• Some may be derived from usage data like popularity rank, or hotness 
• Is a known or specially calculated property of the item 
A FINDERBOTS.COM 
PRODUCTION
CONTENT INDICATORS 
• Finds similar items based on their content—not which users preferred them 
• Examples: text descriptions, tags, categories, genres 
r = ht[TTt] 
r = recommended items, based on tags 
ht = a user’s history of an action on items with 
tags 
[TTt] = item similarity based on similar tags—a content indicator 
• This personalizes even content based recommendations 
A FINDERBOTS.COM 
PRODUCTION
INTRINSIC INDICATORS 
• Attributes of items 
• Genre, subject, category, tags 
• Specially calculated based on business rules 
• Popularity, hotness 
• Based on demographics 
• Preferred by people using mobile access 
• Preferred by city dwellers 
• Preferred by people in warmer climes 
• Query by value—not user history 
r = v*I 
A FINDERBOTS.COM 
PRODUCTION
THE UNIVERSAL 
RECOMMENDER 
“Unified” means one query on all indicators at once 
r = hp[PtP] + hv[VtP] + hc[CtP] + 
ht[TTt] + l*L … 
Unified query: 
query: users-history-of-purchases; field: purchase 
query: users-history-of-views; field: view 
query: users-history-of-categories-viewed; field: category 
query: users-history-of-purchases; field: tags 
query: users-location; field: geo-location-preferred 
… 
A FINDERBOTS.COM 
PRODUCTION
ONE OR MANY 
• One query—one trip to one scalable search 
engine 
• Many flavors—customize in the query 
• Customize for content context 
• Customize for user context 
• Profile, location, time, … 
• Customize for special indicators 
• Trending, hot, new, popular 
• All personalized 
A FINDERBOTS.COM 
PRODUCTION
POLISH THE APPLE 
• Auto-optimize via explore-exploit (important): 
Randomize some returned recs, if they are acted upon they become part of the 
new training data and are more likely to be recommended in the future 
• Visibility control: 
• Don’t show dups or Show dups at some rate 
• Filter items the user has already seen 
• Generate some intrinsic indicators like hotness, popularity—helps 
solve the “cold-start” problem 
• Asymmetric train vs query management—for instance query with 
most recent actions, train on all ingested 
• On-demand cross-validation scoring for tuning purposes 
• A/B testing integration with explore-exploit 
A FINDERBOTS.COM 
PRODUCTION

Weitere ähnliche Inhalte

Andere mochten auch

Collaborative Filtering Recommender Based on Co-occurrence Matrix
Collaborative Filtering Recommender Based on Co-occurrence MatrixCollaborative Filtering Recommender Based on Co-occurrence Matrix
Collaborative Filtering Recommender Based on Co-occurrence MatrixMarjan Sterjev
 
PredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and AppsPredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and Appspredictionio
 
[2C2]PredictionIO
[2C2]PredictionIO[2C2]PredictionIO
[2C2]PredictionIONAVER D2
 
Machine Learning Software Design Pattern with PredictionIO
Machine Learning Software Design Pattern with PredictionIOMachine Learning Software Design Pattern with PredictionIO
Machine Learning Software Design Pattern with PredictionIOTuri, Inc.
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
PredictionIO - Scalable Machine Learning Architecture
PredictionIO - Scalable Machine Learning ArchitecturePredictionIO - Scalable Machine Learning Architecture
PredictionIO - Scalable Machine Learning Architecturepredictionio
 
The Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender SystemsThe Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender SystemsXavier Amatriain
 
The Cloud-natives are RESTless @ JavaOne
The Cloud-natives are RESTless @ JavaOneThe Cloud-natives are RESTless @ JavaOne
The Cloud-natives are RESTless @ JavaOneKonrad Malawski
 
Practical Akka HTTP - introduction
Practical Akka HTTP - introductionPractical Akka HTTP - introduction
Practical Akka HTTP - introductionŁukasz Sowa
 
An Introduction to Akka http
An Introduction to Akka httpAn Introduction to Akka http
An Introduction to Akka httpKnoldus Inc.
 
Securing Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPSecuring Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPRafal Gancarz
 
PredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scalapredictionio
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPdatamantra
 

Andere mochten auch (13)

Collaborative Filtering Recommender Based on Co-occurrence Matrix
Collaborative Filtering Recommender Based on Co-occurrence MatrixCollaborative Filtering Recommender Based on Co-occurrence Matrix
Collaborative Filtering Recommender Based on Co-occurrence Matrix
 
PredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and AppsPredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and Apps
 
[2C2]PredictionIO
[2C2]PredictionIO[2C2]PredictionIO
[2C2]PredictionIO
 
Machine Learning Software Design Pattern with PredictionIO
Machine Learning Software Design Pattern with PredictionIOMachine Learning Software Design Pattern with PredictionIO
Machine Learning Software Design Pattern with PredictionIO
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
PredictionIO - Scalable Machine Learning Architecture
PredictionIO - Scalable Machine Learning ArchitecturePredictionIO - Scalable Machine Learning Architecture
PredictionIO - Scalable Machine Learning Architecture
 
The Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender SystemsThe Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender Systems
 
The Cloud-natives are RESTless @ JavaOne
The Cloud-natives are RESTless @ JavaOneThe Cloud-natives are RESTless @ JavaOne
The Cloud-natives are RESTless @ JavaOne
 
Practical Akka HTTP - introduction
Practical Akka HTTP - introductionPractical Akka HTTP - introduction
Practical Akka HTTP - introduction
 
An Introduction to Akka http
An Introduction to Akka httpAn Introduction to Akka http
An Introduction to Akka http
 
Securing Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTPSecuring Microservices using Play and Akka HTTP
Securing Microservices using Play and Akka HTTP
 
PredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scala
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 

Ähnlich wie Discovery

Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchLucidworks
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation systemAkashPatil334
 
Betabrand presentation
Betabrand presentationBetabrand presentation
Betabrand presentationKaren Song
 
Data Science for Betabrand
Data Science for BetabrandData Science for Betabrand
Data Science for BetabrandKaren Song
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologiesenterprisesearchmeetup
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issueNutanBhor
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbaiTejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in puneprathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabadprathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science coursesprathyusha1234
 
Data Science for Betabrand
Data Science for BetabrandData Science for Betabrand
Data Science for BetabrandKaren Song
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
Using Google Analytics To Market Your Software Idea
Using Google Analytics To Market Your Software IdeaUsing Google Analytics To Market Your Software Idea
Using Google Analytics To Market Your Software IdeaPierre DeBois
 
Using Analytics for User Research
Using Analytics for User ResearchUsing Analytics for User Research
Using Analytics for User ResearchLuke Hay
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshellKonstantin Savenkov
 
Lab EPiServer Find - Advanced developer scenarios
Lab EPiServer Find - Advanced developer scenariosLab EPiServer Find - Advanced developer scenarios
Lab EPiServer Find - Advanced developer scenariosPatrick van Kleef
 
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...Vishrut Shukla
 
Revenue Growth through Machine Learning
Revenue Growth through Machine LearningRevenue Growth through Machine Learning
Revenue Growth through Machine LearningDataWorks Summit
 

Ähnlich wie Discovery (20)

Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Betabrand presentation
Betabrand presentationBetabrand presentation
Betabrand presentation
 
Data Science for Betabrand
Data Science for BetabrandData Science for Betabrand
Data Science for Betabrand
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issue
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
Data Science for Betabrand
Data Science for BetabrandData Science for Betabrand
Data Science for Betabrand
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Using Google Analytics To Market Your Software Idea
Using Google Analytics To Market Your Software IdeaUsing Google Analytics To Market Your Software Idea
Using Google Analytics To Market Your Software Idea
 
Using Analytics for User Research
Using Analytics for User ResearchUsing Analytics for User Research
Using Analytics for User Research
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
 
Lab EPiServer Find - Advanced developer scenarios
Lab EPiServer Find - Advanced developer scenariosLab EPiServer Find - Advanced developer scenarios
Lab EPiServer Find - Advanced developer scenarios
 
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
 
Revenue Growth through Machine Learning
Revenue Growth through Machine LearningRevenue Growth through Machine Learning
Revenue Growth through Machine Learning
 

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Discovery

  • 1. The Guide to Predictive Analytics A FINDERBOTS.COM PRODUCTION DISCOVERY
  • 2. FINDERBOTS.COM • Independent Consulting Service • Specialize in Big-data Predictive Analytics • Recommenders • Personalized discovery • Search optimization and personalization • Committer to open source machine learning projects (Apache Mahout, Finderbots Solr-recommender) Pat Ferrel pat@finderbots.com A FINDERBOTS.COM PRODUCTION
  • 3. DISCOVERY: • Browse • editorial categories • user generated content—tags, hashtags, comments, likes, shares • realtime predictive analytics driven “concepts” • Search • keywords is not enough • inferred keywords (from usage data) • personalized search (from collaborative filtering data, just like Google) • Recommendations • profile based, content based, usage based • entire catalog can be skewed by predictive analytics • required • why? A FINDERBOTS.COM PRODUCTION
  • 4. DISCOVERY: • Browse • editorial categories • user generated content—tags, hashtags, comments, likes, shares • realtime predictive analytics driven “concepts” Netflix—80% of views • Search • keywords is not enough • inferred Amazon—keywords (from 60% usage of data) sales • personalized search (from collaborative filtering data, just like Google) • Recommendations Yahoo News—40% increase in TOS • profile based, content based, usage based • entire catalog can be skewed by predictive analytics • required • why? Better Discovery = Better Engagement A FINDERBOTS.COM PRODUCTION
  • 5. NOT JUST RECOMMENDATIONS Pervasive Content Personalization A FINDERBOTS.COM PRODUCTION
  • 6. RECOMMENDATIONS CAN DO WHAT SEARCH CANNOT • Search for “leather laptop bag” • Hmm, some are ok but not quite right • Put some in “wishlist” • Look at recommendations • Add and remove as you like… A FINDERBOTS.COM PRODUCTION …things improve! • Never knew I wanted a “Messenger bag with a leather strap” • Didn’t know what one was so would never have searched for it
  • 7. SEARCH THAT KNOWS WHAT THE USER MEANS • Search for “leather laptop bag” • Buy “leather messenger bag with leather strap” • With the right usage data we can infer “messenger bag” = “laptop bag” • Now –the the words I know will get me –the object I want even though –I didn’t know how to ask for it A FINDERBOTS.COM PRODUCTION
  • 8. THE CUTTING EDGE IN PREDICTIVE ANALYTICS • Uses any number of user actions—entire user clickstream • Uses metadata—from user profile or item • Uses context—on-site, time, location • Uses content—unstructured text or semi-structured • Personalizes recommendations even when content-based • Mixes any number of “indicators” to increase quality or tune to specific context • Solves the “cold-start” problem—items with too short a lifespan • Can recommend to new users in realtime • Improves Search • Personalizes Search A FINDERBOTS.COM PRODUCTION
  • 9. THE GOOD NEWS • 90% of these features come from 3 technologies • Search engine (Solr, Elasticsearch) • Mahout • Spark • 90% of the flexibility comes at runtime via query—not from new analytical models. A FINDERBOTS.COM PRODUCTION
  • 10. THE UNIVERSAL RECOMMENDER A FINDERBOTS.COM PRODUCTION Technical Overview
  • 11. ARCHITECTURE action logging HDFS A FINDERBOTS.COM PRODUCTION action logs Mahout 1.0 spark-itemsimilarity cooccurrence indicators Scalable Store HDFS or DB content or metadata = intrinsic indicators Spark Mahout 1.0 spark-rowsimilarity Application Catalog creation and editing query indicators index Search Engine realtime background
  • 12. ANATOMY OF A RECOMMENDATION r = recommendations hp = a user’s history of some primary action (purchase for instance) P = the history of all users’ primary action rows are users, columns are items [PtP] = compares column to column using log-likelihood based cooccurrence A FINDERBOTS.COM PRODUCTION r = hp[PtP]
  • 13. THE UNIVERSAL RECOMMENDER • Virtually all collaborative filtering type recommenders can use only one indicator of preference—one action r = hp[PtP] • But the theory doesn’t stop there r = hp[PtP] + hv[VtP] + hc[CtP] + … • Virtually all user actions can be used to improve recommendations—purchase, view, category view… A FINDERBOTS.COM PRODUCTION
  • 14. A COOCCURRENCE INDICATOR • [PtP] is an indicator matrix for some primary action like purchase • Rows = users, columns = items, boolean data • Compares cooccurring interactions using the log-likelihood A FINDERBOTS.COM PRODUCTION ratio—column-wise similarity • LLR finds important cooccurrences and filters out the rest • Comparing the history of the primary action to other actions finds the secondary actions that lead to the primary—the effect is to scrub secondary actions of non-meaningful ones
  • 15. CROSS-COOCCURRENCE INDICATORS hi = a user’s history of an action P, V, C = the history of all users’ history of some action (purchase, view, category view) [PtX] = the pairwise comparison of column to column—comparison may be across two actions but is always anchored by primary r = hp[PtP] + hv[VtP] + hc[CtP] + … A FINDERBOTS.COM PRODUCTION
  • 16. CROSS-COOCCURRENCE SO WHAT? • The entire user’s clickstream can be used • Items clicked • Terms searched • Categories viewed • Items shared • People followed • Items liked or disliked • Video watched • Virtually any action the user can takes makes it easier to predict what they will like in the future. A FINDERBOTS.COM PRODUCTION
  • 17. FROM INDICATOR TO RECOMMENDATION r = hp[PtP] • This actually means to take the user’s history hp and compare it to rows of the indicator matrix [PtP] • TF-IDF weighting of indicators would be nice to mitigate popular items • Query the indicator with user history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? • That is exactly what a search engine does —except for calculating indicators A FINDERBOTS.COM PRODUCTION
  • 18. INDICATOR TYPES • Cooccurrence and cross-cooccurrence • Calculated from user actions as discussed • Create with Mahout 1.0 spark-itemsimilarity • Content or metadata • Tags, categories, description text, anything describing an item • Create with Mahout 1.0 spark-rowsimilarity • Intrinsic • Tags, genres, categories, popularity rank, geo-location, anything describing an item • Some may be derived from usage data like popularity rank, or hotness • Is a known or specially calculated property of the item A FINDERBOTS.COM PRODUCTION
  • 19. CONTENT INDICATORS • Finds similar items based on their content—not which users preferred them • Examples: text descriptions, tags, categories, genres r = ht[TTt] r = recommended items, based on tags ht = a user’s history of an action on items with tags [TTt] = item similarity based on similar tags—a content indicator • This personalizes even content based recommendations A FINDERBOTS.COM PRODUCTION
  • 20. INTRINSIC INDICATORS • Attributes of items • Genre, subject, category, tags • Specially calculated based on business rules • Popularity, hotness • Based on demographics • Preferred by people using mobile access • Preferred by city dwellers • Preferred by people in warmer climes • Query by value—not user history r = v*I A FINDERBOTS.COM PRODUCTION
  • 21. THE UNIVERSAL RECOMMENDER “Unified” means one query on all indicators at once r = hp[PtP] + hv[VtP] + hc[CtP] + ht[TTt] + l*L … Unified query: query: users-history-of-purchases; field: purchase query: users-history-of-views; field: view query: users-history-of-categories-viewed; field: category query: users-history-of-purchases; field: tags query: users-location; field: geo-location-preferred … A FINDERBOTS.COM PRODUCTION
  • 22. ONE OR MANY • One query—one trip to one scalable search engine • Many flavors—customize in the query • Customize for content context • Customize for user context • Profile, location, time, … • Customize for special indicators • Trending, hot, new, popular • All personalized A FINDERBOTS.COM PRODUCTION
  • 23. POLISH THE APPLE • Auto-optimize via explore-exploit (important): Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future • Visibility control: • Don’t show dups or Show dups at some rate • Filter items the user has already seen • Generate some intrinsic indicators like hotness, popularity—helps solve the “cold-start” problem • Asymmetric train vs query management—for instance query with most recent actions, train on all ingested • On-demand cross-validation scoring for tuning purposes • A/B testing integration with explore-exploit A FINDERBOTS.COM PRODUCTION