Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.


1.117 Aufrufe

Veröffentlicht am

Optimizing discovery and engagement

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren


  1. 1. The Guide to Predictive Analytics A FINDERBOTS.COM PRODUCTION DISCOVERY
  2. 2. FINDERBOTS.COM • Independent Consulting Service • Specialize in Big-data Predictive Analytics • Recommenders • Personalized discovery • Search optimization and personalization • Committer to open source machine learning projects (Apache Mahout, Finderbots Solr-recommender) Pat Ferrel pat@finderbots.com A FINDERBOTS.COM PRODUCTION
  3. 3. DISCOVERY: • Browse • editorial categories • user generated content—tags, hashtags, comments, likes, shares • realtime predictive analytics driven “concepts” • Search • keywords is not enough • inferred keywords (from usage data) • personalized search (from collaborative filtering data, just like Google) • Recommendations • profile based, content based, usage based • entire catalog can be skewed by predictive analytics • required • why? A FINDERBOTS.COM PRODUCTION
  4. 4. DISCOVERY: • Browse • editorial categories • user generated content—tags, hashtags, comments, likes, shares • realtime predictive analytics driven “concepts” Netflix—80% of views • Search • keywords is not enough • inferred Amazon—keywords (from 60% usage of data) sales • personalized search (from collaborative filtering data, just like Google) • Recommendations Yahoo News—40% increase in TOS • profile based, content based, usage based • entire catalog can be skewed by predictive analytics • required • why? Better Discovery = Better Engagement A FINDERBOTS.COM PRODUCTION
  6. 6. RECOMMENDATIONS CAN DO WHAT SEARCH CANNOT • Search for “leather laptop bag” • Hmm, some are ok but not quite right • Put some in “wishlist” • Look at recommendations • Add and remove as you like… A FINDERBOTS.COM PRODUCTION …things improve! • Never knew I wanted a “Messenger bag with a leather strap” • Didn’t know what one was so would never have searched for it
  7. 7. SEARCH THAT KNOWS WHAT THE USER MEANS • Search for “leather laptop bag” • Buy “leather messenger bag with leather strap” • With the right usage data we can infer “messenger bag” = “laptop bag” • Now –the the words I know will get me –the object I want even though –I didn’t know how to ask for it A FINDERBOTS.COM PRODUCTION
  8. 8. THE CUTTING EDGE IN PREDICTIVE ANALYTICS • Uses any number of user actions—entire user clickstream • Uses metadata—from user profile or item • Uses context—on-site, time, location • Uses content—unstructured text or semi-structured • Personalizes recommendations even when content-based • Mixes any number of “indicators” to increase quality or tune to specific context • Solves the “cold-start” problem—items with too short a lifespan • Can recommend to new users in realtime • Improves Search • Personalizes Search A FINDERBOTS.COM PRODUCTION
  9. 9. THE GOOD NEWS • 90% of these features come from 3 technologies • Search engine (Solr, Elasticsearch) • Mahout • Spark • 90% of the flexibility comes at runtime via query—not from new analytical models. A FINDERBOTS.COM PRODUCTION
  11. 11. ARCHITECTURE action logging HDFS A FINDERBOTS.COM PRODUCTION action logs Mahout 1.0 spark-itemsimilarity cooccurrence indicators Scalable Store HDFS or DB content or metadata = intrinsic indicators Spark Mahout 1.0 spark-rowsimilarity Application Catalog creation and editing query indicators index Search Engine realtime background
  12. 12. ANATOMY OF A RECOMMENDATION r = recommendations hp = a user’s history of some primary action (purchase for instance) P = the history of all users’ primary action rows are users, columns are items [PtP] = compares column to column using log-likelihood based cooccurrence A FINDERBOTS.COM PRODUCTION r = hp[PtP]
  13. 13. THE UNIVERSAL RECOMMENDER • Virtually all collaborative filtering type recommenders can use only one indicator of preference—one action r = hp[PtP] • But the theory doesn’t stop there r = hp[PtP] + hv[VtP] + hc[CtP] + … • Virtually all user actions can be used to improve recommendations—purchase, view, category view… A FINDERBOTS.COM PRODUCTION
  14. 14. A COOCCURRENCE INDICATOR • [PtP] is an indicator matrix for some primary action like purchase • Rows = users, columns = items, boolean data • Compares cooccurring interactions using the log-likelihood A FINDERBOTS.COM PRODUCTION ratio—column-wise similarity • LLR finds important cooccurrences and filters out the rest • Comparing the history of the primary action to other actions finds the secondary actions that lead to the primary—the effect is to scrub secondary actions of non-meaningful ones
  15. 15. CROSS-COOCCURRENCE INDICATORS hi = a user’s history of an action P, V, C = the history of all users’ history of some action (purchase, view, category view) [PtX] = the pairwise comparison of column to column—comparison may be across two actions but is always anchored by primary r = hp[PtP] + hv[VtP] + hc[CtP] + … A FINDERBOTS.COM PRODUCTION
  16. 16. CROSS-COOCCURRENCE SO WHAT? • The entire user’s clickstream can be used • Items clicked • Terms searched • Categories viewed • Items shared • People followed • Items liked or disliked • Video watched • Virtually any action the user can takes makes it easier to predict what they will like in the future. A FINDERBOTS.COM PRODUCTION
  17. 17. FROM INDICATOR TO RECOMMENDATION r = hp[PtP] • This actually means to take the user’s history hp and compare it to rows of the indicator matrix [PtP] • TF-IDF weighting of indicators would be nice to mitigate popular items • Query the indicator with user history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? • That is exactly what a search engine does —except for calculating indicators A FINDERBOTS.COM PRODUCTION
  18. 18. INDICATOR TYPES • Cooccurrence and cross-cooccurrence • Calculated from user actions as discussed • Create with Mahout 1.0 spark-itemsimilarity • Content or metadata • Tags, categories, description text, anything describing an item • Create with Mahout 1.0 spark-rowsimilarity • Intrinsic • Tags, genres, categories, popularity rank, geo-location, anything describing an item • Some may be derived from usage data like popularity rank, or hotness • Is a known or specially calculated property of the item A FINDERBOTS.COM PRODUCTION
  19. 19. CONTENT INDICATORS • Finds similar items based on their content—not which users preferred them • Examples: text descriptions, tags, categories, genres r = ht[TTt] r = recommended items, based on tags ht = a user’s history of an action on items with tags [TTt] = item similarity based on similar tags—a content indicator • This personalizes even content based recommendations A FINDERBOTS.COM PRODUCTION
  20. 20. INTRINSIC INDICATORS • Attributes of items • Genre, subject, category, tags • Specially calculated based on business rules • Popularity, hotness • Based on demographics • Preferred by people using mobile access • Preferred by city dwellers • Preferred by people in warmer climes • Query by value—not user history r = v*I A FINDERBOTS.COM PRODUCTION
  21. 21. THE UNIVERSAL RECOMMENDER “Unified” means one query on all indicators at once r = hp[PtP] + hv[VtP] + hc[CtP] + ht[TTt] + l*L … Unified query: query: users-history-of-purchases; field: purchase query: users-history-of-views; field: view query: users-history-of-categories-viewed; field: category query: users-history-of-purchases; field: tags query: users-location; field: geo-location-preferred … A FINDERBOTS.COM PRODUCTION
  22. 22. ONE OR MANY • One query—one trip to one scalable search engine • Many flavors—customize in the query • Customize for content context • Customize for user context • Profile, location, time, … • Customize for special indicators • Trending, hot, new, popular • All personalized A FINDERBOTS.COM PRODUCTION
  23. 23. POLISH THE APPLE • Auto-optimize via explore-exploit (important): Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future • Visibility control: • Don’t show dups or Show dups at some rate • Filter items the user has already seen • Generate some intrinsic indicators like hotness, popularity—helps solve the “cold-start” problem • Asymmetric train vs query management—for instance query with most recent actions, train on all ingested • On-demand cross-validation scoring for tuning purposes • A/B testing integration with explore-exploit A FINDERBOTS.COM PRODUCTION