This is my strata NY talk about how to build recommendation engines using common items. In particular, I show how multi-modal recommendations can be built using the same framework.
Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
Talk track: Under the covers, machine learning looks very complicated. So how do you get from here to the familiar examples? Tonight’s presentation will show you some simple tricks to help you apply machine learning techniques to build a powerful recommendation engine.
Note to trainers: the next series of slides start with a cartoon example just to set the pattern of how to find co-occurrence and use it to find indicators of what to recommend. Of course, real examples require a LOT of data of user-item interaction history to actually work, so this is just an analogy to get the idea across…
* A history of what everybody has done. Obviously this is just a cartoon because large numbers of users and interactions with items would be required to build a recommender* Next step will be to predict what a new user might like…
*Bob is the “new user” and getting apple is his history
*Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
Now you see the idea of co-occurrence as a basis for recommendation…
*Now we have a new user, Amelia. Like everybody else, she gets a pony… what should the recommender offer her based on her history?
* Pony not interesting because it is so widespread that it does not differentiate a pattern
Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
*Binary matrix is stored sparsely
*Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
*Convert by MapReduce into a binary matrixNote to trainer: diagonal gives total occurrence for each item (self to self) and is a distraction/ not helpful, so the diagonal here is left blank
Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
Note to trainer: Give students time to offer comments. There’s a lot to discuss here.*Upper left: In context of A, B occurs the largest number of times, 13 times out of 1013 appearances with over 100,000 samples. But that’s only ~1.3% as co-occurrence with A out of of all times B appears.*Upper right: B occurs in context of A 33% of time, but counts so small as to be of concern.*Lower right: most significant anomaly in that B still occurs a small number of times of over 100,000 samples, but it ALWAYS co-occurs with A when it does appear.
*The test Mahout uses for this is Log Likelihood Ration (LLR)* Red circle marks the choice that displays highest confidenceNote to trainer: Slide animates with click to show LLR results. SECOND Click animates the choice that has highest confidence.
Note to trainer: we go back to the earlier matrix as a reminder…
Only important co-occurrence is puppy follows apple
*Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
*This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
This is a diagnostics window in the LucidWorksSolr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.In other words, do these indicator artists represented by their indicator Id make reasonable recommendations Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?
Here we recap what we have in the different components of the recommenderWe start with the meta data for an item stored in the Solr index
*Here we’ve added examples of indicator data for the indicator field(s) of the document
*Here we show you what information might be in the sample query
Note to trainer: you could ask the class to consider which data is related… for example, the first 3 bullets of the query relate to meta data for the item, not to data produced by the recommendation algorithm. The last 3 bullets refer to data in the sample query related to data in the indicator field(s) that were produced by the Mahout recommendation engine.