SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Introduction to Mahout

And How To Build a Recommender

©MapR Technologies 2013- Confidential

1
Topic For Today


What is recommendation?



What makes it different?



What is multi-model recommendation?



How can I build it using common household items?

©MapR Technologies 2013- Confidential

2
Oh … Also This


Detailed break-down of a live machine learning system running
with Mahout on MapR



With code examples

©MapR Technologies 2013- Confidential

3
I may have to
summarize

©MapR Technologies 2013- Confidential

4
I may have to
summarize
just a bit

©MapR Technologies 2013- Confidential

5
Part 1:
5 minutes of background

©MapR Technologies 2013- Confidential

6
Part 2:
5 minutes: I want a pony

©MapR Technologies 2013- Confidential

7
©MapR Technologies 2013- Confidential

8
Part 1:
5 minutes of background

©MapR Technologies 2013- Confidential

9
What Does Machine Learning Look Like?

©MapR Technologies 2013- Confidential

10
What Does Machine Learning Look Like?
é T ù
T
é A A ù é A A ù = ê A1 úé
2 û ë
1
2 û
ë 1
ê AT úë
ë 2 û
é T
A1 A1
=ê T
ê A 2 A1
ë
é r ù é AT A
ê 1 ú=ê 1 1
ê r2 ú ê AT A1
ë
û ë 2

k3

O(k2

k3

O(κ k d + d) =
d log n + d) for small
k, high quality
O(κ d log k) or O(d log κ log k) for larger
k, looser quality

A1

A2 ù
û

ù
T
A1 A 2 ú
AT A 2 ú
2
û
ù
T
A1 A 2 úé h1 ù
ê
ú
T
úê h 2 ú
A 2 A 2 ûë
û

é T
T
r1 = ê A1 A1 A1 A 2
ë

é
ù
ùê h1 ú
úê h ú
û
ë 2 û

But tonight we’re going to show you how to keep it simple yet powerful…

©MapR Technologies 2013- Confidential

11
Recommendations as Machine Learning


Recommendation:
–

–
–

Involves observation of interactions between people taking action (users)
and items for input data to the recommender model
Goal is to suggest additional appropriate or desirable interactions
Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts

©MapR Technologies 2013- Confidential

12
©MapR Technologies 2013- Confidential

13
©MapR Technologies 2013- Confidential

14
Part 2:
How recommenders work
(I still want a pony)

©MapR Technologies 2013- Confidential

15
Recommendations

Recap:
Behavior of a crowd helps us
understand what individuals will do

©MapR Technologies 2013- Confidential

16
Recommendations

Alice

Charles

©MapR Technologies 2013- Confidential

Alice got an apple and a
puppy

Charles got a bicycle

17
Recommendations

Alice

Bob

Charles

©MapR Technologies 2013- Confidential

Alice got an apple and a
puppy

Bob got an apple

Charles got a bicycle

18
Recommendations

Alice

Bob

?

What else would Bob like?

Charles

©MapR Technologies 2013- Confidential

19
Recommendations

Alice

Bob

A puppy, of course!

Charles

©MapR Technologies 2013- Confidential

20
You get the idea of how
recommenders work…
(By the way, like me, Bob
also wants a pony)

©MapR Technologies 2013- Confidential

21
Recommendations
Alice

What if everybody gets a
pony?

Bob

Amelia

?

What else would you
recommend for Amelia?

Charles

©MapR Technologies 2013- Confidential

22
Recommendations
Alice

Bob

Amelia

?

If everybody gets a pony, it’s
not a very good indicator of
what to else predict...

Charles

©MapR Technologies 2013- Confidential

23
Problems with Raw Co-occurrence


Very popular items co-occur with everything (or why it’s not
very helpful to know that everybody wants a pony…)
–



Very widespread occurrence is not interesting as a way to
generate indicators
–



Examples: Welcome document; Elevator music

Unless you want to offer an item that is constantly desired, such as razor
blades (or ponies)

What we want is anomalous co-occurrence
–

This is the source of interesting indicators of preference on which to
base recommendation

©MapR Technologies 2013- Confidential

24
Get Useful Indicators from Behaviors
Use log files to build history matrix of users x items

1.
–

Remember: this history of interactions will be sparse compared to all
potential combinations

2.

Transform to a co-occurrence matrix of items x items

3.

Look for useful co-occurrence by looking for anomalous cooccurrences to make an indicator matrix
–

–

Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences
can with confidence be used as indicators of preference
RowSimilarityJob in Apache Mahout uses LLR

©MapR Technologies 2013- Confidential

25
Log Files
Alice
Charles
Charles
Alice

Alice
Bob
Bob
©MapR Technologies 2013- Confidential

26
Log Files
u1
u2

t4

u2

t3

u1

t2

u1

t3

u3

t3

u3
©MapR Technologies 2013- Confidential

t1

t1
27
Log Files and Dimensions
u1

t1

u2

t4

u2

t3

u1

t2

t1

u1

t3

t2

u3

t3

u3

t1

©MapR Technologies 2013- Confidential

Things

Users
u1 Alice
u2 Charles
u3 Bob

28

t3

t4
History Matrix: Users by Items

Alice

✔

Bob

✔

Charles

©MapR Technologies 2013- Confidential

✔

✔
✔
✔

29

✔
Co-occurrence Matrix: Items by Items
How do you tell which co-occurrences are useful?.

1

2

1

1

2

©MapR Technologies 2013- Confidential

1

0

-

0

1

1
30

0
0
Co-occurrence Matrix: Items by Items
Use LLR test to turn co-occurrence into indicators…

1

2

1

1

2

©MapR Technologies 2013- Confidential

1

0

-

0

1

1
31

0
0
Co-occurrence Binary Matrix

not
not

©MapR Technologies 2013- Confidential

1
1

32

1
Spot the Anomaly
What conclusion do you draw from each situation?
A

not A

B

13

1000

not B

1000

100,000

A

not A

B

1

0

not B

0

10,000

©MapR Technologies 2013- Confidential

A
B

1

0

not B

0

2

A

not A

B

10

0

not B

33

not A

0

100,000
Spot the Anomaly
What conclusion do you draw from each situation?
A

not A

B

13

1000

not B

1000

100,000

A

not A

B

1

0

not B

0

10,000

0.90
4.52

A

not A

B

1

0

not B

0

2

A

not A

B

10

0

not B

0

100,000

1.95
14.3

Root LLR is roughly like standard deviations
 In Apache Mahout, RowSimilarityJob uses LLR


©MapR Technologies 2013- Confidential

34
Co-occurrence Matrix
Recap: Use LLR test to turn co-occurrence into indicators

1

2

1

1

2

©MapR Technologies 2013- Confidential

1

0

-

0

1

1
35

0
0
Indicator Matrix: Anomalous Co-Occurrence
Result: The marked row will be added to the indicator
field in the item document…

✔

✔

©MapR Technologies 2013- Confidential

36
Indicator Matrix
That one row from indicator matrix becomes the indicator field in the Solr
document used to deploy the recommendation engine.

✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators:

(t1)

Note: data for the indicator field is added directly to meta-data for a document in
Solr index. You don’t need to create a separate index for the indicators.
©MapR Technologies 2013- Confidential

37
Internals of the Recommender Engine

38

©MapR Technologies 2013- Confidential

38
Internals of the Recommender Engine

39

©MapR Technologies 2013- Confidential

39
Looking Inside LucidWorks
Real-time recommendation query and results: Evaluation

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
40

©MapR Technologies 2013- Confidential

40
Search-based Recommendations


Sample document
–
–
–
–
–

Merchant Id
Field for text description
Phone
Address
Location

©MapR Technologies 2013- Confidential

41
Search-based Recommendations


Sample document
–
–
–
–
–
–
–
–
–
–

Merchant Id
Field for text description
Phone
Address
Location
Indicator merchant id’s
Indicator industry (SIC) id’s
Indicator offers
Indicator text
Local top40

©MapR Technologies 2013- Confidential

42
Search-based Recommendations


Sample document
–
–
–
–
–



Merchant Id
Field for text description
Phone
Address
Location

Sample query
–
–
–
–
–

–
–
–
–
–
–

Indicator merchant id’s
Indicator industry (SIC) id’s
Indicator offers
Indicator text
Local top40

©MapR Technologies 2013- Confidential

43

Current location
Recent merchant descriptions
Recent merchant id’s
Recent SIC codes
Recent accepted offers
Local top40
Search-based Recommendations


Original data
Sample document
and meta-data
– Merchant Id
–
–
–
–



Sample query
–

Field for text description
Phone
Address
Location

–
–
–
–

–
–
–
–
–
–

Current location
Recent merchant descriptions
Recent merchant id’s
Recent SIC codes
Recent accepted offers
Local top40

Indicator merchant id’s
Recommendation
Indicator industry (SIC) id’s
query
Indicator offers
Indicator text
Derived from cooccurrence
Local top40

and cross-occurrence
analysis
©MapR Technologies 2013- Confidential

44
For example


Users enter queries (A)
–



Users view videos (B)
–



(actor = user, item=video)

ATA gives query recommendation
–



(actor = user, item=query)

“did you mean to ask for”

BTB gives video recommendation
–

“you might like these videos”

©MapR Technologies 2013- Confidential

45
The punch-line


BTA recommends videos in response to a query
–
–

(isn’t that a search engine?)
(not quite, it doesn’t look at content or meta-data)

©MapR Technologies 2013- Confidential

46
Real-life example


Query: “Paco de Lucia”



Conventional meta-data search results:
–
–



“hombres del paco” times 400
not much else

Recommendation based search:
–
–

–

Flamenco guitar and dancers
Spanish and classical guitar
Van Halen doing a classical/flamenco riff

©MapR Technologies 2013- Confidential

47
Real-life example

©MapR Technologies 2013- Confidential

48
Hypothetical Example


Want a navigational ontology?



Just put labels on a web page with traffic
–



Remember viewing history
–



This gives B = users x items

Cross recommend
–



This gives A = users x label clicks

B’A = label to item mapping

After several users click, results are whatever users think they
should be

©MapR Technologies 2013- Confidential

49
Nice. But we
can do better?

©MapR Technologies 2013- Confidential

50
A Quick Simplification


Users who do h (a vector of things a user has done)

Ah


A translates things into users

Also do r

A ( Ah)

User-centric recommendations
(transpose translates back to things)

( A A) h

Item-centric recommendations
(change the order of operations)

T

T

©MapR Technologies 2013- Confidential

51
Symmetry Gives Cross Recommentations

( A A) h
T

(

)

BT A h

©MapR Technologies 2013- Confidential

Conventional recommendations
with off-line learning

Cross recommendations

52
things

users

©MapR Technologies 2013- Confidential

A

53
thing thing
type 1 type 2
users

©MapR Technologies 2013- Confidential

é A A ù
2 û
ë 1

54
é A
ë 1

é
A 2 ù é A1 A 2 ù = ê
û ë
û ê
ë
é
=ê
ê
ë
é r ù é
ê 1 ú=ê
ê r2 ú ê
ë
û ë
T

T ù
A1 úé
A1
T úë
A2 û

A2 ù
û

ù
T
T
A1 A1 A1 A 2 ú
AT A1 AT A 2 ú
2
2
û

ù
T
T
A1 A1 A1 A 2 úé h1
ê
T
T
A 2 A1 A 2 A 2 úê h 2
ûë
é h
é T
ù 1
T
r1 = ê A1 A1 A1 A 2 úê
ë
ûê h 2
ë

©MapR Technologies 2013- Confidential

55

ù
ú
ú
û
ù
ú
ú
û
Part 3:
What about that worked
example?

©MapR Technologies 2013- Confidential

56
History collector
(6)

User behavior
generator (1)

Presentation
tier (2)

Diagnostic
browsing (9)

Cooccurrence
analysis (7)

Post to
search
engine (8)

Search
engine (4)

Session
collector
(3)

http://bit.ly/18vbbaT
©MapR Technologies 2013- Confidential

57

Metrics and
logs (5)
Analyze with Map-Reduce

Complete
history

SolR
SolR
Indexer
Solr
Indexer
indexing

Cooccurrence
(Mahout)

Item metadata

©MapR Technologies 2013- Confidential

Index
shards

58
Deploy with Conventional Search System

User
history

SolR
SolR
Indexer
Solr
Indexer
search

Web tier

Item metadata

©MapR Technologies 2013- Confidential

Index
shards

59
Me, Us


Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG



MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s



Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR

©MapR Technologies 2013- Confidential

60

Weitere ähnliche Inhalte

Was ist angesagt?

Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoopTed Dunning
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for MahoutTed Dunning
 

Was ist angesagt? (20)

Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoop
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 

Andere mochten auch

Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahoutGregg Barrett
 
The Universal Recommender
The Universal RecommenderThe Universal Recommender
The Universal RecommenderPat Ferrel
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filteringsscdotopen
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 

Andere mochten auch (8)

Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahout
 
The Universal Recommender
The Universal RecommenderThe Universal Recommender
The Universal Recommender
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 

Ähnlich wie Building multi-modal recommendation engines using search engines

DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersTed Dunning
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15MLconf
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutWhat's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutMapR Technologies
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...
[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...
[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...Mintigo1
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
SMAC - Presentation from RetailWeek Technology Summit, Sept 23
SMAC - Presentation from RetailWeek Technology Summit, Sept 23SMAC - Presentation from RetailWeek Technology Summit, Sept 23
SMAC - Presentation from RetailWeek Technology Summit, Sept 23AirTight Networks
 
Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...
Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...
Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...e-dialog GmbH
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
SparkScore (The Social Net Promoter Score): A methodology for measuring socia...
SparkScore (The Social Net Promoter Score): A methodology for measuring socia...SparkScore (The Social Net Promoter Score): A methodology for measuring socia...
SparkScore (The Social Net Promoter Score): A methodology for measuring socia...SocialMedia.org
 
The LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelThe LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelLima Consulting Group
 
Using Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsUsing Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsDataWorks Summit/Hadoop Summit
 
Brandwatch Masterclass: Lead Generation
Brandwatch Masterclass: Lead GenerationBrandwatch Masterclass: Lead Generation
Brandwatch Masterclass: Lead GenerationBrandwatch
 

Ähnlich wie Building multi-modal recommendation engines using search engines (20)

DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutWhat's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...
[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...
[Webinar] Data, Predictive Analytics & Marketing Clouds: The Platform For The...
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
SMAC - Presentation from RetailWeek Technology Summit, Sept 23
SMAC - Presentation from RetailWeek Technology Summit, Sept 23SMAC - Presentation from RetailWeek Technology Summit, Sept 23
SMAC - Presentation from RetailWeek Technology Summit, Sept 23
 
Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...
Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...
Google Analytics Konferenz 2018_Rock your Data - Aktiviere deine Daten_ Thoma...
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
SparkScore (The Social Net Promoter Score): A methodology for measuring socia...
SparkScore (The Social Net Promoter Score): A methodology for measuring socia...SparkScore (The Social Net Promoter Score): A methodology for measuring socia...
SparkScore (The Social Net Promoter Score): A methodology for measuring socia...
 
The LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity ModelThe LCG Digital Transformation Maturity Model
The LCG Digital Transformation Maturity Model
 
Using Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsUsing Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent Threats
 
Brandwatch Masterclass: Lead Generation
Brandwatch Masterclass: Lead GenerationBrandwatch Masterclass: Lead Generation
Brandwatch Masterclass: Lead Generation
 

Mehr von Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Mehr von Ted Dunning (11)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Building multi-modal recommendation engines using search engines

  • 1. Introduction to Mahout And How To Build a Recommender ©MapR Technologies 2013- Confidential 1
  • 2. Topic For Today  What is recommendation?  What makes it different?  What is multi-model recommendation?  How can I build it using common household items? ©MapR Technologies 2013- Confidential 2
  • 3. Oh … Also This  Detailed break-down of a live machine learning system running with Mahout on MapR  With code examples ©MapR Technologies 2013- Confidential 3
  • 4. I may have to summarize ©MapR Technologies 2013- Confidential 4
  • 5. I may have to summarize just a bit ©MapR Technologies 2013- Confidential 5
  • 6. Part 1: 5 minutes of background ©MapR Technologies 2013- Confidential 6
  • 7. Part 2: 5 minutes: I want a pony ©MapR Technologies 2013- Confidential 7
  • 8. ©MapR Technologies 2013- Confidential 8
  • 9. Part 1: 5 minutes of background ©MapR Technologies 2013- Confidential 9
  • 10. What Does Machine Learning Look Like? ©MapR Technologies 2013- Confidential 10
  • 11. What Does Machine Learning Look Like? é T ù T é A A ù é A A ù = ê A1 úé 2 û ë 1 2 û ë 1 ê AT úë ë 2 û é T A1 A1 =ê T ê A 2 A1 ë é r ù é AT A ê 1 ú=ê 1 1 ê r2 ú ê AT A1 ë û ë 2 k3 O(k2 k3 O(κ k d + d) = d log n + d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality A1 A2 ù û ù T A1 A 2 ú AT A 2 ú 2 û ù T A1 A 2 úé h1 ù ê ú T úê h 2 ú A 2 A 2 ûë û é T T r1 = ê A1 A1 A1 A 2 ë é ù ùê h1 ú úê h ú û ë 2 û But tonight we’re going to show you how to keep it simple yet powerful… ©MapR Technologies 2013- Confidential 11
  • 12. Recommendations as Machine Learning  Recommendation: – – – Involves observation of interactions between people taking action (users) and items for input data to the recommender model Goal is to suggest additional appropriate or desirable interactions Applications include: movie, music or map-based restaurant choices; suggesting sale items for e-stores or via cash-register receipts ©MapR Technologies 2013- Confidential 12
  • 13. ©MapR Technologies 2013- Confidential 13
  • 14. ©MapR Technologies 2013- Confidential 14
  • 15. Part 2: How recommenders work (I still want a pony) ©MapR Technologies 2013- Confidential 15
  • 16. Recommendations Recap: Behavior of a crowd helps us understand what individuals will do ©MapR Technologies 2013- Confidential 16
  • 17. Recommendations Alice Charles ©MapR Technologies 2013- Confidential Alice got an apple and a puppy Charles got a bicycle 17
  • 18. Recommendations Alice Bob Charles ©MapR Technologies 2013- Confidential Alice got an apple and a puppy Bob got an apple Charles got a bicycle 18
  • 19. Recommendations Alice Bob ? What else would Bob like? Charles ©MapR Technologies 2013- Confidential 19
  • 20. Recommendations Alice Bob A puppy, of course! Charles ©MapR Technologies 2013- Confidential 20
  • 21. You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony) ©MapR Technologies 2013- Confidential 21
  • 22. Recommendations Alice What if everybody gets a pony? Bob Amelia ? What else would you recommend for Amelia? Charles ©MapR Technologies 2013- Confidential 22
  • 23. Recommendations Alice Bob Amelia ? If everybody gets a pony, it’s not a very good indicator of what to else predict... Charles ©MapR Technologies 2013- Confidential 23
  • 24. Problems with Raw Co-occurrence  Very popular items co-occur with everything (or why it’s not very helpful to know that everybody wants a pony…) –  Very widespread occurrence is not interesting as a way to generate indicators –  Examples: Welcome document; Elevator music Unless you want to offer an item that is constantly desired, such as razor blades (or ponies) What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation ©MapR Technologies 2013- Confidential 24
  • 25. Get Useful Indicators from Behaviors Use log files to build history matrix of users x items 1. – Remember: this history of interactions will be sparse compared to all potential combinations 2. Transform to a co-occurrence matrix of items x items 3. Look for useful co-occurrence by looking for anomalous cooccurrences to make an indicator matrix – – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference RowSimilarityJob in Apache Mahout uses LLR ©MapR Technologies 2013- Confidential 25
  • 28. Log Files and Dimensions u1 t1 u2 t4 u2 t3 u1 t2 t1 u1 t3 t2 u3 t3 u3 t1 ©MapR Technologies 2013- Confidential Things Users u1 Alice u2 Charles u3 Bob 28 t3 t4
  • 29. History Matrix: Users by Items Alice ✔ Bob ✔ Charles ©MapR Technologies 2013- Confidential ✔ ✔ ✔ ✔ 29 ✔
  • 30. Co-occurrence Matrix: Items by Items How do you tell which co-occurrences are useful?. 1 2 1 1 2 ©MapR Technologies 2013- Confidential 1 0 - 0 1 1 30 0 0
  • 31. Co-occurrence Matrix: Items by Items Use LLR test to turn co-occurrence into indicators… 1 2 1 1 2 ©MapR Technologies 2013- Confidential 1 0 - 0 1 1 31 0 0
  • 32. Co-occurrence Binary Matrix not not ©MapR Technologies 2013- Confidential 1 1 32 1
  • 33. Spot the Anomaly What conclusion do you draw from each situation? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 ©MapR Technologies 2013- Confidential A B 1 0 not B 0 2 A not A B 10 0 not B 33 not A 0 100,000
  • 34. Spot the Anomaly What conclusion do you draw from each situation? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 0.90 4.52 A not A B 1 0 not B 0 2 A not A B 10 0 not B 0 100,000 1.95 14.3 Root LLR is roughly like standard deviations  In Apache Mahout, RowSimilarityJob uses LLR  ©MapR Technologies 2013- Confidential 34
  • 35. Co-occurrence Matrix Recap: Use LLR test to turn co-occurrence into indicators 1 2 1 1 2 ©MapR Technologies 2013- Confidential 1 0 - 0 1 1 35 0 0
  • 36. Indicator Matrix: Anomalous Co-Occurrence Result: The marked row will be added to the indicator field in the item document… ✔ ✔ ©MapR Technologies 2013- Confidential 36
  • 37. Indicator Matrix That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators. ©MapR Technologies 2013- Confidential 37
  • 38. Internals of the Recommender Engine 38 ©MapR Technologies 2013- Confidential 38
  • 39. Internals of the Recommender Engine 39 ©MapR Technologies 2013- Confidential 39
  • 40. Looking Inside LucidWorks Real-time recommendation query and results: Evaluation What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 40 ©MapR Technologies 2013- Confidential 40
  • 41. Search-based Recommendations  Sample document – – – – – Merchant Id Field for text description Phone Address Location ©MapR Technologies 2013- Confidential 41
  • 42. Search-based Recommendations  Sample document – – – – – – – – – – Merchant Id Field for text description Phone Address Location Indicator merchant id’s Indicator industry (SIC) id’s Indicator offers Indicator text Local top40 ©MapR Technologies 2013- Confidential 42
  • 43. Search-based Recommendations  Sample document – – – – –  Merchant Id Field for text description Phone Address Location Sample query – – – – – – – – – – – Indicator merchant id’s Indicator industry (SIC) id’s Indicator offers Indicator text Local top40 ©MapR Technologies 2013- Confidential 43 Current location Recent merchant descriptions Recent merchant id’s Recent SIC codes Recent accepted offers Local top40
  • 44. Search-based Recommendations  Original data Sample document and meta-data – Merchant Id – – – –  Sample query – Field for text description Phone Address Location – – – – – – – – – – Current location Recent merchant descriptions Recent merchant id’s Recent SIC codes Recent accepted offers Local top40 Indicator merchant id’s Recommendation Indicator industry (SIC) id’s query Indicator offers Indicator text Derived from cooccurrence Local top40 and cross-occurrence analysis ©MapR Technologies 2013- Confidential 44
  • 45. For example  Users enter queries (A) –  Users view videos (B) –  (actor = user, item=video) ATA gives query recommendation –  (actor = user, item=query) “did you mean to ask for” BTB gives video recommendation – “you might like these videos” ©MapR Technologies 2013- Confidential 45
  • 46. The punch-line  BTA recommends videos in response to a query – – (isn’t that a search engine?) (not quite, it doesn’t look at content or meta-data) ©MapR Technologies 2013- Confidential 46
  • 47. Real-life example  Query: “Paco de Lucia”  Conventional meta-data search results: – –  “hombres del paco” times 400 not much else Recommendation based search: – – – Flamenco guitar and dancers Spanish and classical guitar Van Halen doing a classical/flamenco riff ©MapR Technologies 2013- Confidential 47
  • 48. Real-life example ©MapR Technologies 2013- Confidential 48
  • 49. Hypothetical Example  Want a navigational ontology?  Just put labels on a web page with traffic –  Remember viewing history –  This gives B = users x items Cross recommend –  This gives A = users x label clicks B’A = label to item mapping After several users click, results are whatever users think they should be ©MapR Technologies 2013- Confidential 49
  • 50. Nice. But we can do better? ©MapR Technologies 2013- Confidential 50
  • 51. A Quick Simplification  Users who do h (a vector of things a user has done) Ah  A translates things into users Also do r A ( Ah) User-centric recommendations (transpose translates back to things) ( A A) h Item-centric recommendations (change the order of operations) T T ©MapR Technologies 2013- Confidential 51
  • 52. Symmetry Gives Cross Recommentations ( A A) h T ( ) BT A h ©MapR Technologies 2013- Confidential Conventional recommendations with off-line learning Cross recommendations 52
  • 54. thing thing type 1 type 2 users ©MapR Technologies 2013- Confidential é A A ù 2 û ë 1 54
  • 55. é A ë 1 é A 2 ù é A1 A 2 ù = ê û ë û ê ë é =ê ê ë é r ù é ê 1 ú=ê ê r2 ú ê ë û ë T T ù A1 úé A1 T úë A2 û A2 ù û ù T T A1 A1 A1 A 2 ú AT A1 AT A 2 ú 2 2 û ù T T A1 A1 A1 A 2 úé h1 ê T T A 2 A1 A 2 A 2 úê h 2 ûë é h é T ù 1 T r1 = ê A1 A1 A1 A 2 úê ë ûê h 2 ë ©MapR Technologies 2013- Confidential 55 ù ú ú û ù ú ú û
  • 56. Part 3: What about that worked example? ©MapR Technologies 2013- Confidential 56
  • 57. History collector (6) User behavior generator (1) Presentation tier (2) Diagnostic browsing (9) Cooccurrence analysis (7) Post to search engine (8) Search engine (4) Session collector (3) http://bit.ly/18vbbaT ©MapR Technologies 2013- Confidential 57 Metrics and logs (5)
  • 59. Deploy with Conventional Search System User history SolR SolR Indexer Solr Indexer search Web tier Item metadata ©MapR Technologies 2013- Confidential Index shards 59
  • 60. Me, Us  Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG  MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s  Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR ©MapR Technologies 2013- Confidential 60

Hinweis der Redaktion

  1. Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
  2. Talk track: Under the covers, machine learning looks very complicated. So how do you get from here to the familiar examples? Tonight’s presentation will show you some simple tricks to help you apply machine learning techniques to build a powerful recommendation engine.
  3. Note to trainers: the next series of slides start with a cartoon example just to set the pattern of how to find co-occurrence and use it to find indicators of what to recommend. Of course, real examples require a LOT of data of user-item interaction history to actually work, so this is just an analogy to get the idea across…
  4. * A history of what everybody has done. Obviously this is just a cartoon because large numbers of users and interactions with items would be required to build a recommender* Next step will be to predict what a new user might like…
  5. *Bob is the “new user” and getting apple is his history
  6. *Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
  7. Now you see the idea of co-occurrence as a basis for recommendation…
  8. *Now we have a new user, Amelia. Like everybody else, she gets a pony… what should the recommender offer her based on her history?
  9. * Pony not interesting because it is so widespread that it does not differentiate a pattern
  10. Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
  11. *Binary matrix is stored sparsely
  12. *Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
  13. *Convert by MapReduce into a binary matrixNote to trainer: diagonal gives total occurrence for each item (self to self) and is a distraction/ not helpful, so the diagonal here is left blank
  14. Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  15. Note to trainer: Give students time to offer comments. There’s a lot to discuss here.*Upper left: In context of A, B occurs the largest number of times, 13 times out of 1013 appearances with over 100,000 samples. But that’s only ~1.3% as co-occurrence with A out of of all times B appears.*Upper right: B occurs in context of A 33% of time, but counts so small as to be of concern.*Lower right: most significant anomaly in that B still occurs a small number of times of over 100,000 samples, but it ALWAYS co-occurs with A when it does appear.
  16. *The test Mahout uses for this is Log Likelihood Ration (LLR)* Red circle marks the choice that displays highest confidenceNote to trainer: Slide animates with click to show LLR results. SECOND Click animates the choice that has highest confidence.
  17. Note to trainer: we go back to the earlier matrix as a reminder…
  18. Only important co-occurrence is puppy follows apple
  19. *Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
  20. Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  21. *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  22. This is a diagnostics window in the LucidWorksSolr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.In other words, do these indicator artists represented by their indicator Id make reasonable recommendations Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?
  23. Here we recap what we have in the different components of the recommenderWe start with the meta data for an item stored in the Solr index
  24. *Here we’ve added examples of indicator data for the indicator field(s) of the document
  25. *Here we show you what information might be in the sample query
  26. Note to trainer: you could ask the class to consider which data is related… for example, the first 3 bullets of the query relate to meta data for the item, not to data produced by the recommendation algorithm. The last 3 bullets refer to data in the sample query related to data in the indicator field(s) that were produced by the Mahout recommendation engine.