Weitere Ă€hnliche Inhalte Ăhnlich wie Cognitive computing with big data, high tech and low tech approaches (20) Mehr von Ted Dunning (10) KĂŒrzlich hochgeladen (20) Cognitive computing with big data, high tech and low tech approaches3. © 2014 MapR Technologies 3
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
4. © 2014 MapR Technologies 4
The outline
âą The first open source, big data project
âą Another big data project
âą Conclusions
5. © 2014 MapR Technologies 5
First:
An apology for going
off-script
8. In 1866, the top finishers
in the tea race reached
London in 99 days, within
2 hours of each other
© 2014 MapR Technologies 8
10. But in 1851, the record
had been set at 89 days
by the Flying Cloud
© 2014 MapR Technologies 10
11. © 2014 MapR Technologies 11
The difference was due
(in part)
to big data
16. But how does this apply
today?
© 2014 MapR Technologies 16
17. © 2014 MapR Technologies 17
Key Points of Mauryâs Work
âą Give to get
â Give the Abstract Log to captains, get data
âą Data consortium wins
â Merging data gives pictures nobody else can see
âą Give back
â Them that gives, also gets
âą But this is just what every data driven web site does!
â Just 150 years before everybody else
18. © 2014 MapR Technologies 18
The Real News in Behavioral Analysis
âą Everybody knows that:
âą You need ensembles of many models to do recommendations
âą You need to use factorization models
âą You predict what you observe
âą (You should predict ratings)
19. © 2014 MapR Technologies 19
But âŠ
none of this is really true
20. © 2014 MapR Technologies 20
In fact,
âą Fancy models are rarely useful expenditures of time
âą Factorization can be good, but not much better (if at all)
âą Ratings are disastrously bad data
âą Cross-recommendation and multi-modal recommendations are
much more interesting
â Multiple kinds of input are far better than multiple models
âą The UI has a far larger impact than the models
âą The best algorithms combine simplicity with accuracy
â So simple you can embed them in a search engine
22. © 2014 MapR Technologies 22
CooccurrenCcoeoAcncaulryrseisn ce Analysis
23. © 2014 MapR Technologies 23
How Often Do Items Co-occur
How often do items co-occur?
24. © 2014 MapR Technologies 24
Which Co-occurrences are Interesting?
Which cooccurences are interesting?
Each row of indicators becomes a field in a
search engine document
25. © 2014 MapR Technologies 25
Recommendations
Alice got an apple and
Alice a puppy
26. © 2014 MapR Technologies 26
Recommendations
Alice got an apple and
Alice a puppy
Charles Charles got a bicycle
27. © 2014 MapR Technologies 27
Recommendations
Alice got an apple and
Alice a puppy
Bob Bob got an apple
Charles Charles got a bicycle
28. © 2014 MapR Technologies 28
Recommendations
Alice got an apple and
Alice a puppy
Bob What else would Bob like?
Charles Charles got a bicycle
29. © 2014 MapR Technologies 29
Recommendations
Alice got an apple and
Alice a puppy
Bob A puppy!
Charles Charles got a bicycle
30. © 2014 MapR Technologies 30
You get the idea of how
recommenders can workâŠ
31. By the way, like me, Bob also
wants a ponyâŠ
© 2014 MapR Technologies 31
32. © 2014 MapR Technologies 32
Recommendations
?
Alice
Bob
Amelia
Charles
What if everybody gets a
pony?
What else would you recommend for
new user Amelia?
33. © 2014 MapR Technologies 33
Recommendations
?
Alice
Bob
Amelia
Charles
If everybody gets a pony, itâs
not a very good indicator of
what to else predict...
34. © 2014 MapR Technologies 34
Problems with Raw Co-occurrence
âą Very popular items co-occur with everything or why itâs not very
helpful to know that everybody wants a ponyâŠ
â Examples: Welcome document; Elevator music
âą Very widespread occurrence is not interesting to generate indicators
for recommendation
â Unless you want to offer an item that is constantly desired, such as
razor blades (or ponies)
âą What we want is anomalous co-occurrence
â This is the source of interesting indicators of preference on which to
base recommendation
35. Overview: Get Useful Indicators from Behaviors
© 2014 MapR Technologies 35
1. Use log files to build history matrix of users x items
â Remember: this history of interactions will be sparse compared to all
potential combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful indicators by identifying anomalous co-occurrences to
make an indicator matrix
â Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences
can with confidence be used as indicators of preference
â ItemSimilarityJob in Apache Mahout uses LLR
36. Which one is the anomalous co-occurrence?
A not A
B 1 0
not B 0 2
© 2014 MapR Technologies 36
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
37. Which one is the anomalous co-occurrence?
A not A
0.90 1.95
B 1 0
not B 0 2
© 2014 MapR Technologies 37
A not A
B 13 1000
not B 1000 100,000
A not A
4.52 14.3
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
38. Collection of Documents: Insert Meta-Data
© 2014 MapR Technologies 38
Search
Technology
Item
meta-data
Ingest easily via NFS
Document for
âpuppyâ id: t4
title: puppy
desc: The sweetest little puppy
ever.
keywords: puppy, dog, pet
39. From Indicator Matrix to New Indicator Field
© 2014 MapR Technologies 39
â
id: t4
title: puppy
desc: The sweetest little puppy
ever.
keywords: puppy, dog, pet
indicators: (t1)
Solr document
for âpuppyâ
Note: data for the indicator field is added directly to meta-data for a document in Apache
Solr or Elastic Search index. You donât need to create a separate index for the indicators.
42. © 2014 MapR Technologies 42
For example
âą Users enter queries (A)
â (actor = user, item=query)
âą Users view videos (B)
â (actor = user, item=video)
âą ATA gives query recommendation
â âdid you mean to ask forâ
âą BTB gives video recommendation
â âyou might like these videosâ
43. © 2014 MapR Technologies 43
The punch-line
âą BTA recommends videos in response to a query
â (isnât that a search engine?)
â (not quite, it doesnât look at content or meta-data)
44. © 2014 MapR Technologies 44
Real-life example
âą Query: âPaco de Luciaâ
âą Conventional meta-data search results:
â âhombres de pacoâ times 400
â not much else
âą Recommendation based search:
â Flamenco guitar and dancers
â Spanish and classical guitar
â Van Halen doing a classical/flamenco riff
46. © 2014 MapR Technologies 46
Hypothetical Example
âą Want a navigational ontology?
âą Just put labels on a web page with traffic
â This gives A = users x label clicks
âą Remember viewing history
â This gives B = users x items
âą Cross recommend
â BâA = label to item mapping
âą After several users click, results are whatever users think they
should be
47. available for free at
http://www.mapr.com/practical-machine-learning
© 2014 MapR Technologies 47
More Details Available
available for free at
http://www.mapr.com/practical-machine-learning
48. © 2014 MapR Technologies 48
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
49. © 2014 MapR Technologies 49
Q & A
Engage with us!
@mapr maprtech
jbates@mapr.com
MapR
maprtech
mapr-technologies
Hinweis der Redaktion Mention that the Pony book said âRowSimilarityJobâ⊠Problem starts hereâŠ