Weitere ähnliche Inhalte Ähnlich wie Mahout and Recommendations (20) Mehr von Ted Dunning (17) Kürzlich hochgeladen (20) Mahout and Recommendations2. 2©MapR Technologies 2013- Confidential
Me, Us
Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
Tonight
Hash tag - #dfwbd #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
3. 3©MapR Technologies 2013- Confidential
Requested Topic For Tonight
What is Mahout?
What makes it different?
How can big data technology solve impossible problems?
How is big data affecting the world?
4. 4©MapR Technologies 2013- Confidential
Also
What is MapR?
What is MapR doing?
How does MapR’s technology work?
How are customers making use of MapR?
How can anyone make use of MapR to solve problems?
5. 5©MapR Technologies 2013- Confidential
Oh … Also This
Detailed break-down of a live machine learning system running
with Mahout on MapR
With code examples
12. 12©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k,
high quality
O(κ d log k) or O(d log κ log k) for larger k,
looser quality
But tonight we’re going to show you how to keep it simple yet powerful…
13. 13©MapR Technologies 2013- Confidential
Comparison of Three Main ML Topics
Recommendation:
– Involves observation of interactions between people taking action (users)
and items for input data to the recommender model
– Goal is to suggest additional appropriate or desirable interactions
– Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts
17. 17©MapR Technologies 2013- Confidential
Mahout Math
Goals are
– basic linear algebra,
– and statistical sampling,
– and good clustering,
– decent speed,
– extensibility,
– especially for sparse data
But not
– totally badass speed
– comprehensive set of algorithms
– optimization, root finders, quadrature
18. 18©MapR Technologies 2013- Confidential
Matrices and Vectors
At the core:
– DenseVector, RandomAccessSparseVector
– DenseMatrix, SparseRowMatrix
Highly composable API
Important ideas:
– view*, assign and aggregate
– iteration
m.viewDiagonal().assign(v)
19. 19©MapR Technologies 2013- Confidential
Assign? View?
Why assign?
– Copying is the major cost for naïve matrix packages
– In-place operations critical to reasonable performance
– Many kinds of updates required, so functional style very helpful
Why view?
– In-place operations often required for blocks, rows, columns or diagonals
– With views, we need #assign + #views methods
– Without views, we need #assign x #views methods
Synergies
– With both views and assign, many loops become single line
20. 24©MapR Technologies 2013- Confidential
Examples
A =a
A =aB+ b
double alpha; a.assign(alpha);
a.assign(b, Functions.chain(
Functions.plus(beta),
Functions.times(alpha));
21. 26©MapR Technologies 2013- Confidential
More Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums
22. 27©MapR Technologies 2013- Confidential
Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums
m.viewDiagonal().zSum()
23. 28©MapR Technologies 2013- Confidential
Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums
m.viewDiagonal().zSum()
m.viewDiagonal().assign(0)
24. 29©MapR Technologies 2013- Confidential
Examples
The trace of a matrix
Set diagonal to zero
Set diagonal to negative of row sums excluding the diagonal
m.viewDiagonal().zSum()
m.viewDiagonal().assign(0)
Vector diag = m.viewDiagonal().assign(0);
diag.assign(m.rowSums().assign(Functions.MINUS));
25. 32©MapR Technologies 2013- Confidential
Clustering and Such
Streaming k-means and ball k-means
– streaming reduces very large data to a cluster sketch
– ball k-means is a high quality k-means implementation
– the cluster sketch is also usable for other applications
– single machine threaded and map-reduce versions available
SVD and friends
– stochastic SVD has in-memory, single machine out-of-core and map-reduce
versions
– good for reducing very large sparse matrices to tall skinny dense ones
Spectral clustering
– based on SVD, allows massive dimensional clustering
26. 33©MapR Technologies 2013- Confidential
Mahout Math Summary
Matrices, Vectors
– views
– in-place assignment
– aggregations
– iterations
Functions
– lots built-in
– cooperate with sparse vector optimizations
Sampling
– abstract samplers
– samplers as functions
Other stuff … clustering, SVD
30. 37©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a
puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
32. 39©MapR Technologies 2013- Confidential
Recommendations
What if everybody gets a
pony?
Now what does Bob want?
?
Alice
Bob
Charles
35. 42©MapR Technologies 2013- Confidential
Log Files and Dimensions
u1
u3
u2
u1
u3
u2
u1
t1
t2
t3
t4
t3
t3
t1
t1
t2
t3
t4
Things
u1 Alice
Bob
Charles
u3
u2
Users
39. 46©MapR Technologies 2013- Confidential
Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators: (t1)
40. 47©MapR Technologies 2013- Confidential
Problems with Raw Cooccurrence
Very popular items co-occur with everything
– Welcome document
– Elevator music
That isn’t interesting
– We want anomalous cooccurrence
42. 49©MapR Technologies 2013- Confidential
Spot the Anomaly
Root LLR is roughly like standard deviations
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.44 0.98
2.26 7.15
43. 50©MapR Technologies 2013- Confidential
A Quick Simplification
Users who do h (a vector of things a user has done)
Also do r
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
(transpose translates back to things)
Item-centric recommendations
(change the order of operations)
A translates things into users
44. 51©MapR Technologies 2013- Confidential
Symmetry Gives Cross Recommentations
AT
A( )h
BT
A( )h
Conventional recommendations
with off-line learning
Cross recommendations
45. 52©MapR Technologies 2013- Confidential
For example
Users enter queries (A)
– (actor = user, item=query)
Users view videos (B)
– (actor = user, item=video)
ATA gives query recommendation
– “did you mean to ask for”
BTB gives video recommendation
– “you might like these videos”
46. 53©MapR Technologies 2013- Confidential
The punch-line
BTA recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
47. 54©MapR Technologies 2013- Confidential
Real-life example
Query: “Paco de Lucia”
Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
49. 56©MapR Technologies 2013- Confidential
Hypothetical Example
Want a navigational ontology?
Just put labels on a web page with traffic
– This gives A = users x label clicks
Remember viewing history
– This gives B = users x items
Cross recommend
– B’A = label to item mapping
After several users click, results are whatever users think they
should be
53. 60©MapR Technologies 2013- Confidential
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
55. 62©MapR Technologies 2013- Confidential
Metrics and
logs (5)
Cooccurrence
analysis (7)
Post to
search
engine (8)
Search
engine (4)
Presentation
tier (2)
User behavior
generator (1)
Session
collector
(3)
History collector
(6)
Diagnostic
browsing (9)
http://bit.ly/18vbbaT
56. 63©MapR Technologies 2013- Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Complete
history
Analyze with Map-Reduce
57. 64©MapR Technologies 2013- Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
Deploy with Conventional Search System
58. 65©MapR Technologies 2013- Confidential
Objective Results
At a very large credit card company
History is all transactions
Development time to minimal viable product about 4 months
General release 2-3 months later
Search-based recs at or equal in quality to other techniques
59. 66©MapR Technologies 2013- Confidential
Summary
Input: Multiple kinds of behavior on one set of things
Output: Recommendations for one kind of behavior with a
different set of things
Cross recommendation is a special case
60. 67©MapR Technologies 2013- Confidential
Objective Results
At a very large credit card company
History is all transactions
Development time to minimal viable product about 4 months
General release 2-3 months later
Search-based recs at or equal in quality to other techniques
61. 68©MapR Technologies 2013- Confidential
Me, Us
Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
tdunning@{apache.org,maprtech.com} ted.dunning@gmail.com
MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
Tonight
Hash tag - #dfwbd #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
Hinweis der Redaktion Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email…. Talk track: Under the covers, machine learning looks very complicated. So how do you get from here to the familiar examples? Tonight’s presentation will show you some simple tricks to help you apply machine learning techniques to build a powerful recommendation engine.