Neste tutorial apresentei usando Python Básico conceitos de como construir um sistema de recomendação por filtragem colaborativa.
Mutirão PyCursos:
Vídeo em: https://plus.google.com/u/0/events/c3hqbk20omt3r5uoq13gpk82i9g
2. Quem é Marcel ?
Marcel Pinheiro Caraciolo - @marcelcaraciolo
Sergipano, porém Recifense.
Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados
Diretor de Pesquisa e Desenvolvimento no Atépassar
CEO e Co-fundador do PyCursos/ Pingmind
Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE)
Minhas áreas de interesse: Computação móvel e Computação inteligente
Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006)
http://aimotion.blogspot.com (sobre I.A. desde 2009)
8. Intelligence from
Mining Data
User
User
User User
User
Um usuário influencia outros
por resenhas, notas, recomendações e blogs
Um usuário é influenciado por outros
por resenhas, notas, recomendações e blogs
9. aggregation information: lists
ratings
user-generated content
reviews
blogs recommendations
wikis Collective Intelligence voting
Your application bookmarking
Search
tag cloud tagging
saving
Natural Language Processing
Clustering and Harness external content
predictive models
10. WEB SITES
WEB APPLICATIONS
WEB SERVICES
3.0 SEMANTIC WEB
USERS
antes...
VI Encontro do PUG-PE
VI Encontro do PUG-PE
20. “A lot of times, people don’t know what
they want until you show it to them.”
Steve Jobs
“We are leaving the Information age, and
entering into the Recommendation age.”
Chris Anderson, from book Long Tail
21. Recomendações Sociais
Família/Amigos
Amigos/ Família
O Que eu
deveria ler ?
Ref: Flickr-BlueAlgae
“Eu acho que
você deveria ler
Ref: Flickr photostream: jefield estes livros.
22. Recomendações por Interação
Entrada: Avalie alguns livros
O Que eu
deveria ler ?
Saída:
“Livros que você
pode gostar
são …”
25. Netflix
- 2/3 dos filmes alugados vêm de recomendação
Google News
- 38% das notícias mais clicadas vêm de recomendação
Amazon
- 38% das vendas vêm de recomendação
Fonte: Celma & Lamere, ISMIR 2007
26. !"#$%"#&'"%(&$)")
Nós+,&-.$/).#&0#/"1.#$%234(".#
* estamos sobrecarregados de
informação
$/)#5(&6 7&.2.#"$4,#)$8
* 93((3&/.#&0#:&'3".;#5&&<.#
$/)#:-.34#2%$4<.#&/(3/"
Milhares de artigos e posts
* =/#>$/&3;#?#@A#+B#4,$//"(.;#
novos todos os dias
2,&-.$/).#&0#7%&6%$:.#
"$4,#)$8
* =/#C"1#D&%<;#."'"%$(#
Milhões de Músicas, Filmes e
2,&-.$/).#&0#$)#:"..$6".#
Livros
."/2#2&#-.#7"%#)$8
Milhares de Ofertas e
Promoções
27. O que pode ser recomendado ?
Contatos em Redes Sociais Artigos
Produtos Messagens de Propaganda
Cursos e-learning Livros
Tags Músicas
Futuras namoradas
Roupas Filmes
Restaurantes
Programas de Tv
Vídeos Papers
Opções de Investimento Profissionais
Módulos de código
29. O que os sistemas de recomendação
realmente fazem ?
1. Prediz o quanto você pode gostar de um certo
produto ou serviço
2. Sugere um lista de N items ordenada de acordo
com seu interese
3. Sugere uma lista de N usuários ordernada
para um produto/serviço
4. Explica a você o porque esses items foram
recomendados
5. Ajusta a predição e a recomendação baseado em
seu feedback e de outros.
30. Filtragem baseada por Conteúdo
Similar
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
recomenda
gosta
Marcel Usuários
31. Problemas com filtragem por
conteúdo
1. Análise dos dados Restrita
- Items e usuários pouco detalhados. Pior em áudio ou imagens
2. Dados Especializados
- Uma pessoa que não tem experiência com Sushi não recebe o
melhor restaurante de Sushi da cidade
3. Efeito Portfólio
- Só porque eu vi 1 filme da Xuxa quando criança, tem que me
recomendar todos dela
32. Filtragem Colaborativa
O Vento Toy
Thor Armagedon Items
Levou Store
gosta
recomenda
Marcel Rafael Amanda Usuários
Similar
33. Problemas com filtragem colaborativa
1. Escabilidade
- Amazon com 5M usuários, 50K items, 1.4B avaliações
2. Dados esparsos
- Novos usuários e items que não tem histórico
3. Partida Fria
- Só avaliei apenas um único livro no Amazon!
4. Popularidade
- Todo mundo lê ‘Harry Potter’
5. Hacking
- A pessoa que lê ‘Harry Potter’ lê Kama Sutra
34. Filtragem Híbrida
Combinação de múltiplos métodos
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
Ontologias
Dados
Símbolicos
Marcel Rafael Luciana Usuários
35. Como eles são
apresentados ?
Destaques Mais sobre este artista...
Alguem similar a você também gostou disso
O mais popular em seu grupo...
Já que você escutou esta, você pode querer esta...
Lançamentos Escute músicas de artistas similares.
Estes dois item vêm juntos..
36. Como eles são avaliados ?
Como sabemos se a recomendação é boa ?
Geralmente se divide-se em treinamento/teste (80/20)
Críterios utilizados:
- Erro de Predição: RMSE
- Curva ROC*, rank-utility, F-Measure
*http://code.google.com/p/pyplotmining/
37. How to build a recommender
system with Python ?
There is one option...
Crab
A Python Framework for Building
Recommendation Engines
https://github.com/python-recsys/crab
38. How to build a recommender
system with Python ?
There is one option... But it’s still in development!
Crab
A Python Framework for Building
Recommendation Engines
https://github.com/python-recsys/crab
39. But here we will create one from
Zero with Python!
Find someone similar to you
O Vento Toy
Thor Armagedon Items
Levou Store
like
recommends
Marcel Rafael Amanda Users
Similar
40. But here we will create one from
Step Zero with Python!
Find someone similar to you
Movies Ratings Dataset
41. But here we will create one from
Step Zero with Python!
Find someone similar to you
Movies Ratings Dataset
Mr. X deu nota 4 para
Snow Crash e 2 para
Girl with the Dragon Tatoo,
O que recomendar para ele ?
42. But here we will create one from
Step Zero with Python!
Find someone similar to you
43. But here we will create one from
Step Zero with Python!
Find someone similar to you
Descobrimos que Amy é mais similar dentre as opções,
Podemos recomendar um filme visto por ela com 5 estrelas :)
44. But here we will create one from
Step Zero with Python!
Mais uma métrica de similaridade: Distância Euclideana
45. But here we will create one from
Step Zero with Python!
Mais uma métrica de similaridade: Distância Euclideana
46. But here we will create one from
Step Zero with Python!
Mais uma métrica de similaridade: Distância Euclideana
54. Codificando o Mahantan
def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
commonRatings = False
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
commonRatings = True
if commonRatings:
return distance
else:
return -1 #Indicates no ratings in common
56. Codificando o Mahantan
def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
commonRatings = False
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
commonRatings = True
if commonRatings:
return distance
else:
return -1 #Indicates no ratings in common
57. Codificando o Mahantan
def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
commonRatings = False
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
commonRatings = True
if commonRatings:
return distance
else:
return -1 #Indicates no ratings in common
>>> manhattan(users['Hailey'], users['Veronica'])
2.0
>>> manhattan(users['Hailey'], users['Jordyn'])
1.5
>>>
59. Codificando Euclidean
def euclidean(rating1, rating2):
"""Computes the euclidean distance.
Both rating1 and rating2 are dictionaries of the form
{'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
distance = 0.0
commonRatings = False
for key in rating1:
if key in rating2:
distance += pow(abs(rating1[key] - rating2[key]), 2.0)
commonRatings = True
if commonRatings:
return pow(distance, 1/2.0)
else:
return -1 #Indicates no ratings in common
61. Codificando Euclidean
def euclidean(rating1, rating2):
"""Computes the euclidean distance.
Both rating1 and rating2 are dictionaries of the form
{'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
distance = 0.0
commonRatings = False
for key in rating1:
if key in rating2:
distance += pow(abs(rating1[key] - rating2[key]), 2.0)
commonRatings = True
if commonRatings:
return pow(distance, 1/2.0)
else:
return -1 #Indicates no ratings in common
1.4142135623730951
62. Codificando Euclidean
def euclidean(rating1, rating2):
"""Computes the euclidean distance.
Both rating1 and rating2 are dictionaries of the form
{'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
distance = 0.0
commonRatings = False
for key in rating1:
if key in rating2:
distance += pow(abs(rating1[key] - rating2[key]), 2.0)
commonRatings = True
if commonRatings:
return pow(distance, 1/2.0)
else:
return -1 #Indicates no ratings in common
>>> euclidean(users['Hailey'], users['Veronica'])
1.4142135623730951
64. Find the closest users
def computeNearestNeighbor(username, users):
"""creates a sorted list of users based on their distance to
username"""
distances = []
for user in users:
if user != username:
distance = manhattan(users[user], users[username])
distances.append((distance, user))
# sort based on distance -- closest first
distances.sort()
return distances
67. Find the closest users
def computeNearestNeighbor(username, users):
"""creates a sorted list of users based on their distance to
username"""
distances = []
for user in users:
if user != username:
distance = manhattan(users[user], users[username])
distances.append((distance, user))
# sort based on distance -- closest first
distances.sort()
return distances
>>> computeNearestNeighbor('Hailey', users)
[(2.0, 'Veronica'), (4.0, 'Chan'),(4.0, 'Sam'), (4.5, 'Dan'), (5.0,
'Angelica'), (5.5, 'Bill'), (7.5, 'Jordyn')]
>>>
69. The recommender
def recommend(username, users):
"""Give list of recommendations"""
# first find nearest neighbor
nearest = computeNearestNeighbor(username, users)[0][1]
recommendations = []
# now find bands neighbor rated that user didn't
neighborRatings = users[nearest]
userRatings = users[username]
for artist in neighborRatings:
if not artist in userRatings:
recommendations.append((artist, neighborRatings[artist]))
recommendations.sort(key=lambda artistTuple: artistTuple[1],
reverse = True)
return recommendations
94. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
96. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
97. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
98. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
>> movies=recommendations.transformPrefs(recommendations.users)
>> recommendations.computeNearestNeighbors(‘Blues Traveler’, movies)
[(0.657, 'You, Me and Dupree'), (0.487, 'Lady in the Water'), (0.111, 'Snakes on a
Plane'), (-0.179, 'The Night Listener'), (-0.422, 'Just My Luck')]
100. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
101. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
102. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
>> movies=recommendations.transformPrefs(recommendations.critics)
>> recommendations.computeNearestNeighbors(movies,'Superman Returns')
[(0.657, 'You, Me and Dupree'), (0.487, 'Lady in the Water'), (0.111, 'Snakes on a
Plane'), (-0.179, 'The Night Listener'), (-0.422, 'Just My Luck')]
106. Find the closest items
def calculateSimilarItems(prefs,sim_distance=manhattan):
! # Create a dictionary of items showing which other items they
! # are most similar to.
! result={}
! # Invert the preference matrix to be item-centric
! itemPrefs=transformPrefs(prefs)
! c=0
! for item in itemPrefs:
! ! # Status updates for large datasets
! ! c+=1
! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs))
# Find the most similar items to this one
scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance)
result[item]=scores
! return result
107. Find the closest items
def calculateSimilarItems(prefs,sim_distance=manhattan):
! # Create a dictionary of items showing which other items they
! # are most similar to.
! result={}
! # Invert the preference matrix to be item-centric
! itemPrefs=transformPrefs(prefs)
! c=0
! for item in itemPrefs:
! ! # Status updates for large datasets
! ! c+=1
! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs))
# Find the most similar items to this one
scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance)
result[item]=scores
! return result
>>> itemsim=recommendations.calculateSimilarItems(users)
>>> itemsim
{'Lady in the Water': [(0.40000000000000002, 'You, Me and Dupree'), (0.2857142857142857, 'The
Night Listener'),... 'Snakes on a Plane': [(0.22222222222222221, 'Lady in the Water'),
(0.18181818181818182, 'The Night Listener'),... etc.
109. The recommender
def recommend(username,users, similarities, n=3):
scores = {}
totalSim = {}
#
# now get the ratings for the user
#
userRatings = users[username]
# Loop over items rated by this user
for item, rating in userRatings.items():
#Loop over items similar to this one
for sim, other_item in similarities[item]:
# Ignore if this user has already rated this item
if other_item in userRatings: continue
# Weighted sum of rating times similarity
scores.setdefault(other_item, 0.0)
scores[other_item]+= sim * rating
# Sum of all the similarities
totalSim.setdefault(other_item, 0.0)
totalSim[other_item] += sim
# Divide each total score by total weighting to get an average
recommendations = [(score/totalSim[item],item) for item,score in scores.items()]
# finally sort and return
recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True)
# Return the first n items
return recommendations[:n]
111. The recommender
def recommend(username,users, similarities, n=3):
scores = {}
totalSim = {}
#
# now get the ratings for the user
#
userRatings = users[username]
# Loop over items rated by this user
for item, rating in userRatings.items():
#Loop over items similar to this one
for sim, other_item in similarities[item]:
# Ignore if this user has already rated this item
if other_item in userRatings: continue
# Weighted sum of rating times similarity
scores.setdefault(other_item, 0.0)
scores[other_item]+= sim * rating
# Sum of all the similarities
totalSim.setdefault(other_item, 0.0)
totalSim[other_item] += sim
# Divide each total score by total weighting to get an average
recommendations = [(score/totalSim[item],item) for item,score in scores.items()]
# finally sort and return
recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True)
# Return the first n items
return recommendations[:n]
>>> recommend('Hailey', users,similarities,3)
[(3.1176470588235294, 'Slightly Stoopid'),
(2.639207507820647, 'Phoenix'),(2.64476386036961, 'Blues Traveler')]
112. Content Based Filtering
Similar
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
recommend
likes
Marcel Users
113. source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is,
aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results
the reviews from similar users.
We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender
achieved showed to be very promising [19].
approach. By combining the content-based filtering and the
uct recommendation approach in a mobile application so the
A.
Crab is already in production
users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it
Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues
repositories which D ESIGN
recommendation to the user, since the current approaches just the services to be recommended, and the review repository
Application data information our mobile recommender sys-
that contains the user opinions about those services. All this for
only deliver product recommendations with a overall score
without pointing out the appropriateness of such recommen- datatembecan be from data source containers in the web product description
can extracted divided into two parts: the rec
dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user
as the as location, description and [17] as
Hybrid Meta Approach gives the system’s architecture and
suppliers, the system will deliver the explanation, providing
relevant reviews of similar users, we believe that it will
tags, etc.). The Figure 3
increase the confidence in the buying decision process and the
displayed at the Figure 2 and the location recommendation
engine from Google: Google HotPot [18]. by user (such as rating, comments,
reviews or ratings provided
mo
wh
product accepptance rate. In the mobile context this approach
po
could help the users in this process and showing the user
relative components. thi
opinions could contribute to achieve this task. rec
spe
!"#$"%&'$ 5&-$
!"#$%&'%($) !".,"/#) acc
!"*+#,$+'-) !"*+#,$+'-) +,-*.&$
!(#$()&'*&%$
/01&'234&$ !6#$6,00&41&7$
wh
res
!<#$<'&2&'&04&%A$B,431*,0A$&14C$
ves
0+44%6+'%$,.")1%#"2)
0+($"($)1%#"2)
3,4$"',(5)
ou
3,4$"',(5)
)))67,8,#%)+,4%$91$'%4)-1":))))
suc
!"#$%&"'()*+,#&-,.)
/$%,0"12()*3$4%)3""5.)
))))1,;&,<4)<1&%%,')=2)4&:&8$1))
)))))))))))%$4%,5)94,14>?) <',7)41$
pro
8&=,%*1,'>$
exp
8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,'&7$<',7)41%$
!(#$()&'*&%$
ma
8&?*&@$
we
Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,'>$
com
7"$%)
!"8+99"(2"'))
!8#$830E&7$<',7)41%$
The content-based filtering approach will be used to filter ext
the product/service repository, while the collaborative based
8&%).1%$ B.
approach will derive the product review recommendations. In
addition we will use text mining techniques to distinct the
!"8+99"(2%$,+(#) polarity of the user review between positive or negative one.
This information summarized would contribute in the product Architecture
Fig. 3. Mobile Recommender System rat
score recommendation computation. The final product recom-
Fig. 1. Meta Recommender Architecture
mendation score is computed by integrating the result of both
me
recommenders. By now, weproduct/service recommender, the user could
In our mobile are considering to use different and
Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen-
different data sources of user opinions and descriptions, we filter some products or services at special oth
is the symbolic data analysis approach (SDA) [19], which
have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his
product description and also can enter modeled ow
a meta recommender architecture, the system would provide
a personalized control over the generated recommendation list
feedback to some offered product recommendation.
as set of modal symbolic descriptions that summarizes the Re
information provided by the corresponding data sources. It is
114. Crab is already in production
Brazilian Social Network called Atepassar.com
Educational network with more than 60.000 students and 120 video-classes
Running on Python
+ Numpy + Scipy and
Django
Backend for Recommendations
MongoDB - mongoengine
Daily Recommendations
with Explanations
115. Distributing the recommendation computations
Use Hadoop and Map-Reduce intensively
Investigating the Yelp mrjob framework https://github.com/pfig/mrjob
Develop the Netflix and novel standard-of-the-art used
Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines
The most commonly used is Slope One technique.
Simple algebra math with slope one algebra y = a*x+b
116. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
117. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
118. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
119. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
"""The classic MapReduce job: count the frequency of words.
"""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
120. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Apontador Reviews Dataset
121. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Crab implements a Matrix
Factorization with Expectation
Maximization algorithm
Apontador Reviews Dataset
122. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Crab implements a Matrix
Factorization with Expectation
Maximization algorithm
scikits.crab.svd package
Apontador Reviews Dataset
123. How are we working ?
Our Project’s Home Page
http://github.com/python-recsys/crab
124. Future Releases
Planned Release 0.1
Collaborative Filtering Algorithms working, sample datasets to load and test
Planned Release 0.11
Sparse Matrixes and Database Models support
Planned Release 0.12
Slope One Agorithm, new factorization techniques implemented
....
125. Join us!
1. Read our Wiki Page
https://github.com/python-recsys/crab/wiki/Developer-Resources
2. Check out our current sprints and open issues
https://github.com/python-recsys/crab/issues
3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
discussion list
http://groups.google.com/group/scikit-crab
130. Conferências Recomendadas
- ACM RecSys.
–ICWSM: Weblogand Social Media
–WebKDD: Web Knowledge Discovery and Data Mining
–WWW: The original WWW conference
–SIGIR: Information Retrieval
–ACM KDD: Knowledge Discovery and Data Mining
–ICML: Machine Learning