Description
As Germany’s largest online vehicle marketplace mobile.de uses recommendations at scale to help users find the perfect car. We elaborate on collaborative & content-based filtering as well as a hybrid approach addressing the problem of a fast-changing inventory. We then dive into the technical implementation of the recommendation engine, outlining the various challenges faced and experiences made.
Abstract
At mobile.de, Germany’s biggest car marketplace, a dedicated team of data engineers and scientists, supported by the IT project house inovex is responsible for creating intelligent data products. Driven by our company slogan “Find the car that fits your life”, we focus on personalised recommendations to address several user needs. Thereby we improve customer experience during browsing as well as finding the perfect offering. In an introduction to recommendation systems, we briefly mention the traditional approaches for recommendation engines, thereby motivating the need for sophisticated approaches. In particular, we explain the different concepts including collaborative and content-based filtering as well as hybrid approaches and general matrix factorisation methods. This is followed by a deep dive into the implementation and architecture at mobile.de that comprises ElasticSearch, Cassandra and Mahout. We explain how Python and Java is used simultaneously to create and serve recommendations.
By presenting our car-model recommender that suggests similar car models of different brands as a concrete use-case, we reiterate on key-aspects during modelling and implementation. In particular, we present a matrix factorisation library that we used and share our experiences with it. We conclude by a brief demonstration of our results and discuss the improvements we achieved in terms of key performance indicators. Furthermore, we use our use case to exemplify the usage of deep learning for recommendations, comparing it with other traditional approaches and hence providing a brief account of the future of recommendation engines.
Event: PyData Berlin 2017
Speaker: Dr. Florian Wilhelm (inovex), Dr. Arnab Dutta (mobile.de)
Mehr Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Tech-Blog: https://www.inovex.de/blog/
Which car fits my life? Mobile.de’s approach to recommendations
1. Which car fits my life?
Mobile.de’s approach to recommendations
PyData Berlin, July 1st, 2017
Florian Wilhelm, Arnab Dutta
2. 2
Introduction
Dr. Florian Wilhelm
Data Scientist
inovex GmbH
™ @FlorianWilhelm
FlorianWilhelm
florianwilhelm.info
Dr. Arnab Dutta
Data Scientist
mobile.de GmbH
™ @kopfhohen
kraktos
o
5. 5
IT-project house for digital transformation:
‣ Agile Development & Management
‣ Web · UI/UX · Replatforming · Microservices
‣ Mobile · Apps · Smart Devices · Robotics
‣ Big Data & Business Intelligence Platforms
‣ Data Science · Data Products · Search · Deep Learning
‣ Data Center Automation · DevOps · Cloud · Hosting
‣ Trainings & Coachings
Using technology to inspire our
clients. And ourselves.
inovex offices in
Karlsruhe · Pforzheim ·
München · Köln · Hamburg.
www.inovex.de
11. 11
Recommendations on Search Results Page
Recommendations based
on similar vehicle make
and model id to present
alternatives
WishlistHome SRP View Contact Buy
12. 12
Recommendations on View Item Page
VIP
Recommendations based on the
specific make and model a user is
viewing to present alternatives
WishlistHome SRP View Contact Buy
13. 13
Recommendations on your Wishlist
Recommendations based on the
specific make and model of a
deleted ad to provide almost
identical recommendations
Recommendations based on the
users car preferences and the
parking lot items.
WishlistHome SRP View Contact Buy
18. 18
Summary of Collaborative Filtering
✓Collective behaviour of users
✓Standard-Method (it works, it’s reliable etc.)
Cold Start Problem: New listings need a
certain number of clicks to be recommended.
Sparsity problems: lot fewer interaction
data points than total items and users.
Content agnostic
Only “batch-based” learning
19. 19
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Preferences
Marketing
Anonymous
Content-based Filtering: User Preferences
21. 21
Summary of Content-based
üWorks even if there are no other
users
ücontent-based preferences of
users based on a weighted vector
of item features
Hard to do recommendations for
new users (cold start problem)
Non-applicable for heterogenous
content types
Low diversity, i.e. more of the
same
23. 23
Find the car that perfectly fits your life
User’s Car Preferences Car Pool + Attributes
(make, model, color, price, …)
Flexible
(cold-start, uncertainty, real-time, ...)
Interactions of other users
(views, parkings, contacts)
26. 26
Hybrid Recommender Concept
PP
P P
P
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Elastic Search Query
Score
0.8
0.3
1.7
3.2
1.1
0.9
……
28. 28
Finding similar make/models
§Users are often uncertain
with their choices in
orientation phase
§Help users explore similar
models to make informed
decisions
§Exploit the collaborative
aspect in defining the
concept of similarity
29. 29
Make/Model Recommender with LightFM
LightFM:
§Matured and well documented Python package
§Optimized and parallelized with Cython
§Hybrid recommender based on matrix factorisation
§Supports Learning-to-Rank objectives (BPR, WARP)
Audi A4 BMW 645
similar vehicles
31. 31
NMF: Embeddings to Similarities
LF1
LF2
• car is represented as an item embedding
• Given 2 embeddings, LFitem_i and Lfitem_j_
• Compute sim(LFitem_i , LFitem_j)
• Find pairwise values for all item pairs (|I| x |I|)
item embeddings
35. 35
Deep Learning
Dermatologist-level
classification of skin cancer
with deep neural networks
(Nature 542, 115-118, February
2017)
DeepStack: Expert-level artificial
intelligence in heads-up no-limit poker
(Science, March 2017)
Recent Breakthroughs in Deep Learning Reasons for Deep Learning
• captures nonlinear relations
• holistic approach, i.e. reduces number
of components possibly
• less feature engineering
• possibly improved quality
36. 36
Approach: Wide and Deep Model
mileage
price
color
history
mileage
price
color
views
...
0.38
0.25
0.79
...
0
1
0
...
0
0.2
0.8
...
...
encodeencodeencodeencode
...
0.35
-0.15
2.03
cont.
cat.
cont.
cat.
userembeddingsitemembeddings
Output
cross-item
Probability that user X
likes vehicle Y
Deep
Component
Wide
Component
one-hot