The ACM RecSys Challenge 2016 was focussing on the problem of job recommendations: given a user, return a ranked list of jobs that the user is likely to be interested in. More than 100 teams actively participated and submitted solutions. All the winning teams used an ensemble of recommender strategies (e.g. learning to rank approaches, matrix factorization techniques, etc.). More details: http://2016.recsyschallenge.com/
6. RecSys Challenge
Given a user, the goal is to predict those job postings that the
user will interact with.
6
?
Scala Dev,
Hamburg
job postings
Scala
Engineer
2 months of impressions &
interactions
click
bookmark
7. Datasets
1. Training data:
• User demographics (jobtitle, discipline, industry, career level, # CV entries,
country, region) [1M]
• Job postings (title, discipline, industry, career level, country region) [1M]
• Interactions (user_id, item_id, interaction_type, timestamp) [10M, 2 months]
• Impressions (user_id, item_id, week) [30M, 2 months]
2. Task files:
• Users (= User IDs for whom recommendations should be computed) [150k]
• Candidate items (= item IDs that are allowed to be recommended) [300k]
3. Solution (secret)
• Interactions (user_id, item_id) [1M, 1 week]
Anonymization (Strings IDs; users and interactions are enriched with
artitificial noise) 7
8. Interaction Data
includes interactions that were not performed on recommendations
8
1"
10"
100"
1000"
10000"
100000"
1000000"
1" 10" 100" 1000" 10000" 100000"
number'of'users/items'that'performed/
received'X'interac5ons'
number'of'interac5ons'
items"(train)"
users"(train)"
items"(test)"
users"(test)"
81%$
5%$
2%$
12%$
interac( on*types*
clicks$
replies$
bookmarks$
deletes$
9. Evaluation Measure
Mixture of…
- Precision@k (k = 2, 4, 6, 20)
= fraction of relevant items in the top k
- Recall@30 = fraction of relevant
items in the top k
- Success@30 = probability that at
least one relevant item was
recommended in the top 30
9
10. Who participated?
• 119 teams participated (366 teams registered)
• Countries:
USA (25%)
Germany (11%)
China (9%)
France (7%)
Hungary (4%)
• Type of organization:
academia (∼25%)
industry (∼75%)
most common industry: Internet & IT
larger companies such as Yandex, Alibaba, Microsoft or
Amazon as well as start-ups
10
11. Top score over time
11
0"
100"
200"
300"
400"
500"
600"
700"
0"
500000"
1000000"
1500000"
2000000"
2500000"
0" 5" 10" 15" 20"
Number'of'submissions'during'week'X'
Top'score'at'the'end'of'week'X'
Week'
top"score"(full)"
#submissions"
12. Number of submissions per team
12
0"
100"
200"
300"
400"
500"
600"
0" 20" 40" 60" 80" 100" 120"
number'of'submissions'
rank'of'team'
14. Outlook for 2017
• Current plan:
Domain: again job recommendations
Additional perspectives:
is the user a good candidate for the job?
Novelty (recommending new jobs)
New users (recommending jobs to new users)
Additional features (e.g. clicks from recruiters on profiles)
Additional tooling:
Proper API for submitting solutions
Advanced Baseline implementations (building up on this year’s solutions)
• Goal: offline + online (!!) evaluation
• More details: panel discussion in the afternoon
14
15. Thank you to PC!
• Alejandro Bellogín, Universidad Autónoma de Madrid, Spain
• Paolo Cremonesi, Politecnico di Milano, Italy
• Simon Dooms, Trackuity, Belgium
• Balasz Hidasi, Gravity R&D, Hungary
• Levente Kocsis, Hungarian Academy of Sciences, Hungary
• Andreas Lommatzsch, TU Berlin, Germany
• Katja Niemann, XING AG, Germany
• Alan Said, University of Skövde, Sweden
• Yue Shi, Yahoo Labs, USA
• Marko Tkalcic, Free University of Bozen-Bolzano, Italy
15