2. Problem
● Million Song Dataset Challenge (Kaggle)
○ 110k Users, 1m+ unique songs
● Music Recommendation
○ Recommend songs for each user based on a larger
training set of user listening histories
● Winner - 0.17910 (17.9%)
● Benchmark - 0.02079 (2.1%)
3. Data
● Million Song Dataset
● Two subsets of 1000
users (random and
most active)
● Echonest API to get
metadata
5. Previous Approaches
Dynamic K-Means:
● Kim et. al (6th Int’l Conference on ML)
● Li et. al (University of Michigan)
Item and user-based collaborative-filtering:
● Niu et. al (Stanford)
● Lu et. al (Stanford)
12. What are the Results?
All Metadata
0.00200326282427
Weighted Centroids
0.00375567272976
Multiple Centroids (2)
0.00364834470835
Modified Metadata
0.00994279218087
All Improvements
0.01008282844
More Data
0.00266295400221
20. What are the Results?
User Collaborative Filtering (1k Users)
0.008223545412
User Collaborative Filtering (10k Users)
0.012654713312
User Collaborative Filtering (110k Users)
0.112794360446