SlideShare a Scribd company logo
1 of 45
Download to read offline
WSDM Challenge
Recommender Systems
Kenneth Emeka Odoh
25 Jan, 2018 ( Kaggle Meetup | SFU Ventures Labs )
Vancouver, BC
Talk Plan
● Bio
● Competition
● Winning Solutions
● My solution
● Conclusions
● References
2
Kenneth Emeka Odoh
Bio
Education
● Masters in Computer Science (University of Regina) 2013 - 2016
Work
● Research assistant in the field of Visual Analytics 2013 - 2016
○ ( https://scholar.google.com/citations?user=3G2ncgwAAAAJ&hl=en )
● Software engineer working in an IOT startup (Current position) 2017 -
3Kenneth Emeka Odoh
Competition
● Competition rule
○ Only 5 submissions per day
● Data Exploration
○ Data visualization
● Evaluation
● Primer on recommender systems
4
Kenneth Emeka Odoh
Nature of Data
● Data of Listening habits.
● Data are chronologically ordered.
○ Implicit time series data without
timestamps.
○ Timestamps are excluded for sake of
avoiding data leakages.
● Balanced data sets.
What to predict
● Given a data set of listening habit of
users.
● If a song has been listened in recent
past, predict if it will be listening to
again in the same month.
● If the song is listened to again by the
same user in the same month, output
1 or 0 otherwise
Competition: Data Exploration
5
Kenneth Emeka Odoh
Competition: Background
● URL ( https://www.kaggle.com/c/kkbox-music-recommendation-challenge )
● Sister competition on churn prediction by same company.
● Total price $5000
Class of Problem
● Classification
● Sequence prediction ? 6Kenneth Emeka Odoh
Competition: Data files
● train.csv
○ msno, song_id, source_system_tab, source_screen_name, source_type,
target
● songs.csv
○ song_id, song_length, genre_ids, artist_name, composer, lyricist,
language
● members.csv
○ msno, city, bd, gender, registered_via, registration_init_time,
expiration_date
● song_extra_info.csv
○ song_id, song name, isrc
7
Kenneth Emeka Odoh
Competition: Visualization of Data
8
Figure 1: Count vs Source typeKenneth Emeka Odoh
[jeru666]
Competition: Data files
9Figure 2: Count vs Source type grouped by labels
Kenneth Emeka Odoh
[jeru666]
Competition: Data files
10Figure 3: Source screen names vs Count
Kenneth Emeka Odoh
[jeru666]
Competition: Data files
11
Figure 4: Count vs Source screen names grouped by labelKenneth Emeka Odoh
[jeru666]
Competition: Data files
12
Figure 5: Count vs Source system tabKenneth Emeka Odoh
[jeru666]
Competition: Data files
13
Figure 6: Proportion by SexKenneth Emeka Odoh
[jeru666]
Competition: Data files
14
Figure 7: Proportion by Sex grouped by labelKenneth Emeka Odoh
[jeru666]
Competition: Data files
15
Figure 8: Proportion by LanguageKenneth Emeka Odoh
[jeru666]
Competition: Data files
16
Figure 9: City vs CountKenneth Emeka Odoh
[jeru666]
Competition: Evaluation (AUC)
Image credit: Walber & http://gim.unmc.edu/dxtests/roc2.htm
17
Optimizing PR space -> optimizing ROC space [Davis .et.al]
Kenneth Emeka Odoh
Evaluation Metric
● AUC per user vs AUC per group ?
● What about GAUC?
○ calculate the average of the AUC scores of each user.
■ Cold start problem
● AUC vs F-score vs RMSE ?
18Kenneth Emeka Odoh
Competition: Recommender Systems
19
Kenneth Emeka Odoh
U: set of users
I : set of items
R: set of rating of users on items.
Iu
: set of items rated by a user, u.
f : U x I→ R
This is usually a sparse matrix and as such so
form of data imputation is needed
The most common form of recommender systems
● Best-item recommendation
● Top-N recommendation
For a user, ua
, and i* has estimated the highest
value
i* = argmax f(ua
, j )
j ∊ IIu
20
Kenneth Emeka Odoh
WINNING SOLUTIONS
21
Kenneth Emeka Odoh
1st Place Solution
● Similar to Click Value Rate (CVR)
○ Click-through rate (CTR) is the ratio of users who click on a specific link to the number of total users who
accessed the resource.
● Relistening is like purchasing, and first listening is like clicking.
● Not suitable for latent-factor based or neighborhood-based collaborative filtering is not the best
way for this problem.
● Missing not at random.
● Data is sparse as a user-song pair is very informative.
● Data is time sensitive.
Kenneth Emeka Odoh
22
Ensembling
0.6 * LightGBMs and 0.4 * NNs
● Single LightGBMs is 0.74440 on public LB with a learning rate of 0.1.
● Single LightGBMs is 0.74460 on public LB with a learning rate of 0.05.
● Single LightGBMs with bagging is 0.74584 on public LB with a learning rate of 0.05.
For NNs, my best 30-ensemble on the same feature set can get 0.74215, and bagging ensemble of NNs on
different feature set can get 0.74333.
The predictions of LightGBMs and NNs are quite uncorrelated (correlation coefficient is about 0.92). The
ensemble of single LightGBM and single 30-ensemble of NN can get 0.7489+. The ensemble of ensembles
can reach 0.7498+.
Data preprocessing
https://kaggle2.blob.core.windows.net/forum-message-attachments/259372/8131/model.png
Kenneth Emeka Odoh
23
Analyzing Missing Data
Missing-data mechanisms
● Missing completely at random: This occurs when the probability of missing is same for all cases.
● Missing at random: The probability of missing depends only on available information.
● Missingness that depends on unobserved predictors
● Missingness that depends on the missing value itself.
Handling missing data
● Exclude missing data
● Mean / median imputation
● Last value carried forward
● Using information from related observation
24
Kenneth Emeka Odoh
[http://www.stat.columbia.edu/~gelman/arm/missing.pdf]
3rd Place Solution
Identified how to generate features from the future. We can listen or not listen to the same artist by
the same user in the future.
Add matrix factorization to the feature to use xgboost and catboost.
last 35% for future feature
earlier 65% for history
The size of X_train, X_val was about 20% of data.
Kenneth Emeka Odoh
25
The history can be generated using target. Other features uses all data. The generated features in the
context of categorical features.
1) mean of target from history by categorical features (pairs and triples)
2) count – the number of observations in all data by categorical features (pairs and triples)
3) regression – linear regression target by categorical features (msno, msno + genre_ids, msno+composer
and so on)
4) time_from_prev_heard – time from last heard by categorical features (msno, msno + genre_ids,
msno+composer and so on)
5) time_to_next_heard – time to next heard by categorical features (pairs and triples which include msno)
6) last_time_diff – time to the last heard by categorical features
7) part_of_unique_song – the share of unique song of artist that the user heard on all unique song of artist
26Kenneth Emeka Odoh
8) matrix_factorization – LightFM. I use as a user msno in different context (source_type) and the msno
was the feature for user.
I use last 35% for fit. Also I drop last 5% and again use 35%. Such a procedure could be repeated many
times and this greatly improved the score (I did 6 times on validation and this gave an increase of 0.005, but
in the end I did not have time to do everything and did only 2 times on test).
As a result, I fit xgboost and catboost on two parts of the data using and without the matrix_factorization
feature. And finally I blend all 8 predictions.
The code is available here:
https://github.com/VasiliyRubtsov/wsdm_music_recommendations/blob/master/pipeline.ipynb
27Kenneth Emeka Odoh
Factorization Machine
● By factoring into a number of latent vectors.
● lends itself to kernels.
Let us dive into factorization in a new thread.
https://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf
Observed Characteristics of factorization machine
● Ideal for handling missing data
28
Kenneth Emeka Odoh
6th Place Solution
I use an ensemble (average) of lightgbm. The best single model public LB 0.726
Feature Engineering
1) raw features for both song, user and context
2) SVD matrix factorization latent space of user-song interaction matrix
3) SVD matrix factorization score of user-song interaction matrix
4) collaborative filtering (CF) score (top 100 neighborhoods ) of user-song interaction
5) similarity of artist, lyricist, composer, genre,language, source_system_tab, source_type,
soure_screen_name,within each user.
6) source_system_tab, source_type, soure_screen_name categorical variable rate of each user.
7) word embedding of artist, lyricist, composer, genre.
I use 10 cross validation, where train / valid set are splitted randomly.
Kenneth Emeka Odoh
29
SVD
30
[https://intoli.com/blog/pca-and-svd/]
Kenneth Emeka Odoh
Word Embeddings
31[https://deeplearning4j.org/word2vec.html]Kenneth Emeka Odoh
My Solution
● Primer on LSTM architectures.
○ Bidirectional lstm.
○ Batch normalization
● Intuition / guide for building reasonable neural architecture.
● Tricks to avoid overfitting.
● Optimizing loss function using AUC.
● Cross validation of time series data.
● Discuss failures in my traditional ML models.
○ Try same model with different custom loss functions
32
Kenneth Emeka Odoh
LSTM
● LSTM is RNN trained using BPTT ( solves
vanishing gradient problem ).
● Capture long term temporal dependency.
● A memory blocks used in place of neuron.
There are three types of gates in a memory block:
○ Forget Gate: decides information to be
removed from the unit.
○ Input Gate: decides information to update
memory state.
○ Output Gate: decides information to output
based on input and the memory state.
Goodfellow, et.al. Deep Learning, 2016
33
Kenneth Emeka Odoh
34[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
35[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
LSTM Architecture
iooi
[Google images]
See tutorial on subject [https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/]
Many-to-many model (boundary detection problem)
36
Kenneth Emeka Odoh
[Fei-Fei et. al., 2017]
● Bidirectional LSTMs can be trained
using past and future data of a time
frame.
● Bidirectional LSTMs can improve
performance on a number of sequence
classification.
● Bidirectional LSTMs is two lstms
where the 1st lstm accept input as-is,
and 2nd accept the input in reverse
order.
Bidirectional LSTM
37
[Yu, Zhou et al.]
Kenneth Emeka Odoh
Batch Normalization
● Normalization in neural architecture.
● Reduce covariance shift thereby improving
training speed.
● Control gradient flow in the network
thereby minimizing saturation in non-relu
based activation.
● Fix distribution between training and
testing set thereby enhancing
generalization.
● Implicit regularization.
[Ioffe et. al 2015]
38
Kenneth Emeka Odoh
Overfitting
● Implicit Regularization
○ Batch normalization
○ Early stopping [https://en.wikipedia.org/wiki/Early_stopping]
● Dropout
39
Kenneth Emeka Odoh
[Srivastava et. al., 2014]
Cross-Validation of Time-Series
● Don’t trust the
leaderboard.
● Trust local CV for model
comparison.
40
Kenneth Emeka Odoh
My Solution: Code
def model_relu6():
model = Sequential()
model.add( Embedding(inputDim*inputDim*inputDim, inputDim, dropout=0.2) )
#input vector dimension
model.add(Convolution1D(nb_filter= nb_filter, filter_length= filter_length,
border_mode='valid', activation='relu',subsample_length=1))
model.add(MaxPooling1D(pool_length= pool_length))
model.add( Bidirectional( LSTM(1024, return_sequences=True) ))
model.add(LeakyReLU())
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add( Bidirectional( LSTM(2048, return_sequences=True)) )
model.add(Activation('relu'))
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add( Bidirectional( LSTM(512, return_sequences=True)) )
model.add(Activation('relu'))
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add( Bidirectional( LSTM(256)) )
model.add(Activation('relu'))
model.add(BatchNormalization( ))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',metrics=[jacek_auc,discussion41015_auc])
return model
41Code: https://github.com/kenluck2001/KaggleKenneth/tree/master/wsdm_Kaggle
#full code here ( https://www.kaggle.com/rspadim/gini-keras-callback-earlystopping-validation )
# FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41108
def jacek_auc(y_true, y_pred):
score, up_opt = tf.metrics.auc(y_true, y_pred)
#score, up_opt = tf.contrib.metrics.streaming_auc(y_pred, y_true)
K.get_session().run(tf.local_variables_initializer())
with tf.control_dependencies([up_opt]):
score = tf.identity(score)
return score
# FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41015
# AUC for a binary classifier
def discussion41015_auc(y_true, y_pred):
ptas = tf.stack([binary_PTA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
pfas = tf.stack([binary_PFA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0)
pfas = tf.concat([tf.ones((1,)) ,pfas],axis=0)
binSizes = -(pfas[1:]-pfas[:-1])
s = ptas*binSizes
return K.sum(s, axis=0)
# PFA, prob false alert for binary classifier
def binary_PFA(y_true, y_pred, threshold=K.variable(value=0.5)):
y_pred = K.cast(y_pred >= threshold, 'float32')
# N = total number of negative labels
N = K.sum(1 - y_true)
# FP = total number of false alerts, alerts from the negative class labels
FP = K.sum(y_pred - y_pred * y_true)
return FP/N
# P_TA prob true alerts for binary classifier
def binary_PTA(y_true, y_pred, threshold=K.variable(value=0.5)):
y_pred = K.cast(y_pred >= threshold, 'float32')
# P = total number of positive labels
P = K.sum(y_true)
# TP = total number of correct alerts, alerts from the positive class labels
TP = K.sum(y_pred * y_true)
return TP/P
Kenneth Emeka Odoh
Private board: 0.59
Alternatives
● Trees (catboost, lightboost)
Other that could work with careful effort
● HMM
● MARS
● ANFIS
● Logistic regression
● Naive Bayes
42Kenneth Emeka Odoh
Conclusions
● Building models versus building infrastructure trade off.
● Debugging a failing model versus trying out a new model.
● Feature engineering versus better algorithm.
● Neural depth versus training size.
● “Ensembling work when the conditions are right”.
○ Provide empirical justification.
● The place of magic numbers.
● “Leakages may not actually be a bad thing”.
43
Kenneth Emeka Odoh
Thanks for listening
@kenluck2001
44
https://www.linkedin.com/in/kenluck2001/
kenneth.odoh@gmail.com
Kenneth Emeka Odoh
References
Yu, Zhou et al. “Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of
non-native spontaneous speech.” ASRU (2015).
Ioffe, S. and Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd
International Conference on Machine Learning, in PMLR 37:448-456 (2015).
Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2010. Recommender Systems Handbook (1st ed.). Springer-Verlag New York, Inc.,
New York, NY, USA.
Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. MIT Press.
Jeru666, Visualization script, https://www.kaggle.com/jeru666/data-exploration-and-analysis .
Davis, J. & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. ICML '06: Proceedings of the 23rd international
conference on Machine learning 233--240, New York, NY, USA
Fei-Fei Li & Justin Johnson & Serena Yeung, (2017), http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from
overfitting.. Journal of Machine Learning Research, 15, 1929-1958.
45
Kenneth Emeka Odoh

More Related Content

What's hot

What's hot (20)

Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithms
 
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...
 
Unit1 pg math model
Unit1 pg math modelUnit1 pg math model
Unit1 pg math model
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 
505 260-266
505 260-266505 260-266
505 260-266
 
Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learning
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Comparative Performance Analysis & Complexity of Different Sorting Algorithm
Comparative Performance Analysis & Complexity of Different Sorting AlgorithmComparative Performance Analysis & Complexity of Different Sorting Algorithm
Comparative Performance Analysis & Complexity of Different Sorting Algorithm
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Data structure and algorithm notes
Data structure and algorithm notesData structure and algorithm notes
Data structure and algorithm notes
 

Similar to Kaggle kenneth

Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Lionel Briand
 
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Weiyang Tong
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Lippo Group Digital
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
Shrayes Ramesh
 

Similar to Kaggle kenneth (20)

IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
 
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
Multi-Domain Diversity Preservation to Mitigate Particle Stagnation and Enab...
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratings
 
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...
[IJET V2I3P11] Authors: Payal More, Rohini Pandit, Supriya Makude, Harsh Nirb...
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
Ensemble of Heterogeneous Flexible Neural Tree for the approximation and feat...
 
Manos
ManosManos
Manos
 
Music Genre Classification using Machine Learning
Music Genre Classification using Machine LearningMusic Genre Classification using Machine Learning
Music Genre Classification using Machine Learning
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 

Kaggle kenneth

  • 1. WSDM Challenge Recommender Systems Kenneth Emeka Odoh 25 Jan, 2018 ( Kaggle Meetup | SFU Ventures Labs ) Vancouver, BC
  • 2. Talk Plan ● Bio ● Competition ● Winning Solutions ● My solution ● Conclusions ● References 2 Kenneth Emeka Odoh
  • 3. Bio Education ● Masters in Computer Science (University of Regina) 2013 - 2016 Work ● Research assistant in the field of Visual Analytics 2013 - 2016 ○ ( https://scholar.google.com/citations?user=3G2ncgwAAAAJ&hl=en ) ● Software engineer working in an IOT startup (Current position) 2017 - 3Kenneth Emeka Odoh
  • 4. Competition ● Competition rule ○ Only 5 submissions per day ● Data Exploration ○ Data visualization ● Evaluation ● Primer on recommender systems 4 Kenneth Emeka Odoh
  • 5. Nature of Data ● Data of Listening habits. ● Data are chronologically ordered. ○ Implicit time series data without timestamps. ○ Timestamps are excluded for sake of avoiding data leakages. ● Balanced data sets. What to predict ● Given a data set of listening habit of users. ● If a song has been listened in recent past, predict if it will be listening to again in the same month. ● If the song is listened to again by the same user in the same month, output 1 or 0 otherwise Competition: Data Exploration 5 Kenneth Emeka Odoh
  • 6. Competition: Background ● URL ( https://www.kaggle.com/c/kkbox-music-recommendation-challenge ) ● Sister competition on churn prediction by same company. ● Total price $5000 Class of Problem ● Classification ● Sequence prediction ? 6Kenneth Emeka Odoh
  • 7. Competition: Data files ● train.csv ○ msno, song_id, source_system_tab, source_screen_name, source_type, target ● songs.csv ○ song_id, song_length, genre_ids, artist_name, composer, lyricist, language ● members.csv ○ msno, city, bd, gender, registered_via, registration_init_time, expiration_date ● song_extra_info.csv ○ song_id, song name, isrc 7 Kenneth Emeka Odoh
  • 8. Competition: Visualization of Data 8 Figure 1: Count vs Source typeKenneth Emeka Odoh [jeru666]
  • 9. Competition: Data files 9Figure 2: Count vs Source type grouped by labels Kenneth Emeka Odoh [jeru666]
  • 10. Competition: Data files 10Figure 3: Source screen names vs Count Kenneth Emeka Odoh [jeru666]
  • 11. Competition: Data files 11 Figure 4: Count vs Source screen names grouped by labelKenneth Emeka Odoh [jeru666]
  • 12. Competition: Data files 12 Figure 5: Count vs Source system tabKenneth Emeka Odoh [jeru666]
  • 13. Competition: Data files 13 Figure 6: Proportion by SexKenneth Emeka Odoh [jeru666]
  • 14. Competition: Data files 14 Figure 7: Proportion by Sex grouped by labelKenneth Emeka Odoh [jeru666]
  • 15. Competition: Data files 15 Figure 8: Proportion by LanguageKenneth Emeka Odoh [jeru666]
  • 16. Competition: Data files 16 Figure 9: City vs CountKenneth Emeka Odoh [jeru666]
  • 17. Competition: Evaluation (AUC) Image credit: Walber & http://gim.unmc.edu/dxtests/roc2.htm 17 Optimizing PR space -> optimizing ROC space [Davis .et.al] Kenneth Emeka Odoh
  • 18. Evaluation Metric ● AUC per user vs AUC per group ? ● What about GAUC? ○ calculate the average of the AUC scores of each user. ■ Cold start problem ● AUC vs F-score vs RMSE ? 18Kenneth Emeka Odoh
  • 20. U: set of users I : set of items R: set of rating of users on items. Iu : set of items rated by a user, u. f : U x I→ R This is usually a sparse matrix and as such so form of data imputation is needed The most common form of recommender systems ● Best-item recommendation ● Top-N recommendation For a user, ua , and i* has estimated the highest value i* = argmax f(ua , j ) j ∊ IIu 20 Kenneth Emeka Odoh
  • 22. 1st Place Solution ● Similar to Click Value Rate (CVR) ○ Click-through rate (CTR) is the ratio of users who click on a specific link to the number of total users who accessed the resource. ● Relistening is like purchasing, and first listening is like clicking. ● Not suitable for latent-factor based or neighborhood-based collaborative filtering is not the best way for this problem. ● Missing not at random. ● Data is sparse as a user-song pair is very informative. ● Data is time sensitive. Kenneth Emeka Odoh 22
  • 23. Ensembling 0.6 * LightGBMs and 0.4 * NNs ● Single LightGBMs is 0.74440 on public LB with a learning rate of 0.1. ● Single LightGBMs is 0.74460 on public LB with a learning rate of 0.05. ● Single LightGBMs with bagging is 0.74584 on public LB with a learning rate of 0.05. For NNs, my best 30-ensemble on the same feature set can get 0.74215, and bagging ensemble of NNs on different feature set can get 0.74333. The predictions of LightGBMs and NNs are quite uncorrelated (correlation coefficient is about 0.92). The ensemble of single LightGBM and single 30-ensemble of NN can get 0.7489+. The ensemble of ensembles can reach 0.7498+. Data preprocessing https://kaggle2.blob.core.windows.net/forum-message-attachments/259372/8131/model.png Kenneth Emeka Odoh 23
  • 24. Analyzing Missing Data Missing-data mechanisms ● Missing completely at random: This occurs when the probability of missing is same for all cases. ● Missing at random: The probability of missing depends only on available information. ● Missingness that depends on unobserved predictors ● Missingness that depends on the missing value itself. Handling missing data ● Exclude missing data ● Mean / median imputation ● Last value carried forward ● Using information from related observation 24 Kenneth Emeka Odoh [http://www.stat.columbia.edu/~gelman/arm/missing.pdf]
  • 25. 3rd Place Solution Identified how to generate features from the future. We can listen or not listen to the same artist by the same user in the future. Add matrix factorization to the feature to use xgboost and catboost. last 35% for future feature earlier 65% for history The size of X_train, X_val was about 20% of data. Kenneth Emeka Odoh 25
  • 26. The history can be generated using target. Other features uses all data. The generated features in the context of categorical features. 1) mean of target from history by categorical features (pairs and triples) 2) count – the number of observations in all data by categorical features (pairs and triples) 3) regression – linear regression target by categorical features (msno, msno + genre_ids, msno+composer and so on) 4) time_from_prev_heard – time from last heard by categorical features (msno, msno + genre_ids, msno+composer and so on) 5) time_to_next_heard – time to next heard by categorical features (pairs and triples which include msno) 6) last_time_diff – time to the last heard by categorical features 7) part_of_unique_song – the share of unique song of artist that the user heard on all unique song of artist 26Kenneth Emeka Odoh
  • 27. 8) matrix_factorization – LightFM. I use as a user msno in different context (source_type) and the msno was the feature for user. I use last 35% for fit. Also I drop last 5% and again use 35%. Such a procedure could be repeated many times and this greatly improved the score (I did 6 times on validation and this gave an increase of 0.005, but in the end I did not have time to do everything and did only 2 times on test). As a result, I fit xgboost and catboost on two parts of the data using and without the matrix_factorization feature. And finally I blend all 8 predictions. The code is available here: https://github.com/VasiliyRubtsov/wsdm_music_recommendations/blob/master/pipeline.ipynb 27Kenneth Emeka Odoh
  • 28. Factorization Machine ● By factoring into a number of latent vectors. ● lends itself to kernels. Let us dive into factorization in a new thread. https://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf Observed Characteristics of factorization machine ● Ideal for handling missing data 28 Kenneth Emeka Odoh
  • 29. 6th Place Solution I use an ensemble (average) of lightgbm. The best single model public LB 0.726 Feature Engineering 1) raw features for both song, user and context 2) SVD matrix factorization latent space of user-song interaction matrix 3) SVD matrix factorization score of user-song interaction matrix 4) collaborative filtering (CF) score (top 100 neighborhoods ) of user-song interaction 5) similarity of artist, lyricist, composer, genre,language, source_system_tab, source_type, soure_screen_name,within each user. 6) source_system_tab, source_type, soure_screen_name categorical variable rate of each user. 7) word embedding of artist, lyricist, composer, genre. I use 10 cross validation, where train / valid set are splitted randomly. Kenneth Emeka Odoh 29
  • 32. My Solution ● Primer on LSTM architectures. ○ Bidirectional lstm. ○ Batch normalization ● Intuition / guide for building reasonable neural architecture. ● Tricks to avoid overfitting. ● Optimizing loss function using AUC. ● Cross validation of time series data. ● Discuss failures in my traditional ML models. ○ Try same model with different custom loss functions 32 Kenneth Emeka Odoh
  • 33. LSTM ● LSTM is RNN trained using BPTT ( solves vanishing gradient problem ). ● Capture long term temporal dependency. ● A memory blocks used in place of neuron. There are three types of gates in a memory block: ○ Forget Gate: decides information to be removed from the unit. ○ Input Gate: decides information to update memory state. ○ Output Gate: decides information to output based on input and the memory state. Goodfellow, et.al. Deep Learning, 2016 33 Kenneth Emeka Odoh
  • 34. 34[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
  • 35. 35[Fei-Fei et. al., 2017]Kenneth Emeka Odoh
  • 36. LSTM Architecture iooi [Google images] See tutorial on subject [https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/] Many-to-many model (boundary detection problem) 36 Kenneth Emeka Odoh [Fei-Fei et. al., 2017]
  • 37. ● Bidirectional LSTMs can be trained using past and future data of a time frame. ● Bidirectional LSTMs can improve performance on a number of sequence classification. ● Bidirectional LSTMs is two lstms where the 1st lstm accept input as-is, and 2nd accept the input in reverse order. Bidirectional LSTM 37 [Yu, Zhou et al.] Kenneth Emeka Odoh
  • 38. Batch Normalization ● Normalization in neural architecture. ● Reduce covariance shift thereby improving training speed. ● Control gradient flow in the network thereby minimizing saturation in non-relu based activation. ● Fix distribution between training and testing set thereby enhancing generalization. ● Implicit regularization. [Ioffe et. al 2015] 38 Kenneth Emeka Odoh
  • 39. Overfitting ● Implicit Regularization ○ Batch normalization ○ Early stopping [https://en.wikipedia.org/wiki/Early_stopping] ● Dropout 39 Kenneth Emeka Odoh [Srivastava et. al., 2014]
  • 40. Cross-Validation of Time-Series ● Don’t trust the leaderboard. ● Trust local CV for model comparison. 40 Kenneth Emeka Odoh
  • 41. My Solution: Code def model_relu6(): model = Sequential() model.add( Embedding(inputDim*inputDim*inputDim, inputDim, dropout=0.2) ) #input vector dimension model.add(Convolution1D(nb_filter= nb_filter, filter_length= filter_length, border_mode='valid', activation='relu',subsample_length=1)) model.add(MaxPooling1D(pool_length= pool_length)) model.add( Bidirectional( LSTM(1024, return_sequences=True) )) model.add(LeakyReLU()) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add( Bidirectional( LSTM(2048, return_sequences=True)) ) model.add(Activation('relu')) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add( Bidirectional( LSTM(512, return_sequences=True)) ) model.add(Activation('relu')) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add( Bidirectional( LSTM(256)) ) model.add(Activation('relu')) model.add(BatchNormalization( )) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam',metrics=[jacek_auc,discussion41015_auc]) return model 41Code: https://github.com/kenluck2001/KaggleKenneth/tree/master/wsdm_Kaggle #full code here ( https://www.kaggle.com/rspadim/gini-keras-callback-earlystopping-validation ) # FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41108 def jacek_auc(y_true, y_pred): score, up_opt = tf.metrics.auc(y_true, y_pred) #score, up_opt = tf.contrib.metrics.streaming_auc(y_pred, y_true) K.get_session().run(tf.local_variables_initializer()) with tf.control_dependencies([up_opt]): score = tf.identity(score) return score # FROM https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/41015 # AUC for a binary classifier def discussion41015_auc(y_true, y_pred): ptas = tf.stack([binary_PTA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0) pfas = tf.stack([binary_PFA(y_true,y_pred,k) for k in np.linspace(0, 1, 1000)],axis=0) pfas = tf.concat([tf.ones((1,)) ,pfas],axis=0) binSizes = -(pfas[1:]-pfas[:-1]) s = ptas*binSizes return K.sum(s, axis=0) # PFA, prob false alert for binary classifier def binary_PFA(y_true, y_pred, threshold=K.variable(value=0.5)): y_pred = K.cast(y_pred >= threshold, 'float32') # N = total number of negative labels N = K.sum(1 - y_true) # FP = total number of false alerts, alerts from the negative class labels FP = K.sum(y_pred - y_pred * y_true) return FP/N # P_TA prob true alerts for binary classifier def binary_PTA(y_true, y_pred, threshold=K.variable(value=0.5)): y_pred = K.cast(y_pred >= threshold, 'float32') # P = total number of positive labels P = K.sum(y_true) # TP = total number of correct alerts, alerts from the positive class labels TP = K.sum(y_pred * y_true) return TP/P Kenneth Emeka Odoh Private board: 0.59
  • 42. Alternatives ● Trees (catboost, lightboost) Other that could work with careful effort ● HMM ● MARS ● ANFIS ● Logistic regression ● Naive Bayes 42Kenneth Emeka Odoh
  • 43. Conclusions ● Building models versus building infrastructure trade off. ● Debugging a failing model versus trying out a new model. ● Feature engineering versus better algorithm. ● Neural depth versus training size. ● “Ensembling work when the conditions are right”. ○ Provide empirical justification. ● The place of magic numbers. ● “Leakages may not actually be a bad thing”. 43 Kenneth Emeka Odoh
  • 45. References Yu, Zhou et al. “Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech.” ASRU (2015). Ioffe, S. and Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning, in PMLR 37:448-456 (2015). Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2010. Recommender Systems Handbook (1st ed.). Springer-Verlag New York, Inc., New York, NY, USA. Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. MIT Press. Jeru666, Visualization script, https://www.kaggle.com/jeru666/data-exploration-and-analysis . Davis, J. & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. ICML '06: Proceedings of the 23rd international conference on Machine learning 233--240, New York, NY, USA Fei-Fei Li & Justin Johnson & Serena Yeung, (2017), http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting.. Journal of Machine Learning Research, 15, 1929-1958. 45 Kenneth Emeka Odoh