For data savvy users (analysts, scientists, ops, engineers) who are willing to discover some nonparametric machine learning algos that might help while competing via Kaggle or, more down-to-earth-ly, while having not that much time to spend on some predictive analytics projects. Talk given at Paris Kaggle meetup.
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Surface features with nonparametric machine learning
1. Surface features with nonparametric ML
How such algos might help (or not )
Alexis Bondu Sylvain Ferrandiz
2. Capturing patterns via model selection
and hyperparameter tuning is …
autoML Logistic
Regression
XOR
MLP 4
neuronsDecision
Tree
3. XOR
X
Y
Z = (XY > 0)
Z
With MODL nonparametric ML
(why not?)
Or one can capture patterns via data
prep’ and feature engineering …
4. How? MODL* in a nutshell
*M. Boullé. Data grid models for preparation and modeling in supervised learning.
In Hands-On Pattern Recognition: Challenges in Machine Learning, volume 1,
I. Guyon, G. Cawley, G. Dror, A. Saffari (eds.), pp. 99-130, Microtome Publishing, 2011.
1. The MODL
framework
For users to explore a new world!
3. Nonparametric algorithm to find the best model
(with respect to the criterion)
1. Nonparametric set of ‘models’
2. Nonparametric regularized criterion
6. Let’s see how
1. MODL: the discretization algo
2. Non informative features detection
3. Drift detection
4. Model calibration
5. Data recoding
6. Supervised bivariate analysis
7. Co-clustering
8. Multi-table
9. Sequential rules extraction
You cannot win a Kaggle competition
by pushing a button, can you?
Whatever, nonparametric ML can help
(Kaggle competition or business project)
7. Choice of the number of intervals d’intervalles
Choice of the bounds positions
Description of the classe values distribution
Likelihood of the data given the model
1. The MODL
framework
Example: the discretization criterion
https://lnkd.in/dteteWA
8. 1. The MODL
framework
Optimizing only the « Prior »
On interval for each « pure » zone
Optimizing only the « Likelihood »
Example: the discretization criterion
https://lnkd.in/dteteWA
A single interval
9. 1. The MODL
approach
The example of the supervised discretization
N N x 2N / 2
The example of the variable « Age » of the dataset « Adult »
MODL avoids over-fitting, without user parameters to be adjusted !
10. 2. Non informative
features detection
Why we want to select variables ?
Var 1 Var 2 … Class
O 12 … A
Y 98 … B
Y 4 … A
1 – Scaling of the learning algorithms
N
K
2 – Accuracy of the learned models
Curse of dimensionality
How to filter uninformative variables before training a model ?
• in a robust way (depending on N)
• without assumption
x y
Independence ?
11. 2. Non informative
features detection
The supervised discretization can be used as a non-parametric test
• Method : If the most probable model only contains a single interval / group,
the variable can be eliminated!
• Advantage : MODL is a universal approximator of P(y|x), thus it is able to
detect any kind of correlation.
969 numerical variables + 2141 categorical variables
39 numerical variables + 49 categorical variables
After filtering
Dataset
30 000 rows
Logistic Regression
- default parameters
- AUC: +0.06
- Computing time: x1500
12. 3. Drift detection
Method : detection of univariate « drift »
• The MODL discretization is a universal approximator
=> No hypothesis on the shape of the « Drift »
Train Model
Deploy
Drift
Scoring
What is drift ?
Train
Deploy
y
0000000001111111111
How detect it ?
13. 3. Drift detection
GC =1-
-log(P(M | D))
-log(P(M0 | D))
Definition : Comparison between coding length of the current model and the simplest model wich include a single interval
Output : variables sorted by drift level
Dataset
Reliable measurement
(compression gain)
A real life example …
14. 4. Model
calibration
Logistic regression : shape of the output
P(y=1|var1, var2)
P(y=1|X)
Logistic regression on the Adult dataset
Some classifiers distort the output estimated probabilities …
How to solve this problem in a robust way, without assumptions ?
15. 4. Model
calibration
The supervised discretization is suitable for this problem
Estimated P(y=1|X) : output of the model
CalibratedP(y=1|X)
Logistic regression on the “Adult” dataset
Robustness: the number and the size of the intervals depend on N
P(y=1|X) y
0.967 1
0.865 1
0.765 0
0.75 1
New training set
Accuracy: the calibrated distribution is not necessary monotonous
universal
approximator
In this case, there is an improvement
of the AUC: +0.09
16. 5. Data recoding
Color Danger
100%
30%
0%
fit
Color P(danger | color)
1.0
1.0
0.3
0.3
0.0
0.0
0.0
0.3
0.0
1.0
1.0
0.3
transform
Advantages
• Encodes categorical into numerical variables, regardless of the levels number
• Limited number of recoded variables (nb classes -1)
• Gain of robustness
The most of ML algorithms process only numerical variables …
27. 9. Sequential rule
extraction
Abstract
Whole genome RNA expression studies permit systematic approaches to understanding
the correlation between gene_expression profiles to disease states or different
developmental stages of a cell. Microarray analysis provides quantitative_information
about the complete transcription profile of cells that facilitate drug and therapeutics
development, disease_diagnosis, and understanding in the basic cell biology. One of the
challenges in microarray analysis, especially in cancerous gene_expression profiles, is to
identify genes or groups of genes that are highly expressed in tumour_cells but not in
normal cells and vice versa. Previously, we have shown that ensemble machine_learning
consistently performs well in classifying biological data. In this paper, we focus on three
different supervised machine_learning techniques in cancer classification, namely C4.5
decision_tree, and bagged and boosted decision_trees.
< classifying, data > → P(ML) = 95%, P(medicine) = 5%
Two classes of scientific articles : medicine, machine learning
Example : categorization of texts
28. 9. Sequential rule
extraction
Robustness of the compression gain illustrated by using the dataset « skater »
Confidence Growth rate
Compression gain
MODL
GC =1-
-log(P(M | D))
-log(P(M0 | D))
Recall : The compression gain compares the coding length of the current model with the
one of the null model M0, which no includes any element in the rule.
29. 9. Sequential rule
extraction
Recoding the rules & Training of a classifier
Ensemble of
informative rules
A B C D E F G
0 0 1 0 1 0 0
1 1 0 0 0 1 0
0 1 0 0 0 0 1
1 1 0 0 0 0 0
0 1 0 1 0 0 0
Binary recoding
Rules
Observations
Training of the
classifier
Compression gain > 0
30. 9. Sequential rule
extraction
Examples of extracted rules
Amazon reviews : sentiment analysis
- No preprocessing
- 2 classes
- AUC = 0.911 with 500 rules
• « I + highly + recommend »
• « dont + waste + your + money »
• « This + is + a + great »
SMS :
- No preprocessing
- 2 classes (spam / non spam)
- AUC = 0.96 with 50 rules
• « FREE »
• « URGENT!»
• « $1000 »
E-mails Reuters :
- No preprocessing
- 10 classes
- 4 sequential variables (organization / place / objet / corps)
- AUC = 0.975 with 1000 rules
• « the + acquisition + of»
• « crude + oil »
• « trade + surplus »
31. Non informative features detection
Drift detection
Model calibration
Data recoding
Supervised bivariate analysis
Co-clustering
Multi-table
Sequential rules extraction
… Yours soon?
Nonparametric ML can help
(Kaggle competition or business project)
Takeaways
Complementary to autoML
32. Want to use it?
What is Edge ML ?
- A new kind of Auto ML library
- Optimized by using C++ and OpenMP
- Easy to use (simple command lines)
- Integrated with Python
Who can use Edge ML ?
- The Datascientists, in order to make secure their projects and accelerate it !
- Everyone, by using the automatic mode
How to get more information ?
- www.edge-ml.fr
- www.marc-boulle.com (MODL approach)
Edge ML is free for competitors, students and professors, Enjoy
33. Want to use it?
Happy users
CodersClickers Analysts, Experts,
Scientists, Devops
On-premiseAs-a-service
Infrastructure
14-day free trial!
https://predicsis.ai/free-trial/
Alexis
Method : detection of univariate « drift »
Discretization with the class values « Train » and « Deploy »
The MODL discretization is a universal approximator
=> No hypothesis on the shape of the « Drift »
Alexis
Alexis
Alexis
Alexis
Essayer Critéo
Sylvain
Sylvain
Sylvain
Usages multiples : reco et matrices creuses (tutoriel EGC)
Sylvain
Sylvain
Sylvain
La construction de variables : ouverture vers la génération d’agrégats