SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Let us talk about output features !
Florence d’Alch´e-Buc
Joint work with C´eline Brouard (Aalto U.), Juho Rousu (Aalto U.), Alexandre
Garcia, Slim Essid, Chlo´e Clavel, Moussab Djerrab
LTCI, T´el´ecom ParisTech
This work has been partially funded by the T´el´ecom ParisTech Chair Machine
Learning for Big Data
Introduction
Supervised learning in a nutshell
Supervised Learning helps us to answer questions of the following form:
• Given some input object x ∈ X, provide a prediction ˆy of some
output object y ∈ Y associated to x
using
• a training set Sn = {(xi , yi ), i = 1, . . . , n}
• a class of functions H
• a loss (y, y ) that tells how much y and y differ
• a complexity measure Ω of a function h ∈ H
and a learning algorithm A able to build a predictive model hn from Sn
and .
Assumption: the new datapoint x is assumed to be drawn from the same
distribution than the sample Sn.
1
Input Features, Input representation
Once you are in the right space it is easier to decide. Basically choosing
the features strongly influences the choice of models
2
Input Features, Input representation
The choice of input representation to describe input objects is key to the
success of Machine Learning Algorithms:
• Feature space in tree-based methods
• Choice of kernels in kernel methods: say how you want to compare
two objects with k(x, x )
• Representation learning in Deep learning: provide the raw data to
the deep neural networks, the first layers learn appropriate
representations to be handled by next layers
3
Question: what about output features ?
• May the choice of appropriate output features help for the prediction
task at hand ?
• In other words: is it interesting to modify the output space to get an
easier problem ?
• What is the price to come back to the original problem ?
We will give a short overview of the use of output features in machine
learning problems.
4
Output features for structured
output prediction
First Example: input output kernel regression for Metabolite
identification
• x is a mass spectrum
• y is a metabolite signature (a binary vector with presence and
absence of substructure)
Difficult problem to solve in chemoinformatics. Requires in silico
approaches.
5
Output features for structured output prediction
Choose φ : Y → F and solve the following simpler problem:
1. Learning problem: hn = arg minh
n
i=1 L(φ(yi ), h(xi )) + Ω(h)
2. Predictive Model:fn(x) = arg miny∈Y L(φ(y), hn(x))
Question: How to choose L, how do we choose the output feature map φ
?
6
Implicit Output features for structured output prediction
Use the kernel trick in the output space
Output Kernel Regression (Geurts et al. 2006, 2007; Brouard et al. 2011,
2016)
7
Implicit Output feature for structured output prediction
Use the kernel trick in the output space: example, metabolite
identification
• L(φ(y), h(x)) = φ(y) − h(x) 2
• φ is not explicitely defined but a kernel k on Y is defined (gaussian
kernel on finite dimensional fingerprints of the molecules)
• g is the decoding function: if k is normalized, we have:
fn(x) = g(hn(x))) = arg miny∈Y φ(y) − hn(x) 2
• When h is chosen as a kernel-based model with an input kernel and
an output kernel, everything goes nicely: closed-form solution !
Brouard et al. JMLR, 2016.
Brouard et al. Bioinformatics, 2016.
8
Overview of the problem
• Setup : we want to predict the labels of a known target graph
structure (encoded by a directed graph).
”x=TripAdvisor review” ⇒ ”y=sentence level opinion annotations”
The room was ok,
nothing special, still
a perfect choice to
quickly join the main
places.
9
Overview of the problem
• Setup : we want to predict the labels of a known target graph
structure (encoded by a directed graph).
”x=TripAdvisor review” ⇒ ”y=sentence level opinion annotations”
The room was OK,
nothing special, still
a perfect choice to
quickly join the main
places.
9
Overview of the problem
• Additional difficulty : we want to be able to asbtain on some nodes
of the graphs while continuing to make prediction.
”x=TripAdvisor review” ⇒ y=sentence level opinion annotations”
The room was ok,
nothing special, still
a perfect choice to
quickly join the main
places.
10
Overview of the problem
• Additional difficulty : we want to be able to asbtain on some nodes
of the graphs while continuing to make prediction.
”x=TripAdvisor review” ⇒ y=sentence level opinion annotations”
The room was OK,
nothing special, still
a perfect choice to
quickly join the main
places.
10
Output Features (for hierarchical) structure labeling with ab-
stention
We seek h a prediction function and r a reject function.
• Learning deals with: hn = arg minh i ψwa(yi ) − h(xi ) 2
+ Ω(h)
• Abstention is handled at the very last moment:
(fn(x), rn(x)) = arg min(yf ,yr )∈YF,R hn(x), Cψa(yf , yr )
ψwa and ψa with the help of C: take into account the tree structure
Garcia et al., ICML 2018.
11
Output features for few and
zero-shot learning
Third example: zero-shot learning
Multiclass classification ? A simple question, really ?
A human being is able to recognise an object (an animal) in an image
even though he/she has never seen an instance of it before.
The classic setting of supervised learning does not address this issue: the
relevant task is not just about recognising an index of class, it is about
recognising the concept underlying a class
12
Realistic Scenario for Multiclass classification
• You know the set of possible classes Y
• Your training dataset
• does contain a handful of instances for each classes: few-shot
learning: the so-called small data regime
• does contain at least one instance per class : one-shot learning
• does not contain instances of some classes: Y = Yseen
∪ Yunseen
:
zero-shot learning
See Xian et al. 2018 (a review in IEEE Trans. PAMI)
13
Few, one, zero shot-learning in image recognition
Use a semantic encoding z = φ(y) ∈ Rd
of class y ∈ † such that two
close classes have close representation.
Major tool: take the name of an object class and encode it as a
semantic vector in a finite dimensional space with word2vec or Glove
1. Predictive Model:fw (x) = arg maxy∈Y S(x, φ(y), w)
2. Learning problem: minw
n
i=1 (φ(yi ), fw (xi )) + Ω(w)
Question: how to improve φ ? Learn a good encoding of the output
data.
14
A relevant output embedding: the Fisher Score !
A plugging for any method that uses any semantic encoding z = φ(y)
ψ(z) = θ(log(pθ(z))) ∈ R|θ|
Example: take pθ as a Gaussian mixture model
• Use ψ(z) as the new output feature vector that encodes a class y
• The new code ψ ◦ φ(y) highlights the proximity of some classes:
those that belong to the same cluster but are not seen in the
training phase will anyway benefit from the learning as well. 15
Output Fisher Embedding
For the Gaussian mixture model:
∀z ∈ Z , pθ(z) =
C
i=1
πi pθi (z) =
C
i=1
πi N(µi , Σi )
Corresponding Fisher Score:
∂ log(log pθ(z))
∂πj
=
pθj
(z)
pθ(z)
= αj (z)
∂ log(log pθ(z))
∂mj
=πj αj (z)Σ−1
j (z − mj ) = βj,1(z)
∂ log(log pθ(z))
∂Σj
=πj αj (z)(−Σ−1
j + Σ−1
j (z − mj )(z − mj )t
Σ−1
j ) = βj,2(z)
16
Output Fisher Embedding Regression
To wrap up:
1. First estimate θ from {z1
, . . . , zC
} the set of semantic vectors
encoding the classes
2. Encode each yi from the training dataset as: ψˆθ(zi )
3. Solve the regression problem with your preferred multiple output
regression tool: minh i ψˆθ(zi ) − h(xi ) 2
+ Ω(h)
4. Prediction Phase: for each x, compute arg min ψ ◦ φ(y) − h(x) 2
The approach is also valid for any structured output learning problem as
well (results for text-to-time-series, for instance)
17
Experimental results for OFER : Multiclass prediction task Cal-
tech101
Number of Modes for the GMM : C = 2
# ex/
class
Classification accuracy on Test Set: mean ± std (%)
m-SVM Sem-IOKR Sem-KRR OFER-GMM
1 9.61 ± 3.98 13.40 ± 2.22 14.83 ± 4.02 38.22 ± 2.87
3 33.89 ± 1.79 22.51 ± 1.81 22.71 ± 2.33 46.33 ± 2.44
5 47.63 ± 2.87 24.90 ± 1.27 25.91 ± 1.28 49.40 ± 2.09
7 55.19 ± 2.43 26.84 ± 0.92 27.42 ± 1.59 50.39 ± 2.04
10 58.55 ± 1.84 31.27 ± 1.84 29.49 ± 1.39 50.49 ± 1.07
Table 1: Results on Caltech101 with a growing number of labeled examples
per class.
18
Results on zero-shot learning
top-1 accuracy in %
Method SUN CUB AWA1 AWA2 aPY
IAP 19.4 24.0 35.9 35.9 36.6
CONSE 38.8 34.3 45.6 44.5 26.9
LATEM 55.3 49.3 55.1 55.8 35.2
ALE 58.1 54.9 59.9 62.5 39.7
DEVISE 56.5 52.0 54.2 59.7 39.8
SJE 53.7 53.9 65.6 61.9 32.9
ESZSL 54.5 53.9 58.2 58.6 38.3
SYNC 56.3 55.6 54.0 46.6 23.9
GFZSL 60.6 49.3 68.3 63.8 38.4
KRR-ZSL 0.27 7.24 2.77 1.95 0.3
OFER-ZSL 42.7 38.7 46.6 45.7 28.5
SJE-OFE 55.6 57.1 69.3 64.2 33.4
Table 2: Comparison of OFER-ZSL against state of the art methods with
(att) attributes.
19
Conclusion
• Encoding appropriatedly the outputs helps in structured output
prediction
• The problem can be seen as defining new families of surrogate losses
that involve to solve an easy (fast-to-compute) subsidiary regression
problem
• Output Fisher Embedding is just one example of learned
representation
• Current works: learning both the output feature map and the
surrogate regressor, extension to deep learning
20
References
• C´eline Brouard, Huibin Shen, Kai Dhrkop, Florence d’Alch´e-Buc, Sebastian Bocker, Juho
Rousu: Fast metabolite identification with Input Output Kernel Regression. Bioinformatics
32(12): 28-36 (2016)
• C. Brouard, M. Szafranski, F. d’Alch´e-Buc: Input Output Kernel Regression: Supervised and
Semi-Supervised Structured Output Prediction with Operator-Valued Kernels. Journal of
Machine Learning Research 17: 176:1-176:48 (2016)
• Moussab Djerrab, Alexandre Garcia, Maxime Sangnier, Florence d’Alch´e-Buc: Output Fisher
embedding regression. Machine Learning 107(8-10): 1229-1256 (2018)
• Alexandre Garcia, Chlo´e Clavel, Slim Essid, Florence d’Alch´e-Buc: Structured Output
Learning with Abstention: Application to Accurate Opinion Prediction. ICML 2018:
1681-1689
• Anna Korba, Alexandre Garcia, Florence d’Alch´e-Buc: A Structured Prediction Approach for
Label Ranking. To appear, NIPS (2018)
21

Weitere ähnliche Inhalte

Was ist angesagt?

Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Conditional neural processes
Conditional neural processesConditional neural processes
Conditional neural processesKazuki Fujikawa
 
Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2butest
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2butest
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsNAVER Engineering
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 

Was ist angesagt? (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Conditional neural processes
Conditional neural processesConditional neural processes
Conditional neural processes
 
Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Nister iccv2005tutorial
Nister iccv2005tutorialNister iccv2005tutorial
Nister iccv2005tutorial
 

Ähnlich wie "Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Professor @Télécom ParisTech

Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detectionzukun
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Rethinking of Generalization
Rethinking of GeneralizationRethinking of Generalization
Rethinking of GeneralizationHikaru Ibayashi
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Fwdays
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksDenis Dus
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the thesanjaibalajeessn
 
1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the tofariyaPatel
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Ono Shigeru
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deepKNaveenKumarECE
 
Supervised Prediction of Graph Summaries
Supervised Prediction of Graph SummariesSupervised Prediction of Graph Summaries
Supervised Prediction of Graph SummariesDaniil Mirylenka
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingShengyuan Wang Steven
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machinesMostafa G. M. Mostafa
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 

Ähnlich wie "Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Professor @Télécom ParisTech (20)

Structured regression for efficient object detection
Structured regression for efficient object detectionStructured regression for efficient object detection
Structured regression for efficient object detection
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Rethinking of Generalization
Rethinking of GeneralizationRethinking of Generalization
Rethinking of Generalization
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"Yulia Honcharenko "Application of metric learning for logo recognition"
Yulia Honcharenko "Application of metric learning for logo recognition"
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural Networks
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
01 lec intro
01 lec intro01 lec intro
01 lec intro
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
Supervised Prediction of Graph Summaries
Supervised Prediction of Graph SummariesSupervised Prediction of Graph Summaries
Supervised Prediction of Graph Summaries
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative Investing
 
Neural Networks: Support Vector machines
Neural Networks: Support Vector machinesNeural Networks: Support Vector machines
Neural Networks: Support Vector machines
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 

Mehr von Paris Women in Machine Learning and Data Science

Mehr von Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Kürzlich hochgeladen

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Kürzlich hochgeladen (20)

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Professor @Télécom ParisTech

  • 1. Let us talk about output features ! Florence d’Alch´e-Buc Joint work with C´eline Brouard (Aalto U.), Juho Rousu (Aalto U.), Alexandre Garcia, Slim Essid, Chlo´e Clavel, Moussab Djerrab LTCI, T´el´ecom ParisTech This work has been partially funded by the T´el´ecom ParisTech Chair Machine Learning for Big Data
  • 3. Supervised learning in a nutshell Supervised Learning helps us to answer questions of the following form: • Given some input object x ∈ X, provide a prediction ˆy of some output object y ∈ Y associated to x using • a training set Sn = {(xi , yi ), i = 1, . . . , n} • a class of functions H • a loss (y, y ) that tells how much y and y differ • a complexity measure Ω of a function h ∈ H and a learning algorithm A able to build a predictive model hn from Sn and . Assumption: the new datapoint x is assumed to be drawn from the same distribution than the sample Sn. 1
  • 4. Input Features, Input representation Once you are in the right space it is easier to decide. Basically choosing the features strongly influences the choice of models 2
  • 5. Input Features, Input representation The choice of input representation to describe input objects is key to the success of Machine Learning Algorithms: • Feature space in tree-based methods • Choice of kernels in kernel methods: say how you want to compare two objects with k(x, x ) • Representation learning in Deep learning: provide the raw data to the deep neural networks, the first layers learn appropriate representations to be handled by next layers 3
  • 6. Question: what about output features ? • May the choice of appropriate output features help for the prediction task at hand ? • In other words: is it interesting to modify the output space to get an easier problem ? • What is the price to come back to the original problem ? We will give a short overview of the use of output features in machine learning problems. 4
  • 7. Output features for structured output prediction
  • 8. First Example: input output kernel regression for Metabolite identification • x is a mass spectrum • y is a metabolite signature (a binary vector with presence and absence of substructure) Difficult problem to solve in chemoinformatics. Requires in silico approaches. 5
  • 9. Output features for structured output prediction Choose φ : Y → F and solve the following simpler problem: 1. Learning problem: hn = arg minh n i=1 L(φ(yi ), h(xi )) + Ω(h) 2. Predictive Model:fn(x) = arg miny∈Y L(φ(y), hn(x)) Question: How to choose L, how do we choose the output feature map φ ? 6
  • 10. Implicit Output features for structured output prediction Use the kernel trick in the output space Output Kernel Regression (Geurts et al. 2006, 2007; Brouard et al. 2011, 2016) 7
  • 11. Implicit Output feature for structured output prediction Use the kernel trick in the output space: example, metabolite identification • L(φ(y), h(x)) = φ(y) − h(x) 2 • φ is not explicitely defined but a kernel k on Y is defined (gaussian kernel on finite dimensional fingerprints of the molecules) • g is the decoding function: if k is normalized, we have: fn(x) = g(hn(x))) = arg miny∈Y φ(y) − hn(x) 2 • When h is chosen as a kernel-based model with an input kernel and an output kernel, everything goes nicely: closed-form solution ! Brouard et al. JMLR, 2016. Brouard et al. Bioinformatics, 2016. 8
  • 12. Overview of the problem • Setup : we want to predict the labels of a known target graph structure (encoded by a directed graph). ”x=TripAdvisor review” ⇒ ”y=sentence level opinion annotations” The room was ok, nothing special, still a perfect choice to quickly join the main places. 9
  • 13. Overview of the problem • Setup : we want to predict the labels of a known target graph structure (encoded by a directed graph). ”x=TripAdvisor review” ⇒ ”y=sentence level opinion annotations” The room was OK, nothing special, still a perfect choice to quickly join the main places. 9
  • 14. Overview of the problem • Additional difficulty : we want to be able to asbtain on some nodes of the graphs while continuing to make prediction. ”x=TripAdvisor review” ⇒ y=sentence level opinion annotations” The room was ok, nothing special, still a perfect choice to quickly join the main places. 10
  • 15. Overview of the problem • Additional difficulty : we want to be able to asbtain on some nodes of the graphs while continuing to make prediction. ”x=TripAdvisor review” ⇒ y=sentence level opinion annotations” The room was OK, nothing special, still a perfect choice to quickly join the main places. 10
  • 16. Output Features (for hierarchical) structure labeling with ab- stention We seek h a prediction function and r a reject function. • Learning deals with: hn = arg minh i ψwa(yi ) − h(xi ) 2 + Ω(h) • Abstention is handled at the very last moment: (fn(x), rn(x)) = arg min(yf ,yr )∈YF,R hn(x), Cψa(yf , yr ) ψwa and ψa with the help of C: take into account the tree structure Garcia et al., ICML 2018. 11
  • 17. Output features for few and zero-shot learning
  • 18. Third example: zero-shot learning Multiclass classification ? A simple question, really ? A human being is able to recognise an object (an animal) in an image even though he/she has never seen an instance of it before. The classic setting of supervised learning does not address this issue: the relevant task is not just about recognising an index of class, it is about recognising the concept underlying a class 12
  • 19. Realistic Scenario for Multiclass classification • You know the set of possible classes Y • Your training dataset • does contain a handful of instances for each classes: few-shot learning: the so-called small data regime • does contain at least one instance per class : one-shot learning • does not contain instances of some classes: Y = Yseen ∪ Yunseen : zero-shot learning See Xian et al. 2018 (a review in IEEE Trans. PAMI) 13
  • 20. Few, one, zero shot-learning in image recognition Use a semantic encoding z = φ(y) ∈ Rd of class y ∈ † such that two close classes have close representation. Major tool: take the name of an object class and encode it as a semantic vector in a finite dimensional space with word2vec or Glove 1. Predictive Model:fw (x) = arg maxy∈Y S(x, φ(y), w) 2. Learning problem: minw n i=1 (φ(yi ), fw (xi )) + Ω(w) Question: how to improve φ ? Learn a good encoding of the output data. 14
  • 21. A relevant output embedding: the Fisher Score ! A plugging for any method that uses any semantic encoding z = φ(y) ψ(z) = θ(log(pθ(z))) ∈ R|θ| Example: take pθ as a Gaussian mixture model • Use ψ(z) as the new output feature vector that encodes a class y • The new code ψ ◦ φ(y) highlights the proximity of some classes: those that belong to the same cluster but are not seen in the training phase will anyway benefit from the learning as well. 15
  • 22. Output Fisher Embedding For the Gaussian mixture model: ∀z ∈ Z , pθ(z) = C i=1 πi pθi (z) = C i=1 πi N(µi , Σi ) Corresponding Fisher Score: ∂ log(log pθ(z)) ∂πj = pθj (z) pθ(z) = αj (z) ∂ log(log pθ(z)) ∂mj =πj αj (z)Σ−1 j (z − mj ) = βj,1(z) ∂ log(log pθ(z)) ∂Σj =πj αj (z)(−Σ−1 j + Σ−1 j (z − mj )(z − mj )t Σ−1 j ) = βj,2(z) 16
  • 23. Output Fisher Embedding Regression To wrap up: 1. First estimate θ from {z1 , . . . , zC } the set of semantic vectors encoding the classes 2. Encode each yi from the training dataset as: ψˆθ(zi ) 3. Solve the regression problem with your preferred multiple output regression tool: minh i ψˆθ(zi ) − h(xi ) 2 + Ω(h) 4. Prediction Phase: for each x, compute arg min ψ ◦ φ(y) − h(x) 2 The approach is also valid for any structured output learning problem as well (results for text-to-time-series, for instance) 17
  • 24. Experimental results for OFER : Multiclass prediction task Cal- tech101 Number of Modes for the GMM : C = 2 # ex/ class Classification accuracy on Test Set: mean ± std (%) m-SVM Sem-IOKR Sem-KRR OFER-GMM 1 9.61 ± 3.98 13.40 ± 2.22 14.83 ± 4.02 38.22 ± 2.87 3 33.89 ± 1.79 22.51 ± 1.81 22.71 ± 2.33 46.33 ± 2.44 5 47.63 ± 2.87 24.90 ± 1.27 25.91 ± 1.28 49.40 ± 2.09 7 55.19 ± 2.43 26.84 ± 0.92 27.42 ± 1.59 50.39 ± 2.04 10 58.55 ± 1.84 31.27 ± 1.84 29.49 ± 1.39 50.49 ± 1.07 Table 1: Results on Caltech101 with a growing number of labeled examples per class. 18
  • 25. Results on zero-shot learning top-1 accuracy in % Method SUN CUB AWA1 AWA2 aPY IAP 19.4 24.0 35.9 35.9 36.6 CONSE 38.8 34.3 45.6 44.5 26.9 LATEM 55.3 49.3 55.1 55.8 35.2 ALE 58.1 54.9 59.9 62.5 39.7 DEVISE 56.5 52.0 54.2 59.7 39.8 SJE 53.7 53.9 65.6 61.9 32.9 ESZSL 54.5 53.9 58.2 58.6 38.3 SYNC 56.3 55.6 54.0 46.6 23.9 GFZSL 60.6 49.3 68.3 63.8 38.4 KRR-ZSL 0.27 7.24 2.77 1.95 0.3 OFER-ZSL 42.7 38.7 46.6 45.7 28.5 SJE-OFE 55.6 57.1 69.3 64.2 33.4 Table 2: Comparison of OFER-ZSL against state of the art methods with (att) attributes. 19
  • 27. • Encoding appropriatedly the outputs helps in structured output prediction • The problem can be seen as defining new families of surrogate losses that involve to solve an easy (fast-to-compute) subsidiary regression problem • Output Fisher Embedding is just one example of learned representation • Current works: learning both the output feature map and the surrogate regressor, extension to deep learning 20
  • 28. References • C´eline Brouard, Huibin Shen, Kai Dhrkop, Florence d’Alch´e-Buc, Sebastian Bocker, Juho Rousu: Fast metabolite identification with Input Output Kernel Regression. Bioinformatics 32(12): 28-36 (2016) • C. Brouard, M. Szafranski, F. d’Alch´e-Buc: Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels. Journal of Machine Learning Research 17: 176:1-176:48 (2016) • Moussab Djerrab, Alexandre Garcia, Maxime Sangnier, Florence d’Alch´e-Buc: Output Fisher embedding regression. Machine Learning 107(8-10): 1229-1256 (2018) • Alexandre Garcia, Chlo´e Clavel, Slim Essid, Florence d’Alch´e-Buc: Structured Output Learning with Abstention: Application to Accurate Opinion Prediction. ICML 2018: 1681-1689 • Anna Korba, Alexandre Garcia, Florence d’Alch´e-Buc: A Structured Prediction Approach for Label Ranking. To appear, NIPS (2018) 21