SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Learningto Learn
Joaquin Vanschoren
Eindhoven University of Technology & JADS
j.vanschoren@tue.nl
@joavanschoren
Berlin Machine Learning Meetup, Jan 2019
Learning Learning LearningLearning Learning
Slides: www.automl.org/events - Video: bit.ly/2QsnsUT
Book (open access): www.automl.org/book (or www.amazon.de)
Learning is a never-ending process
!2
Humans don’t learn from scratch
Learning is a never-ending process
Task 1 Task 2 Task 3
Models ModelsModelsModels
Model
performance performance performance
Learning
ModelsModelsModelsModels
Learning Learning
Learning

episodes
meta-

learning
meta-

learning
meta-

learning
!3
Humans learn across tasks
Why? Less trial-and-error, less data
Task 1 Task 2 Task 3
ModelsModelsModels
performance performance performance
LearningLearning Learning
ModelsModelsModels
ModelsModelsModels
inductive bias
!4
Inductive bias: assumptions added to the training data to learn effectively
If prior tasks are similar, we can transfer prior knowledge to new tasks
Learning to learn
New Task
performance
ModelsModelsModels
Learning
prior beliefs
constraints
model parameters
representations
training data
Task 1 Task 2 Task 3
ModelsModelsModels
performance performance performance
LearningLearningLearners
LearningLearningLearners
LearningLearningLearners
ModelsModelsModels
ModelsModelsModels
}meta-data
!5
Meta-learning
New Task
performance
ModelsModelsModels
meta-learner
base-learner
additional 

experiments
Meta-learner learns a (base-)learning algorithm, based on meta-data
How?
New Task
meta-learner
ModelsModelsModels
Task 1 Task j
ModelsModelsModels
performance performance
LearningLearningLearningLearning
ModelsModelsModels
…
performance
1. Start with what generally works
2. Start from what likely works (based on partially similar tasks)
3. Start from previously trained models (for very similar tasks)
!6
Learners Learners
1. Start with what generally works
Store and use meta-data:
- configurations: settings that uniquely define the model
- performance (e.g. accuracy) on specific tasks
New Task
meta-learner
ModelsModelsModels
configurations
performances
Task 1 Task j
ModelsModelsModels
performance performance
LearningLearningLearningLearning
ModelsModelsModels
…
performance
λi
Pi,j
!7
Learners Learners
(hyperparameters, pipeline,
neural architecture,…)
Rankings
• Build a global (multi-objective) ranking, recommend the top-K
• Can be used as a warm start for optimization techniques
• E.g. Bayesian optimization, evolutionary techniques,…
Tasks
ModelsModelsModels
performance
LearningLearningLearning
1. λa
2. λb
3. λc
4. λd
5. λe
6. …
New Task
meta-learner
ModelsModelsModels
performance
Global ranking

(task independent)
λa..k
warm
start
Leite et al. 2012
Abdulrahman et al. 2018
}
(discrete)
(multi-objective)
Pi,j
!8
λi
• Functional ANOVA 1
Select hyperparameters that cause variance in the evaluations.
• Tunability 2
Learn good defaults, measure improvement from tuning over defaults
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
To tune or not to tune?
hyperparameter
importance
1 van Rijn & Hutter 2018
λ1
λ2
λ3
λ4 constraints
priors
2 Probst et al. 2018
Pi,j
!9
λi
• Search space pruning
Exclude regions yielding bad performance on (similar) tasks
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
To tune or not to tune?
constraints
Pi,j
Wistuba et al. 2015
P
λ1
!10
λi
λ2
• Experiments on the new task can tell us how it is similar to previous tasks
• Task are similar if observed performance of configurations is similar
• Use this to recommend new experiments
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
Learning task similarity
λc
(discrete)
Pi,j
!11
λi
Pc,new ≅ Pc,j
• Tournament-style selection, warm-start with overall best config λbest
• Next candidate λc : the one that beats current λbest on similar tasks
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
Active testing
Leite et al. 2012
λc
Select λc > λbest
on similar tasks
(discrete)
Pi,j
!12
λi
• Consider space of all configuration options (e.g. all possible neural nets or pipelines)
• Surrogate model: probabilistic regression model of configuration performance
• Acquisition function: selects next configuration to try (exploration-exploitation)
Task
ModelsModelsModels
performance
LearningLearningLearning
Bayesian optimization
Surrogate model
Acquisition function
λ ∈ Λ
P
λi
Rasmussen 2014
!13
Task
ModelsModelsModels
performance
LearningLearningLearning
Bayesian optimization
Surrogate model
Acquisition function
P
λi
Rasmussen 2014
!14
Task
ModelsModelsModels
performance
LearningLearningLearning
Bayesian optimization
Surrogate model
Acquisition function
P
λi
Rasmussen 2014
!15
• Like a short-term memory for
solving new task
• Can we store it in long-term
memory for solving new tasks?
• If task j is similar to the new task, its surrogate model Sj will likely transfer well
• Sum up all Sj predictions, weighted by task similarity (as in active testing)1
• Build combined Gaussian process, weighted by current performance on new task2
Tasks
ModelsModelsModels
performance
LearningLearningLearning
New Task
meta-learner
ModelsModelsModels
performance
per task tj:
Pi,j
}
Surrogate model transfer
1 Wistuba et al. 2018
λi
P
Sj
2 Feurer et al. 2018
S= ∑ wj Sj
+
+
S1
S2
S3
!16
λi
• Multi-task Gaussian processes: train surrogate model on t tasks simultaneously1
• If tasks are similar: transfers useful info
• Not very scalable
• Bayesian Neural Networks as surrogate model2
• Multi-task, more scalable
• Stacking Gaussian Process regressors (Google Vizier)3
• Sequential tasks, each similar to the previous one
• Transfers a prior based on residuals of previous GP
Multi-task Bayesian optimization
1 Swersky et al. 2013
Independent GP predictions Multi-task GP predictions
2 Springenberg et al. 2016
3 Golovin et al. 2017
!17
• Bayesian linear regression (BLR) surrogate model on every task
• Use neural net to learn a suitable basis expansion ϕz(λ) for all tasks
• Scales linearly in # observations, transfers info on configuration space
Tasks
ModelsModelsModels
performance
LearningLearningLearning New Task
meta-learner
ModelsModelsModels
performance
}
More scalable variants
Perrone et al. 2018
P
BLR
surrogate
(λi,Pi,j)
φz(λ)i
warm-start (pre-train)
λi
Bayesian optimization
φz(λ)
BLR hyperparameters
Pi,j
!18
λi
2. Start from what likely works (based on similar tasks)
Meta-features: measurable properties of the tasks
(number of instances and features, class imbalance, feature skewness,…)
configurations
performances
similar mj ?Task 1 Task N
ModelsModelsModels
performance performance
LearningLearningLearning
LearningLearningLearning
ModelsModelsModels
… meta-features New Task
meta-learner
ModelsModelsModels
performance
mj
Pi,j
!19
λi
• Hand-crafted (interpretable) meta-features1
• Number of instances, features, classes, missing values, outliers,…
• Statistical: skewness, kurtosis, correlation, covariance, sparsity, variance,…
• Information-theoretic: class entropy, mutual information, noise-signal
ratio,…
• Model-based: properties of simple models trained on the task
• Landmarkers: performance of fast algorithms trained on the task
• Domain specific task properties
Meta-features
Vanschoren 2018
!20
• Learning a joint task representation
• Deep metric learning: learn a representation hmf using a ground truth
distance
• With Siamese Network: similar task, similar representation
• Can be used to recommend neural architectures given task similarity
Meta-features
Kim et al. 2017
!21
• Find k most similar tasks, warm-start search with best λi
• Auto-sklearn: Bayesian optimization (SMAC)
• Meta-learning yield better models, faster
• Winner of AutoML Challenges
Tasks
ModelsModelsModels
performance
LearningLearningLearning
New Task
meta-learner
ModelsModelsModels
performance
Pi,j
}
Warm-starting from similar tasks
λ1..k
mj
best λi on
similar tasks
Feurer et al. 2015
!22
λi
Bayesian optimization
λ
P
λ1
λ3
λ2
λ4
• Learn direct mapping between meta-features and Pi,j
• Zero-shot meta-models: predict best λi given meta-features 1
• Ranking models: return ranking λ1..k 2
• Predict which algorithms / configurations to consider / tune3
• Predict performance / runtime for given 𝛳i and task4
• Can be integrated in larger AutoML systems: warm start, guide search,…
meta-learner
Meta-models
λbest
1 Brazdil et al. 2009, Lemke et al. 2015
2 Sun and Pfahringer 2013, Pinto et al. 2017
meta-learner λ1..k
mj
mj
meta-learner
Pijmj, λi
3 Sanders and C. Giraud-Carrier 2017
meta-learner
Λmj
4 Yang et al. 2018
!23
Learning Pipelines / Architectures
!24
• Compositionality: the learning process can be broken down into smaller tasks
• Easier to learn, more transferable, more robust
• Pipelines are one way of doing this, but how to control the search space?
• Planning techniques (e.g. Hierarchical Task Planning) 2
• Impose a fixed structure or grammar on the pipeline 1
• Neural architecture search
• Usually defines fixed search space 3
• Very little meta-learning (yet)
• RL controller transfer 4
2 Mohr et al. 2018
1 Feurer et al. 2015
3 Zoph et al. 2018
4 Wong et al. 2018
Evolving pipelines
!25
3 De Sa et al. 2017
1 Olson et al. 2017
2 Gijsbers et al. 2018
• Start from simple pipelines
• Evolve more complex ones if needed
• Reuse pipelines that do specific things
• Mechanisms:
• Cross-over: reuse partial pipelines
• Mutation: change structure, tuning
• Approaches:
• TPOT: Tree-based pipelines1
• GAMA: asynchronous evolution2
• RECIPE: grammar-based3
• Meta-learning:
• Warm-starting, Meta-models 2
Learning to learn through self-play
!26
• Build pipelines by inserting, deleting, replacing components (actions)
• Neural network (LSTM) receives task meta-features, pipelines and evaluations
• Predicts pipeline performance and action probabilities
• Monte Carlo Tree Search builds pipelines, runs simulations against LSTM
Drori et al 2017
New Task
meta-learner
ModelsModelsModels
performance
self-play
mj
λi
3. Start with previously trained models
configurations
performances
Task 1 Task N
ModelsModelsModels
performance performance
LearningLearningLearning
LearningLearningLearning
ModelsModelsModels
… New Task
meta-learner
ModelsModelsModels
performance
model parameters
Models trained on similar tasks
(model parameters, features,…)
intrinsically (very) similar

(e.g. shared representation)
𝛳k
Pi,j
!27
λi
Transfer learning
• For neural networks, both structure and weights can be transferred
• Features and initializations learned from:
• Large image datasets (e.g. ImageNet) 1
• Large text corpora (e.g. Wikipedia) 2
• Fails if tasks are not similar enough 3
frozen new
pre-trained new
frozen
Source
tasks
ModelsModels
performance
LearningLearningLearning Feature extraction: 

remove last layers, use output as features

if task is quite different, remove more layers
End-to-end tuning: 

train from initialized weights
Fine-tuning: 

unfreeze last layers, tune on new task
sm
all target task
large

similar
large

different
filters
1 Razavian et al. 2014
3 Yosinski et al. 2014
2 Mikolov et al. 2013
new
pre-trained
convnet
!28
Learning to learn by gradient descent
• Our brains probably don’t do backprop, replace it with:
• Simple parametric (bio-inspired) rule to update weights 1
• Single-layer neural network to learn weight updates 2
• Learn parameters across tasks, by gradient descent (meta-gradient)
1 Bengio et al. 1995
2 Runarsson and Jonsson 2000
learning rate
presynaptic activity
reinforcing signal
Tasks
meta-learner
performance
ModelsModelsModels
Δ 𝛳i = 𝜂 ( )
meta-gradient
weights λi!29
learning rate
learn λi
gradient
descent
λi
λinit
learner
Bengio et al.
Runarsson and Jonsson
Δ 𝛳i
Learning to learn gradient descent
2 Andrychowicz et al. 2016
1 Hochreiter 2001
• Replace backprop with a recurrent neural net (LSTM)1, not so scalable
• Use a coordinatewise LSTM [m] for scalability/flexibility (cfr. ADAM, RMSprop) 2
• Optimizee: receives weight update gt from optimizer
• Optimizer: receives gradient estimate ∇t from optimizee
• Learns how to do gradient descent across tasks
hidden state
optimisee
weights
New task
Model
meta-
model
by gradient descent
!30LSTM parameters shared for all 𝛳
Single 

network!
Learning to learn gradient descent
2 Andrychowicz et al. 2016
1 Hochreiter 2001
• Left: optimizer and optimize trained to do style transfer
• Right: optimizer solves similar tasks (different style, content and
resolution) without any more training data
by gradient descent
!31
Few-shot learning
• Learn how to learn from few examples (given similar tasks)
• Meta-learner must learn how to train a base-learner based on prior experience
• Parameterize base-learner model and learn the parameters 𝛳i
Ttrain
Image: Hugo Larochelle
meta-model
Model M
𝛳i+1
Tj
Ttest
Ttest
𝛳i
Pi,j
λk
!32
Pi+1,test
X_test
y_test
y_test
X_train y_train
Cost(θi) =
1
|Ttest | ∑
t∈Ttest
loss(θi, t) 1-shot, 5-class:
new classes!
Few-shot learning: approaches
!33
• Existing algorithm as meta-learner:
• LSTM + gradient descent
• Learn 𝛳init + gradient descent
• kNN-like: Memory + similarity
• Learn embedding + classifier
• …
• Black-box meta-learner
• Neural Turing machine (with memory)
• Neural attentive learner
• …
Cost(θi) =
1
|Ttest | ∑
t∈Ttest
loss(θi, t)
Santoro et al. 2016
Mishra et al. 2018
meta-model
Model M
𝛳i+1
Tj Ttest
𝛳i
Pi,j
λk
Pi+1,test
Ravi and Larochelle 2017
Finn et al. 2017
Vinyals et al. 2016
Snell et al. 2017
LSTM meta-learner + gradient descent
Ravi and Larochelle 2017
!34
Train Test
Cost(θT) =
1
|Ttest | ∑
t∈Ttest
loss(θT, t)
LSTM LSTM LSTM LSTM
M M M M M
• Gradient descent update 𝛳t is similar to LSTM cell state update ct
• Hence, training a meta-learner LSTM yields an update rule for training M
• Start from initial 𝛳0, train model on first batch, get gradient and loss update
• Predict 𝛳t+1 , continue to t=T, get cost, backpropagate to learn LSTM weights, optimal 𝛳0
forget
input
!35
Model-agnostic meta-learning
1 Finn et al. 2017
2 Finn et al. 2018
3 Nichol et al. 2018
• For reinforcement learning:
Learning to reinforcement learn
!36
1 Duan et al. 2017
2 Wang et al. 2017
3 Duan et al. 2017
Environments
meta-RL
algorithm
performance
policy 𝝅θ
fast RL
agent
!36
policy 𝝅θ
Similar env.
performance
• Humans often learn to play new games much faster than RL techniques do
• Reinforcement learning is very suited for learning-to-learn:
• Build a learner, then use performance as that learner as a reward
impl
• Learning to reinforcement learn 1,2
• Use RNN-based deep RL to train a
recurrent network on many tasks
• Learns to implement a‘fast’RL agent,
encoded in its weights
Learning to reinforcement learn
!37
• Also works for few-shot learning 3
• Condition on observation + upcoming demonstration
• You don’t know what someone is trying to teach you, but you
prepare for the lesson
1 Duan et al. 2017
2 Wang et al. 2017
3 Duan et al. 2017
!37
Learning to learn more tasks
!38
• Active learning
• Deep network (learns representation) + policy network
• Receives state and reward, says which points to query next
• Density estimation
• Learn distribution over small set of images, can generate new ones
• Uses a MAML-based few-shot learner
• Matrix factorization
• Deep learning architecture that makes recommendations
• Meta-learner learns how to adjust biases for each user (task)
• Replace hand-crafted algorithms by learned ones.
• Look at problems through a meta-learning lens!
Pang et al. 2018
Reed et al. 2017
Vartak et al. 2017
Meta-data sharing
!39
import	openml	as	oml	
from	sklearn	import	tree	
task	=	oml.tasks.get_task(14951)	
clf	=	tree.ExtraTreeClassifier()	
flow	=	oml.flows.sklearn_to_flow(clf)	
run	=	oml.runs.run_flow_on_task(task,	flow)	
myrun	=	run.publish()
run locally, share globally
Vanschoren et al. 2014
models built
by humans
models built
by AutoML bots
• OK, but how do I get large amounts of meta-data for meta-learning?
• OpenML.org
• Thousands of uniform datasets
• 100+ meta-features
• Millions of evaluated runs
• Same splits, 30+ metrics
• Traces, models (opt)
• APIs in Python, R, Java,…
• Publish your own runs
• Never ending learning
• Benchmarks
building a shared memory
Open positions!

Scientific programmer

Teaching PhD
Towards human-like learning to learn
!40
• Learning-to-learn gives humans a significant advantage
• Learning how to learn any task empowers us far beyond knowing
how to learn specific tasks.
• It is a universal aspect of life, and how it evolves
• Very exciting field with many unexplored possibilities
• Many aspects not understood (e.g. task similarity), need more
experiments.
• Challenge:
• Build learners that never stop learning, that learn from each other
• Build a global memory for learning systems to learn from
• Let them explore / experiment by themselves
Thank you!
special thanks to
Pavel Brazdil, Matthias Feurer, Frank Hutter, Erin Grant,
Hugo Larochelle, Raghu Rajan, Jan van Rijn, Jane Wang
more to learn
http://www.automl.org/book/
Chapter 2: Meta-Learning
!41
Never stop learning

Weitere ähnliche Inhalte

Was ist angesagt?

Machine learning 101 dkom 2017
Machine learning 101 dkom 2017Machine learning 101 dkom 2017
Machine learning 101 dkom 2017fredverheul
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à ZAlexia Audevart
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)Thomas da Silva Paula
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Data structures algorithms_tutorial
Data structures algorithms_tutorialData structures algorithms_tutorial
Data structures algorithms_tutorialHarikaReddy115
 
Using SHAP to Understand Black Box Models
Using SHAP to Understand Black Box ModelsUsing SHAP to Understand Black Box Models
Using SHAP to Understand Black Box ModelsJonathan Bechtel
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Lucidworks
 
Pybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonPybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonJavier Arias Losada
 
Demystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for DevelopersDemystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for DevelopersMicrosoft Tech Community
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deckEric Dill
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneDhiana Deva
 

Was ist angesagt? (20)

Machine learning 101 dkom 2017
Machine learning 101 dkom 2017Machine learning 101 dkom 2017
Machine learning 101 dkom 2017
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Data structures algorithms_tutorial
Data structures algorithms_tutorialData structures algorithms_tutorial
Data structures algorithms_tutorial
 
Using SHAP to Understand Black Box Models
Using SHAP to Understand Black Box ModelsUsing SHAP to Understand Black Box Models
Using SHAP to Understand Black Box Models
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
 
Pybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonPybcn machine learning for dummies with python
Pybcn machine learning for dummies with python
 
Demystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for DevelopersDemystifying Machine and Deep Learning for Developers
Demystifying Machine and Deep Learning for Developers
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 

Ähnlich wie Learning how to learn

Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual Abdul Hannan
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...
Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...
Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...Jeromy Anglim
 
Preliminary Exam Slides
Preliminary Exam SlidesPreliminary Exam Slides
Preliminary Exam SlidesDebasmit Das
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfKuan-Tsae Huang
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentKaty Lee
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...Yun Huang
 
Java parser a fine grained indexing tool and its application
Java parser a fine grained indexing tool and its applicationJava parser a fine grained indexing tool and its application
Java parser a fine grained indexing tool and its applicationRoya Hosseini
 
Orchestration Graphs: Enabling Rich Learning Scenarios at Scale
Orchestration Graphs: Enabling Rich Learning Scenarios at ScaleOrchestration Graphs: Enabling Rich Learning Scenarios at Scale
Orchestration Graphs: Enabling Rich Learning Scenarios at ScaleStian Håklev
 
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing TutorialYun Huang
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Model remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworksModel remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworksrosentep
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 

Ähnlich wie Learning how to learn (20)

Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...
Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...
Presentation based on "Hierarchical Bayesian Models of Subtask Learning. Angl...
 
Preliminary Exam Slides
Preliminary Exam SlidesPreliminary Exam Slides
Preliminary Exam Slides
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient Descent
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
 
Java parser a fine grained indexing tool and its application
Java parser a fine grained indexing tool and its applicationJava parser a fine grained indexing tool and its application
Java parser a fine grained indexing tool and its application
 
Orchestration Graphs: Enabling Rich Learning Scenarios at Scale
Orchestration Graphs: Enabling Rich Learning Scenarios at ScaleOrchestration Graphs: Enabling Rich Learning Scenarios at Scale
Orchestration Graphs: Enabling Rich Learning Scenarios at Scale
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Model remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworksModel remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworks
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 

Mehr von Joaquin Vanschoren

Mehr von Joaquin Vanschoren (13)

Designed Serendipity
Designed SerendipityDesigned Serendipity
Designed Serendipity
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
 
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine LearningOpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
 
Data science
Data scienceData science
Data science
 
OpenML 2014
OpenML 2014OpenML 2014
OpenML 2014
 
Open Machine Learning
Open Machine LearningOpen Machine Learning
Open Machine Learning
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop sensordata part2
Hadoop sensordata part2Hadoop sensordata part2
Hadoop sensordata part2
 
Hadoop sensordata part1
Hadoop sensordata part1Hadoop sensordata part1
Hadoop sensordata part1
 
Hadoop sensordata part3
Hadoop sensordata part3Hadoop sensordata part3
Hadoop sensordata part3
 

Kürzlich hochgeladen

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 

Kürzlich hochgeladen (20)

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 

Learning how to learn

  • 1. Learningto Learn Joaquin Vanschoren Eindhoven University of Technology & JADS j.vanschoren@tue.nl @joavanschoren Berlin Machine Learning Meetup, Jan 2019 Learning Learning LearningLearning Learning Slides: www.automl.org/events - Video: bit.ly/2QsnsUT Book (open access): www.automl.org/book (or www.amazon.de)
  • 2. Learning is a never-ending process !2 Humans don’t learn from scratch
  • 3. Learning is a never-ending process Task 1 Task 2 Task 3 Models ModelsModelsModels Model performance performance performance Learning ModelsModelsModelsModels Learning Learning Learning episodes meta- learning meta- learning meta- learning !3 Humans learn across tasks Why? Less trial-and-error, less data
  • 4. Task 1 Task 2 Task 3 ModelsModelsModels performance performance performance LearningLearning Learning ModelsModelsModels ModelsModelsModels inductive bias !4 Inductive bias: assumptions added to the training data to learn effectively If prior tasks are similar, we can transfer prior knowledge to new tasks Learning to learn New Task performance ModelsModelsModels Learning prior beliefs constraints model parameters representations training data
  • 5. Task 1 Task 2 Task 3 ModelsModelsModels performance performance performance LearningLearningLearners LearningLearningLearners LearningLearningLearners ModelsModelsModels ModelsModelsModels }meta-data !5 Meta-learning New Task performance ModelsModelsModels meta-learner base-learner additional experiments Meta-learner learns a (base-)learning algorithm, based on meta-data
  • 6. How? New Task meta-learner ModelsModelsModels Task 1 Task j ModelsModelsModels performance performance LearningLearningLearningLearning ModelsModelsModels … performance 1. Start with what generally works 2. Start from what likely works (based on partially similar tasks) 3. Start from previously trained models (for very similar tasks) !6 Learners Learners
  • 7. 1. Start with what generally works Store and use meta-data: - configurations: settings that uniquely define the model - performance (e.g. accuracy) on specific tasks New Task meta-learner ModelsModelsModels configurations performances Task 1 Task j ModelsModelsModels performance performance LearningLearningLearningLearning ModelsModelsModels … performance λi Pi,j !7 Learners Learners (hyperparameters, pipeline, neural architecture,…)
  • 8. Rankings • Build a global (multi-objective) ranking, recommend the top-K • Can be used as a warm start for optimization techniques • E.g. Bayesian optimization, evolutionary techniques,… Tasks ModelsModelsModels performance LearningLearningLearning 1. λa 2. λb 3. λc 4. λd 5. λe 6. … New Task meta-learner ModelsModelsModels performance Global ranking (task independent) λa..k warm start Leite et al. 2012 Abdulrahman et al. 2018 } (discrete) (multi-objective) Pi,j !8 λi
  • 9. • Functional ANOVA 1 Select hyperparameters that cause variance in the evaluations. • Tunability 2 Learn good defaults, measure improvement from tuning over defaults Tasks ModelsModelsModels performance LearningLearningLearning } New Task meta-learner ModelsModelsModels performance To tune or not to tune? hyperparameter importance 1 van Rijn & Hutter 2018 λ1 λ2 λ3 λ4 constraints priors 2 Probst et al. 2018 Pi,j !9 λi
  • 10. • Search space pruning Exclude regions yielding bad performance on (similar) tasks Tasks ModelsModelsModels performance LearningLearningLearning } New Task meta-learner ModelsModelsModels performance To tune or not to tune? constraints Pi,j Wistuba et al. 2015 P λ1 !10 λi λ2
  • 11. • Experiments on the new task can tell us how it is similar to previous tasks • Task are similar if observed performance of configurations is similar • Use this to recommend new experiments Tasks ModelsModelsModels performance LearningLearningLearning } New Task meta-learner ModelsModelsModels performance Learning task similarity λc (discrete) Pi,j !11 λi Pc,new ≅ Pc,j
  • 12. • Tournament-style selection, warm-start with overall best config λbest • Next candidate λc : the one that beats current λbest on similar tasks Tasks ModelsModelsModels performance LearningLearningLearning } New Task meta-learner ModelsModelsModels performance Active testing Leite et al. 2012 λc Select λc > λbest on similar tasks (discrete) Pi,j !12 λi
  • 13. • Consider space of all configuration options (e.g. all possible neural nets or pipelines) • Surrogate model: probabilistic regression model of configuration performance • Acquisition function: selects next configuration to try (exploration-exploitation) Task ModelsModelsModels performance LearningLearningLearning Bayesian optimization Surrogate model Acquisition function λ ∈ Λ P λi Rasmussen 2014 !13
  • 15. Task ModelsModelsModels performance LearningLearningLearning Bayesian optimization Surrogate model Acquisition function P λi Rasmussen 2014 !15 • Like a short-term memory for solving new task • Can we store it in long-term memory for solving new tasks?
  • 16. • If task j is similar to the new task, its surrogate model Sj will likely transfer well • Sum up all Sj predictions, weighted by task similarity (as in active testing)1 • Build combined Gaussian process, weighted by current performance on new task2 Tasks ModelsModelsModels performance LearningLearningLearning New Task meta-learner ModelsModelsModels performance per task tj: Pi,j } Surrogate model transfer 1 Wistuba et al. 2018 λi P Sj 2 Feurer et al. 2018 S= ∑ wj Sj + + S1 S2 S3 !16 λi
  • 17. • Multi-task Gaussian processes: train surrogate model on t tasks simultaneously1 • If tasks are similar: transfers useful info • Not very scalable • Bayesian Neural Networks as surrogate model2 • Multi-task, more scalable • Stacking Gaussian Process regressors (Google Vizier)3 • Sequential tasks, each similar to the previous one • Transfers a prior based on residuals of previous GP Multi-task Bayesian optimization 1 Swersky et al. 2013 Independent GP predictions Multi-task GP predictions 2 Springenberg et al. 2016 3 Golovin et al. 2017 !17
  • 18. • Bayesian linear regression (BLR) surrogate model on every task • Use neural net to learn a suitable basis expansion ϕz(λ) for all tasks • Scales linearly in # observations, transfers info on configuration space Tasks ModelsModelsModels performance LearningLearningLearning New Task meta-learner ModelsModelsModels performance } More scalable variants Perrone et al. 2018 P BLR surrogate (λi,Pi,j) φz(λ)i warm-start (pre-train) λi Bayesian optimization φz(λ) BLR hyperparameters Pi,j !18 λi
  • 19. 2. Start from what likely works (based on similar tasks) Meta-features: measurable properties of the tasks (number of instances and features, class imbalance, feature skewness,…) configurations performances similar mj ?Task 1 Task N ModelsModelsModels performance performance LearningLearningLearning LearningLearningLearning ModelsModelsModels … meta-features New Task meta-learner ModelsModelsModels performance mj Pi,j !19 λi
  • 20. • Hand-crafted (interpretable) meta-features1 • Number of instances, features, classes, missing values, outliers,… • Statistical: skewness, kurtosis, correlation, covariance, sparsity, variance,… • Information-theoretic: class entropy, mutual information, noise-signal ratio,… • Model-based: properties of simple models trained on the task • Landmarkers: performance of fast algorithms trained on the task • Domain specific task properties Meta-features Vanschoren 2018 !20
  • 21. • Learning a joint task representation • Deep metric learning: learn a representation hmf using a ground truth distance • With Siamese Network: similar task, similar representation • Can be used to recommend neural architectures given task similarity Meta-features Kim et al. 2017 !21
  • 22. • Find k most similar tasks, warm-start search with best λi • Auto-sklearn: Bayesian optimization (SMAC) • Meta-learning yield better models, faster • Winner of AutoML Challenges Tasks ModelsModelsModels performance LearningLearningLearning New Task meta-learner ModelsModelsModels performance Pi,j } Warm-starting from similar tasks λ1..k mj best λi on similar tasks Feurer et al. 2015 !22 λi Bayesian optimization λ P λ1 λ3 λ2 λ4
  • 23. • Learn direct mapping between meta-features and Pi,j • Zero-shot meta-models: predict best λi given meta-features 1 • Ranking models: return ranking λ1..k 2 • Predict which algorithms / configurations to consider / tune3 • Predict performance / runtime for given 𝛳i and task4 • Can be integrated in larger AutoML systems: warm start, guide search,… meta-learner Meta-models λbest 1 Brazdil et al. 2009, Lemke et al. 2015 2 Sun and Pfahringer 2013, Pinto et al. 2017 meta-learner λ1..k mj mj meta-learner Pijmj, λi 3 Sanders and C. Giraud-Carrier 2017 meta-learner Λmj 4 Yang et al. 2018 !23
  • 24. Learning Pipelines / Architectures !24 • Compositionality: the learning process can be broken down into smaller tasks • Easier to learn, more transferable, more robust • Pipelines are one way of doing this, but how to control the search space? • Planning techniques (e.g. Hierarchical Task Planning) 2 • Impose a fixed structure or grammar on the pipeline 1 • Neural architecture search • Usually defines fixed search space 3 • Very little meta-learning (yet) • RL controller transfer 4 2 Mohr et al. 2018 1 Feurer et al. 2015 3 Zoph et al. 2018 4 Wong et al. 2018
  • 25. Evolving pipelines !25 3 De Sa et al. 2017 1 Olson et al. 2017 2 Gijsbers et al. 2018 • Start from simple pipelines • Evolve more complex ones if needed • Reuse pipelines that do specific things • Mechanisms: • Cross-over: reuse partial pipelines • Mutation: change structure, tuning • Approaches: • TPOT: Tree-based pipelines1 • GAMA: asynchronous evolution2 • RECIPE: grammar-based3 • Meta-learning: • Warm-starting, Meta-models 2
  • 26. Learning to learn through self-play !26 • Build pipelines by inserting, deleting, replacing components (actions) • Neural network (LSTM) receives task meta-features, pipelines and evaluations • Predicts pipeline performance and action probabilities • Monte Carlo Tree Search builds pipelines, runs simulations against LSTM Drori et al 2017 New Task meta-learner ModelsModelsModels performance self-play mj λi
  • 27. 3. Start with previously trained models configurations performances Task 1 Task N ModelsModelsModels performance performance LearningLearningLearning LearningLearningLearning ModelsModelsModels … New Task meta-learner ModelsModelsModels performance model parameters Models trained on similar tasks (model parameters, features,…) intrinsically (very) similar (e.g. shared representation) 𝛳k Pi,j !27 λi
  • 28. Transfer learning • For neural networks, both structure and weights can be transferred • Features and initializations learned from: • Large image datasets (e.g. ImageNet) 1 • Large text corpora (e.g. Wikipedia) 2 • Fails if tasks are not similar enough 3 frozen new pre-trained new frozen Source tasks ModelsModels performance LearningLearningLearning Feature extraction: remove last layers, use output as features if task is quite different, remove more layers End-to-end tuning: train from initialized weights Fine-tuning: unfreeze last layers, tune on new task sm all target task large similar large different filters 1 Razavian et al. 2014 3 Yosinski et al. 2014 2 Mikolov et al. 2013 new pre-trained convnet !28
  • 29. Learning to learn by gradient descent • Our brains probably don’t do backprop, replace it with: • Simple parametric (bio-inspired) rule to update weights 1 • Single-layer neural network to learn weight updates 2 • Learn parameters across tasks, by gradient descent (meta-gradient) 1 Bengio et al. 1995 2 Runarsson and Jonsson 2000 learning rate presynaptic activity reinforcing signal Tasks meta-learner performance ModelsModelsModels Δ 𝛳i = 𝜂 ( ) meta-gradient weights λi!29 learning rate learn λi gradient descent λi λinit learner Bengio et al. Runarsson and Jonsson Δ 𝛳i
  • 30. Learning to learn gradient descent 2 Andrychowicz et al. 2016 1 Hochreiter 2001 • Replace backprop with a recurrent neural net (LSTM)1, not so scalable • Use a coordinatewise LSTM [m] for scalability/flexibility (cfr. ADAM, RMSprop) 2 • Optimizee: receives weight update gt from optimizer • Optimizer: receives gradient estimate ∇t from optimizee • Learns how to do gradient descent across tasks hidden state optimisee weights New task Model meta- model by gradient descent !30LSTM parameters shared for all 𝛳 Single network!
  • 31. Learning to learn gradient descent 2 Andrychowicz et al. 2016 1 Hochreiter 2001 • Left: optimizer and optimize trained to do style transfer • Right: optimizer solves similar tasks (different style, content and resolution) without any more training data by gradient descent !31
  • 32. Few-shot learning • Learn how to learn from few examples (given similar tasks) • Meta-learner must learn how to train a base-learner based on prior experience • Parameterize base-learner model and learn the parameters 𝛳i Ttrain Image: Hugo Larochelle meta-model Model M 𝛳i+1 Tj Ttest Ttest 𝛳i Pi,j λk !32 Pi+1,test X_test y_test y_test X_train y_train Cost(θi) = 1 |Ttest | ∑ t∈Ttest loss(θi, t) 1-shot, 5-class: new classes!
  • 33. Few-shot learning: approaches !33 • Existing algorithm as meta-learner: • LSTM + gradient descent • Learn 𝛳init + gradient descent • kNN-like: Memory + similarity • Learn embedding + classifier • … • Black-box meta-learner • Neural Turing machine (with memory) • Neural attentive learner • … Cost(θi) = 1 |Ttest | ∑ t∈Ttest loss(θi, t) Santoro et al. 2016 Mishra et al. 2018 meta-model Model M 𝛳i+1 Tj Ttest 𝛳i Pi,j λk Pi+1,test Ravi and Larochelle 2017 Finn et al. 2017 Vinyals et al. 2016 Snell et al. 2017
  • 34. LSTM meta-learner + gradient descent Ravi and Larochelle 2017 !34 Train Test Cost(θT) = 1 |Ttest | ∑ t∈Ttest loss(θT, t) LSTM LSTM LSTM LSTM M M M M M • Gradient descent update 𝛳t is similar to LSTM cell state update ct • Hence, training a meta-learner LSTM yields an update rule for training M • Start from initial 𝛳0, train model on first batch, get gradient and loss update • Predict 𝛳t+1 , continue to t=T, get cost, backpropagate to learn LSTM weights, optimal 𝛳0 forget input
  • 35. !35 Model-agnostic meta-learning 1 Finn et al. 2017 2 Finn et al. 2018 3 Nichol et al. 2018 • For reinforcement learning:
  • 36. Learning to reinforcement learn !36 1 Duan et al. 2017 2 Wang et al. 2017 3 Duan et al. 2017 Environments meta-RL algorithm performance policy 𝝅θ fast RL agent !36 policy 𝝅θ Similar env. performance • Humans often learn to play new games much faster than RL techniques do • Reinforcement learning is very suited for learning-to-learn: • Build a learner, then use performance as that learner as a reward impl • Learning to reinforcement learn 1,2 • Use RNN-based deep RL to train a recurrent network on many tasks • Learns to implement a‘fast’RL agent, encoded in its weights
  • 37. Learning to reinforcement learn !37 • Also works for few-shot learning 3 • Condition on observation + upcoming demonstration • You don’t know what someone is trying to teach you, but you prepare for the lesson 1 Duan et al. 2017 2 Wang et al. 2017 3 Duan et al. 2017 !37
  • 38. Learning to learn more tasks !38 • Active learning • Deep network (learns representation) + policy network • Receives state and reward, says which points to query next • Density estimation • Learn distribution over small set of images, can generate new ones • Uses a MAML-based few-shot learner • Matrix factorization • Deep learning architecture that makes recommendations • Meta-learner learns how to adjust biases for each user (task) • Replace hand-crafted algorithms by learned ones. • Look at problems through a meta-learning lens! Pang et al. 2018 Reed et al. 2017 Vartak et al. 2017
  • 39. Meta-data sharing !39 import openml as oml from sklearn import tree task = oml.tasks.get_task(14951) clf = tree.ExtraTreeClassifier() flow = oml.flows.sklearn_to_flow(clf) run = oml.runs.run_flow_on_task(task, flow) myrun = run.publish() run locally, share globally Vanschoren et al. 2014 models built by humans models built by AutoML bots • OK, but how do I get large amounts of meta-data for meta-learning? • OpenML.org • Thousands of uniform datasets • 100+ meta-features • Millions of evaluated runs • Same splits, 30+ metrics • Traces, models (opt) • APIs in Python, R, Java,… • Publish your own runs • Never ending learning • Benchmarks building a shared memory Open positions! Scientific programmer Teaching PhD
  • 40. Towards human-like learning to learn !40 • Learning-to-learn gives humans a significant advantage • Learning how to learn any task empowers us far beyond knowing how to learn specific tasks. • It is a universal aspect of life, and how it evolves • Very exciting field with many unexplored possibilities • Many aspects not understood (e.g. task similarity), need more experiments. • Challenge: • Build learners that never stop learning, that learn from each other • Build a global memory for learning systems to learn from • Let them explore / experiment by themselves
  • 41. Thank you! special thanks to Pavel Brazdil, Matthias Feurer, Frank Hutter, Erin Grant, Hugo Larochelle, Raghu Rajan, Jan van Rijn, Jane Wang more to learn http://www.automl.org/book/ Chapter 2: Meta-Learning !41 Never stop learning