SlideShare ist ein Scribd-Unternehmen logo
1 von 87
Pintxos/voileipäpöytä /закуски
of Machine Learning
BigSkyEarth 2018
Kaarina
Menu
• What is Machine Learning?
• Where does it come from?
• What now?
• Why now?
• Machine Learning in the Sky
• The Machine Learning Landscape
• Machine Learning Pipeline
• Neural Networks
• Some useful concepts
• Machine Learning Tools
• Zoom on selected libraries
• Zoom on a few algorithms
• Random Forest
• Gradient Boosting
• Kohonen’s map
• Autoencoder
• Convolutional Neural Network
• Generative Adversarial Network
What is Machine Learning?
Machine…
… Learning?!?
What is Machine Learning?
Not quite so exciting… Learning? NO! Nor thinking…
More algorithms enabling to fit complex data relationships
More like advanced statistical inference
More like implicit programming
More like extracting information dynamic from data for
generalization…
Sobering thought: linear regression belongs to Machine Learning!
That said, some mimicking taking place:
➢ Trying to improve a system’s response to novel perception thanks
to experience
➢ Human biology inspired artificial neural networks
What is Machine Learning?
Artificial Intelligence
Machine Learning
Neural Networks
Deep Learning
What is Machine Learning?
• “Machine Learning at its most basic is the practice of using algorithms
to parse data, learn from it, and then make a determination or
prediction about something in the world.” – Nvidia
• “Machine learning is the science of getting computers to act without
being explicitly programmed.” – Stanford
• “Machine learning is based on algorithms that can learn from data
without relying on rules-based programming.”- McKinsey & Co.
• “Machine learning algorithms can figure out how to perform
important tasks by generalizing from examples.” – University of
Washington
• “The field of Machine Learning seeks to answer the question “How can
we build computer systems that automatically improve with
experience, and what are the fundamental laws that govern all
learning processes?” – Carnegie Mellon University
Source: https://www.techemergence.com/what-is-machine-learning/
What is Machine Learning?
To summarize:
A set of computing and mathematical techniques
whose aim is to achieve human-level or better-than-
human performance at cognitive tasks such as:
•Predicting
•Classifying
•Generating signals / interacting
•Etc.
Source: https://www.techemergence.com/what-is-machine-learning/
Complementary fields
Data Visualization
DATA
SCIENCE
Cloud
Computing
Machine
Learning
“Business”
Knowledge
Differences between ML and Statistical
Modeling
Statistical Modeling Machine Learning
Parametric models that try to
“explain” the world. The focus is on
modeling causality
Non-parametric models that try to
“mimic” the world rather than
“explain” it. Often uses correlations as
proxies to causality
Deduce relations for observed
quantities by parameter estimation for
a pre-specified model of the world
Induce relations between observable
quantities, main goal is predictive
power
Small data (1-100 attributes,100-
1000 examples)
Large data (10-100K attributes, 1K-
100M examples)
Scalability is typically not the major
concern
Scalability is often critical in
applications
Based on a probabilistic approach
Some ML methods are not
probabilistic (SVM, neural networks,
clustering, etc.)
Where does it come from?
Where does it come from? Pioneer age
1943 – McCulloch-Pitts neurons (neuro-scientist and logician)
1950 - Alan Turing envisioned ML
1952 – Arthur Samuel self-improving chess program
1957 – Frank Rosenblatt, perceptron
1959 – David H. Hubel and Torsten Wiesel simple vs complex cells
1960 – Heny J. Kelley Control Theory  Backpropagation
1965 – Alexey Ivakhnenko and V.G. Lapa Group Method of Data
Handling, 8-layer DNN
1980 – Kunihiko Fukushima Neocognitron (pattern recog’), led to CNN
1982 – John Hopfield, Hopfield Network, RNN
1985 – Terry Sejnowski NETtalk, English pronounciation
1986 – Rumelhart, Geoffrey Hinton and Romuald J. Williams,
backpropagation
1989 – Yann LeCun, handwritten digits with CNN
1989 – Christopher Watkins, Q-learning for Reinforcement Learning
Source: https://www.import.io/post/history-of-deep-learning/
What now? Modern days
1993 – Jürgen Schmidhuber, 1000-layers RNN
1995 – Corinna Cortes and Vladimir Vapnik, SVM
1997 - Jürgen Schmidhuber and Sepp Hochreiter, LSTM
1997 – IBM’s Deep Blue beat Garry Kasparov
1998 – Yann Lecun, stochastic gradient descent
2009 – Fei-Fei Li, ImageNet
2011 – Alex Krizhevsky, AlexNet CNN
2011 – IBM’s Watson wins Jeopardy
2012 – ImageNet won by AlexNet, better than humans
2014 – Facebook’s DeepFace
2014 – Ian Goodfellow, Generative Adversarial Network
2016-2017 - Google TensorFlow v1.0 in open source
Source: https://www.import.io/post/history-of-deep-learning/
What now? Modern days
Source: https://www.import.io/post/history-of-deep-learning/
Why now?
ALGORITHMS
COMPUTING
RESSOURCES
DATA
MACHINE
LEARNING
End of the 90’s we had the algorithms but…
… we were lacking in other departments…
Why now?
Trillion-fold increase of computing power and storage
Source: http://www.visualcapitalist.com/visualizing-trillion-fold-increase-computing-power/
Why now?
Data, always more data
Why now?
Data, always more data
Source: Computerworld, 2011
Source: Forbes, 2017
TB = 1012 Bytes
PB = 1015 Bytes
EB = 1018 Bytes
ZB = 1021 Bytes
Pause for thought: Artificial vs Natural Intelligence
Name # of neurons / # of synapses Visuals
Caenorhabditis elegans 302
Hydra vulgaris 5,600
Homarus americanus 100,000
Blatta Orientalis 1,000,000
Nile Crocodile 80,500,000
Digital Reasoning NN (2015) ~86,000,000 (est.) / 1.6E11
Rattus Rattatouillensis 200,000,000
Blue and yellow macaw 1,900,000,000
Chimpanzee 28,000,000,000
Homo Sapiens Sapiens 86,000,000,000 / 1.5E14
African Elephant 257,000,000,000
Source: https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons
Machine Learning in the Sky
• Machine Learning owes a lot to astronomy: least-square regression
for orbital parameter estimation (Legendre-Laplace-Gauss)
Machine Learning in the Sky
Data Big Bang in Astronomy too:
109 object photometric catalogs from USNO, 2MASS, SDSS…
106-8 spectroscopic catalogs from SDSS, LAMOST…
106-7 multi-wavelength source catalogs from WISE, eROSITA…
109 object x 102 epochs surveys like LSST, DES, PTF, CRTS, SNF, VVV, Pan-
STARRS, Stripe 82
Spectral-image datacubes from VLA, ALMA, IUFs…
Machine Learning in the sky
Supernovae of data
Sources: Computer World, https://www.lsst.org/scientists/keynumbers
LSST
DR11 37 109 objects, 7 1012 sources, 5.5 million 3.2 Gigapixel images
30 terabytes of data nightly
Final volume of raw image data = 60 PB
Final image collection (DR11) = 0.5 EB
Final catalog size (DR11) = 15 PB
Final disk storage = 0.4 Exabytes
Peak number of nodes = 1750 nodes
Peak compute power in LSST data centers = about 2 PFLOPS
Machine Learning in the Sky
Explosion in number of papers too:
• From 2010 till 2018, 446 astro-ph papers on arXiv with “Machine
Learning” in the abstracts
• Only 5 papers in 2010
• 80% of the total were published after September 2014
• In all fields of astrophysics
The Machine Learning landscape
Supervised
Learning
Unsupervised
Learning
Regression Classification
Learn real-
valued
function
given
(Xi , Yi)
Learn
discrete
class
function
given
(Xi , Ci)
Clustering
Representation
Learning
Learn
discrete
class
function
given
(Xi ) only
Learn
representing
function given
(Xi ) only
Rn →[1,k] Rn →[1,k] Rn → RkRn → R
The Machine Learning landscape
Reinforcement
Learning
Policy
Optimization
Inverse RL
Learn policy
function given
(si , si+1, ai , ri )
Learn reward
function given
(si , si+1, ai )
Rn → Rk
Rn → R
Additional categories
Transfer learning
Semi-supervised learning
Active learning
Sequence modeling
RL methods for Supervised and
Unsupervised Learning
The Machine Learning landscape
Supervised
Learning
Unsupervised
Learning
Regression Classification
Linear Regression
Trees / CART
SVM/SVR
Ensemble methods
Neural Networks
Logistic Regression
Naive Bayes
Nearest neighbors
SVM
Decision trees
Ensemble methods
Clustering
Representation
Learning
K-means
Hierarchical
clustering
Gaussian mixtures
Hidden Markov
NN (SOM/ART)
PCA/ICA
Factor models
Dim. reduction
Manifold learning
NN (GAN/VAE/AR)
Overview of the Machine Learning landscape
Reinforcement
Learning
Policy
Optimization
Inverse RL
Model-based RL
Model-free RL
Batch/online RL
Linear models RL
Neural Networks
Model-based IRL
Model-free IRL
Batch/online IRL
MaxEnt IRL
Neural networks
• Neural networks is the most universal (and scalable) approach
• Two types of methods tend to dominate Kaggle competitions:
• Ensemble methods (Random Forests and Gradient Boosting)
• Deep Learning
Overview of the Machine Learning landscape
Overview of the Machine Learning landscape
Top of the class
Top of the class
Machine Learning Pipeline
Training
Data Preparation
Machine Learning Pipeline
Raw Dataset
Load Data
Prepared Data
Apply
Algorithm
Select
Features
Explore Data
Clean Data Normalize
ML
Algorithms
Evaluate &
Tune
Deploy
Model
Publish!
Machine Learning: neural networks
Single neuron: computation structure inspired by nature
|g(a)𝑎 = ∑𝑤𝑖 𝑥𝑖
x1
x2
…
xi
…
xn
w2
w1
wi
wn
Activation
Activation
function
z
If g = identity or sigmoid
Linear/logistic regression
Machine Learning: neural networks
Neural networks are connected layers of artificial neurons
Machine Learning: neural networks
All sorts of
architectures!
Machine Learning: neural networks
Pick activation functions adapted to desired output
For multi-class output, choose Softmax function:
Machine Learning: Deep Learning
POWERFUL CPU/GPU x BIG DATA => LEVERAGE ALGORITHMS
Size matters!
Not deep Deep
Try playground.tensorflow.org
Some Useful Concepts
• Parameters and Hyperparameters
• Underfitting / Overfitting / Bias-variance trade-off
• Training/Dev/Test sets
• Loss or cost function
• Forward propagation / Back-propagation
• Batch vs mini-batch vs stochastic descent
• Dimensionality reduction
• Data augmentation
• Performance Metrics
Some Useful Concepts
Parameters and Hyperparameters
• Parameters are learned from the data
• Hyperparameters are set a priori then tuned
Examples :
Model Parameters Hyperparameters
Linear regression
Coefficients
Intercept
Number of features
k-means Indexing of clusters Number of clusters k
Neural Network
Weights
Biases
Number of layers
Number of neurons per
layers
Activation functions
Learning rate
Epochs / batch size
Etc.
Some Useful Concepts
Underfitting and overfitting
Mismatch between number of parameters and data
Some Useful Concepts
Bias-variance trade-off
• Related to underfitting and overfitting
• Know data well but not too well for generalization
Sweet spot
Some Useful Concepts
Bias-variance trade-off
Low bias: model learned data well
Low variance: model can generalize well
Remedies
High Bias
• Train longer
• Increase model complexity
• more features
• more parameters,
• richer architecture
High Variance
• Get more data
• Decrease model complexity
• less features
• less parameters,
• simpler architecture
• Regularization
• Early stopping
• Drop-out
Some Useful Concepts
Training/dev/test sets
• Training set to fit model with a priori hyper-parameters
• Dev or (cross-)validation set to tune hyper-parameters
• Test set to assess final performance of model on unseen data
• Typical splits 60/20/20 or 80/10/10 or 98/1/1 in deep learning
Some Useful Concepts
Loss function
• Depends on problem tackled
• Measures the fit between current
output and target output
• Must decrease as training goes on:
Source: https://heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0
On average!
Some Useful Concepts
Forward propagation and backpropagation
Forward propagation: get estimates during training and predictions after
Backpropagation: apply chain rule to gradient of loss function to adjust
weights/biases
Some Useful Concepts
Batch vs mini-batch vs stochastic descent
Batch: feed the whole training set at each training epoch
Mini-batch: feed subsets (random or not) at each training epoch
Stochastic descent: mini-batch of size 1
It’s a tradeoff!
Dimensionality reduction
• Too many features
• Expensive to store
• Slowing down computation
• Subject to Dimensionality curse
• Sample space gets harder and harder to fill as dimensions grow
• A reason why too many features lead to overfitting as data become sparse
• More and more data needed to fill the same % of space:
Select the features! And use PCA/ICA/SVD/LDA/QDA/Autoencoders…
Some Useful Concepts
Data Augmentation
• When more data are needed, make up new ones!
• Translate, rotate, flip, crop, lighten/darken, add noise, dephase, etc.
Some Useful Concepts
Some Useful Concepts
Performance metrics
• Compare error to simplest method as a benchmark, e.g. linear
regression or logistic regression
Classification problems
• Accuracy
• Precision-recall / F1-score
• ROC-AUC
• Confusion matrix
• Log-Loss
Regression problems
• MSE / RMSE/ MSPE / MAE
• R2 / Adjusted R2
Source: https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234
NOT DISCUSSED IN
THIS DOCUMENT
Some Useful Concepts
Classifier performance metric
Accuracy
Classified as Classified as
True positive True negative
False negative
False positive
Some Useful Concepts
Classifier performance metric
Accuracy = (TP + TN) / All cases
Classified as Classified as
True positive True negative
False negative
False positive
Some Useful Concepts
Classifier performance metric
Accuracy = (TP + TN) / All cases
Classified as Classified as
• Counts whenever the classifier is right
• Simple and intuitive metric
BUT
• Assigns same cost to false positives
and false negatives
• Use with caution because of the
accuracy paradox: a dumb classifier
based on majority class has better
accuracy!
• Absolutely avoid with highly
imbalanced classes
Some Useful Concepts
Classifier performance metric
Precision vs Recall
Classified as Classified as
True positive True negative
False negative
False positive
Some Useful Concepts
Classifier performance metric
Precision vs Recall
Classified as
True positive
False positive
Precision = TP / (TP + FP)
• High precision means high
selectivity
• A selected sample has high
probability to belong to the
correct class
• Some actual positives have been
brushed off
• A low precision means lots of false
positives
Some Useful Concepts
Classifier performance metric
Precision vs Recall
Classified as Classified as
True positive
False negative
Recall = TP / (TP + FN)
• High recall means most positives
have been identified as such, at the
cost of (some) false positives
• Low recall means lots of false
negatives
Some Useful Concepts
Classifier performance metric
F1-score
• F1-score synthesizes both precision and recall
F1 = 2 * Precision x Recall / ( Precision + Recall)
• Need to take into account the desirable trade-off:
• E.g. cancer diagnostics, better to have a higher recall to minimize
false negatives
• E.g. spam detection, better to let pass some false
negatives than to eliminate legit emails
• E.g. zombie apocalypse scenario,
better to have high precision to avoid
letting infected people into the safe zone…
Some Useful Concepts
Classifier performance metric
AUC-ROC
Area Under Curve – Receiver Operating Characteristics
FPR : False Positive Rate
A good classifier has high sensitivity and high specificity
Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
Some Useful Concepts
Classifier performance metric
AUC-ROC
How good is the model at distinguishing between classes at different
thresholds?
How much do you pay your true positives?
Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
Lots of false positives
Lots of true positives
Few false positives
Few true positives
Ideal case: AUC = 1
Some Useful Concepts
Classifier performance metric
AUC-ROC
Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
https://towardsdatascience.com/choosing-the-right-metric-is-a-huge-issue-99ccbe73de61
Some Useful Concepts
Classifier performance metric
Confusion matrix
• Interesting for analysis of classifier performance on multiclass set
True class
Predicted class
Some Useful Concepts
Source: http://wiki.fast.ai/index.php/Log_Loss
−
1
𝑁
𝑖
𝑁
[𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log(1 − 𝑝𝑖)]
−
1
𝑁
𝑖
𝑁
𝑐=1
𝑀
𝑦𝑖,𝑐log(𝑝𝑖,𝑐)
If more than 2 classes:GoodNot good
Classifier performance metric
Log-Loss
• Adapted to binary outputs and multi-classes data sets (if not too imbalanced)
• Punishes extreme probability values when these are wrong
Some Useful Concepts
• So, which metric to choose??? Well, it depends…
Source: https://medium.com/usf-msds/choosing-the-right-metric-for-evaluating-machine-learning-models-part-2-
86d5649a5428
Classes Balanced Imbalanced
Binary If probability differences are
critical: Log-Loss
If only class prediction important
and threshold tuning: AUC-ROC
score
F1 score is sensitive to threshold,
tune before comparing
Small class >0 or <0: ROC-AUC
score
Small positive class: F1
Multi-class Confusion matrix
Log-Loss
Averaging of Precision/Recall
over classes (macro-averaging)
• There are other metrics: Cohen’s kappa, Jaccard index, G-score…
Machine Learning Tools
Main Python libraries
Name Use Logo
Pandas Data Analysis
Spark Distributed Computing
Scikit-learn
Machine Learning
Toolbox
Keras Deep Learning
TensorFlow Deep Learning
Open-cv Computer Vision
Machine Learning Tools
Artificial Intelligence
Machine Learning
Neural Networks
Deep Learning
Machine Learning Tools
TensorFlow – Keras Domination
Machine Learning Tools
TensorFlow – Keras Domination
Zoom on Scikit-Learn logic
1- Import model
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier
2 - Instantiate model class
clf = svm.SVC(gamma=0.001, C=100.)
knn = KNeighborsClassifier()
3 - Train with the fit() method
knn.fit(iris_X_train, iris_y_train)
4 - Make predictions with predict()
clf.predict(digits.data[-1:])
knn.predict(iris_X_test)
Zoom on Keras logic
1- Import model class
from keras.models import Sequential
2 - Instantiate model class
model = Sequential()
3 - Add layers with the add() method specifying input_dim or input_shape
model.add(Dense(32, input_dim=784))
4 - Add activation functions
model.add(Activation('relu'))
5 - Configure training with compile(loss=,optimizer=, metrics[])
model.compile(optimizer='rmsprop', loss='binary_crossentropy’,
metrics=['accuracy'])
6 - Train with the fit() method
model.fit(data, labels, epochs=10, batch_size=32)
7- Evaluate the model performance with the evaluate() method:
score = model.evaluate(x_test, y_test, verbose=0)
8 – Make predictions with predict():
predictions = model.predict(x_test)
Zoom on TensorFlow logic
1 – Define a computation graph:
2 – Start a TensorFlow session
3 – Actually execute the graph implementing nested loops on epochs and batches
Source: https://www.datacamp.com/community/tutorials/cnn-tensorflow-python
tf.variable to be
optimized (weights and
biases)
tf.constant as needed
tf.placeholder for inputs
All operations have a tf
counterpart
Zoom on TensorFlow: Logic
Basic example:
# tf Graph input
a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)
# Define some operations
add = tf.add(a, b)
mul = tf.multiply(a, b)
# Launch the default graph.
with tf.Session() as sess:
# Run every operation with variable input
print("Addition with variables: %i" % sess.run(add, feed_dict={a: 2, b: 3}))
print("Multiplication with variables: %i" % sess.run(mul, feed_dict={a: 2, b: 3}))
Source: https://www.datacamp.com/community/tutorials/cnn-tensorflow-python
Random Forest concept
Random forest uses decision trees as base learners
Regression
Classification
Decision trees are built so that the
splits are prioritised by the
amount of information provided
Random Forest concept
Random forests are built by applying many decision trees to random
subsets and random feature subsets: ensemble learning (here bagging)
Gradient Boosting concept
• Boosting: creates a series of weak learners where new ones focus on
data hard to classify. At the end of the process all learners are
weighted and combined
• Boosting can lead to
overfitting, stop
early enough!
• Many variants:
Gradient Boosting,
XGBoost, AdaBoost,
Gentle Boost
• XGBoost is state-of-
the-art
Gradient Boosting concept
Gradient Boosting vs Random Forest
Gradient Boosting Random Forest
Base learners
Trees
Linear regression
Trees
Bias-variance of
learners
Stumps: high bias and
low variance
Full trees: low bias and
high variance
Hyperparameters
tuning
Lots! (see next page) Number of trees!
Performance #1 Close 2nd
Gradient Boosting concept
Some important hyper parameters for gradient boosting (XGBoost)
to limit the tree growth:
• max_features
• min_sample_split
• min_samples_leaf
• max_depth
Self-organizing map concept
• Inspired by specialization of neural areas in natural brains
• Initially random vectors with same dimension as the input at each
neuron on the grid
• The closest to given input vector and its neighbours are nudged
toward current input
• Clustering, classification and visualization
• Kohonen 1984
Autoencoder concept
• A neural network whose output equals the input
• Hour-glass shape as data is encoded then decoded
• A way to extract meaningful features
Compressed signal
with reduced dimensions
Autoencoder: MNIST example
Encoder Decoder
32X32
16X16
8x8 8x8
16X16
32X32
CNN concept
• Convolutional Neural Networks are a category of Neural Networks
that have proven very effective in areas such as image recognition
and classification. CNNs have been successful in identifying faces,
objects and traffic signs apart from powering vision in robots and self
driving cars.
• :
Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/
CNN concept
Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/
2D Convolution:
• Apply a filter on the image moving at a certain stride to build a feature map
• Use several filters (depth)
Image
Filter
CNN concept
Sources: https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/
What are “features”?
CNN concept
Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/
2D Convolution:
• Apply a Rectified Linear Unit (ReLU)
CNN concept
Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/
2D Convolution:
• Applying pooling to rectified feature maps
CNN concept
Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/
2D Convolution:
• Applying convolution + ReLU + pooling several times
• Pass output to a traditional Multi Layer Perceptron
• SoftMax output layer provides probabilities per classes
GAN concept
Trying to forge the data distribution
Trying to sort out real from fake
Sharing a common loss
function with opposite
goals (min max)
• A generative adversarial network learns a distribution not a relationship
• Alternatives are variational autoencoders
GAN concept
Source: https://research.nvidia.com/sites/default/files/pubs/2017-10_Progressive-Growing-of/karras2018iclr-paper.pdf
Who are these people???
Machine Learning: references
Papers
• Check out arXiv for machine learning in astro-ph…
MOOCs
• All ML courses on Coursera by Andrew Ng
• Deep Learning A-Z™: Hands-On Artificial Neural Networks on Udemy
• Fast.ai courses
Books
“Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python
Guide for the Analysis of Survey Data” by Željko Ivezić & al.
“Data Science from Scratch with Python: Step-by-Step Beginner Guide for
Statistics, Machine Learning, Deep learning and NLP using Python, Numpy,
Pandas, Scipy, Matplotlib, Sciki-Learn, TensorFlow” by Peter Morgan
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts,
Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
Hands-on exercises roadmap
1) keras_log_reg_EX (data = MNIST)
1) Complete missing lines
2) tf_log_reg_EX (data = iris.csv)
1) Complete missing lines
2) Play with learning rate
3) sk_xgboost_regression_EX (data = Boston)
1) Complete missing lines
2) Play with learning rate
3) Find a good value for n_estimators
4) Have a look at feature importance and sample tree
4) sk_sdss_EX (data = sdss_data.csv)
1) Reply to questions in the notebook as you execute cells after cells
5) tf_AutoEncoder_Fashion_EX (data = fashion-mnist_train.csv and fashion-
mnist_test.csv)
1) Reply to questions
2) Make suggested trials
6) keras_gan_bimodal2_EX (data generated in notebook)
7) Check out Others
KIITOS !!!

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Deep Learning for Non-Programmers
Introduction to Deep Learning for Non-ProgrammersIntroduction to Deep Learning for Non-Programmers
Introduction to Deep Learning for Non-ProgrammersOswald Campesato
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature surveyAkshay Hegde
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep LearningDavid Khosid
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work IIMohamed Loey
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMustafa Aldemir
 
Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningRenārs Liepiņš
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningdoppenhe
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleRoelof Pieters
 
Sparse Distributed Representations: Our Brain's Data Structure
Sparse Distributed Representations: Our Brain's Data Structure Sparse Distributed Representations: Our Brain's Data Structure
Sparse Distributed Representations: Our Brain's Data Structure Numenta
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItHolberton School
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOleg Mygryn
 

Was ist angesagt? (20)

HTM Theory
HTM TheoryHTM Theory
HTM Theory
 
Introduction to Deep Learning for Non-Programmers
Introduction to Deep Learning for Non-ProgrammersIntroduction to Deep Learning for Non-Programmers
Introduction to Deep Learning for Non-Programmers
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement Learning
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Sparse Distributed Representations: Our Brain's Data Structure
Sparse Distributed Representations: Our Brain's Data Structure Sparse Distributed Representations: Our Brain's Data Structure
Sparse Distributed Representations: Our Brain's Data Structure
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 

Ähnlich wie Big Sky Earth 2018 Introduction to machine learning

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMNumenta
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
How Can Machine Learning Help Your Research Forward?
How Can Machine Learning Help Your Research Forward?How Can Machine Learning Help Your Research Forward?
How Can Machine Learning Help Your Research Forward?Wouter Deconinck
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
 
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfMachine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfCarlos Paredes
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil ChomalSunil Chomal
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentationAras Masood
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 

Ähnlich wie Big Sky Earth 2018 Introduction to machine learning (20)

AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
ML.pdf
ML.pdfML.pdf
ML.pdf
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTM
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
How Can Machine Learning Help Your Research Forward?
How Can Machine Learning Help Your Research Forward?How Can Machine Learning Help Your Research Forward?
How Can Machine Learning Help Your Research Forward?
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdfMachine_Learning_with_MATLAB_Seminar_Latest.pdf
Machine_Learning_with_MATLAB_Seminar_Latest.pdf
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil Chomal
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 

Kürzlich hochgeladen

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 

Kürzlich hochgeladen (20)

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 

Big Sky Earth 2018 Introduction to machine learning

  • 1. Pintxos/voileipäpöytä /закуски of Machine Learning BigSkyEarth 2018 Kaarina
  • 2. Menu • What is Machine Learning? • Where does it come from? • What now? • Why now? • Machine Learning in the Sky • The Machine Learning Landscape • Machine Learning Pipeline • Neural Networks • Some useful concepts • Machine Learning Tools • Zoom on selected libraries • Zoom on a few algorithms • Random Forest • Gradient Boosting • Kohonen’s map • Autoencoder • Convolutional Neural Network • Generative Adversarial Network
  • 3. What is Machine Learning? Machine… … Learning?!?
  • 4. What is Machine Learning? Not quite so exciting… Learning? NO! Nor thinking… More algorithms enabling to fit complex data relationships More like advanced statistical inference More like implicit programming More like extracting information dynamic from data for generalization… Sobering thought: linear regression belongs to Machine Learning! That said, some mimicking taking place: ➢ Trying to improve a system’s response to novel perception thanks to experience ➢ Human biology inspired artificial neural networks
  • 5. What is Machine Learning? Artificial Intelligence Machine Learning Neural Networks Deep Learning
  • 6. What is Machine Learning? • “Machine Learning at its most basic is the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world.” – Nvidia • “Machine learning is the science of getting computers to act without being explicitly programmed.” – Stanford • “Machine learning is based on algorithms that can learn from data without relying on rules-based programming.”- McKinsey & Co. • “Machine learning algorithms can figure out how to perform important tasks by generalizing from examples.” – University of Washington • “The field of Machine Learning seeks to answer the question “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” – Carnegie Mellon University Source: https://www.techemergence.com/what-is-machine-learning/
  • 7. What is Machine Learning? To summarize: A set of computing and mathematical techniques whose aim is to achieve human-level or better-than- human performance at cognitive tasks such as: •Predicting •Classifying •Generating signals / interacting •Etc. Source: https://www.techemergence.com/what-is-machine-learning/
  • 9. Differences between ML and Statistical Modeling Statistical Modeling Machine Learning Parametric models that try to “explain” the world. The focus is on modeling causality Non-parametric models that try to “mimic” the world rather than “explain” it. Often uses correlations as proxies to causality Deduce relations for observed quantities by parameter estimation for a pre-specified model of the world Induce relations between observable quantities, main goal is predictive power Small data (1-100 attributes,100- 1000 examples) Large data (10-100K attributes, 1K- 100M examples) Scalability is typically not the major concern Scalability is often critical in applications Based on a probabilistic approach Some ML methods are not probabilistic (SVM, neural networks, clustering, etc.)
  • 10. Where does it come from?
  • 11. Where does it come from? Pioneer age 1943 – McCulloch-Pitts neurons (neuro-scientist and logician) 1950 - Alan Turing envisioned ML 1952 – Arthur Samuel self-improving chess program 1957 – Frank Rosenblatt, perceptron 1959 – David H. Hubel and Torsten Wiesel simple vs complex cells 1960 – Heny J. Kelley Control Theory  Backpropagation 1965 – Alexey Ivakhnenko and V.G. Lapa Group Method of Data Handling, 8-layer DNN 1980 – Kunihiko Fukushima Neocognitron (pattern recog’), led to CNN 1982 – John Hopfield, Hopfield Network, RNN 1985 – Terry Sejnowski NETtalk, English pronounciation 1986 – Rumelhart, Geoffrey Hinton and Romuald J. Williams, backpropagation 1989 – Yann LeCun, handwritten digits with CNN 1989 – Christopher Watkins, Q-learning for Reinforcement Learning Source: https://www.import.io/post/history-of-deep-learning/
  • 12. What now? Modern days 1993 – Jürgen Schmidhuber, 1000-layers RNN 1995 – Corinna Cortes and Vladimir Vapnik, SVM 1997 - Jürgen Schmidhuber and Sepp Hochreiter, LSTM 1997 – IBM’s Deep Blue beat Garry Kasparov 1998 – Yann Lecun, stochastic gradient descent 2009 – Fei-Fei Li, ImageNet 2011 – Alex Krizhevsky, AlexNet CNN 2011 – IBM’s Watson wins Jeopardy 2012 – ImageNet won by AlexNet, better than humans 2014 – Facebook’s DeepFace 2014 – Ian Goodfellow, Generative Adversarial Network 2016-2017 - Google TensorFlow v1.0 in open source Source: https://www.import.io/post/history-of-deep-learning/
  • 13. What now? Modern days Source: https://www.import.io/post/history-of-deep-learning/
  • 14. Why now? ALGORITHMS COMPUTING RESSOURCES DATA MACHINE LEARNING End of the 90’s we had the algorithms but… … we were lacking in other departments…
  • 15. Why now? Trillion-fold increase of computing power and storage Source: http://www.visualcapitalist.com/visualizing-trillion-fold-increase-computing-power/
  • 17. Why now? Data, always more data Source: Computerworld, 2011 Source: Forbes, 2017 TB = 1012 Bytes PB = 1015 Bytes EB = 1018 Bytes ZB = 1021 Bytes
  • 18. Pause for thought: Artificial vs Natural Intelligence Name # of neurons / # of synapses Visuals Caenorhabditis elegans 302 Hydra vulgaris 5,600 Homarus americanus 100,000 Blatta Orientalis 1,000,000 Nile Crocodile 80,500,000 Digital Reasoning NN (2015) ~86,000,000 (est.) / 1.6E11 Rattus Rattatouillensis 200,000,000 Blue and yellow macaw 1,900,000,000 Chimpanzee 28,000,000,000 Homo Sapiens Sapiens 86,000,000,000 / 1.5E14 African Elephant 257,000,000,000 Source: https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons
  • 19. Machine Learning in the Sky • Machine Learning owes a lot to astronomy: least-square regression for orbital parameter estimation (Legendre-Laplace-Gauss)
  • 20. Machine Learning in the Sky Data Big Bang in Astronomy too: 109 object photometric catalogs from USNO, 2MASS, SDSS… 106-8 spectroscopic catalogs from SDSS, LAMOST… 106-7 multi-wavelength source catalogs from WISE, eROSITA… 109 object x 102 epochs surveys like LSST, DES, PTF, CRTS, SNF, VVV, Pan- STARRS, Stripe 82 Spectral-image datacubes from VLA, ALMA, IUFs…
  • 21. Machine Learning in the sky Supernovae of data Sources: Computer World, https://www.lsst.org/scientists/keynumbers LSST DR11 37 109 objects, 7 1012 sources, 5.5 million 3.2 Gigapixel images 30 terabytes of data nightly Final volume of raw image data = 60 PB Final image collection (DR11) = 0.5 EB Final catalog size (DR11) = 15 PB Final disk storage = 0.4 Exabytes Peak number of nodes = 1750 nodes Peak compute power in LSST data centers = about 2 PFLOPS
  • 22. Machine Learning in the Sky Explosion in number of papers too: • From 2010 till 2018, 446 astro-ph papers on arXiv with “Machine Learning” in the abstracts • Only 5 papers in 2010 • 80% of the total were published after September 2014 • In all fields of astrophysics
  • 23. The Machine Learning landscape Supervised Learning Unsupervised Learning Regression Classification Learn real- valued function given (Xi , Yi) Learn discrete class function given (Xi , Ci) Clustering Representation Learning Learn discrete class function given (Xi ) only Learn representing function given (Xi ) only Rn →[1,k] Rn →[1,k] Rn → RkRn → R
  • 24. The Machine Learning landscape Reinforcement Learning Policy Optimization Inverse RL Learn policy function given (si , si+1, ai , ri ) Learn reward function given (si , si+1, ai ) Rn → Rk Rn → R Additional categories Transfer learning Semi-supervised learning Active learning Sequence modeling RL methods for Supervised and Unsupervised Learning
  • 25. The Machine Learning landscape Supervised Learning Unsupervised Learning Regression Classification Linear Regression Trees / CART SVM/SVR Ensemble methods Neural Networks Logistic Regression Naive Bayes Nearest neighbors SVM Decision trees Ensemble methods Clustering Representation Learning K-means Hierarchical clustering Gaussian mixtures Hidden Markov NN (SOM/ART) PCA/ICA Factor models Dim. reduction Manifold learning NN (GAN/VAE/AR)
  • 26. Overview of the Machine Learning landscape Reinforcement Learning Policy Optimization Inverse RL Model-based RL Model-free RL Batch/online RL Linear models RL Neural Networks Model-based IRL Model-free IRL Batch/online IRL MaxEnt IRL Neural networks • Neural networks is the most universal (and scalable) approach • Two types of methods tend to dominate Kaggle competitions: • Ensemble methods (Random Forests and Gradient Boosting) • Deep Learning
  • 27. Overview of the Machine Learning landscape
  • 28. Overview of the Machine Learning landscape Top of the class Top of the class
  • 30. Training Data Preparation Machine Learning Pipeline Raw Dataset Load Data Prepared Data Apply Algorithm Select Features Explore Data Clean Data Normalize ML Algorithms Evaluate & Tune Deploy Model Publish!
  • 31. Machine Learning: neural networks Single neuron: computation structure inspired by nature |g(a)𝑎 = ∑𝑤𝑖 𝑥𝑖 x1 x2 … xi … xn w2 w1 wi wn Activation Activation function z If g = identity or sigmoid Linear/logistic regression
  • 32. Machine Learning: neural networks Neural networks are connected layers of artificial neurons
  • 33. Machine Learning: neural networks All sorts of architectures!
  • 34. Machine Learning: neural networks Pick activation functions adapted to desired output For multi-class output, choose Softmax function:
  • 35. Machine Learning: Deep Learning POWERFUL CPU/GPU x BIG DATA => LEVERAGE ALGORITHMS Size matters! Not deep Deep Try playground.tensorflow.org
  • 36. Some Useful Concepts • Parameters and Hyperparameters • Underfitting / Overfitting / Bias-variance trade-off • Training/Dev/Test sets • Loss or cost function • Forward propagation / Back-propagation • Batch vs mini-batch vs stochastic descent • Dimensionality reduction • Data augmentation • Performance Metrics
  • 37. Some Useful Concepts Parameters and Hyperparameters • Parameters are learned from the data • Hyperparameters are set a priori then tuned Examples : Model Parameters Hyperparameters Linear regression Coefficients Intercept Number of features k-means Indexing of clusters Number of clusters k Neural Network Weights Biases Number of layers Number of neurons per layers Activation functions Learning rate Epochs / batch size Etc.
  • 38. Some Useful Concepts Underfitting and overfitting Mismatch between number of parameters and data
  • 39. Some Useful Concepts Bias-variance trade-off • Related to underfitting and overfitting • Know data well but not too well for generalization Sweet spot
  • 40. Some Useful Concepts Bias-variance trade-off Low bias: model learned data well Low variance: model can generalize well Remedies High Bias • Train longer • Increase model complexity • more features • more parameters, • richer architecture High Variance • Get more data • Decrease model complexity • less features • less parameters, • simpler architecture • Regularization • Early stopping • Drop-out
  • 41. Some Useful Concepts Training/dev/test sets • Training set to fit model with a priori hyper-parameters • Dev or (cross-)validation set to tune hyper-parameters • Test set to assess final performance of model on unseen data • Typical splits 60/20/20 or 80/10/10 or 98/1/1 in deep learning
  • 42. Some Useful Concepts Loss function • Depends on problem tackled • Measures the fit between current output and target output • Must decrease as training goes on: Source: https://heartbeat.fritz.ai/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0 On average!
  • 43. Some Useful Concepts Forward propagation and backpropagation Forward propagation: get estimates during training and predictions after Backpropagation: apply chain rule to gradient of loss function to adjust weights/biases
  • 44. Some Useful Concepts Batch vs mini-batch vs stochastic descent Batch: feed the whole training set at each training epoch Mini-batch: feed subsets (random or not) at each training epoch Stochastic descent: mini-batch of size 1 It’s a tradeoff!
  • 45. Dimensionality reduction • Too many features • Expensive to store • Slowing down computation • Subject to Dimensionality curse • Sample space gets harder and harder to fill as dimensions grow • A reason why too many features lead to overfitting as data become sparse • More and more data needed to fill the same % of space: Select the features! And use PCA/ICA/SVD/LDA/QDA/Autoencoders… Some Useful Concepts
  • 46. Data Augmentation • When more data are needed, make up new ones! • Translate, rotate, flip, crop, lighten/darken, add noise, dephase, etc. Some Useful Concepts
  • 47. Some Useful Concepts Performance metrics • Compare error to simplest method as a benchmark, e.g. linear regression or logistic regression Classification problems • Accuracy • Precision-recall / F1-score • ROC-AUC • Confusion matrix • Log-Loss Regression problems • MSE / RMSE/ MSPE / MAE • R2 / Adjusted R2 Source: https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234 NOT DISCUSSED IN THIS DOCUMENT
  • 48. Some Useful Concepts Classifier performance metric Accuracy Classified as Classified as True positive True negative False negative False positive
  • 49. Some Useful Concepts Classifier performance metric Accuracy = (TP + TN) / All cases Classified as Classified as True positive True negative False negative False positive
  • 50. Some Useful Concepts Classifier performance metric Accuracy = (TP + TN) / All cases Classified as Classified as • Counts whenever the classifier is right • Simple and intuitive metric BUT • Assigns same cost to false positives and false negatives • Use with caution because of the accuracy paradox: a dumb classifier based on majority class has better accuracy! • Absolutely avoid with highly imbalanced classes
  • 51. Some Useful Concepts Classifier performance metric Precision vs Recall Classified as Classified as True positive True negative False negative False positive
  • 52. Some Useful Concepts Classifier performance metric Precision vs Recall Classified as True positive False positive Precision = TP / (TP + FP) • High precision means high selectivity • A selected sample has high probability to belong to the correct class • Some actual positives have been brushed off • A low precision means lots of false positives
  • 53. Some Useful Concepts Classifier performance metric Precision vs Recall Classified as Classified as True positive False negative Recall = TP / (TP + FN) • High recall means most positives have been identified as such, at the cost of (some) false positives • Low recall means lots of false negatives
  • 54. Some Useful Concepts Classifier performance metric F1-score • F1-score synthesizes both precision and recall F1 = 2 * Precision x Recall / ( Precision + Recall) • Need to take into account the desirable trade-off: • E.g. cancer diagnostics, better to have a higher recall to minimize false negatives • E.g. spam detection, better to let pass some false negatives than to eliminate legit emails • E.g. zombie apocalypse scenario, better to have high precision to avoid letting infected people into the safe zone…
  • 55. Some Useful Concepts Classifier performance metric AUC-ROC Area Under Curve – Receiver Operating Characteristics FPR : False Positive Rate A good classifier has high sensitivity and high specificity Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
  • 56. Some Useful Concepts Classifier performance metric AUC-ROC How good is the model at distinguishing between classes at different thresholds? How much do you pay your true positives? Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 Lots of false positives Lots of true positives Few false positives Few true positives Ideal case: AUC = 1
  • 57. Some Useful Concepts Classifier performance metric AUC-ROC Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 https://towardsdatascience.com/choosing-the-right-metric-is-a-huge-issue-99ccbe73de61
  • 58. Some Useful Concepts Classifier performance metric Confusion matrix • Interesting for analysis of classifier performance on multiclass set True class Predicted class
  • 59. Some Useful Concepts Source: http://wiki.fast.ai/index.php/Log_Loss − 1 𝑁 𝑖 𝑁 [𝑦𝑖 log 𝑝𝑖 + 1 − 𝑦𝑖 log(1 − 𝑝𝑖)] − 1 𝑁 𝑖 𝑁 𝑐=1 𝑀 𝑦𝑖,𝑐log(𝑝𝑖,𝑐) If more than 2 classes:GoodNot good Classifier performance metric Log-Loss • Adapted to binary outputs and multi-classes data sets (if not too imbalanced) • Punishes extreme probability values when these are wrong
  • 60. Some Useful Concepts • So, which metric to choose??? Well, it depends… Source: https://medium.com/usf-msds/choosing-the-right-metric-for-evaluating-machine-learning-models-part-2- 86d5649a5428 Classes Balanced Imbalanced Binary If probability differences are critical: Log-Loss If only class prediction important and threshold tuning: AUC-ROC score F1 score is sensitive to threshold, tune before comparing Small class >0 or <0: ROC-AUC score Small positive class: F1 Multi-class Confusion matrix Log-Loss Averaging of Precision/Recall over classes (macro-averaging) • There are other metrics: Cohen’s kappa, Jaccard index, G-score…
  • 61. Machine Learning Tools Main Python libraries Name Use Logo Pandas Data Analysis Spark Distributed Computing Scikit-learn Machine Learning Toolbox Keras Deep Learning TensorFlow Deep Learning Open-cv Computer Vision
  • 62. Machine Learning Tools Artificial Intelligence Machine Learning Neural Networks Deep Learning
  • 63. Machine Learning Tools TensorFlow – Keras Domination
  • 64. Machine Learning Tools TensorFlow – Keras Domination
  • 65. Zoom on Scikit-Learn logic 1- Import model from sklearn import svm from sklearn.neighbors import KNeighborsClassifier 2 - Instantiate model class clf = svm.SVC(gamma=0.001, C=100.) knn = KNeighborsClassifier() 3 - Train with the fit() method knn.fit(iris_X_train, iris_y_train) 4 - Make predictions with predict() clf.predict(digits.data[-1:]) knn.predict(iris_X_test)
  • 66. Zoom on Keras logic 1- Import model class from keras.models import Sequential 2 - Instantiate model class model = Sequential() 3 - Add layers with the add() method specifying input_dim or input_shape model.add(Dense(32, input_dim=784)) 4 - Add activation functions model.add(Activation('relu')) 5 - Configure training with compile(loss=,optimizer=, metrics[]) model.compile(optimizer='rmsprop', loss='binary_crossentropy’, metrics=['accuracy']) 6 - Train with the fit() method model.fit(data, labels, epochs=10, batch_size=32) 7- Evaluate the model performance with the evaluate() method: score = model.evaluate(x_test, y_test, verbose=0) 8 – Make predictions with predict(): predictions = model.predict(x_test)
  • 67. Zoom on TensorFlow logic 1 – Define a computation graph: 2 – Start a TensorFlow session 3 – Actually execute the graph implementing nested loops on epochs and batches Source: https://www.datacamp.com/community/tutorials/cnn-tensorflow-python tf.variable to be optimized (weights and biases) tf.constant as needed tf.placeholder for inputs All operations have a tf counterpart
  • 68. Zoom on TensorFlow: Logic Basic example: # tf Graph input a = tf.placeholder(tf.int16) b = tf.placeholder(tf.int16) # Define some operations add = tf.add(a, b) mul = tf.multiply(a, b) # Launch the default graph. with tf.Session() as sess: # Run every operation with variable input print("Addition with variables: %i" % sess.run(add, feed_dict={a: 2, b: 3})) print("Multiplication with variables: %i" % sess.run(mul, feed_dict={a: 2, b: 3})) Source: https://www.datacamp.com/community/tutorials/cnn-tensorflow-python
  • 69. Random Forest concept Random forest uses decision trees as base learners Regression Classification Decision trees are built so that the splits are prioritised by the amount of information provided
  • 70. Random Forest concept Random forests are built by applying many decision trees to random subsets and random feature subsets: ensemble learning (here bagging)
  • 71. Gradient Boosting concept • Boosting: creates a series of weak learners where new ones focus on data hard to classify. At the end of the process all learners are weighted and combined • Boosting can lead to overfitting, stop early enough! • Many variants: Gradient Boosting, XGBoost, AdaBoost, Gentle Boost • XGBoost is state-of- the-art
  • 72. Gradient Boosting concept Gradient Boosting vs Random Forest Gradient Boosting Random Forest Base learners Trees Linear regression Trees Bias-variance of learners Stumps: high bias and low variance Full trees: low bias and high variance Hyperparameters tuning Lots! (see next page) Number of trees! Performance #1 Close 2nd
  • 73. Gradient Boosting concept Some important hyper parameters for gradient boosting (XGBoost) to limit the tree growth: • max_features • min_sample_split • min_samples_leaf • max_depth
  • 74. Self-organizing map concept • Inspired by specialization of neural areas in natural brains • Initially random vectors with same dimension as the input at each neuron on the grid • The closest to given input vector and its neighbours are nudged toward current input • Clustering, classification and visualization • Kohonen 1984
  • 75. Autoencoder concept • A neural network whose output equals the input • Hour-glass shape as data is encoded then decoded • A way to extract meaningful features Compressed signal with reduced dimensions
  • 76. Autoencoder: MNIST example Encoder Decoder 32X32 16X16 8x8 8x8 16X16 32X32
  • 77. CNN concept • Convolutional Neural Networks are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. CNNs have been successful in identifying faces, objects and traffic signs apart from powering vision in robots and self driving cars. • : Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/
  • 78. CNN concept Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/ 2D Convolution: • Apply a filter on the image moving at a certain stride to build a feature map • Use several filters (depth) Image Filter
  • 82. CNN concept Source: https://www.apsl.net/blog/2017/11/20/use-convolutional-neural-network-image-classification/ 2D Convolution: • Applying convolution + ReLU + pooling several times • Pass output to a traditional Multi Layer Perceptron • SoftMax output layer provides probabilities per classes
  • 83. GAN concept Trying to forge the data distribution Trying to sort out real from fake Sharing a common loss function with opposite goals (min max) • A generative adversarial network learns a distribution not a relationship • Alternatives are variational autoencoders
  • 85. Machine Learning: references Papers • Check out arXiv for machine learning in astro-ph… MOOCs • All ML courses on Coursera by Andrew Ng • Deep Learning A-Z™: Hands-On Artificial Neural Networks on Udemy • Fast.ai courses Books “Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data” by Željko Ivezić & al. “Data Science from Scratch with Python: Step-by-Step Beginner Guide for Statistics, Machine Learning, Deep learning and NLP using Python, Numpy, Pandas, Scipy, Matplotlib, Sciki-Learn, TensorFlow” by Peter Morgan Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
  • 86. Hands-on exercises roadmap 1) keras_log_reg_EX (data = MNIST) 1) Complete missing lines 2) tf_log_reg_EX (data = iris.csv) 1) Complete missing lines 2) Play with learning rate 3) sk_xgboost_regression_EX (data = Boston) 1) Complete missing lines 2) Play with learning rate 3) Find a good value for n_estimators 4) Have a look at feature importance and sample tree 4) sk_sdss_EX (data = sdss_data.csv) 1) Reply to questions in the notebook as you execute cells after cells 5) tf_AutoEncoder_Fashion_EX (data = fashion-mnist_train.csv and fashion- mnist_test.csv) 1) Reply to questions 2) Make suggested trials 6) keras_gan_bimodal2_EX (data generated in notebook) 7) Check out Others

Hinweis der Redaktion

  1. ML comes from Optimization problems Statistics Computer science
  2. Specialized/Applied vs generalized AI Artificial intelligence is wider and deals with different approaches like the symbolic approach Also AI is about agents perceiving an environment and trying to act at best, to perform at best
  3. Cloud coputing because you need speed and memory data visualization because the data sets and results are complex Business here is your branch of science i.e. astrophysics because it brings a lot in terms of insights (ML skills won’t replace your astrophysics instinct)
  4. DeepFace is a deep learning facial recognition system created by a research group at Facebook. It identifies human faces in digital images. It employs a nine-layer neural net with over 120 million connection weights, and was trained on four million images uploaded by Facebook users. 97% accuracy, 9 layers, 120 million weights
  5. Side note: It takes roughly 3 chimpanzees to run the US
  6. TB = 10^12 Bytes PB = 10^15 Bytes Exabytes = 10^18 Bytes ZettaBytes = 10^21 bytes
  7. Feature extraction Dimension reduction Semi-supervised learning Active learning Sequence modeling RL methods for Supervised and Unsupervised Learning
  8. Independent Component Analysis Clustering as simple to understand algorithms THEY ARE IMPORTANT
  9. SKIP Tthe RL part
  10. Coming from a pro KAGGLE competitor
  11. Matrix multiplications + element wise operations
  12. Sigmoid for classification Hyperbolic for class or reg ReLu for reg Softmax provides probabilities Example in word prediction in a smartphone, the 3 highest probabilities There are others
  13. Overfitting is to be avoided in ML If you have as many parameters as you have examples, you can learn perfectly the data!
  14. You need to split the data to get an Unbiassed evaluation
  15. LogLoss or Softmax
  16. From one epoch to another the weights are updated Trade-off between time to update and optimality of direction towards minimum
  17. Translate, rotate, crop, lighten/darken, add noise, dephase, etc.
  18. Mean Squared Prediction Error Mean Absolute Error Based on summing or averaging the difference between true value and estimation
  19. Harmonic mean F1 Spam needs high precision and will have low recall
  20. The AUC is independent of the threshold (global metric)
  21. An ideal one is purely diagonal
  22. Y_ic = 1 if sample i belongs to class c Symmetric function! Need probabilities
  23. Mean Squared Prediction Error Mean Absolute Error Based on summing or averaging the difference between true value and estimation
  24. The link between data array and input to the graph is done via feed_dict
  25. A decision tree can totally overfit as it can totally split the feature space
  26. 2D convolutions Blurried reconstruction
  27. makes the input representations (feature dimension) smaller and more manageable reduces the number of parameters and computations in the network, therefore, controlling overfitting 16 makes the network invariant to small transformations, distortions and translations in the input image (a small distortion in input will not change the output of Pooling – since we take the maximum / average value in a local neighborhood). helps us arrive at an almost scale invariant representation of our image (the exact term is “equivariant”). This is very powerful since we can detect objects in an image no matter where they are located (read 18 for details).
  28. makes the input representations (feature dimension) smaller and more manageable reduces the number of parameters and computations in the network, therefore, controlling overfitting 16 makes the network invariant to small transformations, distortions and translations in the input image (a small distortion in input will not change the output of Pooling – since we take the maximum / average value in a local neighborhood). helps us arrive at an almost scale invariant representation of our image (the exact term is “equivariant”). This is very powerful since we can detect objects in an image no matter where they are located (read 18 for details).