SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Introduction to
Machine/Statistical Learning
peishen.wu@gmail.com
Taipei Hackerspace, 2014.9.20
The purpose of this talk
• Not to develop robust understanding of ML
algorithms nor to derive them
• But to provide sufficient basis to do applied
predictive modeling
• Our goal is to do prediction modeling, building
accurate models by utilizing statistical
principles, feature engineering, model tuning,
applying appropriate ML and do error analysis
Preliminary outline
• Model purpose – for prediction, for explanation
• The basic study design of Machine learning
– Model Representation
– Classification vs. Regression Problems
– Supervised vs. Unsupervised Learning
• Model Assessment & Selection
– Interplay between Bias, Variance & Complexity
– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Ex. Models for Explanation
Wong, P. T. P. (2014). Viktor Frankl’s meaning seeking model and positive psychology.
Coursera Course, Machine learning by Andrew Ng
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning
Receptive field in
Humans
Preliminary outline
• Model purpose – for prediction, for explanation
• The basic study design of Machine learning
– Model Representation
– Classification vs. Regression Problems
– Supervised vs. Unsupervised Learning
• Model Assessment & Selection
– Interplay between Bias, Variance & Complexity
– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Andrew'Ng'
Training'Set'
Learning'Algorithm'
h"Size'of'
house'
Es6mated'
price'
How(do(we(represent(h(?(
Linear'regression'with'one'variable.'
Univariate'linear'regression.'
Coursera Course, Machine learning by Andrew Ng
Andrew'Ng'
How'to'choose'''''‘s'?'
Training'Set'
Hypothesis:'
‘s:''''''Parameters'
x y
2104' 460'
1416' 232'
1534' 315'
852' 178'
…' …'
Coursera Course, Machine learning by Andrew Ng
Andrew'Ng'
Hypothesis:'
Parameters:'
Cost'Func6on:'
Goal:'
Coursera Course, Machine learning by Andrew Ng
Andrew'Ng'
(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'
0'
100'
200'
300'
400'
500'
0' 1000' 2000' 3000'
Price'($)''
in'1000’s'
Size'in'feet2'(x)'
Coursera Course, Machine learning by Andrew Ng
Andrew'Ng'
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Abstract supervised setup
• Training :
• : input vector
• y : response variable
– : binary classification
– : regression
– what we want to be able to predict, having
observed some new .
x i =
⇧
⇧
⇧
⇤
xi ,1
xi ,2
...
xi ,n
⇥
⌃
⌃
⌃
⌅
, xi ,j ∈ R
Independent Variables
Predictors
Features
Dependent Variables
Responses
Preliminary outline
• Model purpose – for prediction, for explanation
• The basic study design of Machine learning
– Model Representation
– Classification vs. Regression Problems
– Supervised vs. Unsupervised Learning
• Model Assessment & Selection
– Interplay between Bias, Variance & Complexity
– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Andrew'Ng'
Bias/ variance(
High'bias'
(underfit)'
“Just'right”' High'variance'
(overfit)'
Price'
Size'
Price'
Size'
Price'
Size'
Coursera Course, Machine learning by Andrew Ng
To recap: some definitions
• Variance
– the amount which the prediction would change if we
estimated it using a different training data set
• Bias
– the error that is introduced by approximating a real-
life problem
– more flexible methods result in less bias, but more
variance
• Flexibility = degrees of freedom ~ Complexity
– Can be modified by regularization parameter
– or increase/reduce number of features
Study design – training/test sets
raining and Quiz/Test sets come from di&erent dis$
ributions. Since submissions to the competition can
nly be done once per day, this Probe set allows for a
ghter feedback loop for evaluation of promising
models.
An Introduction to Statistical Learning, Ch 5 Resampling Methods
In practice – training/CV/test set
• Training set
– used to fit the models
• Validation set
– used to estimate prediction error for model selection
• Test set
– used for assessment of the generalization error of the final chosen
model.
The Elements of Statistical Learning ch7. Model Assessment and Selection
Andrew'Ng'
Bias/ variance(
degree'of'polynomial'd'
error'
Training'error:'
Cross'validaDon'error:'
Coursera Course, Machine learning by Andrew Ng
參數來源 θ (x(i)
, y(i)
)
Training error Training set Training set
CV error Training set CV set
Andrew'Ng'
Bias/ variance(as(a(func5on(of(the(regulariza5on(parameter(
Coursera Course, Machine learning by Andrew Ng
Andrew'Ng'
High(bias(
(training'set'size)'
error'
size'
price'
size'
price'
If'a'learning'algorithm'is'suffering'
from' high' bias,' geJ ng' more'
training' data' will' not' (by' itself)'
help'much.'
Andrew'Ng'
High(variance(
(training'set'size)'
error'
size'
price'
size'
price'
If'a'learning'algorithm'is'suffering'
from' high' variance,' geJ ng' more'
training'data'is'likely'to'help.'
(and'small''''')'
Coursera Course, Machine
learning by Andrew Ng
The Bias-Variance Trade-Off
An Introduction to Statistical Learning, Ch 5 Resampling Methods
Cross validation – single split
An Introduction to Statistical Learning, Ch 5 Resampling Methods
Cross validation – n = 10 folds
An Introduction to Statistical Learning, Ch 5 Resampling Methods
K-fold Cross validation ensures
better estimation of test error
Compare these two CV methods,
what’s different and what’s wrong ?
1. Screen the predictors
– find a subset of “good”
predictors that show fairly
strong (univariate)
correlation with the class
labels
1. Build a multivariate
classifier
– Using just this subset of
predictors
1. Apply cross-validation
– to estimate the unknown
tuning parameters and to
estimate the prediction
error of the final model.
1. Divide the samples into K
cross-validation folds (groups)
at random
2. For each fold k = 1,2,...,K
a. Find a subset of “good”
predictors that show fairly strong
(univariate) correlation with the
class labels, using all of the
samples except those in fold k.
b. Using just this subset of
predictors, build a multivariate
classifier, using all of the samples
except those in fold k.
c. Use the classifier to predict the
class labels for the samples in
fold k.
The predictors chosen by the left
method have an unfair advantage
• they were chosen in step
(1) on the basis of all of
the samples.
• Leaving samples out after
the variables have been
selected does not
correctly mimic the
application of the
classifier to a completely
independent test set
• these predictors “have
already seen” the left out
samples.
The Elements of Statistical Learning ch7. Model Assessment and Selection
Recap principles from Statistics
– K-fold CV is a form of random sampling
Coursera Course, Data Analysis and Statistical Inference by Dr. Mine Çetinkaya-Rundel
ML algorithm performance is
dependent on the underlying data
An Introduction to Statistical
Learning, Ch 8 Tree methods
More issues to be covered in next talk
• Remedies for Severe Class Imbalance
• Measuring Predictor Importance
• Factors That Can Affect Model Performance
Preliminary outline
• Model purpose – for prediction, for explanation
• The basic study design of Machine learning
– Model Representation
– Classification vs. Regression Problems
– Supervised vs. Unsupervised Learning
• Model Assessment & Selection
– Interplay between Bias, Variance & Complexity
– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Back then, the prevailing wisdom
• MIT's Marvin Minsky - a "Society of Mind”
– To achieve AI, it was believed, engineers would
have to build and combine thousands of individual
computing units or agents.
– One group of agents, or module, would handle
vision, another language, and so on…
The Single Algorithm Hypothesis
• Human intelligence stems from a single learning
algorithm
– In 1978 paper by Vernon Mountcastle: An Organizing
Principle for Cerebral Function
– Jeff Hawkins “Memory-prediction framework”
• Origin
– Neuroplasticity during brain development
– Potential of other cortical areas to cover previous lost
function after brain injury (eg. stroke)
Deep Learning - 1
• Single Algorithm
– neural networks to mimic human brain behavior
• A basic layer of artificial neurons that can detect simple
things like the edges of a particular shape
• The next layer could then piece together these edges
to identify the larger shape
• Then the shapes could be strung together to
understand an object
• Key: the software does all this on its own
– give the system a lot of data, so it can discover by
itself what some of the concepts in the world are
The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, Wired
Deep Learning - 2
• This approach is inspired by how scientists believe that
humans learn.
– The algorithm didn’t know the word “cat” — Ng had to
supply that — but over time, it learned to identify the
furry creatures we know as cats, all on its own.
– As babies, we watch our environments and start to
understand the structure of objects we encounter, but
until a parent tells us what it is, we can’t put a name to it.
• Building High-level Features Using Large Scale
Unsupervised Learning
The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, Wired
Building High-level Features Using Large Scale Unsupervised Learning, QV Le, et al
References
Stanford Andrew Ng course

Weitere ähnliche Inhalte

Was ist angesagt?

LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
butest
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
butest
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 

Was ist angesagt? (20)

Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Alanoud alqoufi inductive learning
Alanoud alqoufi inductive learningAlanoud alqoufi inductive learning
Alanoud alqoufi inductive learning
 
ML Basics
ML BasicsML Basics
ML Basics
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Learning
LearningLearning
Learning
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 

Ähnlich wie Statistical learning intro

ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
AvijitChaudhuri3
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 

Ähnlich wie Statistical learning intro (20)

STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
 
part3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptxpart3Module 3 ppt_with classification.pptx
part3Module 3 ppt_with classification.pptx
 
Machine learning --Introduction.pptx
Machine learning --Introduction.pptxMachine learning --Introduction.pptx
Machine learning --Introduction.pptx
 
NPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfNPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdf
 
ML
MLML
ML
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML slide share.pptx
ML slide share.pptxML slide share.pptx
ML slide share.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Machine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfMachine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdf
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 

Mehr von Pei-shen (James) Wu

Mehr von Pei-shen (James) Wu (9)

解決正確的問題 - 如何讓數據發揮影響力?
解決正確的問題 - 如何讓數據發揮影響力?解決正確的問題 - 如何讓數據發揮影響力?
解決正確的問題 - 如何讓數據發揮影響力?
 
TaipowerHackathon2016
TaipowerHackathon2016TaipowerHackathon2016
TaipowerHackathon2016
 
健保核刪
健保核刪健保核刪
健保核刪
 
上台本事 闡德
上台本事 闡德上台本事 闡德
上台本事 闡德
 
上台本事
上台本事上台本事
上台本事
 
Zero to one
Zero to oneZero to one
Zero to one
 
NOAC use - brief handout
NOAC use - brief handoutNOAC use - brief handout
NOAC use - brief handout
 
Applying project management 20150210 @ ASCC
Applying project management 20150210 @ ASCCApplying project management 20150210 @ ASCC
Applying project management 20150210 @ ASCC
 
Connectome夥伴討論的發想
Connectome夥伴討論的發想Connectome夥伴討論的發想
Connectome夥伴討論的發想
 

Kürzlich hochgeladen

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Kürzlich hochgeladen (20)

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 

Statistical learning intro

  • 2. The purpose of this talk • Not to develop robust understanding of ML algorithms nor to derive them • But to provide sufficient basis to do applied predictive modeling • Our goal is to do prediction modeling, building accurate models by utilizing statistical principles, feature engineering, model tuning, applying appropriate ML and do error analysis
  • 3. Preliminary outline • Model purpose – for prediction, for explanation • The basic study design of Machine learning – Model Representation – Classification vs. Regression Problems – Supervised vs. Unsupervised Learning • Model Assessment & Selection – Interplay between Bias, Variance & Complexity – Cross Validation: The wrong/correct way of doing it • The Single Algorithm Hypothesis & Deep Learning
  • 4.
  • 5. Ex. Models for Explanation Wong, P. T. P. (2014). Viktor Frankl’s meaning seeking model and positive psychology.
  • 6. Coursera Course, Machine learning by Andrew Ng
  • 7. Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning
  • 9. Preliminary outline • Model purpose – for prediction, for explanation • The basic study design of Machine learning – Model Representation – Classification vs. Regression Problems – Supervised vs. Unsupervised Learning • Model Assessment & Selection – Interplay between Bias, Variance & Complexity – Cross Validation: The wrong/correct way of doing it • The Single Algorithm Hypothesis & Deep Learning
  • 11. Andrew'Ng' How'to'choose'''''‘s'?' Training'Set' Hypothesis:' ‘s:''''''Parameters' x y 2104' 460' 1416' 232' 1534' 315' 852' 178' …' …' Coursera Course, Machine learning by Andrew Ng
  • 13. Andrew'Ng' (for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')' 0' 100' 200' 300' 400' 500' 0' 1000' 2000' 3000' Price'($)'' in'1000’s' Size'in'feet2'(x)' Coursera Course, Machine learning by Andrew Ng
  • 14. Andrew'Ng' Coursera Course, Machine learning by Andrew Ng
  • 15. Coursera Course, Machine learning by Andrew Ng
  • 16. Coursera Course, Machine learning by Andrew Ng
  • 17. Coursera Course, Machine learning by Andrew Ng
  • 18. Coursera Course, Machine learning by Andrew Ng
  • 19. Abstract supervised setup • Training : • : input vector • y : response variable – : binary classification – : regression – what we want to be able to predict, having observed some new . x i = ⇧ ⇧ ⇧ ⇤ xi ,1 xi ,2 ... xi ,n ⇥ ⌃ ⌃ ⌃ ⌅ , xi ,j ∈ R Independent Variables Predictors Features Dependent Variables Responses
  • 20. Preliminary outline • Model purpose – for prediction, for explanation • The basic study design of Machine learning – Model Representation – Classification vs. Regression Problems – Supervised vs. Unsupervised Learning • Model Assessment & Selection – Interplay between Bias, Variance & Complexity – Cross Validation: The wrong/correct way of doing it • The Single Algorithm Hypothesis & Deep Learning
  • 21. Coursera Course, Machine learning by Andrew Ng
  • 22. Coursera Course, Machine learning by Andrew Ng
  • 23. Coursera Course, Machine learning by Andrew Ng
  • 24. Coursera Course, Machine learning by Andrew Ng
  • 25. Coursera Course, Machine learning by Andrew Ng
  • 27. To recap: some definitions • Variance – the amount which the prediction would change if we estimated it using a different training data set • Bias – the error that is introduced by approximating a real- life problem – more flexible methods result in less bias, but more variance • Flexibility = degrees of freedom ~ Complexity – Can be modified by regularization parameter – or increase/reduce number of features
  • 28. Study design – training/test sets raining and Quiz/Test sets come from di&erent dis$ ributions. Since submissions to the competition can nly be done once per day, this Probe set allows for a ghter feedback loop for evaluation of promising models. An Introduction to Statistical Learning, Ch 5 Resampling Methods
  • 29. In practice – training/CV/test set • Training set – used to fit the models • Validation set – used to estimate prediction error for model selection • Test set – used for assessment of the generalization error of the final chosen model. The Elements of Statistical Learning ch7. Model Assessment and Selection
  • 30. Andrew'Ng' Bias/ variance( degree'of'polynomial'd' error' Training'error:' Cross'validaDon'error:' Coursera Course, Machine learning by Andrew Ng 參數來源 θ (x(i) , y(i) ) Training error Training set Training set CV error Training set CV set
  • 32. Andrew'Ng' High(bias( (training'set'size)' error' size' price' size' price' If'a'learning'algorithm'is'suffering' from' high' bias,' geJ ng' more' training' data' will' not' (by' itself)' help'much.' Andrew'Ng' High(variance( (training'set'size)' error' size' price' size' price' If'a'learning'algorithm'is'suffering' from' high' variance,' geJ ng' more' training'data'is'likely'to'help.' (and'small''''')' Coursera Course, Machine learning by Andrew Ng
  • 33. The Bias-Variance Trade-Off An Introduction to Statistical Learning, Ch 5 Resampling Methods
  • 34. Cross validation – single split An Introduction to Statistical Learning, Ch 5 Resampling Methods
  • 35. Cross validation – n = 10 folds An Introduction to Statistical Learning, Ch 5 Resampling Methods
  • 36. K-fold Cross validation ensures better estimation of test error
  • 37. Compare these two CV methods, what’s different and what’s wrong ? 1. Screen the predictors – find a subset of “good” predictors that show fairly strong (univariate) correlation with the class labels 1. Build a multivariate classifier – Using just this subset of predictors 1. Apply cross-validation – to estimate the unknown tuning parameters and to estimate the prediction error of the final model. 1. Divide the samples into K cross-validation folds (groups) at random 2. For each fold k = 1,2,...,K a. Find a subset of “good” predictors that show fairly strong (univariate) correlation with the class labels, using all of the samples except those in fold k. b. Using just this subset of predictors, build a multivariate classifier, using all of the samples except those in fold k. c. Use the classifier to predict the class labels for the samples in fold k.
  • 38. The predictors chosen by the left method have an unfair advantage • they were chosen in step (1) on the basis of all of the samples. • Leaving samples out after the variables have been selected does not correctly mimic the application of the classifier to a completely independent test set • these predictors “have already seen” the left out samples. The Elements of Statistical Learning ch7. Model Assessment and Selection
  • 39. Recap principles from Statistics – K-fold CV is a form of random sampling Coursera Course, Data Analysis and Statistical Inference by Dr. Mine Çetinkaya-Rundel
  • 40. ML algorithm performance is dependent on the underlying data An Introduction to Statistical Learning, Ch 8 Tree methods
  • 41. More issues to be covered in next talk • Remedies for Severe Class Imbalance • Measuring Predictor Importance • Factors That Can Affect Model Performance
  • 42. Preliminary outline • Model purpose – for prediction, for explanation • The basic study design of Machine learning – Model Representation – Classification vs. Regression Problems – Supervised vs. Unsupervised Learning • Model Assessment & Selection – Interplay between Bias, Variance & Complexity – Cross Validation: The wrong/correct way of doing it • The Single Algorithm Hypothesis & Deep Learning
  • 43. Back then, the prevailing wisdom • MIT's Marvin Minsky - a "Society of Mind” – To achieve AI, it was believed, engineers would have to build and combine thousands of individual computing units or agents. – One group of agents, or module, would handle vision, another language, and so on…
  • 44. The Single Algorithm Hypothesis • Human intelligence stems from a single learning algorithm – In 1978 paper by Vernon Mountcastle: An Organizing Principle for Cerebral Function – Jeff Hawkins “Memory-prediction framework” • Origin – Neuroplasticity during brain development – Potential of other cortical areas to cover previous lost function after brain injury (eg. stroke)
  • 45. Deep Learning - 1 • Single Algorithm – neural networks to mimic human brain behavior • A basic layer of artificial neurons that can detect simple things like the edges of a particular shape • The next layer could then piece together these edges to identify the larger shape • Then the shapes could be strung together to understand an object • Key: the software does all this on its own – give the system a lot of data, so it can discover by itself what some of the concepts in the world are The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, Wired
  • 46. Deep Learning - 2 • This approach is inspired by how scientists believe that humans learn. – The algorithm didn’t know the word “cat” — Ng had to supply that — but over time, it learned to identify the furry creatures we know as cats, all on its own. – As babies, we watch our environments and start to understand the structure of objects we encounter, but until a parent tells us what it is, we can’t put a name to it. • Building High-level Features Using Large Scale Unsupervised Learning The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, Wired Building High-level Features Using Large Scale Unsupervised Learning, QV Le, et al