I2ml3e chap2

•Als PPTX, PDF herunterladen•

0 gefällt mir•36 views

sreedevibalasubraman

Intro to ML

Ingenieurwesen

INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
Lecture Slides for

Learning a Class from
Examples
3
 Class C of a “family car”
 Prediction: Is car x a family car?
 Knowledge extraction: What do people expect from a
family car?
 Output:
Positive (+) and negative (–) examples
 Input representation:
x1: price, x2 : engine power

Training set X
N
t
t
t
,r 1
}
{ 
 x
X




negative
is
if
positive
is
if
x
x
0
1
r
4







2
1
x
x
x

Class C
5
   
2
1
2
1 power
engine
AND
price e
e
p
p 




Hypothesis class H




negative
is
says
if
positive
is
says
if
)
(
x
x
x
h
h
h
0
1
 
 




N
t
t
t
r
h
h
E
1
1 x
)
|
( X
6
Error of h on H

S, G, and the Version Space
7
most specific hypothesis, S
most general hypothesis, G
h H, between S and G is
consistent and make up the
version space
(Mitchell, 1997)

VC Dimension
9
 N points can be labeled in 2N ways as +/–
 H shatters N if there
exists h  H consistent
for any of these:
VC(H ) = N
An axis-aligned rectangle shatters 4 points only !

Probably Approximately Correct
(PAC) Learning
10
 How many training examples N should we have, such that with
probability at least 1 ‒ δ, h has error at most ε ?
(Blumer et al., 1989)
 Each strip is at most ε/4
 Pr that we miss a strip 1‒ ε/4
 Pr that N instances miss a strip (1 ‒ ε/4)N
 Pr that N instances miss 4 strips 4(1 ‒ ε/4)N
 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)
 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

Noise and Model Complexity
11
Use the simpler one because
 Simpler to use
(lower computational
complexity)
 Easier to train (lower
space complexity)
 Easier to explain
(more interpretable)
 Generalizes better (lower
variance - Occam’s razor)

Multiple Classes, Ci i=1,...,K
N
t
t
t
,r 1
}
{ 
 x
X







,
if
if
i
j
r
j
t
i
t
t
i
C
C
x
x
0
1
 







,
if
if
i
j
h
j
t
i
t
t
i
C
C
x
x
x
0
1
12
Train hypotheses
hi(x), i =1,...,K:

Regression
  0
1 w
x
w
x
g 

  0
1
2
2 w
x
w
x
w
x
g 


   
 




N
t
t
t
x
g
r
N
g
E
1
2
1
X
|
13
   
 





N
t
t
t
w
x
w
r
N
w
w
E
1
2
0
1
0
1
1
X
|
,
 
  




 
t
t
t
N
t
t
t
x
f
r
r
r
x 1
,
X

Model Selection &
Generalization
14
 Learning is an ill-posed problem; data is not
sufficient to find a unique solution
 The need for inductive bias, assumptions
about H
 Generalization: How well a model performs on
new data
 Overfitting: H more complex than C or f
 Underfitting: H less complex than C or f

Triple Trade-Off
15
 There is a trade-off between three factors
(Dietterich, 2003):
1. Complexity of H, c (H),
2. Training set size, N,
3. Generalization error, E, on new data
 As N,E
 As c (H),first Eand then E

Cross-Validation
16
 To estimate generalization error, we need data
unseen during training. We split the data as
 Training set (50%)
 Validation set (25%)
 Test (publication) set (25%)
 Resampling when there is few data

Dimensions of a Supervised
Learner
1. Model:
2. Loss function:
3. Optimization procedure:
 

|
x
g
   
 


t
t
t
g
r
L
E 
 |
,
| x
X
17
 
X
|
min
arg
* 


E


Weitere ähnliche Inhalte

Was ist angesagt?

Practicle application of maxima and minimaBritish Council

Intractable likelihoodsChristian Robert

Concepts of Maxima And MinimaJitin Pillai

Molodtsov's Soft Set Theory and its Applications in Decision Makinginventionjournals

Maximums and minimum rubimedina01

Mb0048 operations researchsmumbahelp

NBBC15, Reyjavik, June 08, 2015Christian Robert

3.1 Extreme Values of FunctionsSharon Henry

Algorithm of some numerical /computational methodsChandan

Mathematical blog #2Steven Pauly

Classification and regression based on derivatives: a consistency result for ...tuxette

Lecture5 xingTianlu Wang

Permutation graphsandapplicationsJoe Krall

Reliable ABC model choice via random forestsChristian Robert

Lar calc10 ch03_sec1Institute of Applied Technology

Dirichlet processes and ApplicationsSaurav Jha

Math HW/SWFerdinand, Jr. Sta. Ana

Was ist angesagt? (17)

Practicle application of maxima and minima

Intractable likelihoods

Concepts of Maxima And Minima

Molodtsov's Soft Set Theory and its Applications in Decision Making

Maximums and minimum

Mb0048 operations research

NBBC15, Reyjavik, June 08, 2015

3.1 Extreme Values of Functions

Algorithm of some numerical /computational methods

Mathematical blog #2

Classification and regression based on derivatives: a consistency result for ...

Lecture5 xing

Permutation graphsandapplications

Reliable ABC model choice via random forests

Lar calc10 ch03_sec1

Dirichlet processes and Applications

Math HW/SW

Ähnlich wie I2ml3e chap2

Alpaydin - Chapter 2butest

On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen

Bayesian Deep LearningRayKim51

ML unit-1.pptxSwarnaKumariChinni

Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka

Kernels and Support Vector MachinesEdgar Marca

Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen

Cheatsheet supervised-learningSteve Nouri

i2ml3e-chap3.pptxwaseem214905

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru

QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...The Statistical and Applied Mathematical Sciences Institute

Bayesian Subset SimulationJulien Bect

ENBIS 2018 presentation on Deep k-Meanstthonet

Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli

Thesis oral defenseFan Zhitao

Monte Carlo MethodsJames Bell

Gaussian process in machine learningVARUN KUMAR

Tensor Train data format for uncertainty quantificationAlexander Litvinenko

Econometrics 2017-graduate-3Arthur Charpentier

Meta-learning and the ELBOYoonho Lee

Ähnlich wie I2ml3e chap2 (20)

Alpaydin - Chapter 2

On learning statistical mixtures maximizing the complete likelihood

Bayesian Deep Learning

ML unit-1.pptx

Murphy: Machine learning A probabilistic perspective: Ch.9

Kernels and Support Vector Machines

Optimal interval clustering: Application to Bregman clustering and statistica...

Cheatsheet supervised-learning

i2ml3e-chap3.pptx

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...

QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...

Bayesian Subset Simulation

ENBIS 2018 presentation on Deep k-Means

Maximum likelihood estimation of regularisation parameters in inverse problem...

Thesis oral defense

Monte Carlo Methods

Gaussian process in machine learning

Tensor Train data format for uncertainty quantification

Econometrics 2017-graduate-3

Meta-learning and the ELBO

Kürzlich hochgeladen

Unleashing the Power of the SORA AI lastest leapRishantSharmaFr

DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsArindam Chakraborty, Ph.D., P.E. (CA, TX)

COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Kandungan 087776558899

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Minimum and Maximum Modes of microprocessor 8086anil_gaur

Unit 2- Effective stress & Permeability.pdfRagavanV2

Integrated Test Rig For HTFE-25 - NeometrixNeometrix_Engineering_Pvt_Ltd

Unit 1 - Soil Classification and Compaction.pdfRagavanV2

A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066

Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1

Thermal Engineering -unit - III & IV.pptDineshKumar4165

22-prompt engineering noted slide shown.pdf203318pmpc

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)

Kürzlich hochgeladen (20)

Unleashing the Power of the SORA AI lastest leap

DC MACHINE-Motoring and generation, Armature circuit equation

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads

COST-EFFETIVE and Energy Efficient BUILDINGS ptx

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7

Minimum and Maximum Modes of microprocessor 8086

Unit 2- Effective stress & Permeability.pdf

Integrated Test Rig For HTFE-25 - Neometrix

Unit 1 - Soil Classification and Compaction.pdf

A Study of Urban Area Plan for Pabna Municipality

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756

Work-Permit-Receiver-in-Saudi-Aramco.pptx

Thermal Engineering -unit - III & IV.ppt

22-prompt engineering noted slide shown.pdf

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...

I2ml3e chap2

1. INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e Lecture Slides for

2. CHAPTER 2: SUPERVISED LEARNING

3. Learning a Class from Examples 3  Class C of a “family car”  Prediction: Is car x a family car?  Knowledge extraction: What do people expect from a family car?  Output: Positive (+) and negative (–) examples  Input representation: x1: price, x2 : engine power

4. Training set X N t t t ,r 1 } {   x X     negative is if positive is if x x 0 1 r 4        2 1 x x x

5. Class C 5     2 1 2 1 power engine AND price e e p p    

6. Hypothesis class H     negative is says if positive is says if ) ( x x x h h h 0 1         N t t t r h h E 1 1 x ) | ( X 6 Error of h on H

7. S, G, and the Version Space 7 most specific hypothesis, S most general hypothesis, G h H, between S and G is consistent and make up the version space (Mitchell, 1997)

8. Margin 8  Choose h with largest margin

9. VC Dimension 9  N points can be labeled in 2N ways as +/–  H shatters N if there exists h  H consistent for any of these: VC(H ) = N An axis-aligned rectangle shatters 4 points only !

10. Probably Approximately Correct (PAC) Learning 10  How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989)  Each strip is at most ε/4  Pr that we miss a strip 1‒ ε/4  Pr that N instances miss a strip (1 ‒ ε/4)N  Pr that N instances miss 4 strips 4(1 ‒ ε/4)N  4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)  4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

11. Noise and Model Complexity 11 Use the simpler one because  Simpler to use (lower computational complexity)  Easier to train (lower space complexity)  Easier to explain (more interpretable)  Generalizes better (lower variance - Occam’s razor)

12. Multiple Classes, Ci i=1,...,K N t t t ,r 1 } {   x X        , if if i j r j t i t t i C C x x 0 1          , if if i j h j t i t t i C C x x x 0 1 12 Train hypotheses hi(x), i =1,...,K:

13. Regression   0 1 w x w x g     0 1 2 2 w x w x w x g              N t t t x g r N g E 1 2 1 X | 13            N t t t w x w r N w w E 1 2 0 1 0 1 1 X | ,            t t t N t t t x f r r r x 1 , X

14. Model Selection & Generalization 14  Learning is an ill-posed problem; data is not sufficient to find a unique solution  The need for inductive bias, assumptions about H  Generalization: How well a model performs on new data  Overfitting: H more complex than C or f  Underfitting: H less complex than C or f

15. Triple Trade-Off 15  There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c (H), 2. Training set size, N, 3. Generalization error, E, on new data  As N,E  As c (H),first Eand then E

16. Cross-Validation 16  To estimate generalization error, we need data unseen during training. We split the data as  Training set (50%)  Validation set (25%)  Test (publication) set (25%)  Resampling when there is few data

17. Dimensions of a Supervised Learner 1. Model: 2. Loss function: 3. Optimization procedure:    | x g         t t t g r L E   | , | x X 17   X | min arg *    E 

I2ml3e chap2

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Ähnlich wie I2ml3e chap2

Ähnlich wie I2ml3e chap2 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

I2ml3e chap2