SlideShare a Scribd company logo
1 of 41
Download to read offline
Machine Learning for Language
Technology
Lecture 2: Basic Concepts
Marina Santini
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Autumn 2014
Acknowledgement: Thanks to Prof. Joakim Nivre for course design and material
Outline
• Definition of Machine Learning
• Type of Machine Learning:
– Classification
– Regression
– Supervised Learning
– Unsupervised Learning
– Reinforcement Learning
• Supervised Learning:
– Supervised Classification
– Training set
– Hypothesis class
– Empirical error
– Margin
– Noise
– Inductive bias
– Generalization
– Model assessment
– Cross-Validation
– Classification in NLP
– Types of Classification
Lecture 2: Basic Concepts 2
What is Machine Learning
• Machine learning is programming computers to
optimize a performance criterion for some task
using example data or past experience
• Why learning?
– No known exact method – vision, speech recognition,
robotics, spam filters, etc.
– Exact method too expensive – statistical physics
– Task evolves over time – network routing
• Compare:
– No need to use machine learning for computing
payroll… we just need an algorithm
Lecture 2: Basic Concepts 3
Machine Learning – Data Mining –
Artificial Intelligence – Statistics
• Machine Learning: creation of a model that uses training data or
past experience
• Data Mining: application of learning methods to large datasets (ex.
physics, astronomy, biology, etc.)
– Text mining = machine learning applied to unstructured textual data
(ex. sentiment analyisis, social media monitoring, etc. Text Mining,
Wikipedia)
• Artificial intelligence: a model that can adapt to a changing
environment.
• Statistics: Machine learning uses the theory of statistics in building
mathematical models, because the core task is making inference from a
sample.
Lecture 2: Basic Concepts 4
The bio-cognitive analogy
• Imagine that a learning algorithm as a single neuron.
• This neuron receives input from other neurons, one
for each input feature.
• The strength of these inputs are the feature values.
• Each input has a weight and the neuron simply sums
up all the weighted inputs.
• Based on this sum, the neuron decides whether to
“fire” or not. Firing is interpreted as being a positive
example and not firing is interpreted as being a
negative example.
Lecture 2: Basic Concepts 5
Elements of Machine Learning
1. Generalization:
– Generalize from specific examples
– Based on statistical inference
2. Data:
– Training data: specific examples to learn from
– Test data: (new) specific examples to assess performance
3. Models:
– Theoretical assumptions about the task/domain
– Parameters that can be inferred from data
4. Algorithms:
– Learning algorithm: infer model (parameters) from data
– Inference algorithm: infer predictions from model
Lecture 2: Basic Concepts 6
Types of Machine Learning
• Association
• Supervised Learning
– Classification
– Regression
• Unsupervised Learning
• Reinforcement Learning
Lecture 2: Basic Concepts 7
Learning Associations
• Basket analysis:
P (Y | X ) probability that somebody who buys
X also buys Y where X and Y are
products/services
Example: P ( chips | beer ) = 0.7
Lecture 2: Basic Concepts 8
Classification
Lecture 2: Basic Concepts 9
• Example: Credit
scoring
• Differentiating
between low-risk and
high-risk customers
from their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Classification in NLP
• Binary classification:
– Spam filtering (spam vs. non-spam)
– Spelling error detection (error vs. non error)
• Multiclass classification:
– Text categorization (news, economy, culture, sport, ...)
– Named entity classification (person, location,
organization, ...)
• Structured prediction:
– Part-of-speech tagging (classes = tag sequences)
– Syntactic parsing (classes = parse trees)
Lecture 2: Basic Concepts 10
Regression
• Example:
Price of used car
• x : car attributes
y : price
y = g (x | q )
g ( ) model,
q parameters
Lecture 2: Basic Concepts
11
y = wx+w0
Uses of Supervised Learning
• Prediction of future cases:
– Use the rule to predict the output for future inputs
• Knowledge extraction:
– The rule is easy to understand
• Compression:
– The rule is simpler than the data it explains
• Outlier detection:
– Exceptions that are not covered by the rule, e.g., fraud
Lecture 2: Basic Concepts 12
Unsupervised Learning
• Finding regularities in data
• No mapping to outputs
• Clustering:
– Grouping similar instances
• Example applications:
– Customer segmentation in CRM
– Image compression: Color quantization
– NLP: Unsupervised text categorization
Lecture 2: Basic Concepts 13
Reinforcement Learning
• Learning a policy = sequence of
outputs/actions
• No supervised output but delayed reward
• Example applications:
– Game playing
– Robot in a maze
– NLP: Dialogue systems
Lecture 2: Basic Concepts 14
Supervised Classification
• Learning the class C of a “family car” from
examples
– Prediction: Is car x a family car?
– Knowledge extraction: What do people expect from a
family car?
• Output (labels):
Positive (+) and negative (–) examples
• Input representation (features):
x1: price, x2 : engine power
Lecture 2: Basic Concepts 15
Training set X

X  {xt
,rt
}t1
N

r 
1 if x is positive
0 if x is negative



Lecture 2: Basic Concepts
16

x 
x1
x2






Hypothesis class H

p1  price  p2 AND e1  engine power  e2 
Lecture 2: Basic Concepts 17
Empirical (training) error

h(x) 
1 if h says x is positive
0 if h says x is negative




E(h | X)  1 h xt
  rt
 t1
N

Lecture 2: Basic Concepts 18
Empirical error of h on X:
S, G, and the Version Space
Lecture 2: Basic Concepts 19
most specific hypothesis, S
most general hypothesis, G
h  H, between S and G is
consistent [E( h | X) = 0] and
make up the version space
Margin
• Choose h with largest margin
Lecture 2: Basic Concepts 20
Noise
Unwanted anomaly in data
• Imprecision in input attributes
• Errors in labeling data points
• Hidden attributes (relative to H)
Consequence:
• No h in H may be consistent!
Lecture 2: Basic Concepts 21
Noise and Model Complexity
Arguments for simpler model (Occam’s razor principle):
1. Easier to make predictions
2. Easier to train (fewer parameters)
3. Easier to understand
4. Generalizes better (if data is noisy)
Lecture 2: Basic Concepts 22
Inductive Bias
• Learning is an ill-posed problem
– Training data is never sufficient to find a unique
solution
– There are always infinitely many consistent
hypotheses
• We need an inductive bias:
– Assumptions that entail a unique h for a training set X
1. Hypothesis class H – axis-aligned rectangles
2. Learning algorithm – find consistent hypothesis with max-
margin
3. Hyperparameters – trade-off between training error and
margin
Lecture 2: Basic Concepts 23
Model Selection and Generalization
• Generalization – how well a model performs
on new data
– Overfitting: H more complex than C
– Underfitting: H less complex than C
Lecture 2: Basic Concepts 24
Triple Trade-Off
• Trade-off between three factors:
1. Complexity of H, c(H)
2. Training set size N
3. Generalization error E on new data
• Dependencies:
– As N, E
– As c(H), first E and then E
Lecture 2: Basic Concepts 25
Model Selection  Generalization Error
• To estimate generalization error, we need data unseen
during training:
• Given models (hypotheses) h1, ..., hk induced from the
training set X, we can use E(hi | V ) to select the
model hi with the smallest generalization error
Lecture 2: Basic Concepts 26

ˆE  E(h | V)  1 h xt
  rt
 t1
M


V  {xt
,rt
}t1
M
 X
Model Assessment
• To estimate the generalization error of the best
model hi, we need data unseen during training
and model selection
• Standard setup:
1. Training set X (50–80%)
2. Validation (development) set V (10–25%)
3. Test (publication) set T (10–25%)
• Note:
– Validation data can be added to training set before
testing
– Resampling methods can be used if data is limited
Lecture 2: Basic Concepts 27
Cross-Validation
121
31
2
2
2
32
1
1
1



K
K
K
K
K
K
XXXTXV
XXXTXV
XXXTXV




Lecture 2: Basic Concepts 28
• K-fold cross-validation: Divide X into X1, ..., XK
• Note:
– Generalization error estimated by means across K folds
– Training sets for different folds share K–2 parts
– Separate test set must be maintained for model
assessment
Bootstrapping
3680
1
1 1
.





 
e
N
N
Lecture 2: Basic Concepts 29
• Generate new training sets of size N from X by random
sampling with replacement
• Use original training set as validation set (V = X )
• Probability that we do not pick an instance after N
draws
that is, only 36.8% of instances are new!
Measuring Error
• Error rate = # of errors / # of instances = (FP+FN) / N
• Accuracy = # of correct / # of instances = (TP+TN) / N
• Recall = # of found positives / # of positives = TP / (TP+FN)
• Precision = # of found positives / # of found = TP / (TP+FP)
Lecture 2: Basic Concepts 30
Statistical Inference
• Interval estimation to quantify the precision of
our measurements
• Hypothesis testing to assess whether
differences between models are statistically
significant
Lecture 2: Basic Concepts 31

m 1.96

N
e01  e10 1 
2
e01  e10
~ X1
2
Supervised Learning – Summary
• Training data + learner  hypothesis
– Learner incorporates inductive bias
• Test data + hypothesis  estimated generalization
– Test data must be unseen
Lecture 2: Basic Concepts 32
Anatomy of a Supervised Learner
(Dimensions of a supervised machine learning algorithm)
• Model:
• Loss function:
• Optimization procedure:

g x |q 

E q | X  L rt
,g xt
|q  t

Lecture 2: Basic Concepts
33

q*  arg min
q
E q | X 
Supervised Classification: Extension
34
• Divide instances into (two or more) classes
– Instance (feature vector):
• Features may be categorical or numerical
– Class (label):
– Training data:
• Classification in Language Technology
– Spam filtering (spam vs. non-spam)
– Spelling error detection (error vs. no error)
– Text categorization (news, economy, culture, sport, ...)
– Named entity classification (person, location, organization, ...)

X  {xt
,yt
}t1
N

x  x1, , xm

y
Lec 2: Decision Trees - Nearest Neighbors
NLP: Classification (i)
NLP: Classification (ii)
NLP: Classification (iii)
Types of Classification (i)
Types of Classification (ii)
Reading
• Alpaydin (2010): chs 1-2; 19
• Daume’ III (2012): ch 4: only 4.5-4.6
Lecture 2: Basic Concepts 40
End of Lecture 2
Lecture 2: Basic Concepts 41

More Related Content

What's hot

Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceExperfy
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchNilu Desai
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchTekendra Nath Yogi
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationMohammed Bennamoun
 
daa-unit-3-greedy method
daa-unit-3-greedy methoddaa-unit-3-greedy method
daa-unit-3-greedy methodhodcsencet
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 

What's hot (20)

Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Uncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial IntelligenceUncertain Knowledge and Reasoning in Artificial Intelligence
Uncertain Knowledge and Reasoning in Artificial Intelligence
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Unit3:Informed and Uninformed search
Unit3:Informed and Uninformed searchUnit3:Informed and Uninformed search
Unit3:Informed and Uninformed search
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
 
NP completeness
NP completenessNP completeness
NP completeness
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)
 
Divide and conquer
Divide and conquerDivide and conquer
Divide and conquer
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
Machine learning
Machine learningMachine learning
Machine learning
 
daa-unit-3-greedy method
daa-unit-3-greedy methoddaa-unit-3-greedy method
daa-unit-3-greedy method
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 

Viewers also liked

EASA Part 66 Module 5.10 : Fibre Optic
EASA Part 66 Module 5.10 : Fibre OpticEASA Part 66 Module 5.10 : Fibre Optic
EASA Part 66 Module 5.10 : Fibre Opticsoulstalker
 
DETERMINATION OF THERMAL CONDUCTIVITY OF OIL
DETERMINATION OF THERMAL CONDUCTIVITY OF OILDETERMINATION OF THERMAL CONDUCTIVITY OF OIL
DETERMINATION OF THERMAL CONDUCTIVITY OF OILSamuel Alexander
 
Review journal Acoustic –essential requirement for public building”
Review journal Acoustic –essential requirement for public building”Review journal Acoustic –essential requirement for public building”
Review journal Acoustic –essential requirement for public building”Sayed Umam
 
Ruby & Nd YAG laser By Sukdeep Singh
Ruby & Nd YAG laser By Sukdeep SinghRuby & Nd YAG laser By Sukdeep Singh
Ruby & Nd YAG laser By Sukdeep SinghSukhdeep Bisht
 
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber Abhi Hirpara
 
determination of thermal conductivity
determination of thermal conductivitydetermination of thermal conductivity
determination of thermal conductivityAnjali Sudhakar
 
Geiger muller counter
Geiger muller counterGeiger muller counter
Geiger muller counterAvi Dhawal
 
Acoustic Problems in ADM Building
Acoustic Problems in ADM BuildingAcoustic Problems in ADM Building
Acoustic Problems in ADM Buildingyanying
 
Mechanical Design Presentation
Mechanical Design PresentationMechanical Design Presentation
Mechanical Design Presentationraytec
 
Transmission Of Heat
Transmission Of HeatTransmission Of Heat
Transmission Of Heatscotfuture
 
Basic Engineering Design (Part 2): Researching the Need
Basic Engineering Design (Part 2): Researching the NeedBasic Engineering Design (Part 2): Researching the Need
Basic Engineering Design (Part 2): Researching the NeedDenise Wilson
 
Thermal Conductivity
Thermal ConductivityThermal Conductivity
Thermal Conductivitytalatameen42
 
Concepts in engineering design
Concepts in engineering designConcepts in engineering design
Concepts in engineering designMITS Gwalior
 
physics b.tech. 1st sem fibre optics,u 4
physics b.tech. 1st sem fibre optics,u 4physics b.tech. 1st sem fibre optics,u 4
physics b.tech. 1st sem fibre optics,u 4Kumar
 
The Design Cycle @DISK Introduction
The Design Cycle @DISK IntroductionThe Design Cycle @DISK Introduction
The Design Cycle @DISK IntroductionSean Thompson 
 

Viewers also liked (20)

EASA Part 66 Module 5.10 : Fibre Optic
EASA Part 66 Module 5.10 : Fibre OpticEASA Part 66 Module 5.10 : Fibre Optic
EASA Part 66 Module 5.10 : Fibre Optic
 
ruby laser
ruby laser ruby laser
ruby laser
 
Solid state detector mamita
Solid state detector mamitaSolid state detector mamita
Solid state detector mamita
 
DETERMINATION OF THERMAL CONDUCTIVITY OF OIL
DETERMINATION OF THERMAL CONDUCTIVITY OF OILDETERMINATION OF THERMAL CONDUCTIVITY OF OIL
DETERMINATION OF THERMAL CONDUCTIVITY OF OIL
 
Review journal Acoustic –essential requirement for public building”
Review journal Acoustic –essential requirement for public building”Review journal Acoustic –essential requirement for public building”
Review journal Acoustic –essential requirement for public building”
 
Ruby & Nd YAG laser By Sukdeep Singh
Ruby & Nd YAG laser By Sukdeep SinghRuby & Nd YAG laser By Sukdeep Singh
Ruby & Nd YAG laser By Sukdeep Singh
 
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber
 
determination of thermal conductivity
determination of thermal conductivitydetermination of thermal conductivity
determination of thermal conductivity
 
Geiger muller counter
Geiger muller counterGeiger muller counter
Geiger muller counter
 
Acoustic Problems in ADM Building
Acoustic Problems in ADM BuildingAcoustic Problems in ADM Building
Acoustic Problems in ADM Building
 
Laser physics
Laser   physicsLaser   physics
Laser physics
 
Mechanical Design Presentation
Mechanical Design PresentationMechanical Design Presentation
Mechanical Design Presentation
 
Transmission Of Heat
Transmission Of HeatTransmission Of Heat
Transmission Of Heat
 
Transfer of heat
Transfer of heatTransfer of heat
Transfer of heat
 
Basic Engineering Design (Part 2): Researching the Need
Basic Engineering Design (Part 2): Researching the NeedBasic Engineering Design (Part 2): Researching the Need
Basic Engineering Design (Part 2): Researching the Need
 
Thermal Conductivity
Thermal ConductivityThermal Conductivity
Thermal Conductivity
 
Concepts in engineering design
Concepts in engineering designConcepts in engineering design
Concepts in engineering design
 
physics b.tech. 1st sem fibre optics,u 4
physics b.tech. 1st sem fibre optics,u 4physics b.tech. 1st sem fibre optics,u 4
physics b.tech. 1st sem fibre optics,u 4
 
GEIGER MULLER COUNTER
GEIGER MULLER COUNTER GEIGER MULLER COUNTER
GEIGER MULLER COUNTER
 
The Design Cycle @DISK Introduction
The Design Cycle @DISK IntroductionThe Design Cycle @DISK Introduction
The Design Cycle @DISK Introduction
 

Similar to Lecture 2 Basic Concepts in Machine Learning for Language Technology

Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning ParrotAI
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science Frank Kienle
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfSisayNegash4
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfhemangppatel
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 

Similar to Lecture 2 Basic Concepts in Machine Learning for Language Technology (20)

Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning Fundementals of Machine Learning and Deep Learning
Fundementals of Machine Learning and Deep Learning
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
MLE.pdf
MLE.pdfMLE.pdf
MLE.pdf
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
 
Launching into machine learning
Launching into machine learningLaunching into machine learning
Launching into machine learning
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
module 6 (1).ppt
module 6 (1).pptmodule 6 (1).ppt
module 6 (1).ppt
 
Unit-1.ppt
Unit-1.pptUnit-1.ppt
Unit-1.ppt
 

More from Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesMarina Santini
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part) Marina Santini
 

More from Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 

Recently uploaded

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Recently uploaded (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Lecture 2 Basic Concepts in Machine Learning for Language Technology

  • 1. Machine Learning for Language Technology Lecture 2: Basic Concepts Marina Santini Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks to Prof. Joakim Nivre for course design and material
  • 2. Outline • Definition of Machine Learning • Type of Machine Learning: – Classification – Regression – Supervised Learning – Unsupervised Learning – Reinforcement Learning • Supervised Learning: – Supervised Classification – Training set – Hypothesis class – Empirical error – Margin – Noise – Inductive bias – Generalization – Model assessment – Cross-Validation – Classification in NLP – Types of Classification Lecture 2: Basic Concepts 2
  • 3. What is Machine Learning • Machine learning is programming computers to optimize a performance criterion for some task using example data or past experience • Why learning? – No known exact method – vision, speech recognition, robotics, spam filters, etc. – Exact method too expensive – statistical physics – Task evolves over time – network routing • Compare: – No need to use machine learning for computing payroll… we just need an algorithm Lecture 2: Basic Concepts 3
  • 4. Machine Learning – Data Mining – Artificial Intelligence – Statistics • Machine Learning: creation of a model that uses training data or past experience • Data Mining: application of learning methods to large datasets (ex. physics, astronomy, biology, etc.) – Text mining = machine learning applied to unstructured textual data (ex. sentiment analyisis, social media monitoring, etc. Text Mining, Wikipedia) • Artificial intelligence: a model that can adapt to a changing environment. • Statistics: Machine learning uses the theory of statistics in building mathematical models, because the core task is making inference from a sample. Lecture 2: Basic Concepts 4
  • 5. The bio-cognitive analogy • Imagine that a learning algorithm as a single neuron. • This neuron receives input from other neurons, one for each input feature. • The strength of these inputs are the feature values. • Each input has a weight and the neuron simply sums up all the weighted inputs. • Based on this sum, the neuron decides whether to “fire” or not. Firing is interpreted as being a positive example and not firing is interpreted as being a negative example. Lecture 2: Basic Concepts 5
  • 6. Elements of Machine Learning 1. Generalization: – Generalize from specific examples – Based on statistical inference 2. Data: – Training data: specific examples to learn from – Test data: (new) specific examples to assess performance 3. Models: – Theoretical assumptions about the task/domain – Parameters that can be inferred from data 4. Algorithms: – Learning algorithm: infer model (parameters) from data – Inference algorithm: infer predictions from model Lecture 2: Basic Concepts 6
  • 7. Types of Machine Learning • Association • Supervised Learning – Classification – Regression • Unsupervised Learning • Reinforcement Learning Lecture 2: Basic Concepts 7
  • 8. Learning Associations • Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services Example: P ( chips | beer ) = 0.7 Lecture 2: Basic Concepts 8
  • 9. Classification Lecture 2: Basic Concepts 9 • Example: Credit scoring • Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
  • 10. Classification in NLP • Binary classification: – Spam filtering (spam vs. non-spam) – Spelling error detection (error vs. non error) • Multiclass classification: – Text categorization (news, economy, culture, sport, ...) – Named entity classification (person, location, organization, ...) • Structured prediction: – Part-of-speech tagging (classes = tag sequences) – Syntactic parsing (classes = parse trees) Lecture 2: Basic Concepts 10
  • 11. Regression • Example: Price of used car • x : car attributes y : price y = g (x | q ) g ( ) model, q parameters Lecture 2: Basic Concepts 11 y = wx+w0
  • 12. Uses of Supervised Learning • Prediction of future cases: – Use the rule to predict the output for future inputs • Knowledge extraction: – The rule is easy to understand • Compression: – The rule is simpler than the data it explains • Outlier detection: – Exceptions that are not covered by the rule, e.g., fraud Lecture 2: Basic Concepts 12
  • 13. Unsupervised Learning • Finding regularities in data • No mapping to outputs • Clustering: – Grouping similar instances • Example applications: – Customer segmentation in CRM – Image compression: Color quantization – NLP: Unsupervised text categorization Lecture 2: Basic Concepts 13
  • 14. Reinforcement Learning • Learning a policy = sequence of outputs/actions • No supervised output but delayed reward • Example applications: – Game playing – Robot in a maze – NLP: Dialogue systems Lecture 2: Basic Concepts 14
  • 15. Supervised Classification • Learning the class C of a “family car” from examples – Prediction: Is car x a family car? – Knowledge extraction: What do people expect from a family car? • Output (labels): Positive (+) and negative (–) examples • Input representation (features): x1: price, x2 : engine power Lecture 2: Basic Concepts 15
  • 16. Training set X  X  {xt ,rt }t1 N  r  1 if x is positive 0 if x is negative    Lecture 2: Basic Concepts 16  x  x1 x2      
  • 17. Hypothesis class H  p1  price  p2 AND e1  engine power  e2  Lecture 2: Basic Concepts 17
  • 18. Empirical (training) error  h(x)  1 if h says x is positive 0 if h says x is negative     E(h | X)  1 h xt   rt  t1 N  Lecture 2: Basic Concepts 18 Empirical error of h on X:
  • 19. S, G, and the Version Space Lecture 2: Basic Concepts 19 most specific hypothesis, S most general hypothesis, G h  H, between S and G is consistent [E( h | X) = 0] and make up the version space
  • 20. Margin • Choose h with largest margin Lecture 2: Basic Concepts 20
  • 21. Noise Unwanted anomaly in data • Imprecision in input attributes • Errors in labeling data points • Hidden attributes (relative to H) Consequence: • No h in H may be consistent! Lecture 2: Basic Concepts 21
  • 22. Noise and Model Complexity Arguments for simpler model (Occam’s razor principle): 1. Easier to make predictions 2. Easier to train (fewer parameters) 3. Easier to understand 4. Generalizes better (if data is noisy) Lecture 2: Basic Concepts 22
  • 23. Inductive Bias • Learning is an ill-posed problem – Training data is never sufficient to find a unique solution – There are always infinitely many consistent hypotheses • We need an inductive bias: – Assumptions that entail a unique h for a training set X 1. Hypothesis class H – axis-aligned rectangles 2. Learning algorithm – find consistent hypothesis with max- margin 3. Hyperparameters – trade-off between training error and margin Lecture 2: Basic Concepts 23
  • 24. Model Selection and Generalization • Generalization – how well a model performs on new data – Overfitting: H more complex than C – Underfitting: H less complex than C Lecture 2: Basic Concepts 24
  • 25. Triple Trade-Off • Trade-off between three factors: 1. Complexity of H, c(H) 2. Training set size N 3. Generalization error E on new data • Dependencies: – As N, E – As c(H), first E and then E Lecture 2: Basic Concepts 25
  • 26. Model Selection  Generalization Error • To estimate generalization error, we need data unseen during training: • Given models (hypotheses) h1, ..., hk induced from the training set X, we can use E(hi | V ) to select the model hi with the smallest generalization error Lecture 2: Basic Concepts 26  ˆE  E(h | V)  1 h xt   rt  t1 M   V  {xt ,rt }t1 M  X
  • 27. Model Assessment • To estimate the generalization error of the best model hi, we need data unseen during training and model selection • Standard setup: 1. Training set X (50–80%) 2. Validation (development) set V (10–25%) 3. Test (publication) set T (10–25%) • Note: – Validation data can be added to training set before testing – Resampling methods can be used if data is limited Lecture 2: Basic Concepts 27
  • 28. Cross-Validation 121 31 2 2 2 32 1 1 1    K K K K K K XXXTXV XXXTXV XXXTXV     Lecture 2: Basic Concepts 28 • K-fold cross-validation: Divide X into X1, ..., XK • Note: – Generalization error estimated by means across K folds – Training sets for different folds share K–2 parts – Separate test set must be maintained for model assessment
  • 29. Bootstrapping 3680 1 1 1 .        e N N Lecture 2: Basic Concepts 29 • Generate new training sets of size N from X by random sampling with replacement • Use original training set as validation set (V = X ) • Probability that we do not pick an instance after N draws that is, only 36.8% of instances are new!
  • 30. Measuring Error • Error rate = # of errors / # of instances = (FP+FN) / N • Accuracy = # of correct / # of instances = (TP+TN) / N • Recall = # of found positives / # of positives = TP / (TP+FN) • Precision = # of found positives / # of found = TP / (TP+FP) Lecture 2: Basic Concepts 30
  • 31. Statistical Inference • Interval estimation to quantify the precision of our measurements • Hypothesis testing to assess whether differences between models are statistically significant Lecture 2: Basic Concepts 31  m 1.96  N e01  e10 1  2 e01  e10 ~ X1 2
  • 32. Supervised Learning – Summary • Training data + learner  hypothesis – Learner incorporates inductive bias • Test data + hypothesis  estimated generalization – Test data must be unseen Lecture 2: Basic Concepts 32
  • 33. Anatomy of a Supervised Learner (Dimensions of a supervised machine learning algorithm) • Model: • Loss function: • Optimization procedure:  g x |q   E q | X  L rt ,g xt |q  t  Lecture 2: Basic Concepts 33  q*  arg min q E q | X 
  • 34. Supervised Classification: Extension 34 • Divide instances into (two or more) classes – Instance (feature vector): • Features may be categorical or numerical – Class (label): – Training data: • Classification in Language Technology – Spam filtering (spam vs. non-spam) – Spelling error detection (error vs. no error) – Text categorization (news, economy, culture, sport, ...) – Named entity classification (person, location, organization, ...)  X  {xt ,yt }t1 N  x  x1, , xm  y Lec 2: Decision Trees - Nearest Neighbors
  • 40. Reading • Alpaydin (2010): chs 1-2; 19 • Daume’ III (2012): ch 4: only 4.5-4.6 Lecture 2: Basic Concepts 40
  • 41. End of Lecture 2 Lecture 2: Basic Concepts 41