SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Semi Supervised Learning
• Qiang Yang
– Adapted from…
• Thanks
– Zhi-Hua Zhou
– http://cs.nju.edu.cn/pe
ople/zhouzh/
– zhouzh@nju.edu.cn
– LAMDA Group,
– National Laboratory for
Novel Software
Technology, Nanjing
University, China
Supervised learning is a typical machine learning setting,
where labeled examples are used as training examples
decision trees, neural networks,
support vector machines, etc.
trained
model
training
data
Name Rank Years Tenured
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
label
training
? =
yes
unseen data
(Jeff, Professor, 7, ?)
label
unknown
Supervised learningSupervised learning
Labeled vs. UnlabeledLabeled vs. Unlabeled
In many practical applications, unlabeled training
examples are readily available but labeled ones are fairly
expansive to obtain because labeling the unlabeled examples
requires human effort
class = “war”
(almost) infinite
number of web pages on
the Internet
?
Three main paradigms for Semi-supervisedThree main paradigms for Semi-supervised
Learning:Learning:
• Transductive learning:
Unlabeled examples are exactly the test examples
• Active learning:
•Assume that a user can continue to label data
•The learner actively selects some unlabeled
examples to query from an oracle (assume the learner
has some control over the input space)
• Multi-view Learning
•Unlabeled examples may be different from the
test examples
•Regularization (minimize error and maximize
smoothness)
•Multi-view Learning and Co-training
SSL: Why unlabeled data can be helpful?SSL: Why unlabeled data can be helpful?
Suppose the data is well-modeled by a mixture density:
Thus, the optimal classification rule for this model is the MAP rule:
where and θ = {θl }( ) ( )
1
L
l l
l
f x f xθ α θ
=
= ∑ 1
1
L
ll
α=
=∑
The class labels are viewed as random quantities and are assumed chosen
conditioned on the selected mixture component mi ∈ {1,2,…,L} and
possibly on the feature value, i.e. according to the probabilities P[ci|xi,mi]
( ) arg max P , Pi i i i ijk
S x c k m j x m j x=  = =   =    ∑
where
( )
( )
1
P
j i j
i i L
l i l
l
f x
m j x
f x
α θ
α θ
=
 =  = 
∑
unlabeled examples can be used
to help estimate this term
[D.J. Miller & H.S. Uyar, NIPS’96]
Transductive SVMTransductive SVM
Transductive SVM: Taking into account a particular test
set and trying to minimize misclassifications of just those
particular examples
Figure reprinted from [T. Joachims, ICML99]
Concretely, using
unlabeled examples
to help identify the
maximum margin
hyperplanes
Active learning: Getting more from queryActive learning: Getting more from query
The labels of the training examples are obtained by
querying the oracle. Thus, for the same number of queries,
more helpful information can be obtained by actively
selecting some unlabeled examples to query
Key: To select the unlabeled examples on which the
labeling will convey the most helpful information
for the learner
 Uncertainty sampling
Train a single learner and then query the unlabeled
instances on which the learner is the least confident
[Lewis & Gale, SIGIR’94]
 Committee-based sampling
Generate a committee of multiple learners and select the
unlabeled examples on which the committee members
disagree the most [Abe & Mamitsuka, ICML’98; Seung et al.,
COLT’92]
Active Learning: Representative approachesActive Learning: Representative approaches
To retrieve images from a (usually large) image database
according to user interest
very useful in digital library, digital photo album, etc.
Active Learning Application: ImageActive Learning Application: Image
retrievalretrieval
Where are my photos
taken at Guilin?
DatabaseText
Interface
Text
Interface
Text-based Retrieval Engine
− Every image is associated with a text annotation
− User poses a keyword
− The system retrieves images by matching the keyword
with annotations
Active Learning: Text-based imageActive Learning: Text-based image
retrievalretrieval
“tiger”
query
tiger lily
white tiger
In some applications, there are two sufficient and redundant views,
i.e. two attribute sets each of which is sufficient for learning and
conditionally independent to the other given the class label
e.g. two views for web page classification: 1) the text appearing on the
page itself, and 2) the anchor text attached to hyperlinks pointing to this
page, from other pages
Co-trainingCo-training
learner1 learner2
X1 view X2 view
labeled training examples
unlabeled training examples
labeled
unlabeled examples
labeled
unlabeled examples
[A. Blum & T. Mitchell, COLT98]
Co-training (con’t)Co-training (con’t)
Co-training (con’t)Co-training (con’t)
 Theoretical analysis [Blum & Mitchell, COLT’98; Dasgupta,
NIPS’01; Balcan et al., NIPS’04; etc.]
 Experimental studies [Nigam & Ghani, CIKM’00]
 New algorithms
• Co-training without two views [Goldman & Zhou, ICML’00;
Zhou & Li, TKDE’05]
• Semi-supervised regression [Zhou & Li, IJCAI’05]
 Applications
• Statistical parsing [Sarkar, NAACL01; Steedman et al.,
EACL03; R. Hwa et al., ICML03w]
• Noun phrase identification [Pierce & Cardie, EMNLP01]
• Image retrieval [Zhou et al., ECML’04; Zhou et al., TOIS06]
Multi-view Learning and Co-
training
• Multi-view learning describes the setting of
learning from data where observations are
represented by multiple independent sets of
features.
An example of two views:
• Features can be split into two sets:
– The instance space:
– Each instance:
21 XXX ×=
),( 21 xxx =
Inductive vs.Transductive
• Transductive: Produce label only for the available
unlabeled data.
– The output of the method is not a classifier.
• Inductive: Not only produce label for unlabeled
data, but also produce a classifier.
An Example of two views
• Web-page classification: e.g.,
find homepages of faculty members.
– Page text: words occurring on that page:
e.g., “research interest”, “teaching”
– Hyperlink text: words occurring in hyperlinks
that point to that page:
e.g., “my advisor”
Another Example
X1 : job title
X2: job description
Classifying Jobs for FlipDog
Two Views
• : the set of target function over .
• : the set of target functions over .
• : the set of target function over .
• Instead of learning from , multi-view
learning aims to learn a pair of functions
from , such that .
1X
21 XXX ×=
2X2C
1C
C
f C
),( 21 ff
21 CC × )()()( 2211 xfxfxf ==
Co-training
• Proposed by (Blum and Mitchell 1998)
Combine Multi-view learning & semi-supervised learning.
• Related work:
– (Yarowsky 1995)
– (Nigam and Ghani, 2000)
– (Goldman and Zhou, 2000)
– (Abney, 2002)
– (Sarkar, 2002)
– …
• Used in document classification, parsing, etc.
The Yarowsky Algorithm
Iteration: 0
+
-
A
Classifier
trained
by SL
Choose instances
labeled with high
confidence
Iteration: 1
+
-
Add them to the
pool of current
labeled training
data
……
(Yarowsky 1995)
Iteration: 2
+
-
Co-training
Assumption 1: compatibility
• The instance distribution is compatible with
the target function if for any
with non-zero probability, .
• Definition: compatibility of with :
),( 21 xxx =
D
),( 21 fff =
)()()( 2211 xfxfxf ==
 Each set of features is sufficient for classification
0)]()(:),[(Pr1 221121 >≠−= xfxfxxp D
f D
Co-training
Assumption 2: conditional
independence
• Definition: A pair of views satisfy view
independence when:
• A classification problem instance satisfies view
independence when all pairs satisfy view
independence.
),( 21 xx
)|(),|(
)|(),|(
221122
112211
yYxXPyYxXxXP
yYxXPyYxXxXP
======
======
),( 21 xx
Co-training Algorithm
Co-Training
• Instances contain two sufficient sets of features
– i.e. an instance is x=(x1,x2)
– Each set of features is called a View
• Two views are independent given the label:
• Two views are consistent:
x
x1
x2
(Blum and Mitchell 1998)
Co-Training
Iteration: t
+
-
Iteration: t+1
+
-
……
C1: A
Classifier
trained
on view 1
C2: A
Classifier
trained
on view 2
Allow C1 to label
Some instances
Allow C2 to label
Some instances
Add self-labeled
instances to the pool
of training data
Agreement Maximization
• A side effect of the Co-Training: Agreement between
two views.
• Is it possible to pose agreement as the explicit goal?
– Yes. The resulting algorithm: Agreement Boost
(Leskes 2005)
What if Co-training Assumption
Not Perfectly Satisfied?
• Idea: Want classifiers that produce a maximally
consistent labeling of the data
• If learning is an optimization problem, what
function should we optimize?
-
+
+
+
Other Related Works
• Multi-view clustering (Bickel & Scheffer 2004)
Modified the co-training algorithm by replacing the class
variable (class label) with a mixture coefficient to obtain
a multi-view clustering algorithm.
• Manifold co-regularization (Sindhwani et al., 2005)
Extended Manifold regularization to multi-view learning.
• Active multi-view learning (Muslea 2002)
Combine active learning and multi-view learning.
• More related works can be find in the workshop on Multi-
view learning in ICML 2005:
http://www-ai.cs.uni-dortmund.de/MULTIVIEW2005/index.html
Reference
• A. Blum and T. Mitchell, 1998. “Combining Labeled and Unlabeled Data with
Co-Training,” In Proceedings of COLT 1998.
• D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised
methods. In Proceedings of ACL 1995.
• Nigam, K., & Ghani, R, 2000. Analyzing the effectiveness and applicability of
co-training. In Proceedings of CIKM 2000.
• Steven Abney, 2002. Bootstrapping. In Proceedings of ACL, 2002.
• Ulf Brefeld and Tobias Scheer. Co-EM support vector learning. In
Proceedings ICML, 2004.
• Steen Bickel and Tobias Scheer. Multi-view clustering. In Proceedings of
ICDM, 2004.
• Sindhwani, V.; Niyogi, P.; and Belkin, M. 2005. A Co-Regularization
Approach to Semi-supervised Learning with Multiple Views. In Workshop on
Learning with Multiple Views at ICML 2005.
• Ion Muslea. Active learning with multiple views. PhD thesis, University of
Southern California, 2002.

Weitere ähnliche Inhalte

Was ist angesagt?

Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentationAyanaRukasar
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison TreesSelman Bozkır
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learningUmmeSalmaM1
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learninggrinu
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShahar Cohen
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning TechniquesTara ram Goyal
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningKmPooja4
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lectureazuring
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 

Was ist angesagt? (20)

Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison Trees
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lecture
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 

Andere mochten auch

Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learningAhmed Taha
 
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...Mohamed Farouk
 
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...zukun
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACHNexgen Technology
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPDavid Przybilla
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkInSemble
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition pptSantosh Kumar
 
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State Machines
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State MachinesRecognizing Patterns in Noisy Data using Trainable ‘Functional’ State Machines
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State MachinesFaisal Waris
 
Recent Advances in Crop Classification
Recent Advances in Crop ClassificationRecent Advances in Crop Classification
Recent Advances in Crop ClassificationCIMMYT
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesA Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesMohamed Farouk
 
PPT file
PPT filePPT file
PPT filebutest
 
Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data MiningImproving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data MiningSplunk
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondEunjeong (Lucy) Park
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques Jigar Patel
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningGianluca Bontempi
 
07 history of cv vision paradigms - system - algorithms - applications - eva...
07  history of cv vision paradigms - system - algorithms - applications - eva...07  history of cv vision paradigms - system - algorithms - applications - eva...
07 history of cv vision paradigms - system - algorithms - applications - eva...zukun
 
Power of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you knowPower of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you knowcdathuraliya
 

Andere mochten auch (20)

Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learning
 
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
 
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
CVPR2010: Semi-supervised Learning in Vision: Part 3: Algorithms and Applicat...
 
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACHMINING HEALTH EXAMINATION RECORDS  A GRAPH-BASED APPROACH
MINING HEALTH EXAMINATION RECORDS A GRAPH-BASED APPROACH
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache Spark
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
 
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State Machines
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State MachinesRecognizing Patterns in Noisy Data using Trainable ‘Functional’ State Machines
Recognizing Patterns in Noisy Data using Trainable ‘Functional’ State Machines
 
Recent Advances in Crop Classification
Recent Advances in Crop ClassificationRecent Advances in Crop Classification
Recent Advances in Crop Classification
 
vts_7560_10802
vts_7560_10802vts_7560_10802
vts_7560_10802
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesA Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
 
PPT file
PPT filePPT file
PPT file
 
Improving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data MiningImproving Healthcare Operations Using Process Data Mining
Improving Healthcare Operations Using Process Data Mining
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and Beyond
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
 
07 history of cv vision paradigms - system - algorithms - applications - eva...
07  history of cv vision paradigms - system - algorithms - applications - eva...07  history of cv vision paradigms - system - algorithms - applications - eva...
07 history of cv vision paradigms - system - algorithms - applications - eva...
 
Power of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you knowPower of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you know
 

Ähnlich wie Semi-supervised Learning

Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learningbutest
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learningbutest
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
slides
slidesslides
slidesbutest
 
slides
slidesslides
slidesbutest
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.butest
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.pptbutest
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.pptbutest
 
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...RSIS International
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Gan Keng Hoon
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxiaeronlineexm
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional dataSantosConleyha
 
Hci techniques from idea to deployment
Hci techniques from idea to deploymentHci techniques from idea to deployment
Hci techniques from idea to deploymentJohn Thomas
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 

Ähnlich wie Semi-supervised Learning (20)

Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learning
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learning
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
slides
slidesslides
slides
 
slides
slidesslides
slides
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.ppt
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.ppt
 
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
 
Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...Category & Training Texts Selection for Scientific Article Categorization in ...
Category & Training Texts Selection for Scientific Article Categorization in ...
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data
 
Hci techniques from idea to deployment
Hci techniques from idea to deploymentHci techniques from idea to deployment
Hci techniques from idea to deployment
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Semi-supervised Learning

  • 1. Semi Supervised Learning • Qiang Yang – Adapted from… • Thanks – Zhi-Hua Zhou – http://cs.nju.edu.cn/pe ople/zhouzh/ – zhouzh@nju.edu.cn – LAMDA Group, – National Laboratory for Novel Software Technology, Nanjing University, China
  • 2. Supervised learning is a typical machine learning setting, where labeled examples are used as training examples decision trees, neural networks, support vector machines, etc. trained model training data Name Rank Years Tenured Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no label training ? = yes unseen data (Jeff, Professor, 7, ?) label unknown Supervised learningSupervised learning
  • 3. Labeled vs. UnlabeledLabeled vs. Unlabeled In many practical applications, unlabeled training examples are readily available but labeled ones are fairly expansive to obtain because labeling the unlabeled examples requires human effort class = “war” (almost) infinite number of web pages on the Internet ?
  • 4. Three main paradigms for Semi-supervisedThree main paradigms for Semi-supervised Learning:Learning: • Transductive learning: Unlabeled examples are exactly the test examples • Active learning: •Assume that a user can continue to label data •The learner actively selects some unlabeled examples to query from an oracle (assume the learner has some control over the input space) • Multi-view Learning •Unlabeled examples may be different from the test examples •Regularization (minimize error and maximize smoothness) •Multi-view Learning and Co-training
  • 5. SSL: Why unlabeled data can be helpful?SSL: Why unlabeled data can be helpful? Suppose the data is well-modeled by a mixture density: Thus, the optimal classification rule for this model is the MAP rule: where and θ = {θl }( ) ( ) 1 L l l l f x f xθ α θ = = ∑ 1 1 L ll α= =∑ The class labels are viewed as random quantities and are assumed chosen conditioned on the selected mixture component mi ∈ {1,2,…,L} and possibly on the feature value, i.e. according to the probabilities P[ci|xi,mi] ( ) arg max P , Pi i i i ijk S x c k m j x m j x=  = =   =    ∑ where ( ) ( ) 1 P j i j i i L l i l l f x m j x f x α θ α θ =  =  =  ∑ unlabeled examples can be used to help estimate this term [D.J. Miller & H.S. Uyar, NIPS’96]
  • 6. Transductive SVMTransductive SVM Transductive SVM: Taking into account a particular test set and trying to minimize misclassifications of just those particular examples Figure reprinted from [T. Joachims, ICML99] Concretely, using unlabeled examples to help identify the maximum margin hyperplanes
  • 7. Active learning: Getting more from queryActive learning: Getting more from query The labels of the training examples are obtained by querying the oracle. Thus, for the same number of queries, more helpful information can be obtained by actively selecting some unlabeled examples to query Key: To select the unlabeled examples on which the labeling will convey the most helpful information for the learner
  • 8.  Uncertainty sampling Train a single learner and then query the unlabeled instances on which the learner is the least confident [Lewis & Gale, SIGIR’94]  Committee-based sampling Generate a committee of multiple learners and select the unlabeled examples on which the committee members disagree the most [Abe & Mamitsuka, ICML’98; Seung et al., COLT’92] Active Learning: Representative approachesActive Learning: Representative approaches
  • 9. To retrieve images from a (usually large) image database according to user interest very useful in digital library, digital photo album, etc. Active Learning Application: ImageActive Learning Application: Image retrievalretrieval Where are my photos taken at Guilin?
  • 10. DatabaseText Interface Text Interface Text-based Retrieval Engine − Every image is associated with a text annotation − User poses a keyword − The system retrieves images by matching the keyword with annotations Active Learning: Text-based imageActive Learning: Text-based image retrievalretrieval “tiger” query tiger lily white tiger
  • 11. In some applications, there are two sufficient and redundant views, i.e. two attribute sets each of which is sufficient for learning and conditionally independent to the other given the class label e.g. two views for web page classification: 1) the text appearing on the page itself, and 2) the anchor text attached to hyperlinks pointing to this page, from other pages Co-trainingCo-training
  • 12. learner1 learner2 X1 view X2 view labeled training examples unlabeled training examples labeled unlabeled examples labeled unlabeled examples [A. Blum & T. Mitchell, COLT98] Co-training (con’t)Co-training (con’t)
  • 13. Co-training (con’t)Co-training (con’t)  Theoretical analysis [Blum & Mitchell, COLT’98; Dasgupta, NIPS’01; Balcan et al., NIPS’04; etc.]  Experimental studies [Nigam & Ghani, CIKM’00]  New algorithms • Co-training without two views [Goldman & Zhou, ICML’00; Zhou & Li, TKDE’05] • Semi-supervised regression [Zhou & Li, IJCAI’05]  Applications • Statistical parsing [Sarkar, NAACL01; Steedman et al., EACL03; R. Hwa et al., ICML03w] • Noun phrase identification [Pierce & Cardie, EMNLP01] • Image retrieval [Zhou et al., ECML’04; Zhou et al., TOIS06]
  • 14. Multi-view Learning and Co- training • Multi-view learning describes the setting of learning from data where observations are represented by multiple independent sets of features. An example of two views: • Features can be split into two sets: – The instance space: – Each instance: 21 XXX ×= ),( 21 xxx =
  • 15. Inductive vs.Transductive • Transductive: Produce label only for the available unlabeled data. – The output of the method is not a classifier. • Inductive: Not only produce label for unlabeled data, but also produce a classifier.
  • 16. An Example of two views • Web-page classification: e.g., find homepages of faculty members. – Page text: words occurring on that page: e.g., “research interest”, “teaching” – Hyperlink text: words occurring in hyperlinks that point to that page: e.g., “my advisor”
  • 17. Another Example X1 : job title X2: job description Classifying Jobs for FlipDog
  • 18. Two Views • : the set of target function over . • : the set of target functions over . • : the set of target function over . • Instead of learning from , multi-view learning aims to learn a pair of functions from , such that . 1X 21 XXX ×= 2X2C 1C C f C ),( 21 ff 21 CC × )()()( 2211 xfxfxf ==
  • 19. Co-training • Proposed by (Blum and Mitchell 1998) Combine Multi-view learning & semi-supervised learning. • Related work: – (Yarowsky 1995) – (Nigam and Ghani, 2000) – (Goldman and Zhou, 2000) – (Abney, 2002) – (Sarkar, 2002) – … • Used in document classification, parsing, etc.
  • 20. The Yarowsky Algorithm Iteration: 0 + - A Classifier trained by SL Choose instances labeled with high confidence Iteration: 1 + - Add them to the pool of current labeled training data …… (Yarowsky 1995) Iteration: 2 + -
  • 21. Co-training Assumption 1: compatibility • The instance distribution is compatible with the target function if for any with non-zero probability, . • Definition: compatibility of with : ),( 21 xxx = D ),( 21 fff = )()()( 2211 xfxfxf ==  Each set of features is sufficient for classification 0)]()(:),[(Pr1 221121 >≠−= xfxfxxp D f D
  • 22. Co-training Assumption 2: conditional independence • Definition: A pair of views satisfy view independence when: • A classification problem instance satisfies view independence when all pairs satisfy view independence. ),( 21 xx )|(),|( )|(),|( 221122 112211 yYxXPyYxXxXP yYxXPyYxXxXP ====== ====== ),( 21 xx
  • 24. Co-Training • Instances contain two sufficient sets of features – i.e. an instance is x=(x1,x2) – Each set of features is called a View • Two views are independent given the label: • Two views are consistent: x x1 x2 (Blum and Mitchell 1998)
  • 25. Co-Training Iteration: t + - Iteration: t+1 + - …… C1: A Classifier trained on view 1 C2: A Classifier trained on view 2 Allow C1 to label Some instances Allow C2 to label Some instances Add self-labeled instances to the pool of training data
  • 26. Agreement Maximization • A side effect of the Co-Training: Agreement between two views. • Is it possible to pose agreement as the explicit goal? – Yes. The resulting algorithm: Agreement Boost (Leskes 2005)
  • 27. What if Co-training Assumption Not Perfectly Satisfied? • Idea: Want classifiers that produce a maximally consistent labeling of the data • If learning is an optimization problem, what function should we optimize? - + + +
  • 28. Other Related Works • Multi-view clustering (Bickel & Scheffer 2004) Modified the co-training algorithm by replacing the class variable (class label) with a mixture coefficient to obtain a multi-view clustering algorithm. • Manifold co-regularization (Sindhwani et al., 2005) Extended Manifold regularization to multi-view learning. • Active multi-view learning (Muslea 2002) Combine active learning and multi-view learning. • More related works can be find in the workshop on Multi- view learning in ICML 2005: http://www-ai.cs.uni-dortmund.de/MULTIVIEW2005/index.html
  • 29. Reference • A. Blum and T. Mitchell, 1998. “Combining Labeled and Unlabeled Data with Co-Training,” In Proceedings of COLT 1998. • D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of ACL 1995. • Nigam, K., & Ghani, R, 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of CIKM 2000. • Steven Abney, 2002. Bootstrapping. In Proceedings of ACL, 2002. • Ulf Brefeld and Tobias Scheer. Co-EM support vector learning. In Proceedings ICML, 2004. • Steen Bickel and Tobias Scheer. Multi-view clustering. In Proceedings of ICDM, 2004. • Sindhwani, V.; Niyogi, P.; and Belkin, M. 2005. A Co-Regularization Approach to Semi-supervised Learning with Multiple Views. In Workshop on Learning with Multiple Views at ICML 2005. • Ion Muslea. Active learning with multiple views. PhD thesis, University of Southern California, 2002.