SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Maximizing the Utility of Small Training Sets in Machine Learning Raymond J. Mooney Department of Computer Sciences University of Texas at Austin
Computational Linguistics and Machine Learning ,[object Object],[object Object],[object Object],[object Object]
Demand for Annotated Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Treebanks ,[object Object],[object Object],[object Object],[object Object]
Learning from Small Training Sets ,[object Object],[object Object]
Methods for Improving Results on Small Training Sets ,[object Object],[object Object],[object Object],[object Object],[object Object]
Learning Ensembles ,[object Object],[object Object],Training Data Data1 Data m Data2                         Learner1 Learner2 Learner m                         Model1 Model2 Model m                         Model Combiner Final Model
Value of Ensembles ,[object Object],[object Object],[object Object],[object Object],[object Object]
Homogenous Ensembles ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
D ECORATE (Melville & Mooney, 2003) ,[object Object],[object Object]
Base Learner Overview of D ECORATE Training Examples Artificial Examples Current Ensemble - - + + + C1 + + - + -
C1 Base Learner Overview of D ECORATE Training Examples Artificial Examples Current Ensemble - - + + + - - + - + C2 + - - - +
C1 C2 Base Learner Overview of D ECORATE Training Examples Artificial Examples Current Ensemble - + + + - - - + + + C3
Experimental Methodology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning Curve for Labor Contract Prediction ,[object Object],[object Object]
Learning Curve for Cancer Diagnosis ,[object Object],[object Object],[object Object]
Active Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]
Ensembles and Active Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
D ECORATE Active-D ECORATE Training Examples Unlabeled Examples Current Ensemble C1 C2 C3 C4 Utility =  0.1 - - + + - + + + +
D ECORATE Active-D ECORATE Training Examples Unlabeled Examples Current Ensemble C1 C2 C3 C4 Utility =  0.1 0.9 - - + + - + + - - 0.3 0.2 0.5 + Acquire Label
Experimental Methodology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning Curve for Soybean Disease Diagnosis ≈   60% savings in supervision
Learning Curve for Spoken Vowel Recognition ≈   50% savings in supervision
Transfer Learning a.k.a. Adaptation, Learning to Learn, Lifelong Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using Source as a Prior ,[object Object],[object Object],[object Object],[object Object]
Corpus Mixing Target Training Examples Learner Classifier Source Training Examples - - + + + - - + + + - - + - - + - - +
Corpus Mixing Results (Roark and Bacchiani, 2003) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],85.40% 84.90% 10,000 sentences 84.35% 82.60% 4,000 sentences 83.05% 80.50% 2,000 sentences Transfer  F-Measure Baseline  F-Measure Target Domain Training Size
Transferring from One Language to Another ,[object Object],[object Object],[object Object],[object Object],[object Object]
Projecting a POS Tagger  (Yarowsky & Ngai, 2001) English :  a  significant  producer  for  crude  oil French :  un  producteur  important  de  petrole  brut  Word alignment DT  JJ  NN  IN  JJ  NN  DT  NN  JJ  IN  NN  JJ  Projected POS Tags English POS Tagger French POS Tagger POS Tag Learner
POS Tagging Transfer Results  (Yarowsky & Ngai, 2001) ,[object Object],Core : 94% Full : 91% Core : 96% Full : 93% Trained on Projected Data Core : 98% Full : 97% Core : 97% Full : 96% Directly Trained on 100K French Words N/A Core : 76% Full : 69% Project from English Novel French Aligned French Model
Unsupervised Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semi-Supervised Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Self-Labeling - + Unlabeled Examples + + - Training Examples - - + + + Learner Classifier
Self-Labeling Classifier retrained on automatically labeled  data is frequently more accurate  Training Examples - - + + + Learner Classifier - + + + -
Semi-Supervised EM Unlabeled Examples Training Examples - - + + + Prob. Learner Prob. Classifier +  +  +  +   +
Semi-Supervised EM Training Examples - - + + + Prob. Learner +  +  +  +   + Prob. Classifier
Semi-Supervised EM Training Examples - - + + + Prob. Learner +  +  +  +   + Prob. Classifier
Semi-Supervised EM Unlabeled Examples Training Examples - - + + + Prob. Learner Prob. Classifier +  +  +  +   +
Semi-Supervised EM Continue retraining iterations until probabilistic labels on unlabeled data converge. Training Examples - - + + + Prob. Learner +  +  +  +   + Prob. Classifier
Semi-Supervised EM Results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Semi-Supervised EM Example ,[object Object],[object Object],[object Object],[object Object],[object Object]
Semi-Supervised Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object]
Semi-Supervised Clustering with Pairwise Constraints Linguist Prof Student # Publications Programming Ability Computer Scientist 2-way clustering
Semi-supervised Clustering with Pairwise Constraints Linguist Prof Student # Publications Programming Ability Computer Scientist 2-way clustering Cannot-link Must-link
Semi-Supervised Clustering with  Hidden Markov Random Fields (HMRFs) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Active Semi-Supervised Clustering on Classifying Messages from 3 Newsgroups talk.politics.misc vs. talk.politics.guns, vs. talk.politics.mideast ≈   80% savings in supervision!
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET Journal
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdf
Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdfPaper Scopus - The Naive Bayes algorithm for learning data analytics.pdf
Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdfviettran102053
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningFabrizio Sebastiani
 
Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationReview: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationIOSR Journals
 
AI Unit 5 machine learning
AI Unit 5 machine learning AI Unit 5 machine learning
AI Unit 5 machine learning Narayan Dhamala
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)Akhilesh Joshi
 
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine LearningPWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine LearningTristan Penman
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methodsiosrjce
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learningbutest
 
slides
slidesslides
slidesbutest
 

Was ist angesagt? (18)

0
00
0
 
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdf
Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdfPaper Scopus - The Naive Bayes algorithm for learning data analytics.pdf
Paper Scopus - The Naive Bayes algorithm for learning data analytics.pdf
 
Text Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion MiningText Classification, Sentiment Analysis, and Opinion Mining
Text Classification, Sentiment Analysis, and Opinion Mining
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Text categorization
Text categorizationText categorization
Text categorization
 
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationReview: Semi-Supervised Learning Methods for Word Sense Disambiguation
Review: Semi-Supervised Learning Methods for Word Sense Disambiguation
 
AI Unit 5 machine learning
AI Unit 5 machine learning AI Unit 5 machine learning
AI Unit 5 machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)
 
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine LearningPWL Seattle #23 - A Few Useful Things to Know About Machine Learning
PWL Seattle #23 - A Few Useful Things to Know About Machine Learning
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methods
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learning
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
slides
slidesslides
slides
 

Ähnlich wie PPT SLIDES

activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.pptbutest
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptxKaviya452563
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional dataBenitoSumpter862
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional dataSantosConleyha
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...kevig
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.butest
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learningbutest
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learningbutest
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxAI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxYousef Aburawi
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_ProjectreportSampath Velaga
 
slides
slidesslides
slidesbutest
 
Data-Driven Learning Strategy
Data-Driven Learning StrategyData-Driven Learning Strategy
Data-Driven Learning StrategyJessie Chuang
 

Ähnlich wie PPT SLIDES (20)

activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptx
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learning
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learning
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxAI_06_Machine Learning.pptx
AI_06_Machine Learning.pptx
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_Projectreport
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
slides
slidesslides
slides
 
Data-Driven Learning Strategy
Data-Driven Learning StrategyData-Driven Learning Strategy
Data-Driven Learning Strategy
 
E017252831
E017252831E017252831
E017252831
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

PPT SLIDES

  • 1. Maximizing the Utility of Small Training Sets in Machine Learning Raymond J. Mooney Department of Computer Sciences University of Texas at Austin
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Base Learner Overview of D ECORATE Training Examples Artificial Examples Current Ensemble - - + + + C1 + + - + -
  • 12. C1 Base Learner Overview of D ECORATE Training Examples Artificial Examples Current Ensemble - - + + + - - + - + C2 + - - - +
  • 13. C1 C2 Base Learner Overview of D ECORATE Training Examples Artificial Examples Current Ensemble - + + + - - - + + + C3
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. D ECORATE Active-D ECORATE Training Examples Unlabeled Examples Current Ensemble C1 C2 C3 C4 Utility = 0.1 - - + + - + + + +
  • 20. D ECORATE Active-D ECORATE Training Examples Unlabeled Examples Current Ensemble C1 C2 C3 C4 Utility = 0.1 0.9 - - + + - + + - - 0.3 0.2 0.5 + Acquire Label
  • 21.
  • 22. Learning Curve for Soybean Disease Diagnosis ≈ 60% savings in supervision
  • 23. Learning Curve for Spoken Vowel Recognition ≈ 50% savings in supervision
  • 24.
  • 25.
  • 26. Corpus Mixing Target Training Examples Learner Classifier Source Training Examples - - + + + - - + + + - - + - - + - - +
  • 27.
  • 28.
  • 29. Projecting a POS Tagger (Yarowsky & Ngai, 2001) English : a significant producer for crude oil French : un producteur important de petrole brut Word alignment DT JJ NN IN JJ NN DT NN JJ IN NN JJ Projected POS Tags English POS Tagger French POS Tagger POS Tag Learner
  • 30.
  • 31.
  • 32.
  • 33. Self-Labeling - + Unlabeled Examples + + - Training Examples - - + + + Learner Classifier
  • 34. Self-Labeling Classifier retrained on automatically labeled data is frequently more accurate Training Examples - - + + + Learner Classifier - + + + -
  • 35. Semi-Supervised EM Unlabeled Examples Training Examples - - + + + Prob. Learner Prob. Classifier +  +  +  +   +
  • 36. Semi-Supervised EM Training Examples - - + + + Prob. Learner +  +  +  +   + Prob. Classifier
  • 37. Semi-Supervised EM Training Examples - - + + + Prob. Learner +  +  +  +   + Prob. Classifier
  • 38. Semi-Supervised EM Unlabeled Examples Training Examples - - + + + Prob. Learner Prob. Classifier +  +  +  +   +
  • 39. Semi-Supervised EM Continue retraining iterations until probabilistic labels on unlabeled data converge. Training Examples - - + + + Prob. Learner +  +  +  +   + Prob. Classifier
  • 40.
  • 41.
  • 42.
  • 43. Semi-Supervised Clustering with Pairwise Constraints Linguist Prof Student # Publications Programming Ability Computer Scientist 2-way clustering
  • 44. Semi-supervised Clustering with Pairwise Constraints Linguist Prof Student # Publications Programming Ability Computer Scientist 2-way clustering Cannot-link Must-link
  • 45.
  • 46. Active Semi-Supervised Clustering on Classifying Messages from 3 Newsgroups talk.politics.misc vs. talk.politics.guns, vs. talk.politics.mideast ≈ 80% savings in supervision!
  • 47.

Hinweis der Redaktion

  1. So far we have looked at learning curve statistics summarized over many datasets. But we can also look at learning curves for individual datasets. I’ll present a couple of datasets that clearly demonstrate the improvements that Decorate can bring. We have plotted here the accuracies of the Decorate, bagging and boosting given varying amounts of training data from the Labor dataset. We note that Decorate achieves higher accuracies throughout the learning curve. This is primarily because Labor is quite a small dataset, with approximately 60 examples. So bagging and boosting are limited in the amount of diversity they can produce – as discussed earlier.
  2. More typically the performance of the 3 ensemble methods will converge given enough data. But in most cases Decorate achieves higher accuracy given fewer examples. In the breast cancer dataset, shown here, Decorate produces an accuracy greater than 92% with just 6 examples. (Boosting & Bagging produce almost no improvement over the base learner’s accuracy of 75%)