SlideShare a Scribd company logo
1 of 92
Machine Learning-1
Presented by Skillslash
What is MC Learning
www.skillslash.com
Thesubfield of computer science that “gives computers the ability tolearn
without being explicitlyprogrammed”.
(Arthur Samuel,1959)
Acomputer program is said to learn from experience Ewith respect to someclass of tasks Tand
performance measure Pif its performance at tasks in T
,as measured byP
,improveswith
experienceE.”
(T
omMitchell, 1997)
Using data foranswering questions
High Bias and Low Variance
(Low Flexibility)
Low Bias and High Variance
(Too Flexibility)
Low Bias and High Variance
(Balanced Flexibility)
Bias Error:
The bias is known as the difference between the prediction of the values by the ML model and the correct
value. Being high in biasing gives a large error in training as well as testing data.
Variance Error:
Variance is the amount that the estimate of the target function will change if different training data was
used.
www.skillslash.com
Types of Machine Learning
Supervised Learning vs Unsupervised Learning
Un-Supervised Learning
Clustering
Regression vs Classification
Semi Supervised Learning
Types of Supervised ML
Supervised
Unsupervised
Reinforcement
Output is a discrete variable (e.g.,
Defaulter and Non Defaulter
Spam and non spam
Purchaser Non Purchaser)
Classification
Regression
Output is continuous (e.g.,
price of house, temperature)
www.skillslash.com
www.skillslash.com
Supervised
Unsupervised
Reinforcement
www.skillslash.com
www.skillslash.com
Types of Machine Learning Problems
Supervised
Unsupervised
Reinforcement
Supervised
Is this a cat or a dog?
Are these emails spam or not?
Unsupervised
Predict the market value of houses, given the
square meters, number of rooms, neighborhood,
etc.
Reinforcement
Learn through examples of which we knowthe
desired output (what we want topredict).
Types of Machine LearningProblems
Unsupervised
Supervised
There is no desired output. Learn somethingabout
the data. Latent relationships.
I want to find anomalies in the credit cardusage
patterns of my customers.
Reinforcement
I have photos and want to put them in 20
groups.
www.skillslash.com
Types of Machine LearningProblems
Unsupervised
Supervised
Reinforcement
Useful for learning structure in the data(clustering),
hidden correlations, reduce dimensionality,etc.
www.skillslash.com
Environment gives feedback via a positiveor
negative reward signal.
Unsupervised
Reinforcement
Supervised An agent interacts with an environment andwatches
the result of the interaction.
Types of Machine LearningProblems
www.skillslash.com
Data Gathering
60
Might depend on humanwork
• Manual labeling for supervised learning.
• Domain knowledge. Maybe evenexperts.
May come for free, or “sortof”
• E.g., Machine Translation.
The more the better: Some algorithms need large amounts of data to be
useful (e.g., neural networks).
The quantity and quality of data dictate the modelaccuracy
www.skillslash.com
Data Preprocessing
61
Is there anything wrong with thedata?
• Missing values
• Outliers
• Bad encoding (fortext)
• Wrongly-labeled examples
• Biased data
• Do I have many more samples of one class
than the rest?
Need to fix/remove data?
www.skillslash.com
FeatureEngineering
62
What is a feature?
Afeature is an individual measurable
property of a phenomenon being observed
Our inputs are represented by a setof features.
T
oclassify spam email, features couldbe:
• Number of words that have beench4ng3d
like this.
• Language of the email (0=English,
1=Spanish)
• Number of emojis
Buy ch34p drugs
from the ph4rm4cy
now :) :) :)
(2, 0, 3)
Feature
engineering
www.skillslash.com
FeatureEngineering
63
Extract more information from existing data, not adding “new” dataper-se
• Making itmore useful
• With good features, most algorithms can learnfaster
It can be an art
• Requires thought and knowledge of thedata
T
wo steps:
• Variable transformation (e.g.,dates into weekdays, normalizing)
• Feature creation (e.g., n-grams for texts, if word is capitalizedto
detect names, etc.)
www.skillslash.com
Algorithm Selection &Training
64
Supervised
• Linear classifier
• Naive Bayes
• Support Vector Machines (SVM)
• Decision Tree
• Random Forests
• k-Nearest Neighbors
• Neural Networks (Deeplearning)
Unsupervised
• PCA
• t-SNE
• k-means
• DBSCAN
Reinforcement
• SARSA–λ
• Q-Learning
www.skillslash.com
65
THE MACHINE LEARNING FRAMEWORK
y = f(x)
● Training: given a training set of labeled examples {(x1,y1), …,
(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
● Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)
output prediction
function
Image
feature
www.skillslash.com
Goal of training: making the correct prediction as often as possible
• Incremental improvement:
• Use of metrics for evaluating performance and comparingsolutions
• Hyperparameter tuning: more an art than ascience
Algorithm Selection &Training
66
Predict Adjust
www.skillslash.com
Summary
67
• Machine Learning is intelligent use of data to answer questions
• Enabled by an exponential increase in computing power anddata
availability
• Three big types of problems: supervised, unsupervised,reinforcement
• 5 stepsto every machine learning solution:
1. Data Gathering
2. Data Preprocessing
3. Feature Engineering
4. Algorithm Selection &Training
5. Making Predictions
www.skillslash.com
Generalization
● How well does a learned model generalize from the data it
was trained on to a new test set?
Training set (labels known) Test set (labels
unknown)
Generalization
● Components of generalization error
○ Bias: how much the average model over all training sets differ from the true
model?
■ Error due to inaccurate assumptions/simplifications made by the model
■ Using very less features
○ Variance: how much models estimated from different training sets differ from
each other
● Underfitting: model is too “simple” to represent all the relevant class
characteristics
○ High bias and low variance
○ High training error and high test error
● Overfitting: model is too “complex” and fits irrelevant characteristics
(noise) in the data
○ Low bias and high variance
○ Low training error and high test error
Bias-Variance Trade-off
• Models with too few parameters are
inaccurate because of a large bias (not
enough flexibility).
• Bias can also come due to wrong
assumption.
• Lead to Train error
• Models with too many parameters are
inaccurate because of a large variance
(too much sensitivity to the sample).
• Lead to Test Error
www.skillslash.com
THANK YOU
www.skillslash.com
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx
Machine Learning - Lecture2.pptx

More Related Content

Similar to Machine Learning - Lecture2.pptx

introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptxDr. Amanpreet Kaur
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 
Machine Learning course in Chandigarh Join
Machine Learning course in Chandigarh JoinMachine Learning course in Chandigarh Join
Machine Learning course in Chandigarh Joinasmeerana605
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview QuestionsRock Interview
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Intel® Software
 
How to make m achines learn
How to make m achines learnHow to make m achines learn
How to make m achines learniskamegy
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learningZAMANCHBWN
 
Classification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsClassification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsAM Publications
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Machine Learning
Machine LearningMachine Learning
Machine LearningAmit Kumar
 

Similar to Machine Learning - Lecture2.pptx (20)

introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Lec1 intoduction.pptx
Lec1 intoduction.pptxLec1 intoduction.pptx
Lec1 intoduction.pptx
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Machine Learning course in Chandigarh Join
Machine Learning course in Chandigarh JoinMachine Learning course in Chandigarh Join
Machine Learning course in Chandigarh Join
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview Questions
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
 
How to make m achines learn
How to make m achines learnHow to make m achines learn
How to make m achines learn
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learning
 
Classification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsClassification of Machine Learning Algorithms
Classification of Machine Learning Algorithms
 
MachineLearning_AishwaryaCR
MachineLearning_AishwaryaCRMachineLearning_AishwaryaCR
MachineLearning_AishwaryaCR
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Machine Learning - Lecture2.pptx

  • 2. What is MC Learning www.skillslash.com Thesubfield of computer science that “gives computers the ability tolearn without being explicitlyprogrammed”. (Arthur Samuel,1959) Acomputer program is said to learn from experience Ewith respect to someclass of tasks Tand performance measure Pif its performance at tasks in T ,as measured byP ,improveswith experienceE.” (T omMitchell, 1997) Using data foranswering questions
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. High Bias and Low Variance (Low Flexibility) Low Bias and High Variance (Too Flexibility) Low Bias and High Variance (Balanced Flexibility)
  • 41. Bias Error: The bias is known as the difference between the prediction of the values by the ML model and the correct value. Being high in biasing gives a large error in training as well as testing data. Variance Error: Variance is the amount that the estimate of the target function will change if different training data was used.
  • 43. Supervised Learning vs Unsupervised Learning
  • 44.
  • 45.
  • 50. Types of Supervised ML Supervised Unsupervised Reinforcement Output is a discrete variable (e.g., Defaulter and Non Defaulter Spam and non spam Purchaser Non Purchaser) Classification Regression Output is continuous (e.g., price of house, temperature) www.skillslash.com
  • 52.
  • 55. Types of Machine Learning Problems Supervised Unsupervised Reinforcement Supervised Is this a cat or a dog? Are these emails spam or not? Unsupervised Predict the market value of houses, given the square meters, number of rooms, neighborhood, etc. Reinforcement Learn through examples of which we knowthe desired output (what we want topredict).
  • 56. Types of Machine LearningProblems Unsupervised Supervised There is no desired output. Learn somethingabout the data. Latent relationships. I want to find anomalies in the credit cardusage patterns of my customers. Reinforcement I have photos and want to put them in 20 groups. www.skillslash.com
  • 57. Types of Machine LearningProblems Unsupervised Supervised Reinforcement Useful for learning structure in the data(clustering), hidden correlations, reduce dimensionality,etc. www.skillslash.com
  • 58. Environment gives feedback via a positiveor negative reward signal. Unsupervised Reinforcement Supervised An agent interacts with an environment andwatches the result of the interaction. Types of Machine LearningProblems www.skillslash.com
  • 59.
  • 60. Data Gathering 60 Might depend on humanwork • Manual labeling for supervised learning. • Domain knowledge. Maybe evenexperts. May come for free, or “sortof” • E.g., Machine Translation. The more the better: Some algorithms need large amounts of data to be useful (e.g., neural networks). The quantity and quality of data dictate the modelaccuracy www.skillslash.com
  • 61. Data Preprocessing 61 Is there anything wrong with thedata? • Missing values • Outliers • Bad encoding (fortext) • Wrongly-labeled examples • Biased data • Do I have many more samples of one class than the rest? Need to fix/remove data? www.skillslash.com
  • 62. FeatureEngineering 62 What is a feature? Afeature is an individual measurable property of a phenomenon being observed Our inputs are represented by a setof features. T oclassify spam email, features couldbe: • Number of words that have beench4ng3d like this. • Language of the email (0=English, 1=Spanish) • Number of emojis Buy ch34p drugs from the ph4rm4cy now :) :) :) (2, 0, 3) Feature engineering www.skillslash.com
  • 63. FeatureEngineering 63 Extract more information from existing data, not adding “new” dataper-se • Making itmore useful • With good features, most algorithms can learnfaster It can be an art • Requires thought and knowledge of thedata T wo steps: • Variable transformation (e.g.,dates into weekdays, normalizing) • Feature creation (e.g., n-grams for texts, if word is capitalizedto detect names, etc.) www.skillslash.com
  • 64. Algorithm Selection &Training 64 Supervised • Linear classifier • Naive Bayes • Support Vector Machines (SVM) • Decision Tree • Random Forests • k-Nearest Neighbors • Neural Networks (Deeplearning) Unsupervised • PCA • t-SNE • k-means • DBSCAN Reinforcement • SARSA–λ • Q-Learning www.skillslash.com
  • 65. 65 THE MACHINE LEARNING FRAMEWORK y = f(x) ● Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set ● Testing: apply f to a never before seen test example x and output the predicted value y = f(x) output prediction function Image feature www.skillslash.com
  • 66. Goal of training: making the correct prediction as often as possible • Incremental improvement: • Use of metrics for evaluating performance and comparingsolutions • Hyperparameter tuning: more an art than ascience Algorithm Selection &Training 66 Predict Adjust www.skillslash.com
  • 67. Summary 67 • Machine Learning is intelligent use of data to answer questions • Enabled by an exponential increase in computing power anddata availability • Three big types of problems: supervised, unsupervised,reinforcement • 5 stepsto every machine learning solution: 1. Data Gathering 2. Data Preprocessing 3. Feature Engineering 4. Algorithm Selection &Training 5. Making Predictions www.skillslash.com
  • 68. Generalization ● How well does a learned model generalize from the data it was trained on to a new test set? Training set (labels known) Test set (labels unknown)
  • 69. Generalization ● Components of generalization error ○ Bias: how much the average model over all training sets differ from the true model? ■ Error due to inaccurate assumptions/simplifications made by the model ■ Using very less features ○ Variance: how much models estimated from different training sets differ from each other ● Underfitting: model is too “simple” to represent all the relevant class characteristics ○ High bias and low variance ○ High training error and high test error ● Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data ○ Low bias and high variance ○ Low training error and high test error
  • 70.
  • 71.
  • 72. Bias-Variance Trade-off • Models with too few parameters are inaccurate because of a large bias (not enough flexibility). • Bias can also come due to wrong assumption. • Lead to Train error • Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). • Lead to Test Error

Editor's Notes

  1. We use deep learning for training models on reinforcement learning
  2. http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_bias_variance.htm