SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
UNIT-1
Basics:
Definition-Machine Learning, Classification, Supervised/Unsupervised
Learning; Probably Approximately Correct (PAC) Learning.
Bayesian Decision Theory:
Classification, Losses and Risks, Discriminant Functions, Utility
Theory, Evaluating an Estimator: Bias and Variance, The Bayes'
Estimator, Parametric Classification, Model Selection Procedures.
1.What is Machine Learning
• Machine learning is the study of computer algorithms that allow computer
programs to automatically improve through experience.
• Machine learning is the programming computers to optimize a performance
criterion example data or past experience
Traditional Programming and Machine Learning
Applications of Machine Learning
 Virtual Personal Assistants –Siri, Alexa, Google
 Traffic Predictions
 Social Media services
 Google Translators
 Self driving cars
 Fraud Detection
 Videos Surveillance
 Email spam and Malware Filtering
 Product Recommendations
Why Machine Learning
 Machine learning is a method of data analysis that automates analytical
model building. It is a branch of artificial intelligence based on the idea that
systems can learn from data, identify patterns and make decisions with
minimal human intervention.
 It is possible to quickly and automatically produce models that can analyze
bigger, more complex data and deliver faster more accurate results.
 Building Precise models
 Avoiding unknown risks
Types of Machine Learning Algorithms
2. Classification
 Classification is the process of predicting the class of given data points
 In machine learning, classification refers to a predictive modeling problem
where a class label is predicted for a given example of input data.
 Classification is a process of categorizing a given set of data into classes.
 The process starts with predicting the class of given data points
 The main goal is to identify which class/category the new data will fall into.
Example:
A credit is an amount of money loaded by financial institutions for example a
bank.
 To be paid back with interest, generally in installments.
 It is important for the bank to be able to predict in advance the risk
associated with a loan.
 In credit scoring the bank calculates the risk given the amount of credit and
information about the customer.
The information about the customer includes:
1) Income
2) Savings
3) Profession
4) Age
5) Past financial history etc
 The bank has a record of past loans containing such customer data and
whether the loan was paid back or not
 From this data of particular applications, the aim is to infer a general rule
coding the association between a customer’s attributes and his risk.
 This is an example of a classification problem where there are two classes
either low risk customers or high risk customers.
3. Supervised Learning
 Supervised learning is the Data mining task of inferring a function
from labeled training data.
 The training data consist of a set of training examples.
 In supervised learning, each example is a pair consisting of an input object
and a desired output value.
 Train the machine using data which is well “labeled”, it means some data is
already tagged with the correct answer.
 A supervised learning algorithm analyzes the training data and produces
an inferred function, which can be used for mapping new examples.
 An optimal scenario will allow for the algorithm to correctly determine the
class labels for unseen instances.
Example:
 Suppose you have a basket and it is fulled with different kinds of fruits.
 Your task is to arrange them as groups.
 For understanding let me clear the names of the fruits in our basket.
 We have four types of fruits. They are APPLE, BANANA,GRAPES,CHERRY.
Supervised Learning :
 You already learn from your previous work about the physical characters of fruits
 So arranging the same type of fruits at one place is easy now
 Your previous work is called as training data in data mining
 You already learn the things from your train data, this is because of response
variable
 Response variable means just a decision variable
 You can observe response variable below (FRUIT NAME)
No. SIZE COLOR SHAPE FRUIT NAME
1 Big Red Rounded shape with a depression at the top Apple
2 Small Red Heart-shaped to nearly globular Cherry
3 Big Green Long curving cylinder Banana
4 Small Green Round to oval, Bunch shape Cylindrical Grape
 Suppose you have taken a new fruit from the basket then you will see the size ,
color and shape of that particular fruit.
 If size is Big , color is Red , shape is rounded shape with a depression at the top,
you will conform the fruit name as apple and you will put in apple group.
 Likewise for other fruits also.
 Job of grouping fruits was done and happy ending.
 You can observe in the table that a column was labeled as “FRUIT NAME“. This
is called as response variable.
 If you learn the thing before from training data and then applying that knowledge
to the test data (for new fruit), this type of learning is called as Supervised
Learning.
 Classification comes under supervised learning.
4. Unsupervised Learning
 The problem of unsupervised learning is that of trying to find hidden
structure in unlabeled data.
 Since the examples given to the learner are unlabeled, there is no error or
reward signal to evaluate a potential solution.
 Unsupervised learning problems posses only the input variables (x) but no
corresponding output variables.
 It uses unlabeled training data to model the underlying structure of the data.
 Unlike supervised learning no teacher is provided that means no training will
be given to the machine.
Unsupervised Learning:
 Suppose you have a basket and it is filled with some different types fruits, your
task is to arrange them as groups.
 This time you don’t know anything about the fruits, honestly saying this is the first
time you have seen them. You have no clue about those.
 So, how will you arrange them?
 What will you do first???
 You will take a fruit and you will arrange them by considering physical character
of that particular fruit.
 Suppose you have considered color.
o Then you will arrange them on considering base condition as color.
o Then the groups will be something like this.
 RED COLOR GROUP: apples & cherry fruits.
 GREEN COLOR GROUP: bananas & grapes.
 So now you will take another physical character such as size.
o RED COLOR AND BIG SIZE: apple.
o RED COLOR AND SMALL SIZE: cherry fruits.
o GREEN COLOR AND BIG SIZE: bananas.
o GREEN COLOR AND SMALL SIZE: grapes.
 Job done happy ending.
 Here you did not learn anything before ,means no train data and no response
variable.
 This type of learning is known as unsupervised learning.
 Clustering comes under unsupervised learning.
5. Probably Approximately Correct (PAC) Learning
 In this framework, the learner receives samples and must select a
generalization function (called hypothesis from a certain class of possible
functions).
 The goal is that with the high probability (the “probably” part) the selected
function will have low generalization error ( the approximately correct”
part).
 Consider a concept class C defined over a set of instances X of length n and
a learner L using hypothesis space H. C is PAC learnable by L using H if
For all c belongs to C
Distribution D over X
Here epsilon is a error is very small and delta is a probability of failure
which is arbitrarily small.
Learner L will with probability at least( 1- delta) output of a hypothesis h
belongs to H.
Such that error (h)<epsilon
Example:
 How many training examples N should we have, such that with probability
at least 1 ‒ δ, h has error at most ε ?
(Blumer et al., 1989)
 Each strip is at most ε/4
 Pr that we miss a strip 1‒ ε/4
 Pr that N instances miss a strip (1 ‒ ε/4)N
 Pr that N instances miss 4 strips 4(1 ‒ ε/4)N
 4(1 ‒ ε/4)N
≤ δ and (1 ‒ x)≤exp( ‒ x)
 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)
6. Bayesian Decision Theory : Classification
Bayesian decision theory is a fundamental statistical approach to the problem of
pattern classification.
This approach is based on quantifying the tradeoffs between various
classification decisions using probability and the costs that accompany
such decisions.
 Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk
 Input: x = [x1,x2]T
,Output: C Î {0,1}
 Prediction:















otherwise
0
)
|
0
(
)
|
1
(
if
1
choose
or
otherwise
0
5
0
)
|
1
(
if
1
choose
2
1
2
1
2
1
C
C
C
C
,x
x
C
P
,x
x
C
P
.
,x
x
C
P
Bayes formula:
     
 
x
x
x
p
p
P
P
C
C
C
|
| 
Bayes formula can be expressed informally as
   
         
    1
|
1
|
0
0
0
|
1
1
|
1
1
0














x
x
x
x
x
C
C
C
C
C
C
C
C
P
p
P
p
P
p
p
P
P
Bayes’ Rule: K>2 Classes
     
 
   
   




K
k
k
k
i
i
i
i
i
C
P
C
p
C
P
C
p
p
C
P
C
p
C
P
1
|
|
|
|
x
x
x
x
x
   
   
x
x |
max
|
if
choose
1
and
0
1
k
k
i
i
K
i
i
i
C
P
C
P
C
C
P
C
P


 

Example:
 It is a classification technique based on Bayes’ Theorem with an assumption
of independence among predictors.
 In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature.
 For example, a fruit may be considered to be an apple if it is red, round, and
about 3 inches in diameter. Even if these features depend on each other or
upon the existence of the other features, all of these properties independently
contribute to the probability that this fruit is an apple and that is why it is
known as ‘Naive’.
 Naive Bayes model is easy to build and particularly useful for very large
data sets.
 Bayes theorem provides a way of calculating posterior probability P(c|x)
from P(c), P(x) and P(x|c). Look at the equation below:
Above,
 P(c|x) is the posterior probability of class (c, target)
given predictor (x, attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.
How Naive Bayes algorithm works?
Let’s understand it using an example. Below I have a training data set of
weather and corresponding target variable ‘Play’ (suggesting possibilities of
playing). Now, we need to classify whether players will play or not based on
weather condition. Let’s follow the below steps to perform it.
Step 1: Convert the data set into a frequency table
Step 2: Create Likelihood table by finding the probabilities like Overcast
probability = 0.29 and probability of playing is 0.64.
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome of
prediction.
Problem: Players will play if weather is sunny. Is this statement is correct?
We can solve it using above discussed method of posterior probability.
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14
= 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different class
based on various attributes. This algorithm is mostly used in text classification and
with problems having multiple classes.
Applications of Naive Bayes Algorithms
 Real time Prediction
 Multi class Prediction
 Text classification/ Spam Filtering/ Sentiment Analysis
 Recommendation System
7. Losses and Risks
 A financial institution when making a decision for a loan applicant should
take into account the potential gain and loss as well.
 An accepted low-risk applicant increases profit, while a rejected high-risk
applicant decreases loss.
 The loss for a high-risk applicant erroneously accepted may be different
from the potential gain for an erroneously rejected low-risk applicant.
 Loss is a function that measures the badness of some particular guess.
 Risk is the expected value of your loss over all of the guesses that you made.
 A loss function is a measure of the cost of not making the best decision.
 It is usually regarded as an opportunity loss, so if the best decision would
give you a profit of $20 and another decision would give a profit of $5, the
opportunity loss for that decision is $15.
 Typically a loss function is used for parameter estimation, and the event in
question is some function of the difference between estimated and true
values for an instance of data.
An expected loss is called a risk, and R(ai|x) is called the conditional
risk. Whenever we encounter a particular observation x, we can minimize our
expected loss by selecting the action that minimizes the conditional risk.
Thus, the Bayes decision rule states that to minimize the overall risk, compute the
conditional risk.
and then select the action ai for which R(ai|x) is minimum.
The resulting minimum overall risk is called the Bayes risk, denoted R, and is the
best performance that can be achieved.
probable case.
8.Descriminant Functions
An useful way to represent classifiers is through discriminant functions.
Discriminant function analysis (DFA) is a statistical procedure that classifies
unknown individuals and the probability of their classification into a certain
group.
8.Utility Theory
 In equation we defined the expected risk and chose the action that minimizes
expected risk.
 We now generalize this to utility theory, which is concerned with making
rational decisions when we are uncertain about the state.
 Let us say that given evidence x, the probability of state Sk is calculated as
P(Sk |x)
 We define a utility function, Uik, which measures how good it is to take
action αi when the state is Sk.
 The expected utility is
 A rational decision maker chooses the action that maximizes the expected
Utility
 In the context of classification, decisions correspond to choosing one
of the classes, and maximizing the expected utility is equivalent to
minimizing expected risk.
 Uik are generally measured in monetary terms, and this gives us a way to
define the loss matrix λik as well.
8.Evaluating an Estimator : Bias and Variance
 Discussed how to make optimal decisions when the uncertainty is modeled
using probabilities.
 We now see how we can estimate these probabilities from a given training
set. We start with the parametric approach for classification and regression.
 The basic idea is that there is a set of fixed parameters that determine a
probability model.
 The basic idea behind the parametric method is that there is a set of fixed
parameters that uses to determine a probability model that is used in
Machine Learning as well.
Unknown parameter q
Estimator di = d (Xi) on sample Xi
Bias: bq(d) = E [d] – q
Variance: E [(d–E [d])2
]
Mean square error:
r (d,q) = E [(d–q)2
]
= (E [d] – q)2
+ E [(d–E [d])2
]
= Bias2
+ Variance
8.Bayes’ Estimator
we may have some prior information on the possible value range that a parameter,
θ, may take.
This information is quite useful and should be used, especially when the sample is
small.
The prior information does not tell us exactly what the parameter value is and we
model this uncertainty by viewing θ as a random variable and by defining a prior
density for it, p(θ).
For example, let us say we are told that θ is approximately normal and with 90
percent confidence, θ lies between 5 and 9, symmetrically around 7.
Then we can write p(θ) to be normal with mean 7 and because
 Treat θ as a random var with prior p (θ)
 Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)
 Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ
 Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)
 Maximum Likelihood (ML): θML = argmaxθ p(X|θ)
 Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ
 xt ~ N (θ, σo
2
) and θ ~ N ( μ, σ2
)
 θML = m
θMAP = θBayes’ =   






 2
2
0
2
2
2
0
2
0
/
1
/
/
1
/
1
/
/
|




N
m
N
N
E X
8. Parametric Classification
Models of data with a categorical response are called classifiers. A classifier is
built from training data, for which classifications are known. The classifier assigns
new test data to one of the categorical levels of the response.
Parametric methods, like Discriminant Analysis Classification, fit a parametric
model to the training data and interpolate to classify test data.
A learning model that summarizes data with a set of parameters of fixed size
(independent of the number of training examples) is called a parametric model. No
matter how much data you throw at a parametric model, it won’t change its mind
about how many parameters it needs.
we can write the posterior probability of class Ci as
9. Model Selection Procedure

Weitere ähnliche Inhalte

Ähnlich wie Unit-1.pdf

Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
butest
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
butest
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
butest
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
butest
 
slides
slidesslides
slides
butest
 
slides
slidesslides
slides
butest
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 

Ähnlich wie Unit-1.pdf (20)

Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxAI_06_Machine Learning.pptx
AI_06_Machine Learning.pptx
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
 
slides
slidesslides
slides
 
slides
slidesslides
slides
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classification
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning(UNIT 4)
Machine learning(UNIT 4)Machine learning(UNIT 4)
Machine learning(UNIT 4)
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 

Mehr von SwarnaKumariChinni (8)

CompetencyMatrix-Help Manual.pdf
CompetencyMatrix-Help Manual.pdfCompetencyMatrix-Help Manual.pdf
CompetencyMatrix-Help Manual.pdf
 
CompetencyMatrix-Help Manual (1).pdf
CompetencyMatrix-Help Manual (1).pdfCompetencyMatrix-Help Manual (1).pdf
CompetencyMatrix-Help Manual (1).pdf
 
CompetencyMatrix.pdf
CompetencyMatrix.pdfCompetencyMatrix.pdf
CompetencyMatrix.pdf
 
ML unit3.pptx
ML unit3.pptxML unit3.pptx
ML unit3.pptx
 
ML unit2.pptx
ML unit2.pptxML unit2.pptx
ML unit2.pptx
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
HiperLAN & Bluetooth.ppt
HiperLAN & Bluetooth.pptHiperLAN & Bluetooth.ppt
HiperLAN & Bluetooth.ppt
 
HiperLAN.ppt
HiperLAN.pptHiperLAN.ppt
HiperLAN.ppt
 

Kürzlich hochgeladen

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 

Kürzlich hochgeladen (20)

Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

Unit-1.pdf

  • 1. UNIT-1 Basics: Definition-Machine Learning, Classification, Supervised/Unsupervised Learning; Probably Approximately Correct (PAC) Learning. Bayesian Decision Theory: Classification, Losses and Risks, Discriminant Functions, Utility Theory, Evaluating an Estimator: Bias and Variance, The Bayes' Estimator, Parametric Classification, Model Selection Procedures.
  • 2. 1.What is Machine Learning • Machine learning is the study of computer algorithms that allow computer programs to automatically improve through experience. • Machine learning is the programming computers to optimize a performance criterion example data or past experience Traditional Programming and Machine Learning
  • 3. Applications of Machine Learning  Virtual Personal Assistants –Siri, Alexa, Google  Traffic Predictions  Social Media services  Google Translators  Self driving cars  Fraud Detection  Videos Surveillance  Email spam and Malware Filtering  Product Recommendations Why Machine Learning  Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.  It is possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster more accurate results.  Building Precise models  Avoiding unknown risks
  • 4. Types of Machine Learning Algorithms
  • 5. 2. Classification  Classification is the process of predicting the class of given data points  In machine learning, classification refers to a predictive modeling problem where a class label is predicted for a given example of input data.  Classification is a process of categorizing a given set of data into classes.  The process starts with predicting the class of given data points  The main goal is to identify which class/category the new data will fall into. Example: A credit is an amount of money loaded by financial institutions for example a bank.  To be paid back with interest, generally in installments.  It is important for the bank to be able to predict in advance the risk associated with a loan.  In credit scoring the bank calculates the risk given the amount of credit and information about the customer. The information about the customer includes: 1) Income 2) Savings 3) Profession 4) Age 5) Past financial history etc
  • 6.  The bank has a record of past loans containing such customer data and whether the loan was paid back or not  From this data of particular applications, the aim is to infer a general rule coding the association between a customer’s attributes and his risk.  This is an example of a classification problem where there are two classes either low risk customers or high risk customers.
  • 7. 3. Supervised Learning  Supervised learning is the Data mining task of inferring a function from labeled training data.  The training data consist of a set of training examples.  In supervised learning, each example is a pair consisting of an input object and a desired output value.  Train the machine using data which is well “labeled”, it means some data is already tagged with the correct answer.  A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.  An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. Example:  Suppose you have a basket and it is fulled with different kinds of fruits.  Your task is to arrange them as groups.  For understanding let me clear the names of the fruits in our basket.  We have four types of fruits. They are APPLE, BANANA,GRAPES,CHERRY. Supervised Learning :  You already learn from your previous work about the physical characters of fruits  So arranging the same type of fruits at one place is easy now  Your previous work is called as training data in data mining
  • 8.  You already learn the things from your train data, this is because of response variable  Response variable means just a decision variable  You can observe response variable below (FRUIT NAME) No. SIZE COLOR SHAPE FRUIT NAME 1 Big Red Rounded shape with a depression at the top Apple 2 Small Red Heart-shaped to nearly globular Cherry 3 Big Green Long curving cylinder Banana 4 Small Green Round to oval, Bunch shape Cylindrical Grape  Suppose you have taken a new fruit from the basket then you will see the size , color and shape of that particular fruit.  If size is Big , color is Red , shape is rounded shape with a depression at the top, you will conform the fruit name as apple and you will put in apple group.  Likewise for other fruits also.  Job of grouping fruits was done and happy ending.  You can observe in the table that a column was labeled as “FRUIT NAME“. This is called as response variable.  If you learn the thing before from training data and then applying that knowledge to the test data (for new fruit), this type of learning is called as Supervised Learning.  Classification comes under supervised learning.
  • 9. 4. Unsupervised Learning  The problem of unsupervised learning is that of trying to find hidden structure in unlabeled data.  Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution.  Unsupervised learning problems posses only the input variables (x) but no corresponding output variables.  It uses unlabeled training data to model the underlying structure of the data.  Unlike supervised learning no teacher is provided that means no training will be given to the machine. Unsupervised Learning:  Suppose you have a basket and it is filled with some different types fruits, your task is to arrange them as groups.  This time you don’t know anything about the fruits, honestly saying this is the first time you have seen them. You have no clue about those.  So, how will you arrange them?  What will you do first???  You will take a fruit and you will arrange them by considering physical character of that particular fruit.  Suppose you have considered color. o Then you will arrange them on considering base condition as color. o Then the groups will be something like this.  RED COLOR GROUP: apples & cherry fruits.  GREEN COLOR GROUP: bananas & grapes.  So now you will take another physical character such as size. o RED COLOR AND BIG SIZE: apple. o RED COLOR AND SMALL SIZE: cherry fruits. o GREEN COLOR AND BIG SIZE: bananas. o GREEN COLOR AND SMALL SIZE: grapes.  Job done happy ending.  Here you did not learn anything before ,means no train data and no response variable.  This type of learning is known as unsupervised learning.  Clustering comes under unsupervised learning.
  • 10. 5. Probably Approximately Correct (PAC) Learning  In this framework, the learner receives samples and must select a generalization function (called hypothesis from a certain class of possible functions).  The goal is that with the high probability (the “probably” part) the selected function will have low generalization error ( the approximately correct” part).  Consider a concept class C defined over a set of instances X of length n and a learner L using hypothesis space H. C is PAC learnable by L using H if For all c belongs to C Distribution D over X Here epsilon is a error is very small and delta is a probability of failure which is arbitrarily small. Learner L will with probability at least( 1- delta) output of a hypothesis h belongs to H. Such that error (h)<epsilon Example:  How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989)  Each strip is at most ε/4  Pr that we miss a strip 1‒ ε/4  Pr that N instances miss a strip (1 ‒ ε/4)N  Pr that N instances miss 4 strips 4(1 ‒ ε/4)N  4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)  4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)
  • 11. 6. Bayesian Decision Theory : Classification Bayesian decision theory is a fundamental statistical approach to the problem of pattern classification. This approach is based on quantifying the tradeoffs between various classification decisions using probability and the costs that accompany such decisions.  Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk  Input: x = [x1,x2]T ,Output: C Î {0,1}  Prediction:
  • 12.                otherwise 0 ) | 0 ( ) | 1 ( if 1 choose or otherwise 0 5 0 ) | 1 ( if 1 choose 2 1 2 1 2 1 C C C C ,x x C P ,x x C P . ,x x C P Bayes formula:         x x x p p P P C C C | |  Bayes formula can be expressed informally as                   1 | 1 | 0 0 0 | 1 1 | 1 1 0               x x x x x C C C C C C C C P p P p P p p P P Bayes’ Rule: K>2 Classes                     K k k k i i i i i C P C p C P C p p C P C p C P 1 | | | | x x x x x         x x | max | if choose 1 and 0 1 k k i i K i i i C P C P C C P C P     
  • 13. Example:  It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors.  In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.  For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.  Naive Bayes model is easy to build and particularly useful for very large data sets.  Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below: Above,  P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).  P(c) is the prior probability of class.  P(x|c) is the likelihood which is the probability of predictor given class.  P(x) is the prior probability of predictor.
  • 14. How Naive Bayes algorithm works? Let’s understand it using an example. Below I have a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it. Step 1: Convert the data set into a frequency table Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64. Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction. Problem: Players will play if weather is sunny. Is this statement is correct? We can solve it using above discussed method of posterior probability. P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny) Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
  • 15. Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability. Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes. Applications of Naive Bayes Algorithms  Real time Prediction  Multi class Prediction  Text classification/ Spam Filtering/ Sentiment Analysis  Recommendation System 7. Losses and Risks  A financial institution when making a decision for a loan applicant should take into account the potential gain and loss as well.  An accepted low-risk applicant increases profit, while a rejected high-risk applicant decreases loss.  The loss for a high-risk applicant erroneously accepted may be different from the potential gain for an erroneously rejected low-risk applicant.  Loss is a function that measures the badness of some particular guess.  Risk is the expected value of your loss over all of the guesses that you made.  A loss function is a measure of the cost of not making the best decision.  It is usually regarded as an opportunity loss, so if the best decision would give you a profit of $20 and another decision would give a profit of $5, the opportunity loss for that decision is $15.  Typically a loss function is used for parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data.
  • 16. An expected loss is called a risk, and R(ai|x) is called the conditional risk. Whenever we encounter a particular observation x, we can minimize our expected loss by selecting the action that minimizes the conditional risk. Thus, the Bayes decision rule states that to minimize the overall risk, compute the conditional risk. and then select the action ai for which R(ai|x) is minimum. The resulting minimum overall risk is called the Bayes risk, denoted R, and is the best performance that can be achieved.
  • 17. probable case. 8.Descriminant Functions An useful way to represent classifiers is through discriminant functions. Discriminant function analysis (DFA) is a statistical procedure that classifies unknown individuals and the probability of their classification into a certain group.
  • 18.
  • 19. 8.Utility Theory  In equation we defined the expected risk and chose the action that minimizes expected risk.  We now generalize this to utility theory, which is concerned with making rational decisions when we are uncertain about the state.  Let us say that given evidence x, the probability of state Sk is calculated as P(Sk |x)  We define a utility function, Uik, which measures how good it is to take action αi when the state is Sk.  The expected utility is  A rational decision maker chooses the action that maximizes the expected Utility  In the context of classification, decisions correspond to choosing one of the classes, and maximizing the expected utility is equivalent to minimizing expected risk.  Uik are generally measured in monetary terms, and this gives us a way to define the loss matrix λik as well.
  • 20. 8.Evaluating an Estimator : Bias and Variance  Discussed how to make optimal decisions when the uncertainty is modeled using probabilities.  We now see how we can estimate these probabilities from a given training set. We start with the parametric approach for classification and regression.  The basic idea is that there is a set of fixed parameters that determine a probability model.  The basic idea behind the parametric method is that there is a set of fixed parameters that uses to determine a probability model that is used in Machine Learning as well. Unknown parameter q Estimator di = d (Xi) on sample Xi Bias: bq(d) = E [d] – q Variance: E [(d–E [d])2 ] Mean square error: r (d,q) = E [(d–q)2 ] = (E [d] – q)2 + E [(d–E [d])2 ] = Bias2 + Variance
  • 21. 8.Bayes’ Estimator we may have some prior information on the possible value range that a parameter, θ, may take. This information is quite useful and should be used, especially when the sample is small. The prior information does not tell us exactly what the parameter value is and we model this uncertainty by viewing θ as a random variable and by defining a prior density for it, p(θ). For example, let us say we are told that θ is approximately normal and with 90 percent confidence, θ lies between 5 and 9, symmetrically around 7. Then we can write p(θ) to be normal with mean 7 and because  Treat θ as a random var with prior p (θ)  Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)  Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ  Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)  Maximum Likelihood (ML): θML = argmaxθ p(X|θ)  Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ  xt ~ N (θ, σo 2 ) and θ ~ N ( μ, σ2 )  θML = m θMAP = θBayes’ =           2 2 0 2 2 2 0 2 0 / 1 / / 1 / 1 / / |     N m N N E X
  • 22. 8. Parametric Classification Models of data with a categorical response are called classifiers. A classifier is built from training data, for which classifications are known. The classifier assigns new test data to one of the categorical levels of the response. Parametric methods, like Discriminant Analysis Classification, fit a parametric model to the training data and interpolate to classify test data. A learning model that summarizes data with a set of parameters of fixed size (independent of the number of training examples) is called a parametric model. No matter how much data you throw at a parametric model, it won’t change its mind about how many parameters it needs. we can write the posterior probability of class Ci as
  • 23.
  • 24.
  • 25. 9. Model Selection Procedure