SlideShare ist ein Scribd-Unternehmen logo
1 von 70
Probability for Data Scientists
Dr. Ferdin Joe John Joseph
Machine Learning
Machine Learning is an interdisciplinary field in Data Science that uses
• statistics
• probability
• algorithms
to learn from data and provide insights which can be used to build
intelligent applications.
2
Today We Learn
3
Probability in Real Life
4
Probability in Real Life
5
Probability in Real Life
6
Probability in Real Life
7
Probability for Data Science
•Probability deals with predicting the likelihood of
future events, while statistics involves the
analysis of the frequency of past events.
8
Terminologies
• Event
• Random Variable
• Empirical Probability
• Theoretical Probability
• Joint Probability
• Conditional Probability
9
Event
• An event is a set of outcomes of an experiment to which a probability
is assigned.
• E represents event
• P(E) is the probability that the event E occur.
• A situation where E might happen (success) or might not happen
(failure) is called a trial.
10
Event
• Tossing a coin
11
Event
• Rolling dice
12
Event
• Pulling colored ball out of the bag
13
Random Variable
• The variable that represents the outcome of an events is called a
random variable.
• Eg. Getting head or tail in tossing a coin
14
Random variable in tossing a coin
• If we toss a coin, the chances for getting head or tail is 50-50
• The probability of getting head or tail is ½ or 50%
• Random variable range between 0 and 1
15
Empirical Probability
• Also known as practical probability
• It is the number of times the event occurs divided by the total
number of incidents observed.
• If for ‘n’ trials and we observe ‘s’ successes, the probability of success
is s/n.
• Toss a coin 4 times. The outcome is H, H, H, T
• P(Head) =3/4=0.75
• P(Tail)=1/4=0.25
16
Theoretical probability
• The number of ways the particular event can occur divided by the
total number of possible outcomes.
• A head can occur once and possible outcomes are two (head, tail).
The true (theoretical) probability of a head is 1/2.
17
Exercise 1
A die is rolled, find the probability that an even number is obtained.
18
Exercise 1
A die is rolled, find the probability that an even number is obtained.
Solution:
Let us first write the sample space S of the experiment.
S = {1,2,3,4,5,6}
Let E be the event "an even number is obtained" and write it down.
E = {2,4,6}
We now use the formula of the classical probability.
P(E) = n(E) / n(S) = 3 / 6 = 1 / 2
19
Exercise 2
Two coins are tossed, find the probability that two heads are obtained.
Note: Each coin has two possible outcomes H (heads) and T (Tails).
20
Exercise 2
Two coins are tossed, find the probability that two heads are obtained.
Note: Each coin has two possible outcomes H (heads) and T (Tails).
The sample space S is given by.
S = {(H,T),(H,H),(T,H),(T,T)}
Let E be the event "two heads are obtained".
E = {(H,H)}
We use the formula of the classical probability.
P(E) = n(E) / n(S) = 1 / 4
21
Exercise 3
A card is drawn at random from a deck of cards. Find the probability of
getting the 3 of diamond.
22
Exercise 3
A card is drawn at random from a deck of cards. Find the probability of
getting the 3 of diamond.
The sample space S of the experiment in question 6 is shown below
23
Exercise 3
A card is drawn at random from a deck of cards. Find the probability of
getting the 3 of diamond.
24
Exercise 3
A card is drawn at random from a deck of cards. Find the probability of
getting the 3 of diamond.
Let E be the event "getting the 3 of diamond". An examination of the
sample space shows that there is one "3 of diamond" so that n(E) = 1
and n(S) = 52. Hence the probability of event E occurring is given by
P(E) = 1 / 52
25
Exercise 4
The blood groups of 200 people is distributed as follows:
50 have type A blood,
65 have B blood type,
70 have O blood type and
15 have type AB blood.
If a person from this group is selected at random, what is the
probability that this person has O blood type?
26
Exercise 4
We construct a table of frequencies for the the blood groups as follows
group frequency
A 50
B 65
O 70
AB 15
We use the empirical formula of the probability
P(E) = Frequency for O blood / Total frequencies
= 70 / 200 = 0.35
27
Classwork 1
What is the probability of throwing one dice and getting the number
greater than 4 ?
28
Classwork 2
The customer wants to buy a bread and a can. There are 30 pieces of
bread in the shop, including 5 from the previous day, and 20 cans with
unreadable expiration date, of which one has expired. What is the
probability that the customer will buy a fresh bread and a tin under
warranty ?
29
Classwork 3
What is the probability that if we choose a trinity from 19 boys and 12
girls, we will have :
a) three boys
b) three girls
c) two boys and one girl ?
30
Joint Probability
• Probability of events A and B denoted by P(A and B) or P(A ∩ B) is the
probability that events A and B both occur.
• P(A ∩ B) = P(A). P(B)
• This only applies if A and B are independent, which means that if A
occurred, that doesn’t change the probability of B, and vice versa.
31
Conditional Probability
• A and B are not independent
• When A and B are not independent, it is often useful to compute the
conditional probability, P (A|B)
• The probability of A given that B occurred: P(A|B) =
P(A ∩ B)
P(B)
• Similarly, P(B|A) =
P(A ∩ B)
P(A)
32
• Joint probability of A and B can be denoted as
• P(A ∩ B)= p(A).P(B|A)
33
Bayes Theorem
34
Bayes Theorem
• Used in Naïve Bayes Classifier
35
36
Types of Events
• Independent
• Mutually Exclusive
37
Independent Events
• Two or more events not having control over the outcome of the
others.
38
Mutually Exclusive Events
• If two events are NOT independent, then we say that they are dependent.
• Sampling may be done with replacement or without replacement.
• With replacement: If each member of a population is replaced after it is
picked, then that member has the possibility of being chosen more than
once. When sampling is done with replacement, then events are
considered to be independent, meaning the result of the first pick will not
change the probabilities for the second pick.
• Without replacement: When sampling is done without replacement, each
member of a population may be chosen only once. In this case, the
probabilities for the second pick are affected by the result of the first pick.
The events are considered to be dependent or not independent.
39
Sampling with replacement
• Suppose you pick three cards with replacement. The first card you
pick out of the 52 cards is the
• Q of spades. You put this card back, reshuffle the cards and pick a
second card from the 52-card deck. It is the ten of clubs. You put this
card back, reshuffle the cards and pick a third card from the 52-card
deck. This time, the card is the Q of spades again. Your picks are {Q of
spades, ten of clubs, Q of spades}. You have picked the Q of spades
twice. You pick each card from the 52-card deck.
40
Sampling without replacement
• Suppose you pick three cards without replacement. The first card you
pick out of the 52 cards is the
• K of hearts. You put this card aside and pick the second card from the
51 cards remaining in the deck. It is the three of diamonds. You put
this card aside and pick the third card from the remaining 50 cards in
the deck. The third card is the J of spades. Your picks are {K of hearts,
three of diamonds, J of spades}. Because you have picked the cards
without replacement, you cannot pick the same card twice.
41
Probability Distribution
• A probability distribution is a list of all of the possible outcomes of a
random variable along with their corresponding probability values.
42
Discrete Probability Distribution
• If we consider 1 and 2 as outcomes of rolling a six-sided die, then we
can’t have an outcome in between that (e.g. I can’t have an outcome
of 1.5).
• This is called probability mass function
43
Continuous Probability Distribution
• Sometimes we are concerned with the probabilities of random
variables that have continuous outcomes.
• Eg. The height of an adult picked at random from a population or the
amount of time that a taxi driver has to wait before their next job.
• When we use a probability function to describe a continuous
probability distribution we call it a probability density function
(commonly abbreviated as pdf).
44
Central Limit Theorem
• The central limit theorem states that if you have a population with
mean μ and standard deviation σ and take sufficiently large random
samples from the population with replacement text annotation
indicator, then the distribution of the sample means will be
approximately normally distributed.
45
Central Limit Theorem
46
Normal Distribution
• Uses the Central Limit Theorem
• Known as Bell Curve
47
Normal Distribution
48
Case Study
49
Genetic Algorithm
Genetic algorithm is a search heuristic that is inspired by Charles
Darwin’s theory of natural evolution.
This algorithm reflects the process of natural selection where the fittest
individuals are selected for reproduction in order to produce offspring
of the next generation.
50
Genetic Algorithm
51
Phases of Genetic Algorithm
Initial population
Fitness function
Selection
Crossover
Mutation
52
Initial Population
The process begins with a set of individuals which is called a
Population. Each individual is a solution to the problem you want to
solve.
An individual is characterized by a set of parameters (variables) known
as Genes. Genes are joined into a string to form a Chromosome
(solution).
In a genetic algorithm, the set of genes of an individual is represented
using a string, in terms of an alphabet. Usually, binary values are used
(string of 1s and 0s). We say that we encode the genes in a
chromosome.
53
Initial Population
54
Fitness Function
The fitness function determines how fit an individual is (the ability of
an individual to compete with other individuals).
It gives a fitness score to each individual.
The probability that an individual will be selected for reproduction is
based on its fitness score.
55
Selection
The idea of selection phase is to select the fittest individuals and let
them pass their genes to the next generation.
Two pairs of individuals (parents) are selected based on their fitness
scores. Individuals with high fitness have more chance to be selected
for reproduction.
56
Crossover
Crossover is the most significant phase in a genetic algorithm. For each
pair of parents to be mated, a crossover point is chosen at random
from within the genes.
For example, consider the crossover point to be 3 as shown below.
57
Crossover
• Offspring are created by exchanging the genes of parents among
themselves until the crossover point is reached.
• The new offsprings A5 and A6 are added to the population.
58
Probability in crossover
• Choosing which chromosome to perform crossover
• Choosing the pair to perform crossover
• Choosing the part of chromosome to perform crossover
59
Mutation
• In certain new offspring formed, some of their genes can be
subjected to a mutation with a low random probability.
• This implies that some of the bits in the bit string can be flipped.
60
Probability in mutation
• Choosing which chromosome to perform mutation
• Choosing whether to perform mutation or not
• Choosing the part of chromosome to perform mutation
61
Sample Java Code
https://github.com/ferdinjoe/Genetic-Algorithm
62
Probability usage in programming
63
Probability usage in programming
64
# generate random floating point values
from random import seed
from random import random
# seed random number generator
seed(1)
# generate random numbers between 0-1
for _ in range(10):
value = random()
print(value)
Probability usage in programming
65
# generate random integer values
from random import seed
from random import randint
# seed random number generator
seed(1)
# generate some integers
for _ in range(10):
value = randint(0, 10)
print(value)
Probability usage in programming
66
# choose a random element from a list
from random import seed
from random import choice
# seed random number generator
seed(1)
# prepare a sequence
sequence = [i for i in range(20)]
print(sequence)
# make choices from the sequence
for _ in range(5):
selection = choice(sequence)
print(selection)
Probability usage in programming
67
# randomly shuffle a sequence
from random import seed
from random import shuffle
# seed random number generator
seed(1)
# prepare a sequence
sequence = [i for i in range(20)]
print(sequence)
# randomly shuffle the sequence
shuffle(sequence)
print(sequence)
Slides Available in link below
www.slideshare.net/ferdinjoe
68
More topics recommended to learn
• Queueing Theory
• Statistics
• Numerical Methods
• Discrete Mathematics
• Optimization problems in Operations Research
69
70

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introduction to Probability and Probability Distributions
Introduction to Probability and Probability DistributionsIntroduction to Probability and Probability Distributions
Introduction to Probability and Probability Distributions
 
Bayes rule (Bayes Law)
Bayes rule (Bayes Law)Bayes rule (Bayes Law)
Bayes rule (Bayes Law)
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Basic concepts of probability
Basic concepts of probability Basic concepts of probability
Basic concepts of probability
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Probablity ppt maths
Probablity ppt mathsProbablity ppt maths
Probablity ppt maths
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Statistics and data science
Statistics and data scienceStatistics and data science
Statistics and data science
 
Intro to probability
Intro to probabilityIntro to probability
Intro to probability
 
Theorems And Conditional Probability
Theorems And Conditional ProbabilityTheorems And Conditional Probability
Theorems And Conditional Probability
 

Ähnlich wie Probability Theory for Data Scientists

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
beshahashenafe20
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
beshahashenafe20
 
Binomial distribution good
Binomial distribution goodBinomial distribution good
Binomial distribution good
Zahida Pervaiz
 
Lab23 chisquare2007
Lab23 chisquare2007Lab23 chisquare2007
Lab23 chisquare2007
sbarkanic
 
Lecture Notes MTH302 Before MTT Myers.docx
Lecture Notes MTH302 Before MTT Myers.docxLecture Notes MTH302 Before MTT Myers.docx
Lecture Notes MTH302 Before MTT Myers.docx
RaghavaReddy449756
 

Ähnlich wie Probability Theory for Data Scientists (20)

Probability and Statistics - Week 1
Probability and Statistics - Week 1Probability and Statistics - Week 1
Probability and Statistics - Week 1
 
PROBABILITY THEORIES.pptx
PROBABILITY THEORIES.pptxPROBABILITY THEORIES.pptx
PROBABILITY THEORIES.pptx
 
powerpoints probability.pptx
powerpoints probability.pptxpowerpoints probability.pptx
powerpoints probability.pptx
 
1 - Probabilty Introduction .ppt
1 - Probabilty Introduction .ppt1 - Probabilty Introduction .ppt
1 - Probabilty Introduction .ppt
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Chapter7ppt.pdf
Chapter7ppt.pdfChapter7ppt.pdf
Chapter7ppt.pdf
 
probability
probabilityprobability
probability
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhChapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Chapter Five.ppthhjhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 
Probability
ProbabilityProbability
Probability
 
5Enote5.ppt
5Enote5.ppt5Enote5.ppt
5Enote5.ppt
 
5Enote5.ppt
5Enote5.ppt5Enote5.ppt
5Enote5.ppt
 
probability.pptx
probability.pptxprobability.pptx
probability.pptx
 
random variable and distribution
random variable and distributionrandom variable and distribution
random variable and distribution
 
probability and its functions with purpose in the world's situation .pptx
probability and its functions with purpose in the world's situation .pptxprobability and its functions with purpose in the world's situation .pptx
probability and its functions with purpose in the world's situation .pptx
 
chapter five.pptx
chapter five.pptxchapter five.pptx
chapter five.pptx
 
Stat.pptx
Stat.pptxStat.pptx
Stat.pptx
 
Binomial distribution good
Binomial distribution goodBinomial distribution good
Binomial distribution good
 
Lab23 chisquare2007
Lab23 chisquare2007Lab23 chisquare2007
Lab23 chisquare2007
 
Lecture Notes MTH302 Before MTT Myers.docx
Lecture Notes MTH302 Before MTT Myers.docxLecture Notes MTH302 Before MTT Myers.docx
Lecture Notes MTH302 Before MTT Myers.docx
 

Mehr von Ferdin Joe John Joseph PhD

Mehr von Ferdin Joe John Joseph PhD (20)

Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud Computing
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud Computing
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
 
Cloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba CloudCloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba Cloud
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Week 10: Programming for Data Analysis
Week 10: Programming for Data AnalysisWeek 10: Programming for Data Analysis
Week 10: Programming for Data Analysis
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Week 8: Programming for Data Analysis
Week 8: Programming for Data AnalysisWeek 8: Programming for Data Analysis
Week 8: Programming for Data Analysis
 

Kürzlich hochgeladen

怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Kürzlich hochgeladen (20)

怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

Probability Theory for Data Scientists

  • 1. Probability for Data Scientists Dr. Ferdin Joe John Joseph
  • 2. Machine Learning Machine Learning is an interdisciplinary field in Data Science that uses • statistics • probability • algorithms to learn from data and provide insights which can be used to build intelligent applications. 2
  • 8. Probability for Data Science •Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events. 8
  • 9. Terminologies • Event • Random Variable • Empirical Probability • Theoretical Probability • Joint Probability • Conditional Probability 9
  • 10. Event • An event is a set of outcomes of an experiment to which a probability is assigned. • E represents event • P(E) is the probability that the event E occur. • A situation where E might happen (success) or might not happen (failure) is called a trial. 10
  • 13. Event • Pulling colored ball out of the bag 13
  • 14. Random Variable • The variable that represents the outcome of an events is called a random variable. • Eg. Getting head or tail in tossing a coin 14
  • 15. Random variable in tossing a coin • If we toss a coin, the chances for getting head or tail is 50-50 • The probability of getting head or tail is ½ or 50% • Random variable range between 0 and 1 15
  • 16. Empirical Probability • Also known as practical probability • It is the number of times the event occurs divided by the total number of incidents observed. • If for ‘n’ trials and we observe ‘s’ successes, the probability of success is s/n. • Toss a coin 4 times. The outcome is H, H, H, T • P(Head) =3/4=0.75 • P(Tail)=1/4=0.25 16
  • 17. Theoretical probability • The number of ways the particular event can occur divided by the total number of possible outcomes. • A head can occur once and possible outcomes are two (head, tail). The true (theoretical) probability of a head is 1/2. 17
  • 18. Exercise 1 A die is rolled, find the probability that an even number is obtained. 18
  • 19. Exercise 1 A die is rolled, find the probability that an even number is obtained. Solution: Let us first write the sample space S of the experiment. S = {1,2,3,4,5,6} Let E be the event "an even number is obtained" and write it down. E = {2,4,6} We now use the formula of the classical probability. P(E) = n(E) / n(S) = 3 / 6 = 1 / 2 19
  • 20. Exercise 2 Two coins are tossed, find the probability that two heads are obtained. Note: Each coin has two possible outcomes H (heads) and T (Tails). 20
  • 21. Exercise 2 Two coins are tossed, find the probability that two heads are obtained. Note: Each coin has two possible outcomes H (heads) and T (Tails). The sample space S is given by. S = {(H,T),(H,H),(T,H),(T,T)} Let E be the event "two heads are obtained". E = {(H,H)} We use the formula of the classical probability. P(E) = n(E) / n(S) = 1 / 4 21
  • 22. Exercise 3 A card is drawn at random from a deck of cards. Find the probability of getting the 3 of diamond. 22
  • 23. Exercise 3 A card is drawn at random from a deck of cards. Find the probability of getting the 3 of diamond. The sample space S of the experiment in question 6 is shown below 23
  • 24. Exercise 3 A card is drawn at random from a deck of cards. Find the probability of getting the 3 of diamond. 24
  • 25. Exercise 3 A card is drawn at random from a deck of cards. Find the probability of getting the 3 of diamond. Let E be the event "getting the 3 of diamond". An examination of the sample space shows that there is one "3 of diamond" so that n(E) = 1 and n(S) = 52. Hence the probability of event E occurring is given by P(E) = 1 / 52 25
  • 26. Exercise 4 The blood groups of 200 people is distributed as follows: 50 have type A blood, 65 have B blood type, 70 have O blood type and 15 have type AB blood. If a person from this group is selected at random, what is the probability that this person has O blood type? 26
  • 27. Exercise 4 We construct a table of frequencies for the the blood groups as follows group frequency A 50 B 65 O 70 AB 15 We use the empirical formula of the probability P(E) = Frequency for O blood / Total frequencies = 70 / 200 = 0.35 27
  • 28. Classwork 1 What is the probability of throwing one dice and getting the number greater than 4 ? 28
  • 29. Classwork 2 The customer wants to buy a bread and a can. There are 30 pieces of bread in the shop, including 5 from the previous day, and 20 cans with unreadable expiration date, of which one has expired. What is the probability that the customer will buy a fresh bread and a tin under warranty ? 29
  • 30. Classwork 3 What is the probability that if we choose a trinity from 19 boys and 12 girls, we will have : a) three boys b) three girls c) two boys and one girl ? 30
  • 31. Joint Probability • Probability of events A and B denoted by P(A and B) or P(A ∩ B) is the probability that events A and B both occur. • P(A ∩ B) = P(A). P(B) • This only applies if A and B are independent, which means that if A occurred, that doesn’t change the probability of B, and vice versa. 31
  • 32. Conditional Probability • A and B are not independent • When A and B are not independent, it is often useful to compute the conditional probability, P (A|B) • The probability of A given that B occurred: P(A|B) = P(A ∩ B) P(B) • Similarly, P(B|A) = P(A ∩ B) P(A) 32
  • 33. • Joint probability of A and B can be denoted as • P(A ∩ B)= p(A).P(B|A) 33
  • 35. Bayes Theorem • Used in Naïve Bayes Classifier 35
  • 36. 36
  • 37. Types of Events • Independent • Mutually Exclusive 37
  • 38. Independent Events • Two or more events not having control over the outcome of the others. 38
  • 39. Mutually Exclusive Events • If two events are NOT independent, then we say that they are dependent. • Sampling may be done with replacement or without replacement. • With replacement: If each member of a population is replaced after it is picked, then that member has the possibility of being chosen more than once. When sampling is done with replacement, then events are considered to be independent, meaning the result of the first pick will not change the probabilities for the second pick. • Without replacement: When sampling is done without replacement, each member of a population may be chosen only once. In this case, the probabilities for the second pick are affected by the result of the first pick. The events are considered to be dependent or not independent. 39
  • 40. Sampling with replacement • Suppose you pick three cards with replacement. The first card you pick out of the 52 cards is the • Q of spades. You put this card back, reshuffle the cards and pick a second card from the 52-card deck. It is the ten of clubs. You put this card back, reshuffle the cards and pick a third card from the 52-card deck. This time, the card is the Q of spades again. Your picks are {Q of spades, ten of clubs, Q of spades}. You have picked the Q of spades twice. You pick each card from the 52-card deck. 40
  • 41. Sampling without replacement • Suppose you pick three cards without replacement. The first card you pick out of the 52 cards is the • K of hearts. You put this card aside and pick the second card from the 51 cards remaining in the deck. It is the three of diamonds. You put this card aside and pick the third card from the remaining 50 cards in the deck. The third card is the J of spades. Your picks are {K of hearts, three of diamonds, J of spades}. Because you have picked the cards without replacement, you cannot pick the same card twice. 41
  • 42. Probability Distribution • A probability distribution is a list of all of the possible outcomes of a random variable along with their corresponding probability values. 42
  • 43. Discrete Probability Distribution • If we consider 1 and 2 as outcomes of rolling a six-sided die, then we can’t have an outcome in between that (e.g. I can’t have an outcome of 1.5). • This is called probability mass function 43
  • 44. Continuous Probability Distribution • Sometimes we are concerned with the probabilities of random variables that have continuous outcomes. • Eg. The height of an adult picked at random from a population or the amount of time that a taxi driver has to wait before their next job. • When we use a probability function to describe a continuous probability distribution we call it a probability density function (commonly abbreviated as pdf). 44
  • 45. Central Limit Theorem • The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement text annotation indicator, then the distribution of the sample means will be approximately normally distributed. 45
  • 47. Normal Distribution • Uses the Central Limit Theorem • Known as Bell Curve 47
  • 50. Genetic Algorithm Genetic algorithm is a search heuristic that is inspired by Charles Darwin’s theory of natural evolution. This algorithm reflects the process of natural selection where the fittest individuals are selected for reproduction in order to produce offspring of the next generation. 50
  • 52. Phases of Genetic Algorithm Initial population Fitness function Selection Crossover Mutation 52
  • 53. Initial Population The process begins with a set of individuals which is called a Population. Each individual is a solution to the problem you want to solve. An individual is characterized by a set of parameters (variables) known as Genes. Genes are joined into a string to form a Chromosome (solution). In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the genes in a chromosome. 53
  • 55. Fitness Function The fitness function determines how fit an individual is (the ability of an individual to compete with other individuals). It gives a fitness score to each individual. The probability that an individual will be selected for reproduction is based on its fitness score. 55
  • 56. Selection The idea of selection phase is to select the fittest individuals and let them pass their genes to the next generation. Two pairs of individuals (parents) are selected based on their fitness scores. Individuals with high fitness have more chance to be selected for reproduction. 56
  • 57. Crossover Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be mated, a crossover point is chosen at random from within the genes. For example, consider the crossover point to be 3 as shown below. 57
  • 58. Crossover • Offspring are created by exchanging the genes of parents among themselves until the crossover point is reached. • The new offsprings A5 and A6 are added to the population. 58
  • 59. Probability in crossover • Choosing which chromosome to perform crossover • Choosing the pair to perform crossover • Choosing the part of chromosome to perform crossover 59
  • 60. Mutation • In certain new offspring formed, some of their genes can be subjected to a mutation with a low random probability. • This implies that some of the bits in the bit string can be flipped. 60
  • 61. Probability in mutation • Choosing which chromosome to perform mutation • Choosing whether to perform mutation or not • Choosing the part of chromosome to perform mutation 61
  • 63. Probability usage in programming 63
  • 64. Probability usage in programming 64 # generate random floating point values from random import seed from random import random # seed random number generator seed(1) # generate random numbers between 0-1 for _ in range(10): value = random() print(value)
  • 65. Probability usage in programming 65 # generate random integer values from random import seed from random import randint # seed random number generator seed(1) # generate some integers for _ in range(10): value = randint(0, 10) print(value)
  • 66. Probability usage in programming 66 # choose a random element from a list from random import seed from random import choice # seed random number generator seed(1) # prepare a sequence sequence = [i for i in range(20)] print(sequence) # make choices from the sequence for _ in range(5): selection = choice(sequence) print(selection)
  • 67. Probability usage in programming 67 # randomly shuffle a sequence from random import seed from random import shuffle # seed random number generator seed(1) # prepare a sequence sequence = [i for i in range(20)] print(sequence) # randomly shuffle the sequence shuffle(sequence) print(sequence)
  • 68. Slides Available in link below www.slideshare.net/ferdinjoe 68
  • 69. More topics recommended to learn • Queueing Theory • Statistics • Numerical Methods • Discrete Mathematics • Optimization problems in Operations Research 69
  • 70. 70