Fitting probability distribution into data is very essential knowledge for the researchers of any discipline. I hope this presentation slides may contribute in scientific research.
1. 1
Fitting Data into Probability
Distributions
by
Sarkar Nikhil Chandra, M.S.
PhD Student
2. 2
Problem Statement
• Consider a vector of N=40 values that are the results of an
experiment.
• We want to find a probability distribution that can describe (i.e.,
model) the outcome of the sample data from the experiment.
Q. How to determine which distribution fits data best?
3. 3
Probability distribution
Probability distribution
Continuous Discrete
Normal distribution Binomial distribution
t distribution Poisson distribution
Chi-square distribution Geometric distribution
F distribution Hypergeometric distribution
Exponential distribution Negative binomial distribution
Uniform distribution
Beta distribution
Cauchy distribution
Logistic distribution
Lognormal distribution
Gamma distribution
Weibull distribution
Pareto distribution
7. 7
The Gamma Distribution
Probability density function: f 𝑥; 𝛼, 𝜆 =
𝜆𝑒−𝜆𝑥
(𝜆𝑥) 𝛼−1
Γ(𝛼)
, 𝑥 ≥ 0
0 , 𝑥 < 0
Fig4: The Gamma distribution
PDF
The quantity Γ(𝛼) is called Gamma function and is given by
Γ(𝛼)= 𝑒−𝑥
𝑥 𝛼−1
𝑑𝑥
∞
0
8. 8
Fitting Procedure: Overview
• Fit estimated data into a distribution ( i.e., determine the parameters of
a probability distribution that best fit with estimated data)
• Determine the goodness of fit (i.e., how well estimated data fit a specific
distribution) by using:
o Histogram and theoretical densities plot
o Empirical and theoretical CDFs plot
o Q-Q plot
10. 10
Probability Density Histogram
Fig2(a) Probability density histogram for desired deceleration; (b) probability density histogram for maximum acceleration
(a)
(b)
11. 11
Probability Density Histogram
Fig3(a) Probability density histogram for linear jam distance; (b) probability density histogram for non-linear jam distance
(a)
(b)
12. 12
Notes for Probability Density Histogram
• The visual perception varies in density histogram based on different
bin widths.
• It is not clear which one should consider. You may consider
cumulative distribution function (CDF) to over come this issue.
13. 13
Histogram and theoretical densities
Fig4: Goodness-of-fit plot for various distributions fitted to estimated desired speed
14. 14
Histogram and theoretical densities
Fig5: Goodness-of-fit plot for various distributions fitted to estimated maximum acceleration
15. 15
Histogram and theoretical densities
Fig6: Goodness-of-fit plot for various distributions fitted to estimated desired deceleration
16. 16
Histogram and theoretical densities
Fig7: Goodness-of-fit plot for various distributions fitted to estimated safe time headway
17. 17
Histogram and theoretical densities
Fig8: Goodness-of-fit plot for various distributions fitted to estimated linear jam distance
18. 18
Histogram and theoretical densities
Fig9: Goodness-of-fit plot for various distributions fitted to estimated non-linear jam distance
19. 19
Empirical and theoretical CDFs
Fig10: Goodness-of-fit plot for various distributions fitted to estimated desired speed
20. 20
Empirical and theoretical CDFs
Fig11: Goodness-of-fit plot for various distributions fitted to estimated maximum acceleration
21. 21
Empirical and theoretical CDFs
Fig12: Goodness-of-fit plot for various distributions fitted to estimated desired deceleration
22. 22
Empirical and theoretical CDFs
Fig13: Goodness-of-fit plot for various distributions fitted to estimated safe time headway
23. 23
Empirical and theoretical CDFs
Fig14: Goodness-of-fit plot for various distributions fitted to estimated linear jam distance
24. 24
Empirical and theoretical CDFs
Fig15: Goodness-of-fit plot for various distributions fitted to estimated non-linear jam distance
25. 25
The Quantile-Quantile plot
Fig15: Goodness-of-fit plot for various distributions fitted to estimated non-linear jam distance
• The theoretical quantiles verses sample quantiles plot generally
known as Q-Q plot.
• Q-Q plot is used to provide a visual comparison for measure the
goodness-of-fit of specific probability distribution with the sample
data.
• If the plot produces an approximately straight line suggesting that
the data follows that specific probability distribution.
26. 26
Q-Q plot for Normal distribution
Fig16: Goodness-of-fit plot for normal distribution fitted to data
Linear jam distance
Maximum acceleration Desired deceleration
Safety time headway
Desired speed
Non-linear jam distance
27. 27
Q-Q plot for Lognormal distribution
Fig17: Goodness-of-fit plot for Lognormal distribution fitted to data
Linear jam distance
Maximum acceleration Desired deceleration
Safety time headway
Desired speed
Non-linear jam distance
28. 28
Q-Q plot for GEV distribution
Fig18: Goodness-of-fit plot for GEV distribution fitted to data where desired speed meets the assumption
Linear jam distance
Maximum acceleration Desired deceleration
Safety time headway
Desired speed
Non-linear jam distance
29. 29
Q-Q plot for Weibull distribution
Fig19: Goodness-of-fit plot for Weibull distribution fitted to data where safe time headway meets assumption
Linear jam distance
Maximum acceleration Desired deceleration
Safety time headway
Desired speed
Non-linear jam distance
30. 30
Q-Q plot for Gamma distribution
Fig20: Goodness-of-fit plot for Gamma distribution fitted to data
Linear jam distance
Maximum acceleration Desired deceleration
Safety time headway
Desired speed
Non-linear jam distance
31. 31
Q-Q plot for Generalized Pareto distribution
Fig21: Goodness-of-fit plot for Generalised Pareto distribution fitted to data where desired speed meets the assumption
Linear jam distance
Maximum acceleration Desired deceleration
Safety time headway
Desired speed
Non-linear jam distance