Diese Präsentation wurde erfolgreich gemeldet.

# Input analysis

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Chapter7
×

1 von 48 Anzeige

# Input analysis

Modelling Simulation and Analysis

Modelling Simulation and Analysis

Anzeige
Anzeige

## Weitere Verwandte Inhalte

Anzeige

Anzeige

### Input analysis

1. 1. Input Analysis SUBJECT :- Modelling Simulation and Analysis :TE504 FACULTY GUIDE :- Prof. Dr. L.B. Zala PREPARED BY:- Mansi C. Rupani (17TS808) Bhavik A. Shah (17TS809) CIVIL ENGG. DEPARTMENT BIRLA VISHVAKARMA MAHAVIDYALAYA ENGG. COLLEGE VALLABH VIDYANAGAR-388120 M.TECH - TRANSPORTATION ENGINEERING K. Salah
2. 2. 2 Driving simulation models Stochastic simulation models use random variables to represent inputs such as inter-arrival times, service times, probabilities of storms, proportion of ATM customers making a deposit. We need to know the distribution family and parameters of each of these random variables. Some methods to do this: •Collect real data, and feed this data to the simulation model. This is called trace- driven simulation. •Collect real data, build an empirical distribution of the data, and sample from this distribution in the simulation model. •Collect real data, fit a theoretical distribution to the data, and sample from that distribution in the simulation model. We will examine the last two of these methods.
3. 3. 3 Why Is This an Issue? Service Times (in Days) 0 50 100 150 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Days Number The above graph shows service times (LOS) for a hospital in Ontario. How would you model a patient’s length of stay? N = 546
4. 4. 4 Option 1: Trace Patient LOS 1 4 2 1 3 7 4 6 5 31 6 1 7 4 8 2 9 6 10 5 11 1 12 1 13 6 14 1 15 2 We could attempt a trace. In this scheme, we would hold this list of numbers in a file. When we generate the first patient, we would assign him or her an LOS of 4; the 2nd 1; the 3rd 7; … Traces have the advantage of being simple and reproducing the observed behaviour exactly. Traces don’t allow us to generate values outside of our sample. Our sample is usually of a limited size, meaning that we may not have observed the system in all states.
5. 5. 5 Option 2: Empirical Distribution LOS Count Frequency Cum. Freq 0 0 0.000 0.000 1 139 0.255 0.255 2 120 0.220 0.474 3 69 0.126 0.601 4 48 0.088 0.689 5 30 0.055 0.744 6 25 0.046 0.789 7 21 0.038 0.828 8 12 0.022 0.850 9 11 0.020 0.870 10 8 0.015 0.885 11 9 0.016 0.901 12 10 0.018 0.919 13 7 0.013 0.932 14 0 0.000 0.932 15 1 0.002 0.934 16 2 0.004 0.938 17 4 0.007 0.945 18 2 0.004 0.949 19 3 0.005 0.954 20 12 0.022 0.976 21 5 0.009 0.985 22 0 0.000 0.985 23 2 0.004 0.989 24 0 0.000 0.989 25 2 0.004 0.993 26 0 0.000 0.993 27 1 0.002 0.995 28 2 0.004 0.998 29 1 0.002 1.000 30 0 0.000 1.000 546 A second idea might be to use an empirical distribution to model LOS. We will use the LOS and cumulative frequency as an input to our model. For instance, assume we pick a random number (say .62) as F(x). This corresponds to x of between 3 and 4 (~3.3 by interpolation). The LOS will represent x, and the cumulative frequency represents F(x). Empirical distributions may however, be based on a small sample size and may have irregularities. The empirical distribution cannot generate an x value outside of the range [lowest observed value, highest observed value].
6. 6. 6 Option 3: Fitted Distribution 31 intervals of w idth 1 betw een 1 and 31 1 - Logarithmic Series 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Frequency-Comparison Plot Interval Midpoint Proportion 1.50 5.50 9.50 13.50 17.50 21.50 25.50 29.50
7. 7. 7 Why Theoretical Distributions? • Theoretical distributions “smooth out” the irregularities that may be present in trace and empirical distributions. • Gives the simulation the ability to generate wider range of values. – Test extreme conditions. • There may be a compelling reason to use a theoretical distribution. • Theoretical distributions are a compact way to represent very large datasets. Easy to change, very practical.
8. 8. 8 Describing Distributions A probability distribution is described by its family and its parameters. The choice of family is usually made by examining the density function, because this tends to have a unique shape for each distribution family. Distribution parameters are of three types: location parameter(s) γ scale parameter(s) β shape parameter(s) α x f(x) γ1 γ3γ2 x f(x) β 1 β 2 β 3 x α 1 α 2 α 3 f(x)
9. 9. 9 Examples: continuous distributions x f(x) ba Uniform distribution (a is the location parameter; (b-a) is the shape parameter) ab − 1     ≤≤ −= otherwise bxa abxf 0 1 )( 0 ( ) 0 x a x a F x a x b b a x b <  − = ≤ ≤ −  > [ ] ( ) 2 : , : 2 : 12 Range a b a b Mean b a Variance + − Uses: 1st model in which only a, b are known. Essential for generation of other distributions.
10. 10. 10 Examples: continuous distributions x f(x) Exponential distribution (one scale parameter) β 1 2 0 01)( 0 0 1 )( σµβ β β β ==      ≥−=      ≥= − − otherwise xexF otherwise xexf x x [ ] 2 : 0, : : Range Mean Variance β β ∞ Uses: Inter-arrival times. Notes: Special case of Weibull and Gamma (α = 1 , β = β) If X1, X2, …, Xm are independent expo(β) then X1+ X2+ …+ Xm is distributed as an m-Erlang or gamma(m, β)
11. 11. 11 Examples: continuous distributions ... Gamma distribution (one scale parameter,one shape parameter) ( ) ( ) ( )1 0 1 ! 0( ) 0 1 0 0 Note: F(x) has no closed form solution if is not a positive integer. j j x x x j x e xf x otherwise F x e x otherwise α α α β β β β α α − = − − − −   >=  Γ    ∑ =  − >   x α=1 α=2 α=3 f(x) β=1 [ ] ( ) ( ) 2 1 0 : 0, : : Γ α = α-1 ! if αisa positiveinteger otherwiset Range Mean Variance t e dtα αβ αβ ∞ − − ∞ ∫ Uses: Task completion time. Notes: For positive integer (m) gamma (m, β) is an m-erlang. If X1, X2, …, Xn are independent gamma(αi,β) then X1+ X2+ …+ Xn is distributed as an gamma (α1 + α2 +…+ αn, β)
12. 12. 12 Examples: continuous distributions ... Weibull distribution (one scale parameter,one shape parameter) ( ) 1 0( ) 0 1 0 0 x x x e xf x otherwise e xF x otherwise α α βα α β αβ   −  − −     −      >=     − >=   x α=1 α=2 f(x) β=1 [ ] 22 : 0, 1 : 2 1 1 : 2 Range Mean Variance β α α β α α α α ∞   Γ           Γ − Γ             Uses: Task completion time; equipment failure time Notes: The expo(β) and the Weibull(1, β) are the same distribution.
13. 13. 13 Examples: continuous distributions ... Normal distribution (one scale parameter,one shape parameter) ( ) ( ) 2 2 2 1 ( ) 2 . x f x e F x No closed form µ σ πσ − − =   = x f(x) [ ] 2 2 : , : : Range Mean or Variance a or β µ σ −∞ ∞ Uses: Errors from a set point. Quantities that are the sum of a large number of other quantities Notes:
14. 14. 14 Examples: discrete distribution Bernoulli distribution 1 0 ( ) 1 0 0 0 ( ) 1 0 1 1 1 p if x f x p if x otherwise if x F x p if x if x − =  = =   <  = − ≤ <  ≥ x p(x) p 1-p 0 1 { } ( ) : 0,1 : : 1 Range Mean p Variance p p− Uses: Outcome of an experiment that either succeeds or fails. Notes: If X1, X2, …, Xt are independent Bernoulli trials, then X1+X2+ … + Xt is binomially distributed with parameters (t, p).
15. 15. 15 Examples: discrete distribution Binomial Distribution ( ) { } ( ) 0 1 if x 1,2,..., ( ) 0 otherwise 0 0 ( ) 1 0 1 t px x t ii i t p p t f x x if x t F x p p if x t x if x t − − =   − ∈ =     <    = − ≤ ≤      > ∑ x p(x) p 0 1 { } ( ) : 0,1,..., : : 1 Range t Mean tp Variance tp p− Uses: Number of defectives in a batch of size t. Notes: If X1, X2, …, Xm are independent and distributed bin(ti,p), then X1+X2+ … + Xm is binomially distributed with parameters (t1 + t2+ …+ tm, p). The binomial distribution is symmetric only if p = 0.5. 2 3 4 5 t = 5 p = 0.5
16. 16. 16 • Poisson • Pareto
17. 17. 17 Selecting an Input Distribution - Family • The 1st step in any input distribution fit procedure is to hypothesize a family (i.e. exponential). • Prior knowledge about the distribution and its use in the simulation can provide useful clues. – Normal shouldn’t be used for service times, since negative values can be returned. • Mostly, we will use heuristics to settle on a distribution family.
18. 18. Summary Statistics Function Summary Statistic Dist Type Comments Minimum, Maximum X(1), X(n) C, D Estimates range Mean µ ( )X n C, D Central tendency Median x0.5 ( ) 1 2 1 2 2 ˆ 1 2 n n n X if n is odd x n X X if n is even +          +           =    +      C, D Central tendency Variance σ2 ( )2 s n C, D Measure of variability Coeffient of Variation (cv) =σ/µ ( ) ( ) 2 s n X n C Measure of variability cv = 1 implies exponential If cv > 1, then lognormal is a better model than gamma or weibull. Lexus ratio τ= σ2 /µ ( ) ( ) 2 s n X n D Measure of variability τ= 1 implies Poisson τ< 1 implies Binomial τ> 1 implies Negative Binomial Skewness v = ( ) ( ) 3 2 3 2 E X µ σ − ( ) ( )( ) ( )( ) 3 2 3 1 2 1 ˆ n i i X X n n v n S n = − = ∑ C, D Measure of symmetry v = 0 for normal distributions v > 0 for right skewed distribution v < - for left skewed distributions 17
19. 19. 19 Summary Example Consider the following sample CV suggests NOT exponential Conclusion: Gamma? Weibull? Beta? Sample Value Sample Value 1 0.0065 16 0.0826 2 0.0118 17 0.0964 3 0.0135 18 0.1010 4 0.0241 19 0.1083 Min: 0.0065 5 0.0281 20 0.1258 Max: 0.2714 6 0.0285 21 0.1319 Mean: 0.1035 7 0.0347 22 0.1458 Median: 0.0739 8 0.0372 23 0.1884 Variance: 0.0066 9 0.0380 24 0.1930 CV: 0.7875 10 0.0428 25 0.1958 Lexus: 0.0642 11 0.0496 26 0.1985 Skewness: 0.7190 12 0.0521 27 0.2017 13 0.0544 28 0.2517 14 0.0650 29 0.2616 15 0.0652 30 0.2714 Skew suggests a left skewed family. Continuous data
20. 20. 20 Draw a Histogram Histogram 0 2 4 6 8 10 12 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 Value Count • Law & Kelton suggest drawing a histogram. • Use the plot to “eyeball” the family. • Law and Kelton suggest trying several plots with varying bar width. • Pick a bar width such that the plot isn’t too boxy, nor too scraggly.
21. 21. 21 Histogram Example – I Histogram (Bin Width = .025) 0 1 2 3 4 5 6 7 8 0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 0.550 0.600 Value Count
22. 22. 22 Histogram Example – II Histogram (Bin Width = .05) 0 2 4 6 8 10 12 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 Value Count
23. 23. 23 Histogram Example – III Histogram (Bin Width = .1) 0 2 4 6 8 10 12 14 16 18 0.00 0.10 0.20 0.30 0.40 0.50 0.60 Value Count
24. 24. 24 Sturge’s Rule Histogram (Bin Width = .06) 0 2 4 6 8 10 12 14 0.060 0.120 0.180 0.240 0.300 Value Count Select k (# bins): Int(1 + 3.322log10n), where n = number of samples. We will guess at a gamma distribution.
25. 25. 25 Parameter Estimation • To estimate parameters for our distribution, we use the Maximum Likelihood Estimators (MLE’s) for the selected distribution. • For a gamma we’ll use the following approximation. Calculate T: ( )( ) ( ) 1 1 1 2.64 1 ln(.1035) ( 2.6534) ln ln n i i T X n X n = = = = − − − ∑ Use table 6.20 to obtain α. α = 1.464 Calculate B: ( ) .1035ˆ .069 ˆ 1.464 X n β α = = =
26. 26. 26 A Note on MLEs • Maximum likelihood estimator (MLE) • Method for determining parameters of a hypothesized distribution. • We assume that we have collected n IID observations. ( ) ( ) ( ) ( )1 2 ... nL p X p X p Xθ θ θθ = We define the likelihood function: The MLE is simply the value of theta that maximizes L(θ) over all values of θ. In simple terms, we want to pick an MLE that will give us a good estimate of the underlying parameter of interest.
27. 27. 27 MLE for an Exponential Distribution ( ) ( ) ( ) ( )1 2 ... nL p X p X p Xθ θ θθ = 1 2 1 1 1 1 1... n n i i X X X X n e e e e L β β β β β β β β β                            = − − − − −       − − − ∑ = = We want to estimate ß: So, the problem is to maximize the rhs of the above equation. To make things simpler we’ll maximize ln(L(ß)) 1 1ln n i i XMax n β β =       −− ∑ Take the 1st derivative and set = 0. Solve for ß ( ) 2 2 1 1 1 . . 1 1 1 0 n i i n i i n i i i e X X X X n n n β β β β β = = = + = = = − ∑ ∑ ∑ In general the MLEs are difficult to calculate for most distributions. See Chapter 6 of Law and Kelton.
28. 28. 28 Determining the “Goodness” of the Model • As you might imagine, determine whether our hypothesize model is “good” is fraught with difficulties. • Law and Kelton suggest both heuristic and analytical tests to determine “goodness”. • Heuristic tests: – Density/Histogram over plots. – Frequency comparisons. – Distribution function difference plots. – Probability-Probability Plots (P-P plots). • Analytical tests: – Chi-Squared tests. – Kolmogorov-Smirnov tests
29. 29. 29 Frequency Comparisons Frequency Comparison 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.06 0.12 0.18 0.24 0.3 x hj,rj(%) Data Dist F(bj-1) - F(bj) x Data Dist 0 0 0 0.060 0.433 0.385 0.120 0.633 0.688 0.180 0.767 0.850 0.240 0.900 0.931 0.300 1.000 0.968 Cum. Distn Functions
30. 30. 30 Distribution Function Difference Plot Plot the CDF of the hypothesized distribution – the CDF as observed in the data. If the fit is perfect, this plot should be a straight line at 0. L&K suggest the difference should be less than .10 for all points. Distribution Function Difference Plot -0.2 -0.1 0 0.1 0.2 0 0.1 0.2 0.3 0.4 0.5 x F'(x)-Fn(x)
31. 31. 31 P-P Plot P-P Plot 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fn(x) - Data F(X)-Distribution Note: We are plotting F(x) vs. F
32. 32. 32 Q-Q Plot L&K note that Q-Q plots tend to emphasize errors in the tail. Our fitted distribution doesn’t look appropriate in the upper tail. Q Q plot 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.0000 0.0500 0.1000 0.1500 0.2000 0.2500 0.3000 0.3500 X(i)s F-1((i-.5)/n)
33. 33. 33 Analytic Test – Chi Square Test 1. Divide the range into k adjacent intervals. 2. Nj = # of Xi’s in jth interval. 3. Determine the proportion of Xi’s that should fit into the jth interval. For continuous data equal-probability approach is recommended. Pj’s are set to be equal values For continuous R.V.; ( ) ( ) 1 1 ˆ ˆ ˆ j j j a a j j n np n f x dx n F a F a − −          = = = − ∫ For discrete R.V.; Pj; Total probability that the random var. will take values in jth interval = P(X=x| x>=aj-1, x<=aj)
34. 34. 34 Analytic Test – Chi Squared Test 1. The critical test statistic is: ( ) 2 2 0 i 1 i where O is the observed num of samples in theith interval E is theobserved num of samples in theith interval j i i i i O E E= − Χ =∑ 2. A number of authors suggest that the intervals be grouped such that Ei is >= 5 in all cases. 3. H0: X conforms to the assumed distribution. H1: X does not conform to the assumed distribution. Reject if X0 2 > X2 k-1,1-α
35. 35. 35 Example k=5 Pj=1/k=.2 Cumulative a(j) s Observed Frequency Exp. Frequency (O-E)^2/E 0.2 0.033599 6 6 0 0.4 0.063509 7 6 0.166667 0.6 0.101155 5 6 0.166667 0.8 0.160816 4 6 0.666667 0.999999 infinity 8 6 0.666667 SUM 1.666667 X critical (5-1,.95) 9.48 accepted
36. 36. 36 Example 6.15 T.2Tablefrom204.27.1for . 042.)20/21ln(399.0 020.)20/11ln(399.0 19,..,2,1)20/1ln(399.0 issatisfy thmuste-1/)(ˆ on)distributiunboundedisl(Expoentia,0 ok)arewe5(95.10219*05.0 interval.eachfor 05.0/1sointervals20wantWe20Set 0.399withlexponentiaisondistributiFitted 6.7)Tableinis(Data219 01.01,120 2 1 399.0 - 200 == =−−= =−−= =−−= == ∞== >=== === = = −−XueCritcalval a a jja akjaF aa np kpk n j j a j j j j α β
37. 37. 37 Example 6.15 aj’s Observed Expected k # of intervals Less than the critical value
38. 38. 38 Example 6.16 T.2Tablefrom605.4 0.1forvalueCritical 6.13in tablegivenintervals3withupendWe valuean thisgreater thbemusts' dist.)fittedtheof(Mode0.3460)P(x intervalsintodatathegroupingbyequalroughlys'themaketry toWe intervalsyprobabilitequalexactlymakenotcanWe eom(0.346)isondistributiFitted size)Demand6.9Tableinis(Data156 01.01,13 = = == = −−X p p g n j j α
39. 39. 39 Example 6.16 Determine the intervals so that Pj’s are more or less equal Less than the critical value
40. 40. 40 P value Alpha Critical valueTest statistic calculated Chi-square density with k-1 d.f. P value; Cumulative Prb. to the right of the test stat. if P value > alpha Don’t reject Ho
41. 41. 41 The Kolomogorov-Smirnov (K-S) Test The K-S test compares an empirical distribution function(F^(x)) with a hypothesized distribution function (Fn(x)). The K-S test is somewhat stronger than the Chi-Square test. This test doesn’t require that we aggregate any of our samples into a minimum batch size. We define a test statistic Dn: ( ) ( ) ^ supn n x D F x F x   = −    The largest value over all x
42. 42. 42 K-S Example Let’s say we had collected 10 samples (0.09, 0.23, 0.24, 0.26, 0.36, 0.38, 0.55, 0.62, 0.65, and 0.76) and have developed an emprical cdf. We want to test the hypothesis that our sample is drawn from a U(0,1) distribution f(x) = 1 0 <= x <= 1 F(x) = x 0 <= x <= 1 Where H0 : Fn(x) = F^ (x) H1 : Fn(x) <> F^ (x)
43. 43. 43 K-S Example 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2 F^ (x) Fn(x)
44. 44. 44 K-S Example 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2 Dn is simply the largest of the gaps between F^ (x) and Fn(x). Remember – we need to find the Dn - and Dn + at every discontinuity
45. 45. 45 K-S Example Sample x F^(x) (Fit Distribution) Fn(x) (Empirical) Dn - Dn + 0 0 0 0 1 0.09 0.09 0.1 0.09 0.01 2 0.23 0.23 0.2 0.13 0.03 3 0.24 0.24 0.3 0.04 0.06 4 0.26 0.26 0.4 0.04 0.14 5 0.36 0.36 0.5 0.04 0.14 6 0.38 0.38 0.6 0.12 0.22 7 0.55 0.55 0.7 0.05 0.15 8 0.62 0.62 0.8 0.08 0.18 9 0.65 0.65 0.9 0.15 0.25 10 0.76 0.76 1.0 0.14 0.24 Our Dn is the largest of the Dn - , Dn + columns. In this case Dn = 0.25.
46. 46. 46 K-S Test • The value of Dnwhen appropriately adjusted (see next slide), is found to be < the critical point (1.224) for α = 0.10. • Thus we cannot reject H0.
47. 47. 47 Adjusted Critical Values • Law and Kelton present critical values for the K-S tables that are non- standard. • They use a compressed table based on work from Stevens (1962). Case Mulitply Dn by All parameters in F^(x) known Normal Distn Expo Distn 0.11 0.12 nn D n   + + ÷   0.85 0.01 nn D n   − + ÷   0.2 0.5 0.26nD n n n   − + + ÷ ÷    0.11 0.12 0.11 10 0.12 .25 (3.32)(.25) 0.83 10 nn D n   + +      + + = =    For tables of critical values see pgs 365-367 of L&K
48. 48. 48 THANK YOU FOr BeAriNg. Mansi C. Rupani (17TS808) Bhavik A. Shah (17TS809)