SlideShare ist ein Scribd-Unternehmen logo
1 von 134
EN505 Engineering Statistics
                                Fernando Tovia, Ph.D.

             1   RANDOM VARIABLES AND PROBABILITY


1.1 Random Variables
Definition 1.1 A random experiment is an experiment such that the outcome cannot be
predicted in advance with absolute precision.
Definition 1.2 The set of all possible outcomes of a random experiment is called the
sample space. The sample space is denoted by Ω. An element of the sample space is
denoted by ω.
Example 1.1 Construct the sample space for each of the following random experiments:
1. flip a coin
2. toss a die
3. flip a coin twice

Definition 1.3 A subset of Ω is called an event. Events are denoted by italicized, capital
letters.
Example 1.2 Consider the random experiment consisting of tossing a die. Describe the
following events.
         1. A = the event that 2 appears
       2. B = the event that an even number appears
       3. C = the event that an odd number appears
       4. D = the event that a number appears
       5. E = the event that no number appears
The particular set that we are interested in depends on the problem being considered.
However, a good thing to do when beginning any probability modeling problem is to
clearly define all the events of interest.
One graphical method of describing events defined on a sample space is the Venn
diagram. The representation of an event using a Venn diagram is given in Figure 1.1 Note
that the rectangle corresponds to the sample space, and the shaded region corresponds to
the event of interest.




                          Figure 1.1 Venn Diagram for Event A



                                                                                             1
Definition 1.4 Let A and B be two event defined on a sample space Ω. A is a subset of B,
denoted by A ⊂ B, if an only if (iff), ∀ ω ∈ A, ω ∈ B. (Figure 1.2)




                          Figure 1.2 Venn Diagram for A ⊂ B
Definition 1.5 Let A be an event defined on a sample space Ω. ω ∈ Ac iff ω ∉ A. Ac is
called the complement of A. (Figure 1.3)




                            Figure 1.3 Venn Diagram for Ac
Definition 1.6 Let A and B be two events defined on the sample space Ω. ω ∈ A ∪ B iff
ω ∈ A or ω ∈ B (or both). A ∪ B is called the union of A and B (see Figure 1.4).



                               Figure 1.4 Venn Diagram for A ∪ B

Let {A1, A2, …} be a collection of events defined on a sample space.
       ∞
ω ∈ U Aj
       j =1

iff ∃ some j = 1, 2, … ∋ ω ∈ Aj
∞

UA
j =1
       j   is called the union of {A1, A2, …}

Definition 1.7 Let A and B be two events defined on the sample space Ω. ω ∈ A ∩ B iff
ω ∈ A and ω ∈ B. A ∩ B is called the intersection of A and B (see Figure 1.5).




                              Figure 1.5 Venn Diagram for
Let {A1, A2, …} be a collection of events defined on a sample space.
       ∞
ω ∈ I Aj
       j =1

iff ω ∈ A ∀ j = 1, 2, …
∞

IA
j =1
       j   is called the intersection of {A1, A2, …}

Example 1.3 (example 1.2 continued)
   1. Bc = C


                                                                                        2
2. B ∪ C = D

      3. A ∩ B = B
Theorem 1.1 Properties of Complements
Let A be an event defined on a sample space Ω. Then

(a)
(b)



Theorem 1.2 Properties of the Unions
Let A, B, C be events defined on a sample space Ω. Then
(a)
(b)
(c)
(d)
(e)
Example Prove Theorem 1.2 (c)




Theorem 1.3 Properties of the Intersection
Let A, B, and C be events defined on the sample space Ω. Then
(a)
(b)
(c)
(d)
(e)

Example 1.6 Prove theorem 1.3 (b)




                                                                3
Theorem 1.4 Distribution of Union and Intersection
Let A, B and C be events defined in the sample space Ω. Then
       (a) A ∩ (B ∪ C) = (A ∩ B) ∪ (A∩C)
       (b) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
Theorem 1.5 DeMorgan’s Law
Let A, B and C be events defined in the sample space Ω. Then
       (a) (A ∪ B)c = Ac ∩ Bc
       (b) (A ∩ B) c = Ac ∪ Bc
Definition 1.8 Let A and B be two events defined in the sample space Ω. A and B are said
to be mutually exclusive or disjoint iff A ∩ B = Ø (Figure 1.6). A collection of events
{A1, A2, … }, defined on a sample space Ω, is said to be disjoint iff every pair of events
in the collection is mutually exclusive.




                 Figure 1.6 Venn Diagram for Mutually Exclusive Events
Definition 1.9 A collection of events {A1, A2, …, An} defined on a sample space Ω, is
said to be a partition (Figure 1.7) of Ω iff
        (a) the collection is disjoint
             n

       (b)   UA     j   =Ω
             j =1




                      Figure 1.7 Venn Diagram for a Partition
Example 1.7 (Example 1.2 continued) Using the defined event, identify:
     (a) a set of mutually exclusive events



       (b) a partition of the sample space




                                                                                         4
Definition 1.10 A collection of events, F, defined on a sample space Ω, is said to be a
field iff
         (a) Ω ∈ F,
           (b) if A Ω ∈ F, then Ac ∈ F, then
            n

           U A ∈ F,
           j =1
                  j


We use fields to represent all the events that we are interested in study. To construct a
field:
       1. we start with Ω
           2. Ø is inserted by implication (Definition 1.10 (b)
           3. we then add the events of interest
           4. we then add complements and unions



Example 1.8 Suppose we perform a random experiment which consists of observing the
type of shirt worn by the next person entering a room. Suppose we are interested in the
following events.
        L = the shirt that has long sleeves
        S = the shirt has short sleeves
        N = the shirt has no sleeves
Assuming that {L, S, N} is a partition of Ω, construct an appropriate field



Theorem 1.6 Intersection are in Fields
Let F be a field of events defined in the sample space Ω. Then if A1, A2, … , An ∈ F,
then
 n

IA ∈ F
j =1
       j


Example 1.9 Prove that if A, B ∈ F, then A ∩ B ∈ F.




                                                                                            5
Any meaningful expression containing events of interest, ∪ , ∩, and c can be shown to be
       in the field.
       Definition 1.11 Consider a set of elements, such as S = {a, b, c}. A permutation of the
       elements is an ordered sequence of elements. The number of permutations of n different
       elements is n! where
                                   n! = n x (n-1) x (n-2) x …x 2 x 1
       Example 1.10 List all the permutations of the elements S


       Definition 1.12 The number of permutations of subsets of r elements selected from a set
       of n different elements is

       Another counting problem of interest is the number of subsets of r elements that can be
       selected from a set of n elements. Here the order is not important, and are called
       combinations.
       Definition 1.13 The number of combinations, subsets of size r that can be selected from a
       set of n elements, is denoted as



       Example 1.11 The EN505 class has 13 students. If teams of 2 students can be selected,
       how many different teams are possible?




1.2 Probability
       Probability is used to quantify the likelihood, or chance, that an outcome of a random
       experiment will occur.


       Definition 1.14 A random variable is a real-valued function defined on a sample space.
       Random variables are typically denoted by italicized capital letters. Specific values taken
       on by a random variable are typically denoted by italicized, lower-case letters.

       Definition 1.15 A random variable that can take on a countable number of values is said
       to be a discrete random variable.

       Definition 1.16 A random variable that can take on an uncountable number of values is
       said to be a continuous random variable.




                                                                                                 6
Definition 1.17 The set of possible values for a random variable is referred as a range of
a random variable.
Example 1.12 Consider the following experiments of random variables, define the
random variable, identify the range for each random variable, and classify it as discrete or
continuous.
       1. flip a coin




       2. toss a die until a 6 appears




       3. quality inspection of a shipment of manufactured items.




       4. arrival of customer to a bank




Definition 1.18 Let Ω be the random space for some random experiment. For any event
defined on Ω, Pr(·) is a function which assigns a number to the event. Pr(A) is called the
probability of event A provided the following conditions hold:

(a)

(b)

(c)




Probability is used to quantify the likelihood, or chance, that an event will occur within
the sample space.


                                                                                             7
Whenever a sample consists of N possible outcomes , the probability of each outcome is
1/N

Theorem 1.7 Probability Computational Rules
Let A and B events defined on a sample space Ω, and let {A1, A2, …, An} be a collection
of events defined on Ω. Then

(a)

(b)

(c)

(d)

(e)


(f)



Corollary 1.1 Union of Three or More Events
Let A, B, C and D be events defined on a sample space Ω. Then,

Pr( A ∪ B ∪ C ) = Pr( A) + Pr( B ) + Pr(C ) − Pr( A ∩ B ) − Pr( A ∩ C ) − Pr( B ∩ C ) + Pr( A ∩ B ∩ C )
and

Pr( A ∪ B ∪ C ∪ D ) = Pr( A) + Pr( B) + Pr(C ) + Pr( D) − Pr( A ∩ B) − Pr( A ∩ C ) − Pr( A ∩ D ) −
− Pr( B ∩ C ) − Pr( B ∩ D) − Pr(C ∩ D) + Pr( A ∩ B ∩ C ) + Pr( A ∩ B ∩ D) + Pr( B ∩ C ∩ D) +
− Pr( A ∩ B ∩ C ∩ D )

Example 1.11 Let A, B and C be events defined on a sample space Ω ∋
Pr(A) = 0.30
Pr(Bc) = 0.60
Pr(C) = 0.20
Pr(A ∪ B) = 0.50
Pr( B ∩ C ) = 0.05
A and C are mutually exclusive

Compute the following probabilities

      (a) Pr(B)




                                                                                                   8
(b) Pr( B ∪ C ) =


   (c) Pr( A ∩ B )




   (d) Pr( A ∪ C )




   (e) Pr( A ∩ C )



   (f) Pr( B ∩ C c )




   (g) Pr( A ∪ B ∪ C ) =




1.3 Independence
Two events are independent if any one of the following equivalent statements is true



                                                                                       9
(1) P(A|B) = P(A)

   (2) P(B|A) = P(B)

   (3) P ( A ∩ B ) = P ( A) P ( B )


   Example 2.29 (book work in class)




1.4 Conditional Probability
Definition 1.19. Let A and B events define on a sample space Ω ∋ B ≠ Ø. We refer to
Pr(A|B) as the conditional probability of event given the occurrence of event B, where




Pr( AC | B) =
probability of not A given B


                                                                                     10
note that Pr(A|Bc) ≠ 1-Pr(A|B)

Example 1.12
A semiconductor manufacturing facility is controlled in a manner such that 2% of
manufactured chips are subjected to high levels of contamination. If a chip is subjected to
high levels of contamination, there is a 12% chance that it will fail testing. What is the
probability that a chip is subjected to high levels of contamination and fails upon tests?

c=

f=

Pr(High c level) =

Pr(Fail | high c level) =


Pr(F∩C) =



Example 1.13
An air quality test is designed to detect the presence of two molecules (molecule 1 and
molecule 2). 17% of all samples contain both molecules, and 48% of all samples contain
molecule 1. If a sample contains molecule 1, what is the probability that it also contains
molecule 2?

M1 = molecule 1
M2 = molecule 2

Pr(M1∩M2) =

Pr(M1) =

Pr(M2|M1) =



Theorem 1.8 Properties of Conditional Probability

Let A and B be non-empty event defined on a sample space Ω. Then

     (a) If A and B are mutually exclusive then Pr(A|B) = 0

     (b) If A ⊂ B then Pr(A|B)>=Pr(A)



                                                                                        11
(c) If B ⊂ A then Pr(A|B) =1

Theorem 1.9 Law of Total Probability – Part 1

Let A and B be events defined on a sample space Ω ∋ A ≠ Ø, B ≠ Ø, Bc ≠ Ø. Then




Example 1.15
A certain machine’s performance can be characterized by the quality of a key component.
94% of machines with a defective key component will fail. Whereas only 1% of
machines with a non-defective key component will fail. 4% of machines have a defective
key component. What is the probability that the machine will fail?

F = fail
D = defective

Pr(D) =

Pr(F|D) =

Pr(F|Dc) =

Pr(F) =



Theorem 1.11 Bayes’ Theorem – Part 1

Let A and B be events defined on a sample space Ω ∋ A ≠ Ø, B ≠ Ø, Bc ≠ Ø. Then




Example 1.15 (Example 1.14 continues)
 Suppose the machine fails. What is the probability that the key component was
defective?

Pr(D|F) =




                                                                                    12
Theorem 1.12 Law of Total Probability – Part 2

Let A be a non-empty event defined on a sample space Ω, and let {B1, B2, …, Bn}be a
partition of Ω ∋ Bj ≠ Ø ∀ j =1, 2, …, n. Then


Pr( A) =



Theorem 1.13 Bayes’ Theorem – Part 2

Let A be a non-empty event defined on a sample space Ω, and let {B1, B2, …, Bn}be a
partition of Ω ∋ Bj ≠ Ø ∀ j =1, 2, …, n. Then


                 Pr( A | B j ) Pr( B j )        Pr( A | B j ) Pr( B j )
Pr( B j | A) =                             =    n
                        Pr( A)
                                               ∑ Pr( A | B ) Pr( B )
                                               i =1
                                                             i        i




                                                                                      13
14
2      DISCRETE RANDOM VARIABLES AND PROBABILITY

                                        DISTRIBUTIONS

A discrete random is a random variable that can take on at most countable number of
values.

Definition 2.1 Let X be a discrete random variable having cumulative distribution
function F. Let x1, x2 , ... denote the possible values of X. Then f(x) is the probability
mass function (pmf) of X, if

a) f(x) = P(X = x)
b) f(xj) > 0, j = 1, 2, …
c) f(x) = 0, if x = xj, j = 1, 2,
     ∞

d)   ∑ f (x ) = 1
     j =1
                j



Definition 2.2 The cumulative distribution function of a discrete random variable X, is
denoted by F(X), and is given by

F(X) =       ∑ f ( x)
             xj ≤x


And satisfies the following properties

a) F(X) = P(X ≤ x) =     ∑ f (x )
                         xj ≤x
                                    j



b) 0 ≤ F( X ) ≤ 1
c) if x ≤ y then F( X ) ≤ F( Y )


Example 2.1 Suppose X is a discrete random variable having pmf f and cmf F, where
f(1) = 0.1, f(2) = 0.4, f(3) = 0.2, f(4) = 0.3.

1. Construct the cumulative distribution function of X.




                                                                                             15
2. Compute Pr(X ≤ 2).




3. Compute Pr(X < 4).



4. Compute Pr(X ≥ 2).

Definition 2.3 The mean or expected value of X, denoted as µ or E(X), is


                                   µ = E ( X ) = ∑ xf ( x)
                                                     x
The variance of X, denoted by Var (X) and given by


                              σ 2 = V ( X ) = E ( x − µ )2 = ∑ ( x − µ )2 f ( x) = ∑ x 2 f ( x) − µ 2
                                                               x                    x

The standard deviation of X is σ = σ 2

Definition 2.4 Let X be a discrete random variable with a probability mass function f(x).
The expected value of X is denoted by E(X) and given by
                                               ∞
                                    E( X ) = ∑ x j f ( x j )
                                              j =1




2.1 Discrete Distributions

2.1.1   Discrete Uniform Distribution
Suppose a random experiment has a finite set of equally likely outcomes. If X is a random
variable such that there is one-to-one correspondence between the outcomes and the set
of integers {a, a + 1, … , b}, then X is a discrete uniform random variable having
parameters a and b.
Notation



Range


Probability Mass Function



                                                                                                  16
Parameters




Mean



Variance.


Example 2.2 Let X ~ DU(1, 6).

        1. Compute Pr(X = 2).




        2. Compute Pr(X > 4)




2.1.2   The Bernoulli Random Variable
Consider a random experiment that either “succeeds” or “fails”. If the probability of
success is p, and we let X = 0 if the experiment fails and X = 1 if it succeeds, then X is a
Bernoulli random variable with probability p. Such a random experiment is referred to
as a Bernoulli trial.

Notation


Range


Probability Mass Function



Parameter




                                                                                          17
Mean




Variance



2.1.3   The Binomial Distribution
The binomial distribution denotes the number of success in n independent Bernoulli
trials with probability p of success on each trial.
Notation



Range



Probability Mass Function




Cumulative Distribution Function



Parameters




Mean



Variance



Comments             if n = 1, then X ~ Bernoulli(p)



                                                                                     18
Example 2.3 Each sample of air has a 10% chance of containing a particular rare
molecule.

1. Find the probability that in the next 18 samples, exactly 2 contains the rare molecule.




2. Determine the probability that at least four samples contain the rare molecule




3. Determine the probability that there will be at least one sample with the rare molecule
but less than four?




2.1.4   The Negative Binomial Random Variable

               The negative binomial random variable denotes the number of trials
               until the kth success in a sequence of independent Bernoulli trials with
               probability p of success on each trial.

Notation

Range


Probability Mass Function




                                                                                         19
Cumulative Distribution Function




Parameters


Mean


Variance


Example 2.4 A high-performance aircraft contains three identical computers. Only one is
used to operate the aircraft; the other two are spares that can be activated in case the
primary system fails. During one hour of operation, the probability of a failure in the
primary computer is 0.0005.

        1. Assuming that each hour represents an independent trial, what is the mean time
to failure of all the three components?




       3. What is the probability that all three computers fail within a 5-hour flight?




                                                                                          20
Comments: If k = 1, then X ~ geom(p), i.e. X is a geometric random variable having a
probability of success p


2.1.5   The Geometric Distribution
In a series of independent Bernoulli trials, with constant probability p of a success, let the
random variable X denote the number of trials until the first success. Then X has a
geometric distribution.

Notation


Range



Probability Mass Function




Cumulative Distribution Function




Parameters



Mean



Variance




Example 2.3 Consider a sequence of independent Bernoulli trials with a probability of
success p = 0.2 for each trial.

(a) What is the expected number of trials to obtain the first success?




                                                                                           21
(b) After the eight successes occurs, what is the expected number of trials to obtain
       the ninth success?



2.1.6   The Hypergeometric Random Variable
Consider a population consisting of N members, K of which are denoted as successes.
Consider a random experiment during which n members are selected random from the
population, and let X denote the number of successes in the random sample. If the
members in the sample are selected from the population without replacement, then X is
a hypergeometric random variable having parameters N, k and n.

Notation




Range




Probability mass function




Parameters




                                                                                           22
Comments       If the sample is taken from the population with replacement, then
               X~bin(n,K/N). Therefore, if n<<N, we can use the approximation bin(n,K/
               N) ≈ HG(N, K, n).
Example 2.4 Suppose a shipment of 5000 batteries is received, 150 of them being
defective. A sample of 100 is taken from the shipment without replacement. Let X denote
the number of defective batteries in the sample.

1. What kind of random variable is X, and what is the range of X?




2. Compute Pr(X = 5).




3. Approximate Pr(X = 5) using the binomial approximation to the hypergeometric.




2.1.7   The Poisson Random Variable
The Poisson random variable denotes the number of events that occurs in an interval of
length t when events occur at a constant average rate λ.
Notation




                                                                                         23
Probability Mass Function




Cumulative Distribution Function




Parameters




Comments
The Poisson random variable X equals the number of counts in the time interval t. The
counts in each subinterval is independent of other subinterval.

If np>5 or (1-p) > 5, then we can use the approximation, Poisson ≈ bin (n, p).


Mean



Variance




It is important to use consistent units in the calculations of probabilities, means, and
variances involving Poisson random variable.

Example 2.5 Contamination is a problem in the manufacture of optical storage disks. The
number of particles of contamination that occur on an optical disk has a Poisson




                                                                                           24
distribution, and the average number of particles per centimeter squared of media surface
is 0.1. The area of a disk under study is 100 squared centimeters.

a) Find the probability that 12 particles occur in the area of a disk under study.




b) Find the probability that zero particles occur in the area of the disk under the study.




c) Find the probability that 12 or fewer particles occur in the area of a disk under study.




2.1.8   Poisson Process
Up to this time in the course, we have discussed the assignment of probabilities to events
and random variables, and by manipulating these probabilities we can analyze “snap-
shots” of systems behavior at certain point in the time, or under certain conditions. Now,
we are going to study one of the most commonly recognized continuous-time stochastic
processes that allow us to study important aspects of systems behavior over a time
interval t.

Definition 2.5 Let {N(t), t≥ 0} be a counting process. Then, {N(t), t≥ 0}is said to be a
Poisson process having rate λ, λ > 0, iff

               a.
                    start counting from zero



                                                                                             25
b.
                    The number of outcomes occurring in one time or (specific region) is
                    independent of the number that occurs in any other disjoint time
                    interval or region space, which can be interpreted that the Poisson
                    process has no memory.

               c. the number of events in any interval (s, s + t) is a Poisson random
                  variable with mean λt.

               d. The probability that more that more than one outcome will occur at the
                  same time is negligible.

This is denoted by N(t) ~ PP(λ), which λ refers to the average rate at which events occur.
The part (c) of the definition implies that

               1)




               2)

               4)


Note that in order to be a Poisson proves the average event occurrence MUST BE
CONSTANT over time, otherwise the Poisson process would be an inappropriate model.
Also note, that t can be interpreted as the specific “time”, “distance”, “area”, “volume” of
interest.

Example 2.6 Customers arrive to a facility according to a Poisson process with rate λ =
120 customer per hour. Suppose we begin observing the facility at some point in time.

   a) What is the probability that 8 customers arrive during a 5-minute interval?




   b) On average, how many customers will arrive during a 3.2-minute interval?



                                                                                          26
c) What is the probability that more than 2 customers arrive during a 1-minute
   interval?




d) What is the probability that 4 customers arrive during the interval that begins 3.3
   minutes after we start observing and ends 6.7 minutes after we start observing?




e) On average, how many customers will arrive during the interval that begins 16
   minutes after we start observing and ends 17.8 minutes after we start observing?




f) What is the probability that 7 customers arrive during the first 12.2 minutes we
   observe, given that 5 customers arrive during the first 8 minutes?




                                                                                      27
g) If 3 customers arrive during the first 1.2 minutes of our observation period, on
   average, how many customers will arrive during the first 3.7 minutes?




h) If 1 customer arrives during the first 6 seconds or our observations, what is the
   probability that 2 customers arrive during the interval that begins 12 seconds after
   we observing and ends 30 seconds after we start observing?




i) If 5 customers arrive during the first 30 seconds of our observations, on average,
   how many customers will arrive during the interval that begins 1 minute after we
   start observing and ends 3 minutes after we start observing?




j) If 3 customers arrive during the interval that starts 1 minute after we start
   observing and ends 2.2 minutes after we start observing, on average, how many
   customers will arrive during the first 3.7 minutes?



                                                                                      28
Example 2.7 (Binomial approximation)
In a manufacturing process where glass products are produced, defects or bubbles occur,
occasionally rendering the piece undesirable for marketing. It is known that, on average,
1 in every 1000 of these items produced has one or more bubbles. What is the probability
that a random sample of 8000 will yield fewer than 7 items possessing bubbles?




                                                                                       29
3   CONTINUOUS RANDOM VARIABLES AND PROBABILITY

                                   DISTRIBUTIONS

As stated earlier, a continuous random variable is a random variable that can take on an
uncountable number of values.
Definition 3.1 The probability density function of a continuous random variable X is a
nonnegative function f(x) defined ∀ real x ∋ for any set A of real numbers




Theorem 3.1 Integral of a Density Function
The function f is a density function iff




All probability computations for a continuous random variable can be answered using the
density function.
Theorem3.2 Probability Computational Rules for Continuous Random Variables
Let X be a continuous random variable having cumulative distribution function F and
probability density function f. Then

(a)



(b)




(c)




                                                                                       30
(d)




(e)




The mean or expected value of X, denoted as µ or E(X), is




The variance of X, denoted by Var (X) and given by




Example 3.1 Consider a continuous random variable X having the following density
function where c is a constant
           c(1 − x 2 ) 0 ≤ x ≤ 1
 f ( x) = 
          0           otherwise
          1. What is the value of c?




       2. Construct the cumulative distribution function of X.




                                                                                   31
3. Compute Pr(0.2< X ≤ 0.8) =




        4. Compute Pr(X >0.5) =




Part (d) of Theorem 1.3.2 states that the probability density function is the derivative of
the cumulative distribution function. Altough this is true, it does not provide adequate
intuition as to the interpretation of the density function. For a discrete random variable,
the probability mass function actually assigned probabilities to the possible values of the
random variables. Theorem 1.3.2 (b) states that the probability of any specific value for a
continuous random variable is 0. The probability density function is not the probability of
a specific value. It is, however, the relative likelihood ((as compared to other possible
values) that the random variable will be near a certain value.
Continuous random variables are typically specified in terms of the form of their
probability density functions. In addition, some continuous random variables have been
widely-used in probability modeling. We will consider some of these more commonly-
used random variables, including:
        1. the uniform random variable,
        2. the exponential random variable,
        3. the gamma random variable,
        4. the Weibull random variable,
        5. the normal random variable,
        6. the lognormal random variable,
        7. the beta random variable,

3.1 The Uniform Continuous Random Variable
Notation

Range

Probability Density Function




                                                                                        32
Cumulative Distribution Function




Parameters




Mean



Variance



Comments        As its name implies, the uniform random variable is used to represent
quantities that occur randomly over some interval of the real line.
An observation of a U(0,1) random variable is referred to as a random number.
Example 3.2 Verify that the equation for the cumulative distribution of the uniform
random variable is correct.




Example 3.3 The magnitude (measured in N) of a load applied to a steel beam is
believed to be a U(2000, 5000) random variable. What is the probability that the load
exceeds 4200 N?




                                                                                        33
3.2 The exponential Random
The random variable X that equal the distance (time) between successive counts of a
Poisson process with mean λ (rate, events per time unit, i.e. arrivals per hour, failures per
day, etc.) has an exponential distribution with parameter λ.

Notation

Range

Probability Density Function




Cumulative Distribution Function




Parameters




Mean




Variance




Comments λ is called the rate of the exponential distribution.




                                                                                           34
Example 3.4 In a large computer network, user log-ons to the system can be modeled as
a Poisson process with a mean of 25 logs-on per hour. What is the probability that there
are no log-ons in an interval of 6 minutes?




What is the probability that the time until the next log-on is between 2 and 3 minutes?
Upon converting all units to hours,




Determine the interval of time that the probability that no log-on occurs in the interval is
0.90. Te question asks for the length of time x such that Pr(X>x) = 0.90.




What is the mean time until the next log-on?




What is the standard deviation of the time until the next log-on?




                                                                                          35
Theorem 3.3 The Memoryless Property of the Exponential Distribution
Let X be a continuous random variable. The X is an exponential random variable iff




Theorem 3.4 The Conditional Form of the Memoryless Prperty
Let X be a continuous random variable. Then X is an exponential random variable iff




Furthermore, no other continuous random variable possesses this property.There are
several implications of the memoryless property of the exponential random variable.
            First, if the exponential random variable is used to model the lifetime of a
               device, then at every point in time until it fails, the device is as good as
               new (from a probabilistic standpoint).
              If the exponential random variable is used to model an arrival time, then at
               every point in time until the arrival occurs, it is as just began “waiting” for
               the arrival.
Example 3.5 Suppose that the life length of a component is an exponential random
variable with rate 0.0001. Note that time units are hours. Determine the following.
        a) What is the probability that the component lasts more than 2000 hours?




       b) Given that the component lasts at least 1000 hours, what is the probability that
           it lasts more than 2000 hours?




                                                                                           36
Theorem 3.5 Expectation under the Memoryless Property
      Let X be an exponential random variable. Then




Example 3.6 (Example 3.5 continued)
     a) Given that the component lasts at least 1000 hours, what is the expected value
           of its life length?




        b) Given that the component has survived 1000 hours, on average, how much
           longer will it survive?




3.3 The Normal Distribution
Notation

Range


Probability Density Function




Cumulative Distribution Function no closed form expression
Parameters




                                                                                   37
Mean



Variance



Comments



Standard Normal Random Variable
If µ = 0 and σ = 1, then X is referred as the standard normal random variable. The
standard random normal variable is often denoted by Z.


The cumulative distribution of the standard normal random variable is denoted as

                                     Φ ( z ) = Pr( Z ≤ z )

Appendix A Table I provides cumulative probabilities for a standard random variable.
For example, assume that Z is a standard normal random variable. Appendix A Table I
provides probabilities of the form Pr(Z≤ 1.53). Find in the column z 1.5 and find in the
row 0.03, then Pr(Z≤ 1.53) = 0.93699.




The same value can be obtained in Excel, type function icon (fx), statistical,
NORMSDIST(z), enter 1.53, and Excel will give you the result in the cell
=NORMSDIST(1.53) = 0.936992
The function



Is denoted a probability from Appendix A Table I. It is the cumulative distribution
function of a standard normal random variable. (see figure 4-13 page 124 from the
Montgomery book).

Example 3.7 (Example 4-12Montgomery)

Some useful results concerning a normal distribution are summarized in Fig 4-1413.
(textbook). For any random variable




                                                                                           38
1)




2)




3)




4)




If X ~ N(µ,σ2), then (X -µ)/ σ ~N(0,1), which is known as the z-transformation. That is, Z
is a standard normal random variable.
Suppose X is a normal random variable with mean µ and standard deviation σ. Then,




                                                                                        39
Example 3.8 One key characteristic of a certain type of drive shaft is its diameter, and
the diameter is a normal distributed random variable having µ = 5 cm and σ = 0.08 cm.
a)        What is the probability that the diameter of a given drive shaft is between 4.9
          and 5.05 cm?




b)        What diameter is exceeded by 90% of drive shafts?




c) Provide tolerances, symmetric about the mean, that capture 99% of drive shafts.




                                                                                      40
Example 3.9
The diameter of a shaft in an optical storage drive is normally distributed with mean
0.2508 inch and standard deviation 0.0005 inch. The specifications on the shaft are
±0.0015 inch. What proportion of shafts conforms to specifications?




3.3.1   Normal Approximation to the Binomial and Poisson Distributions
Binomial Approximation

If X is a binomial random variable with parameter n and p




Is approximately a standard random variable. To approximate a binomial probability with
a normal distribution a correction (continuity) factor is apllied.



                                                                                        41
The approximation is good for np > 5 and n(1-p) > 5.

Poisson Approximation
If X is a Poisson random variable with E(X) = λ and V(X) = λ,




is approximately a standard random variable. The approximation is good for λ > 5.


Example 3.10
The manufacturing of semiconductor chips produces 2% defective chips. Assume that
chips are independent and that a lot contains 1000 chips.

a) Approximate the probability that more than 25 chips are defective.




b) Approximate the probability that between 20 and 30 chips are defective




                                                                                    42
3.4 Lognormal Distribution
Variables in the system sometimes follow an exponential relationship, where the
exponent is a random variable, say W, X = exp(W). If W has a normal distribution then
the distribution of X is called a lognormal distribution.
Notation


Range




Probability density function




Cumulative Distribution Function          no closed form expression
Parameters




Comments              If Y ~ N(µ, σ2) and X = eY, then X ~ LN((µ, σ2)
                              The lognormal random variable is often used to represent
                              elapsed times, especially equipment repair times, and
                              material properties.


Mean



Variance


                                                                                         43
Example 3.11 A wood floor system can be evaluated in one way by measuring its
modulus of elasticity (MOE) measured in 106 psi. One particular type of system is such
that its MOE is a lognormal random variable having µ = 0.375 and σ = 0.25.

       1. What is the probability that a system’s MOE is less than 2?




       2. Find the value of MOE that is exceeded by only 1% of the systems?




3.5 The Weibull Distribution
The Weibull distribution is often used to model the time until failure of many different
physical systems. It is used in Reliability time-dependent failures models, where the
failure distribution may be used to model both increasing and decreasing failure rates.
Notation




                                                                                           44
Range




Probability Density Function




Cumulative Distribution Function



Parameters



Mean



Variance




Comments              If β = 1, then X ~ expon(1/η)
                              The Weibull random variable is most often used to
                              represent elapsed time, especially time to failure of a unit
                              of equipment.


Example 3.12 The time to failure of a power supply is a Weibull random variable having
β = 2.0 and η = 1000.0 hours. The manufacturer sells a warranty such that only 5% of the
power supplies fail before the warranty expires. What is the time period of the warranty?




                                                                                             45
4    JOINT PROBABILITY DISTRIBUTIONS

Up to this point we have considered issues related to a single random variable. Now we
are going to consider situations in which we have two or more random variables that we
are interested in studying.

4.1 Two or more discrete random variables
Definition 4.1 The function f(x, y) is a joint probability distribution or probability
mass function of discrete random variables X and Y if
1.


2.



3.



Example 4.1 Let X denote the number of times a certain numerical control machine will
malfunction: 1, 2 or 3 times on a given day. Let Y denote the number of times a
technician is called on an emergency call. Their joint probability distribution is given as

                     f (x, y )                           x
                                                  1             2       3
                                       1       0.05          0.05     0.1
                        y              2       0.05           0.1    0.35
                                       3          0           0.2     0.1

a) Find P(X<3, Y = 1)




b) Find the probability that the technician is called at least 2 times and the machine fails
no more than 1 time.




                                                                                           46
c) Find P(X>Y)




When studying joint probability distribution we are also interested in the probability
distributions of each variable individually, which is referred as the marginal probability
distribution.

Theorem 4.1 Let X and Y be discrete random variables having joint probability mass
functions f(x, y). Let x1 , x2 ,... denote the possible values of X, and let y1 , y2 ,... denote the
possible values of Y. Let f x ( x) denote the marginal probability mass function of X, and
let f y ( y ) denote the (marginal) probability mass function of Y. Then,




                                                                                                   47
Example 4.2 Let X and Y be discrete random variables such that
f(1, 1) = 1/9 f(1, 2) = 1/6 f(1, 3) = 1/8
f(2, 1) = 1/18 f(2, 2) = 1/9 f(2, 3) = 1/9
f(3, 1) = 1/9 f(3, 2) = 1/9 f(3, 3) = 1/6

Find the marginal probability mass function of X and Y




                                                                 48
Definition 4.2 The function f(x, y) is a joint probability density function of continuous
random variables X and Y if
1.


2.



3.




Example 4.3 A candy company distributes boxes of chocolates with a mixture of creams,
toffees, and nuts coated in both light and dark chocolates. For a randomly selected box,
let X and Y, respectively, be the proportions of the light and dark chocolates that are
creams and suppose that joint density function is

             2                                  
              (2 x + 3 y ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f ( x, y ) =  5                                 
             0,
                                    elsewhere  

a) verify condition 2




                                                                                       49
c)         Find P[( X , Y ∈ A], where A={(x,y),|0 ≤ x ≤ 1/2, 1/4 ≤ y ≤ 1/2}




Theorem 4.2 Marginal Probability Density Function
Let X and Y be continuous random variables having probability density function f(x, y).
le. Let f x ( x) denote the marginal probability density function of X, and let f y ( y ) denote
the (marginal) probability density function of Y. Then,




                                                                                               50
Example 4.4 Let X and Y be continuous random variables such that

       f(x, y) = 0.75 e-0.3y

Find the marginal probability density function of X and Y.




      Theorem 4.3 The Law of the Unconscious Statistician
Let X and Y be discrete (continuous) random variables having joint probability mass
(density) function f(x,y). Let x1, x2, … denote the possible values of X, and let y1, y2, …
denote the possible values of Y. Let g(X,Y) be a real-valued function. Then




                                                                                              51
Example 4.5 Suppose X and Y are discrete random variables having joint probability
mass function f(x,y). Let x1, x2, … denote the possible values of X, and let y1, y2, … denote
the possible values of Y. What is E(X+Y)?




                                                                                          52
Theorem 4.4 Expectation of a Sum of Random Variables
Let X1, X2, …, Xn be random variables, and let a1, a2, …an be constants. Then




Example 4.6 What is the E(3X – 2Y + 4)?




Theorem 4.5 Independent Discrete Random Variables Let X and Y be random
variables having joint probability mass function f(x, y). Let fx(x) denote the marginal
probability mass function of X, and let fy(y) denote the marginal probability mass
function of Y. Then X and Y are said to be independent iff




                                                                                          53
Theorem 4.6 Independent Continuous Random Variables Let X and Y be random
variables having joint probability density function f(x, y). Let fx(x) denote the marginal
probability density function of X, and let fy(y) denote the marginal probability density
function of Y. Then X and Y are said to be independent iff




Example 4.6 Consider example 4.2. Are X and Y independents?




Example 4.7 Consider example 4.4. Are X and Y independent?




Definition 4.3 Let X and Y be random variables. The covariance of X and Y is denoted as
Cov(X,Y) and given by




                                                                                             54
A positive covariance indicates that X tends to increase (decrease) as Y increases
(decreases). A negative covariance indicates that X tends to decrease (increase) as Y
increases (decreases).


Example 4.8 Example 4.2 continued. Find the covariance of X and Y.




Theorem 4.7 Covariance of Independent Random Variables
Let X and Y be random variables. If X and Y are independent, then

Cov(X, Y) = 0.


Theorem 4.8 Variance of the Sum of Random Variables
Let X1, X2, … , XN be random variables. Then




Theorem 4.9 Variance of the Sum of Independent Random Variables
Let X1, X2, … , XN be independent random variables. Then




                                                                                        55
Definition 4.4 Let X and Y be two random variables. The correlation between X and Y is
denoted by ρxy and given by




Note that correlation and covariance have the same interpretation regarding the
relationship between the two variables. However, correlation does not have units and is
restricted to the range (-1, 1). Therefore, the magnitude of the correlation provides some
idea of the strength of the relationship between the two random variables.




                                                                                         56
5    RANDOM SAMPLES, STATISTICS AND THE CENTRAL

                                        LIMIT THEOREM

Definition 5.1 Independent random variables X1, X2, … ,Xn are called a random sample.
A randomly selected sample means that if a sample of n objects is selected, each subset
of size n is equally likely to be selected. If the number of objects in the population is
much larger than n, the random variables X1, X2, … ,Xn that represents the observations
from the sample can be shown to be approximately independent random variables with
the same distribution.

Definition 5.2 A statistic is a function of the random variables in a random sample.
Given the data, we calculate statistics all the time, such as the sample mean X and the
sample standard deviation S. Each statistic has a distribution and it is the distribution that
determines how well it estimates a quantity such as μ.

We begin our discussions by focusing on a single random variable, X. To perform any
meaningful statistical analysis regarding X, we must have data.

Let X be some random variable of interest. A random sample on X consists of n
observations on X: x1, x2, … , xn. We assume that these observations are independent and
identically distributed. The value of n is referred to as the sample size.


Definition 5.3 Descriptive statistics refers to the process of collecting data on a random
variable and computing meaningful quantities (statistics) that characterize the underlying
probability distribution of the random variable.

There are three points of interest regarding this definition.
   • Performing any type of statistical analysis requires that we collect data on one or
       more random variables.
   • A statistic is nothing more than a numerical quantity computed using collected
       data.
   • If we knew the probability distribution which governed the random variable of
       interest, collecting data would be unnecessary.

Types of Descriptive Statistics

       1. measures of central tendency
           • sample mean (sample average)
           • sample median
           • sample mode (discrete random variables only)




                                                                                            57
2. measures of variability
           • sample range
           • sample variance
           • sample standard deviation
           • sample quartiles

     Microsoft Excel has a Descriptive Statistics tool within its Data Analysis
     ToolPak.
Computing the Sample Mean




       • Most of your calculators have a built-in method for entering data and computing
          the sample mean.
       • Note the sample mean is a point estimate of the true mean of X. In other
       words,




Computing the Sample Median

       To compute the sample median, we first rank the data in ascending order and re-
       number it: x(1), x(2), …. , x(n).

       The sample median corresponds to the value that has 50% of the data above it and
       50% of the data below it.




Computing the Sample Mode

       The sample mode is the most frequently occurring value in the sample. It is
       typically only of interest in sample data from a discrete random variable, because
       sample data on a continuous random variable often does not have any repeated
       values.




                                                                                       58
Compute the Sample Range




Computing the Sample Variance




        • Why do we divide by n − 1? We divide by n − 1 because we have n − 1 degrees
           of freedom. This refers to the fact that if we know the sample mean and n − 1
           of the data values, we can compute the remaining data point.
        • Note that the sample variance is a point estimate of the true variance. In
           other words,




Computing the Sample Standard Deviation




• Note that the sample standard deviation is a point estimate of the true standard
deviation.

Theorem 5.1. If X1, X2, … ,Xn is a random sample of size n taken from a population with
mean μ and variance σ2, μ and variance σ2, and if X is the sample mean, the limiting form
of the distribution of




                                                                                      59
as n→∞, is the standard normal distribution.

       5.1    Populations and Random Samples
       The field of statistical inference consists of those methods used to draw
conclusions about a population. These methods utilize the information contained in a
random sample of observations from the population.


       Statistical inference may be divided into two major areas:

       • parameter estimation
       • hypothesis testing

Both of these areas require a random sample of observations from one or more
populations, therefore, we will begin our discussions by addressing the concepts of
random sampling.

   Definition 5.4 A population consists of the totality of the observations with which
we are concerned.

   •   We almost always use a random variable/probability distribution to model the
       behavior of a population.

   Definition 5.5 The number of observations in the population is called the size of the
population.

   •   Populations may be finite or infinite. However, we can typically assume the
       population is infinite.
   •   In some cases, a population is conceptual. For example, the population of items
       to be manufactured is a conceptual population.

   Definition 5.6 A sample is a subset of observations selected from a population.

   •   We model these observations using random variables.
   •   If our inferences are to be statistically valid, then the sample must be
       representative of the entire population. In other words, we want to ensure that we
       take a random sample.




                                                                                         60
Definition 5.7 The random variables X1, X2, … , Xn are a random sample of size n
if X1, X2, … , Xn are independent and identically distributed.

   •   After the data has been collected, the numerical values of the observations are
       denoted as x1, x2, … , xn.
   •   The next step in statistical inference is to use the collected data to compute one or
       more statistics of interest.


   5.2 Point Estimates
                               ˆ
   Definition 5.8 A statistic, Θ , is any function of the observations in a random sample.

   •   In parameter estimation, statistics are used to estimate quantities of interest.
   •
   •   The measures of central tendency and variability we considered in “Descriptive
       Statistics” are all statistics.
   •
   Definition 5.9 A point estimate of some population parameter θ is a single
                    ˆ               ˆ
   numerical value θ of a statistic Θ .

   •

Estimation problems occur frequently in engineering. The quantities that we will focus
on are:

   •   the mean µ of a population
   •   the standard deviation σ of a population
   •   the proportion p of items in a population that belong to a class of interest – p is the
       probability of success for a Bernoulli trial
The point estimates that we use are:

   •
   •
   •




                                                                                           61
5.3 Sampling Distributions
    A statistic is a function of the observations in the random sample. These observations
are random variables, therefore, the statistic itself is a random variable. All random
variables have probability distributions.

       Definition 5.10 The probability distribution of a statistic is called a sampling
distribution.

       •   The sampling distribution of a statistic depends on the probability distribution
           which governs the entire population, the size of the random sample, and the
           method of sample selection.

        Theorem 5.3 The Sampling Distribution of the Mean
        If X1, X2, … , Xn are IID N(µ,σ2) random variables, then the sample mean is a
normal random variable having mean         and variance
  .
Thus, if we are sampling from a normal population then the sampling distribution of the
mean is normal. But what if we are not sampling from a normal population?

       Theorem 5.4 The Central Limit Theorem

       If X1, X2, … , Xn is a random sample of size n taken from a population with mean
µ and variance σ2, then as n → ∞,




is a standard normal random variable.

       •   The quality of the normal approximation depends on the true probability
           distribution governing the population and the sample size.
       •   For most cases of practical interest, n ≥ 30 ensures a relatively good
           approximation.
       •   If n < 30, then the underlying probability distribution must not be severely
           non-normal.

       Example 5.1 A plastics company produces cylindrical tubes for various industrial
applications. One of their production processes is such that the diameter of a tube is
normally distributed with a mean of 1 inch and a standard deviation of 0.02 inch.


       (a)    What is the probability that a single tube has a diameter of more than
       1.015 inches?



                                                                                          62
X = diameter of a tube (measured in inches) ~ N(                  )




       (b)    What is the probability that the average diameter of five tubes is more than
       1.015 inches?

       n=             X = average diameter ~ N(               )




       (c)    What is the probability that the average diameter of 25 tubes is more than
       1.015 inches?


       n=             X = average diameter ~ N(                     )




        Example 5.2 The life length of an electronic component, T, is exponentially
distributed with a mean of 10,000 hours.

       (a)    What is the probability that a single component lasts more than 7500
       hours?




       (b)    What is the probability that the average life length for 200 components is
       more than 9500 hours?

       E(T) =                    hours

       σT =             hours




                                                                                       63
Note that                    .




       (c)    What is the probability that the average life length for 10 components is
       more than 9500 hours?


       n is too small to use the CLT approximation


                       S 10
       Note that T =        .
                       10




       If we had tried to use the CLT:




        Now consider the case in which we are interested in studying two independent
populations. Let the first population have mean µ1 and standard deviation σ1, and let the
second population have mean µ2 and standard deviation σ2.
If we are interested in comparing the two means, then the obvious point estimate of
interest is

       µ1 − µ 2 = X 1 − X 2 .
          ˆ

       What is the sampling distribution of this statistic?


                                                                                          64
Theorem 5.4 The Sampling Distribution of the Difference in Two Means

        If we have two independent populations with means µ1 and µ2 and standard
deviations σ1 and σ2, and if a random sample of size n1 is taken from the first population
and a random sample of size n2 is taken from the second population, then the sampling
distribution of




is standard normal as n1 and n2 → ∞. If the two populations are normal, then the
sampling distribution of Z is exactly standard normal.

   •     Again, the approximation is relatively accurate if n1 ≥ 30 and n2 ≥ 30.

   Example 5.3 The life length of batteries produced by Battery Manufacturer A is a
continuous random variable having a mean of 1500 hours and a standard deviation of 100
hours. The life length of batteries produced by Battery Manufacturer B is a continuous
random variable having a mean of 1400 hours and a standard deviation of 200 hours.

   (a) Suppose 50 batteries of each type are tested. What is the probability that Battery
       Manufacturer A’s sample average life length exceeds Battery Manufacturer B’s
       by more than 75 hours?




   (b)      How would your answer change if only 12 batteries of each type were tested?

         There is not enough information to answer the question. If we assume normality,
         then we could proceed.


                                                                                         65
5.4 Confidence Intervals
       A point estimate provides only a single number for drawing conclusions about a
parameter. And if another random sample were selected, this point estimate would
almost certainly be different. In fact, this difference could be drastic.

       For this reason, a point estimate typically does not supply adequate information to
an engineer. In such cases, it may be possible and useful to construct a confidence
interval which expresses the degree of uncertainty associated with a point estimate.

       Definition 5.11 If θ is the parameter of interest, then the point estimate and
sampling distribution of θ can be used to identify a 100(1 − α )% confidence interval on
θ. This interval is of the form:




L and U are called the lower-confidence limit and upper-confidence limit. If L and U
are constructed properly, then




      .
   The quantity (1 − α) is called the confidence coefficient.

   •   The confidence coefficient is a measure of the accuracy of the confidence
       interval. For example, if a 90% confidence interval is constructed, then the
       probability that the true value of θ is contained in the interval is 0.9.

   •   The length of the confidence interval is a measure of the precision of the point
       estimate. A general rule of thumb is that increasing the sample size improves the
       precision of a point estimate.

Confidence intervals are closely related to hypothesis testing. Therefore, we will address
confidence intervals within the context of hypothesis testing.




                                                                                        66
6      FORMULATING STATISTICAL HYPOTHESES


For many engineering problems, a decision must be made as to whether a particular
statement about a population parameter is true or false. In other words, we must either
accept the statement as being true or reject the statement as being false.

Example 6.1 Consider the following statements regarding the population of engineering
students at the Philadelphia University.

   1. The average GPA is 3.0.
   2. The standard deviation of age is 5 years.
   3. 30% are afraid to fly
   4. The average age of mothers is the same as the average age of fathers.

Definition 6.1 A statistical hypothesis is a statement about the parameters of one or
more populations.

   •    It is worthwhile to note that a statistical hypothesis is a statement about the
        underlying probability distributions, not the sample data.

Example 6.2 (Ex. 6,1 continued) Convert each of the statements into a statistical
hypothesis.


   1.
   2.
   3.
   4.

To perform a test of hypotheses, we must have a contradictory statement about the
parameters of interest.

Example 6.3 Consider the following contradictory statements.

   1. No, it’s more than that.
   2. No, it’s not.
   3. No, it’s less than that.



                                                                                          67
4. No, fathers are older.

The result of our original statement and our contradictory statement is a set of two
hypotheses.

Example 6.4 (Ex. 6.1 continued) Combine the two statements for each of the examples.


       1.



       2.


       3.



       4.



Our original statement is referred to as the null hypothesis (H0).

   •   The value specified in the null hypothesis may be a previously established value
       (in which case we are trying to detect changes to that value), a theoretical value
       (in which case we are trying to verify the theory), or a design specification (in
       which case we are trying to determine if the specification has been met).

The contradictory statement (H1) is referred to as the alternative hypothesis.

   •   Note that an alternative hypothesis can be one-sided (1, 3, 4) or two-sided (2).
   •   The decision as to whether the alternative hypothesis should be one-sided or two-
       sided depends on the problem of interest.

Type I Error

Rejecting the null hypothesis H0 when it is true




                                                                                          68
For example, the true mean of example 2.4.1 is 3.0. However, for the randomly selected
sample we could observe that the test statistic x falls into the critical region. Therefore,
we could reject the null hypothesis in favor of the alternative hypothesis H1.

Type II Error
Failing to reject the null hypothesis when it is false.




       6.1 Performing a Hypothesis Test
       Definition 6.2 A procedure leading to a decision about a particular null and

alternative hypothesis is called a hypothesis test.



   •   Hypothesis testing involves the use of sample data on the population(s) of
       interest.
   •   If the sample data is consistent with a hypothesis, then we “accept” that
       hypothesis and conclude that the corresponding statement about the population is
       true.
   •   We “reject” the other hypothesis, and conclude that the corresponding statement
       is false. However, the truth or falsity of the statements can never be known with
       certainty, so we need to define our procedure so that we limit the probability of
       making an erroneous decision.
   •   The burden of proof is placed upon the alternative hypothesis.

Basic Hypothesis Testing Procedure

A random sample is collected on the population(s) of interest, a test statistic is computed
based on the sample data, and the test statistic is used to make the decision to either
accept (some people say “fail to reject”) or reject the null hypothesis.

Example 6.5 A manufactured product is used in such a way that its most important
dimension is its width. Let X denote the width of a manufactured product. Suppose
historical data suggests that X is a normal random variable having σ = 4 cm. However,
the mean can change due to fluctuations in the manufacturing process. Therefore, we
wish to perform the following hypothesis test.


       H0:
       H1:




                                                                                           69
The following procedure has been proposed.

Inspect a random sample of 25 products. Measure the width of each product. If the
sample mean is less than 188 cm or more than 192 cm, reject H0.

For the proposed procedure, identify the following:

(a)      sample size



(b)      test statistic



(c)      critical region




(d)      acceptance region




Is the procedure defined in Ex. 6.5 a good procedure? Since we are only taking a random
sample, we cannot guarantee that the results of the hypothesis test will lead to us making
the correct decision. Therefore, the question “Is this a good procedure?” can be broken
down into two additional questions.

      1. If the null hypothesis is true, what is the probability that we accept H0?
      2. If the null hypothesis is not true, what is the probability that we accept H0?


Example 6.6 (Ex. 6.5 continued) If the null hypothesis is true, what is the probability
that we accept H0?




                                                                                          70
note assumptions




Therefore, if the null hypothesis is true, then there is a 98.76% chance that we will make
the correct decision. However, that also means that there is a 1.24% chance that we will
make the incorrect decision (reject H0 when H0 is true).

    •   Such a mistake is called a Type I error, or a false positive.
    •   α = P(Type I error) = level of significance
    •   In our example, α = 0.0124. When constructing a hypothesis test, we get to
        specify α.

If the null hypothesis is not true (i.e. the alternative hypothesis is true), then accepting H0
would be a mistake.

    •   Accepting H0 when H0 is false is called a Type II error, or a false negative.

    •

    •


    •   Unfortunately, we can’t answer this question (find a value for β) in general. Since
        the alternative hypothesis is µ ≠ 190 cm, there are an uncountable number of
        situations in which the alternative hypothesis is true.
    •   We must identify specific situations of interest and analyze each one individually.



Example 6.7 (Ex. 6.5 continued) Find the probability of a Type II error when µ = 189 cm
and µ = 193 cm.

        For µ = 189 cm:




                                                                                             71
For µ = 193 cm:




        Note that as µ moves away from the hypothesized value (190 cm), β decreases.
If we experiment with other sample sizes and critical/acceptance regions, we will see that
the values of α and β can change significantly. However, there are some general “truths”
for hypothesis testing.

   1.   We can explicitly control α (given that the underlying assumptions are true).
   2.   Type I and Type II error are inversely related.
   3.   Increasing the sample size is the only way to simultaneously reduce α and β.
   4.   We can only control β for one specific situation.

Since we can explicitly control α, the probability of a Type I error, rejecting H0 is a
strong conclusion. However, we can only control Type II errors in a very limited
fashion. Therefore, accepting H0 is a weak conclusion. In fact, many statisticians use
the terminology, “fail to reject H0” as opposed to “accept H0.”

   •    Since “reject H0” is a strong conclusion, we should put the statement about which
        it is important to make a strong conclusion in the alternative hypothesis.

Example 6.8 How would the procedure change if we wished to perform the following
hypothesis test?

H0: µ ≥ 190 cm


                                                                                          72
H1: µ < 190 cm

Proposed hypothesis testing procedure: Inspect a random sample of 25 observations on
the width of a product. If the sample mean is less than 188 cm, reject H0.


        6.1.1   Generic Hypothesis Testing Procedure
All hypothesis have a common procedure. The textbook identifies eight steps in this
procedure.

   1. From the problem context and assumptions, identify the parameter of interest.
   2. State the null hypothesis, H0.
   3. Specify an appropriate alternative hypothesis, H1.
   4. Choose a significance level α.
   5. State an appropriate test statistic.
   6. State the critical region for that statistic.
   7. Collect a random sample of observations on the random variable (or from the
      population) of interest, and compute the test statistic.
   8. Compare the test statistic value to the critical region and decided whether or not to
      reject H0.



        6.2 Performing Hypothesis Tests on µ when σ is Known

In this section, we consider making inferences about the mean µ of a single population
where the population standard deviation σ is known.

   •    We will assume that a random sample X1, X2, … , Xn has been taken from the
        population.
   •    We will also assume that either the population is normal or the conditions of the
        Central Limit Theorem apply.




Suppose we wish to perform the following hypothesis test.




                                                                                         73
It is somewhat obvious that inferences regarding µ would be based on the value of the
sample mean. However, it is usually more convenient to standardize the sample mean.
Using what we know about the sampling distribution of the mean, it is reasonable to
conclude that the test statistic will be




If the null hypothesis is true, then the test statistic is a standard normal random variable.
Therefore, we only reject the null hypothesis if the value of Z0 is unusual for an
observation on a standard normal random variable.

Specifically, we reject H0 if:




where α is the specified level of significance. The acceptance region is therefore




Obviously, the acceptance and critical regions can be converted to expressions in terms of
the sample mean.

       Reject H0 if X > a or X < b where




                                                                                            74
Example 6.9 Let X denote the GPA of an engineering student at the Philadelphia
University. It is widely known that, for this population, σ = 0.5. The population mean is
not widely known, however, it is commonly believed that the average GPA is 3.0. We
wish to test this hypothesis using a sample of size 25 and a level of significance of 0.05.

(a)    Identify the null and alternative hypotheses.




(b)    List any required assumptions.




(c)    Identify the test statistic and the critical region.




       Reject H0 if



(d)    Suppose 25 students are sampled and the sample average GPA is 3.18. State and
       interpret the conclusion of the test.

       Z0 =




(e)    What is the probability of a Type I error for this test?




(f)    How would the results change if we had used α = 0.10?




                                                                                        75
Critical region changes.




We may also modify this procedure if the test is one-sided. This modification only
requires a change in the critical/acceptance regions. If the alternative hypothesis is




then a negative value of Z0 would not indicate a need to reject H0. Therefore, we
only reject H0 if



Likewise, if the alternative hypothesis is




Example 6.10 The Glass Bottle Company (GBC) manufactures brown glass beverage
containers that are sold to breweries. One of the key characteristics of these bottles is
their volume. GBC knows that the standard deviation of volume is 0.08 oz. They wish to
ensure that the mean volume is not more than 12.2 oz using a sample size of 30 and a
level of significance of 0.01.

(a)    Identify the null and alternative hypotheses.




                                                                                         76
(b)    Identify the test statistic and the critical region.




(c)    Suppose 30 bottles are measured and the sample mean is 12.23. State and
       interpret the conclusion of the test.




       6.2.1      Computing P-Values


       We have already seen that the choice of the value for the level of significance can
impact the conclusions derived form a test of hypotheses. As a result, we may be
interested in answering the question: How close did we come to making the opposite
conclusion? We answer this question using an equivalent decision approach that can be
used as an alternative to the critical/acceptance regions. This approach is called the P-
value approach.

Definition 6.3 The P-value for a hypothesis test is the smallest level of significance that
would lead to rejection of the null hypothesis.
How we compute the P-value depends on the form of the alternative hypothesis.




We reject H0 if


                                                                                          77
Example 6.11 (Ex. 6.9 continued) Compute the P-value for the test.




       Note when α = 0.05,

       But, when α = 0.10,

Example 612 (Ex. 6.10 continued) Compute the P-value for the test.




       Since α = 0.01, P > α



       6.2.2   Type II Error

In hypothesis testing, we get to specify the probability of a Type I error ( α). However,
the probability of a Type II error (β) depends on the choice of sample size (n).
Consider first the case in which the alternative hypothesis is H1: µ ≠ µ0.




Before we can proceed, we must be more specific about “H0 is false”.               We will
accomplish this by saying:




where δ ≠ 0.




                                                                                        78
               X − µ0                      
        β = P − Z α / 2 ≤
             
                                    ≤ Zα / 2 µ = µ 0 + δ 
                                                         
                            σ n                         
                             σ                      σ              
        β = P µ 0 − Z α / 2
                                ≤ X ≤ µ 0 + Zα / 2      µ = µ0 + δ 
                                                                    
                              n                      n             

                           σ                                σ              
              µ 0 − Zα / 2    − ( µ0 + δ )     µ 0 + Zα / 2    − (µ0 + δ ) 
        β = P                                                              
                             n                                n
                                            ≤Z≤
                           σ                                σ              
                                                                           
                            n                                n             




If the alternative hypothesis is H1: µ > µ0, then




       .

If the alternative hypothesis is H1: µ < µ0.




       .

Example 6.13 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at
the Philadelphia University. It is widely known that, for this population, σ = 0.5. The
population mean is not widely known, however, it is commonly believed that the average
GPA is 3.0. We wish to test this hypothesis using a sample of size 25 and a level of
significance of 0.05. In Example 16.5, we formulated this hypothesis test as




                                                                                    79
The corresponding test statistic and critical region are given by

                X − 3.0
        Z0 =
               0.5 25




(a)    If µ = 3.2, what is the Type II error probability for this test?

       δ = µ − µ0 =




        β = P( − 3.96 ≤ Z ≤ −0.04 ) = 0.4840

(b)    If µ = 2.68, what is the Type II error probability for this test?

       δ = µ − µ0 =




        β = P(1.24 ≤ Z ≤ 5.16 ) = 0.1075

(c)    If µ = 2.68, what is the power of the test?

       power =




                                                                           80
(d)    If µ = 3.32, what is the power of the test?

       power = 0.8925

Example 6.14 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures
brown glass beverage containers that are sold to breweries. One of the key characteristics
of these bottles is their volume. GBC knows that the standard deviation of volume is
0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a
sample size of 30 and a level of significance of 0.01. Example 16.6, we formulated this
hypothesis test as

       H0: µ ≤ 12.2
       H1: µ > 12.2

The corresponding test statistic and critical region are given by




       Reject H0 if

(a)    If µ = 12.27 oz, what is the Type II error probability for this test?

       δ = µ − µ0 = 0.07

                           0.07 30 
        β = P Z ≤ 2.3263 −          = P( Z ≤ −2.47 ) = 0.0068
                             0.08 
                                   


(b)    If µ = 12.15 oz, what is the Type II error probability for this test?

       This is a poor question. If µ = 12.15 oz, then “technically” the null hypothesis is
       true. If we are truly concerned with detecting this, we should have used a two-
       sided alternative hypothesis.




                                                                                       81
6.2.3     Choosing the Sample Size


The expressions for β allow the determination of an appropriate sample size. To choose
the proper sample size for our test, we must specify a value of β for a specified value of
δ.
For the case in which H1: µ ≠µ0, the symmetry of the test allows us to always specify a
positive value of δ. If we specify a relatively small value of β (≤ 0.1), then the lower side
of the equation becomes negligible. So, the equation for β reduces to:




This yields:

                                     δ n
        Z 1− β = − Z β = Z α / 2 −
                                      σ

        δ n
            = Zα / 2 + Z β
         σ




For both cases in which the alternative hypothesis is one-sided:




                                                                                          82
Example 6.14 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at
the Philadelphia University. It is widely known that, for this population, σ = 0.5. The
population mean is not widely known, however, it is commonly believed that the average
GPA is 3.0. We wish to test this hypothesis using a sample of size n and a level of
significance of 0.05. In Example 16.5, we formulated this hypothesis test as

       H0: µ = 3.0
       H1: µ ≠ 3.0

The corresponding test statistic and critical region are given by




(a)    If we want β = 0.10 at µ = 3.2, what sample size should we use?

       δ = 0.2




       n = 66

(b)    If we want β = 0.10 at µ = 3.25, what sample size should we use?

       δ = 0.25


        n=
           ( Z 0.025 + Z 0.10 ) 2 0.5 2 = (1.96 + 1.282) 2 0.5 2   = 42.04
                     0.25 2                      0.25 2

       n = 43

(c)    If we want β = 0.05 at µ = 3.2, what sample size should we use?

       δ = 0.2


        n=
           ( Z 0.025 + Z 0.05 ) 2 0.5 2 = (1.96 + 1.645) 2 0.52    = 81.2
                      0.2 2                      0.2 2


                                                                                    83
n = 82

Example 6.15 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures
brown glass beverage containers that are sold to breweries. One of the key characteristics
of these bottles is their volume. GBC knows that the standard deviation of volume is
0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a
sample size of n and a level of significance of 0.01. Example 16.6, we formulated this
hypothesis test as




The corresponding test statistic and critical region are given by




If we wish to have a test power of 0.95 at µ = 12.25 oz, what is the required sample size
for this test?



       6.2.4    Choosing the Sample Size


The expressions for β allow the determination of an appropriate sample size. To choose
the proper sample size for our test, we must specify a value of β for a specified value of
δ.
For the case in which H1: µ ≠µ0, the symmetry of the test allows us to always specify a
positive value of δ. If we specify a relatively small value of β (≤ 0.1), then the lower side
of the equation becomes negligible. So, the equation for β reduces to:


                            δ n
        β = P Z ≤ Z α / 2 −
             
                                
                             σ 
                                



                                                                                          84
This yields:

                                       δ n
        Z 1− β = − Z β = Z α / 2 −
                                        σ

        δ n
            = Zα / 2 + Z β
         σ

               (Z   α/2   + Zβ ) σ 2
                                   2

        n=
                          δ2

For both cases in which the alternative hypothesis is one-sided:

               (Z   α   + Zβ ) σ 2
                               2

        n=
                          δ2

Example 6.14 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at
the Philadelphia University. It is widely known that, for this population, σ = 0.5. The
population mean is not widely known, however, it is commonly believed that the average
GPA is 3.0. We wish to test this hypothesis using a sample of size n and a level of
significance of 0.05. In Example 16.5, we formulated this hypothesis test as

       H0: µ = 3.0
       H1: µ ≠ 3.0

The corresponding test statistic and critical region are given by

                X − 3.0
        Z0 =
                0.5 n

       Reject H0 if Z0 < −Zα/2 = −Z0.025 = −1.96 or if Z0 > Zα/2 = 1.96

(a)    If we want β = 0.10 at µ = 3.2, what sample size should we use?

       δ = 0.2


        n=
           ( Z 0.025 + Z 0.10 ) 2 0.5 2 = (1.96 + 1.282) 2 0.5 2   = 65.7
                           0.2 2                 0.2 2

       n = 66

(b)    If we want β = 0.10 at µ = 3.25, what sample size should we use?


                                                                                    85
δ = 0.25


        n=
           ( Z 0.025 + Z 0.10 ) 2 0.5 2 = (1.96 + 1.282) 2 0.5 2   = 42.04
                     0.25 2                      0.25 2

       n = 43

(c)    If we want β = 0.05 at µ = 3.2, what sample size should we use?

       δ = 0.2


        n=
           ( Z 0.025 + Z 0.05 ) 2 0.5 2 = (1.96 + 1.645) 2 0.52    = 81.2
                      0.2 2                      0.2 2

       n = 82

Example 6.15 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures
brown glass beverage containers that are sold to breweries. One of the key characteristics
of these bottles is their volume. GBC knows that the standard deviation of volume is
0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a
sample size of n and a level of significance of 0.01. Example 16.6, we formulated this
hypothesis test as

       H0: µ ≤ 12.2
       H1: µ > 12.2


The corresponding test statistic and critical region are given by

               X − 12.2
        Z0 =
               0.08 n

       Reject H0 if Z0 > Zα = Z0.01 = 2.3263

If we wish to have a test power of 0.95 at µ = 12.25 oz, what is the required sample size
for this test?

       δ = 0.05

       β = 0.05


        n=
             ( Z 0.01 + Z 0.05 ) 2 0.082 = ( 2.326 + 1.645) 2 0.082   = 40.4
                     0.05 2                        0.05 2


                                                                                        86
n = 41


        6.3 Statistical Significance
        A hypothesis test is a test for statistical significance. When we reject H 0, we are
stating that the data indicates a statistically significant difference between the true mean
and the hypothesized value of the mean. When we accept H 0, then we are stating that
there is not a statistically significant difference. Statistical difference and practical
significance are not the same. This is especially important to recognize when the sample
size is large.


        6.3.1    Introduction to Confidence Intervals


        As we have previously discussed, the sample mean is the most often used point
estimate for the population mean. However, we also pointed out that two different
samples would most likely result in two different sample means. Therefore, we define
confidence intervals as a means of quantifying the uncertainty in our point estimate.


If θ is the parameter of interest, then the point estimate and sampling distribution of θ can
be used to identify a 100(1 − α )% confidence interval on θ. This interval is of the
form:

                 L ≤ θ ≤ U.

L and U are called the lower-confidence limit and upper-confidence limit.

        If L and U are constructed properly, then

                 P(L ≤ θ ≤ U) = 1 − α.

The quantity (1 − α) is called the confidence coefficient. The confidence coefficient is a
measure of the accuracy of the confidence interval. For example, if a 90% confidence
interval is constructed, then the probability that the true value of θ is contained in the
interval is 0.9.




                                                                                          87
The length of the confidence interval is a measure of the precision of the point
estimate.     A general rule of thumb is that increasing the sample size improves the
precision of a point estimate.

          6.3.2   Confidence Interval on µ when σ is Known

          We can use what we have learned to construct a 100(1 − α )% confidence
interval on the mean, assuming that (a) the population standard deviation is known, and
(b) the population is normally distributed (or the conditions of the Central Limit Theorem
apply).

          P( − Z α / 2 ≤ Z ≤ Z α / 2 ) = 1 − α

                        X −µ          
          P − Z α / 2 ≤      ≤ Zα / 2  = 1−α
                        σ n           
                                      

                        σ                   σ 
          P X − Z α / 2
                           ≤ µ ≤ X + Zα / 2    = 1−α
                                               
                         n                   n

Such a confidence interval is called a two-sided confidence interval.       We can also
construct one-sided confidence intervals for the same set of assumptions (σ known,
normal population or Central Limit Theorem conditions apply).

The 100(1 − α)% upper-confidence interval is given by

                        σ 
          P µ ≤ X + Z α
                           = 1−α
                           
                         n

and the 100(1 − α)% lower-confidence interval is given by

                        σ 
          P µ ≥ X − Z α
                           = 1−α .
                           
                         n

Example 6.16 Let X denote the GPA of an engineering student at the Philadelphia
University. It is widely known that, for this population, σ = 0.5. The population mean is
not widely known, however, we have a collected a sample of size 25 from the population.
The resulting sample mean was 3.18.



                                                                                       88
(a)   What assumptions, if any, are required to use this data to construct a confidence
      interval on the mean GPA?

      GPA is normally distributed.

(b)   Construct a 95% confidence interval on µ and interpret its meaning.

                σ                 0.5
      X ± Z 0.025   = 3.18 ± 1.96
                 n                 25
      2.984 ≤ µ ≤ 3.376

      P( 2.984 ≤ µ ≤ 3.376 ) = 0.95

(c)   Construct a 99% confidence interval on µ and compare it to the confidence
      interval obtained in part (b).

                    σ                0.5
      X ± Z 0.005      = 3.18 ± 2.58
                     n                25

      2.922 ≤ µ ≤ 3.438

      more accurate, but less precise

(d)   Construct a 95% upper-confidence interval on µ and interpret its meaning.

                    σ                 0.5
      X + Z 0.05       = 3.18 + 1.645
                     n                 25

      µ ≤ 3.3445

      P( µ ≤ 3.3445) = 0.95

(e)   Construct a 95% lower-confidence interval on µ and interpret its meaning.

                    σ                 0.5
      X − Z 0.05       = 3.18 − 1.645
                     n                 25

      µ ≥ 3.0155

      P( µ ≥ 3.0155) = 0.95

(f)   Combine the two confidence intervals obtained in parts (d) and (e). Is this
      confidence interval superior to the one constructed in part (b)?


                                                                                    89
3.0155 ≤ µ ≤ 3.3445

        No, it is only a 90% confidence interval.


6.3.3   Choosing the Sample Size for a Confidence Interval on µ when σ is Known

        The percentage of a confidence interval is a measure of the accuracy of the
confidence interval.
        The half-width of the confidence interval, E, is a measure of the precision of the
confidence interval. For a two-sided confidence interval, E = (U – L)/2. For an upper-
confidence interval, E = U − θ and for a lower-confidence interval, E = θ − L.
        For a given level of accuracy (α), we can control the precision of the confidence
interval using the sample size. For the two-sided confidence interval on µ, we specify a
value of E and note that:

                     σ
        E = Zα / 2      .
                      n

        Then, we can solve for n.
                          2
            Z σ 
        n =  α/2 
             E 

        For the one-sided confidence intervals:
                      2
          Z σ 
        n= α  .
           E 

Example 6.17 (Ex. 6.16 continued)

(a)     If we wish to construct a 95% confidence interval on µ that has a half-width of
        0.1, how many students should we survey?

                          2          2
             Z σ   1.96 ⋅ 0.5 
        n =  0.025  =          = 96.04
             E   0.1 

        n = 97



                                                                                       90
(b)     If we wish to construct a 95% upper-confidence interval on µ that has a half-
        width of 0.1, how many students should we survey?

                      2                2
             Z σ   1.645 ⋅ 0.5 
        n =  0.05  =            = 67.65
             E        0.1 

        n = 68


(c)     If we wish to construct a 90% confidence interval on µ that has a half-width of
        0.1, how many students should we survey?

                      2                2
             Z σ   1.645 ⋅ 0.5 
        n =  0.05  =            = 67.65
             E        0.1 

        n = 68


6.3.4   Using Confidence Intervals to Perform Hypothesis Tests on µ when σ is
        Known
        Thus far, we have considered two methods of evaluating hypothesis tests: critical
regions and P-values. A third, equivalent method is to use a confidence interval.

        1.       Specify: µo, α, n

        2.       If H1: µ ≠ µo, construct a 100(1 − α)% confidence interval on µ.
                 If H1: µ > µo, construct a 100(1 − α)% lower-confidence interval on µ.
                 If H1: µ < µo, construct a 100(1 − α)% upper-confidence interval on µ.

        3.       Reject H0 is µo is not contained in that confidence interval.

Example 6.17 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at
the Philadelphia University. It is widely known that, for this population, σ = 0.5. The
population mean is not widely known, however, it is commonly believed that the average
GPA is 3.0. We wish to test this hypothesis using a sample of size 25 and a level of
significance of 0.05.

From Ex. 6.10:

        H0: µ = 3.0
        H1: µ ≠ 3.0




                                                                                          91
Suppose the sample mean is 3.18. Use a confidence interval to evaluate the hypothesis
test.




       α = 0.05, H1: ≠

               95% confidence interval

       From Ex. 6.16:

               2.984 ≤ µ ≤ 3.376

       3.0 is in the confidence interval

               fail to reject H0

Example 6.18 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures
brown glass beverage containers that are sold to breweries. One of the key characteristics
of these bottles is their volume. GBC knows that the standard deviation of volume is
0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a
sample size of 30 and a level of significance of 0.01.

From Ex. 3.2.2:

       H0: µ ≤ 12.2
       H1: µ > 12.2

Suppose the sample mean is 12.23. Use a confidence interval to evaluate the hypothesis
test.

       α = 0.01, H1: >

               99% lower-confidence interval

                            σ                   0.08
               X − Z 0.01      = 12.23 − 2.3263
                             n                    30

               µ ≥ 12.1960

       12.2 is in the confidence interval

               fail to reject H0




                                                                                       92
6.4 Hypothesis Test on μ and σ unkown
       What if σ is Unknown?

       Suppose we are interested in studying the mean of a population, but we do not
know the value of the population standard deviation?

   •   We can use the procedures defined in section 2.3 and replace σ with S, provided
       that the sample size is large (n ≥ 30).
   •   When the sample size is small and σ is unknown, then we must assume that the
       population is normally distributed.


                               The t Distribution


       Suppose we wish to perform the following hypothesis test.

       H0: µ = µ0
       H1: µ ≠ µ0

       Suppose we have collected a random sample of size n and that we have used this
sample data to compute the sample mean X and the sample standard deviation S.
       If σ were known then we would compute the test statistic:


               X − µ0
        Z0 =            .
               σ    n

Therefore, a logical approach is to replace σ with S. The resulting test statistic is:

               X − µ0
        T0 =            .
               S    n

Before we can proceed, we should analyze the sampling distribution of this test statistic.

       Theorem 6.1 The t Distribution
       Let X1, X2, … , Xn be a random sample from a normal population having mean µ.
       The quantity
                                                  X −µ
                                             T=
                                                  S
                                                     n


                                                                                         93
has a t distribution with n – 1 degrees of freedom.

      While we won’t discuss the details of the t distribution, it is important to recognize

two points regarding the t probability density function.

      •   First, it is symmetric about 0.
      •   Second, as the number of degrees of freedom increases, the t distribution
          approaches the standard normal distribution. This explains why it is OK to use
          the procedures from section 2.3 when n ≥ 30 (at 29 degrees of freedom there is
          little difference between t and Z).

Example 6.19 Suppose T has a t distribution with 7 degrees of freedom. Find the
following:

(a)       P(T > 2.365)

          Excel function TDIST(x, degrees of freedom, 1 or 2 tails)=P(2.3625, 7, 1) = 0.025

          Note Excel gives you the value P(X>x)

(b)       P(T > 1.415)

          0.10

(c)       P(T < −3.499)

          P(T > 3.499) = 0.005

(d)       P(T > −2.8) = 1 – Pt (T<2.88) = 0.9867


(e)       the value a such that P(T >a) = 0.05

          a = t0.05,7 = 1.895

(f)       the value of a such that P(T > a) = 0.01

          a = t0.01,7 = 2.998

(g)       the value of a such that P(T < a) = 0.9975

          a = t0.0025,7 = 4.029



                                                                                         94
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes
En505 engineering statistics student notes

Weitere ähnliche Inhalte

Was ist angesagt?

Presentation on application of numerical method in our life
Presentation on application of numerical method in our lifePresentation on application of numerical method in our life
Presentation on application of numerical method in our lifeManish Kumar Singh
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
 
Engineering Numerical Analysis Lecture-1
Engineering Numerical Analysis Lecture-1Engineering Numerical Analysis Lecture-1
Engineering Numerical Analysis Lecture-1Muhammad Waqas
 
Multiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical MethodsMultiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical MethodsMeenakshisundaram N
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite IntegralJelaiAujero
 
Numerical differentation with c
Numerical differentation with cNumerical differentation with c
Numerical differentation with cYagya Dev Bhardwaj
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSnaveen kumar
 
Setting linear algebra problems
Setting linear algebra problemsSetting linear algebra problems
Setting linear algebra problemsJB Online
 
Numerical
NumericalNumerical
Numerical1821986
 

Was ist angesagt? (20)

Ch07 ans
Ch07 ansCh07 ans
Ch07 ans
 
probability assignment help
probability assignment helpprobability assignment help
probability assignment help
 
Mit6 006 f11_quiz1
Mit6 006 f11_quiz1Mit6 006 f11_quiz1
Mit6 006 f11_quiz1
 
4 meda
4 meda4 meda
4 meda
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Presentation on application of numerical method in our life
Presentation on application of numerical method in our lifePresentation on application of numerical method in our life
Presentation on application of numerical method in our life
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
Applied mathematics 40
Applied mathematics 40Applied mathematics 40
Applied mathematics 40
 
probability assignment help (2)
probability assignment help (2)probability assignment help (2)
probability assignment help (2)
 
Engineering Numerical Analysis Lecture-1
Engineering Numerical Analysis Lecture-1Engineering Numerical Analysis Lecture-1
Engineering Numerical Analysis Lecture-1
 
Multiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical MethodsMultiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical Methods
 
Math Exam Help
Math Exam HelpMath Exam Help
Math Exam Help
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
Numerical differentation with c
Numerical differentation with cNumerical differentation with c
Numerical differentation with c
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
 
Numerical Method 2
Numerical Method 2Numerical Method 2
Numerical Method 2
 
Setting linear algebra problems
Setting linear algebra problemsSetting linear algebra problems
Setting linear algebra problems
 
Numerical
NumericalNumerical
Numerical
 
03 optimization
03 optimization03 optimization
03 optimization
 

Ähnlich wie En505 engineering statistics student notes

Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...Scilab
 
Basics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the conceptsBasics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the conceptsps6005tec
 
Ch1 sets and_logic(1)
Ch1 sets and_logic(1)Ch1 sets and_logic(1)
Ch1 sets and_logic(1)Kwonpyo Ko
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probabilityguest45a926
 
Probability-Olasılık
Probability-OlasılıkProbability-Olasılık
Probability-OlasılıkEnes Teymir
 
STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)2010111
 
STAB52 Lecture Notes (Week 2)
STAB52 Lecture Notes (Week 2)STAB52 Lecture Notes (Week 2)
STAB52 Lecture Notes (Week 2)Danny Cao
 
Indefinite integration class 12
Indefinite integration class 12Indefinite integration class 12
Indefinite integration class 12nysa tutorial
 
Theory of Probability
Theory of Probability Theory of Probability
Theory of Probability ArchanaKadam19
 
schaums-probability.pdf
schaums-probability.pdfschaums-probability.pdf
schaums-probability.pdfSahat Hutajulu
 
Briefnts1 events
Briefnts1 eventsBriefnts1 events
Briefnts1 eventsilathahere
 
Probability Arunesh Chand Mankotia 2005
Probability   Arunesh Chand Mankotia 2005Probability   Arunesh Chand Mankotia 2005
Probability Arunesh Chand Mankotia 2005Consultonmic
 
Probability and Entanglement
Probability and EntanglementProbability and Entanglement
Probability and EntanglementGunn Quznetsov
 
Note 2 probability
Note 2 probabilityNote 2 probability
Note 2 probabilityNur Suaidah
 
Introduction to Real Analysis 4th Edition Bartle Solutions Manual
Introduction to Real Analysis 4th Edition Bartle Solutions ManualIntroduction to Real Analysis 4th Edition Bartle Solutions Manual
Introduction to Real Analysis 4th Edition Bartle Solutions ManualDawsonVeronica
 

Ähnlich wie En505 engineering statistics student notes (20)

Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
Introduction to Discrete Probabilities with Scilab - Michaël Baudin, Consort...
 
Basics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the conceptsBasics of Probability Theory ; set definitions about the concepts
Basics of Probability Theory ; set definitions about the concepts
 
Ch1 sets and_logic(1)
Ch1 sets and_logic(1)Ch1 sets and_logic(1)
Ch1 sets and_logic(1)
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
Probability-Olasılık
Probability-OlasılıkProbability-Olasılık
Probability-Olasılık
 
STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)
 
STAB52 Lecture Notes (Week 2)
STAB52 Lecture Notes (Week 2)STAB52 Lecture Notes (Week 2)
STAB52 Lecture Notes (Week 2)
 
chap2.pdf
chap2.pdfchap2.pdf
chap2.pdf
 
Course material mca
Course material   mcaCourse material   mca
Course material mca
 
Indefinite integration class 12
Indefinite integration class 12Indefinite integration class 12
Indefinite integration class 12
 
3.1 probability
3.1 probability3.1 probability
3.1 probability
 
Theory of Probability
Theory of Probability Theory of Probability
Theory of Probability
 
schaums-probability.pdf
schaums-probability.pdfschaums-probability.pdf
schaums-probability.pdf
 
Probability[1]
Probability[1]Probability[1]
Probability[1]
 
Probability theory
Probability theoryProbability theory
Probability theory
 
Briefnts1 events
Briefnts1 eventsBriefnts1 events
Briefnts1 events
 
Probability Arunesh Chand Mankotia 2005
Probability   Arunesh Chand Mankotia 2005Probability   Arunesh Chand Mankotia 2005
Probability Arunesh Chand Mankotia 2005
 
Probability and Entanglement
Probability and EntanglementProbability and Entanglement
Probability and Entanglement
 
Note 2 probability
Note 2 probabilityNote 2 probability
Note 2 probability
 
Introduction to Real Analysis 4th Edition Bartle Solutions Manual
Introduction to Real Analysis 4th Edition Bartle Solutions ManualIntroduction to Real Analysis 4th Edition Bartle Solutions Manual
Introduction to Real Analysis 4th Edition Bartle Solutions Manual
 

Kürzlich hochgeladen

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Kürzlich hochgeladen (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

En505 engineering statistics student notes

  • 1. EN505 Engineering Statistics Fernando Tovia, Ph.D. 1 RANDOM VARIABLES AND PROBABILITY 1.1 Random Variables Definition 1.1 A random experiment is an experiment such that the outcome cannot be predicted in advance with absolute precision. Definition 1.2 The set of all possible outcomes of a random experiment is called the sample space. The sample space is denoted by Ω. An element of the sample space is denoted by ω. Example 1.1 Construct the sample space for each of the following random experiments: 1. flip a coin 2. toss a die 3. flip a coin twice Definition 1.3 A subset of Ω is called an event. Events are denoted by italicized, capital letters. Example 1.2 Consider the random experiment consisting of tossing a die. Describe the following events. 1. A = the event that 2 appears 2. B = the event that an even number appears 3. C = the event that an odd number appears 4. D = the event that a number appears 5. E = the event that no number appears The particular set that we are interested in depends on the problem being considered. However, a good thing to do when beginning any probability modeling problem is to clearly define all the events of interest. One graphical method of describing events defined on a sample space is the Venn diagram. The representation of an event using a Venn diagram is given in Figure 1.1 Note that the rectangle corresponds to the sample space, and the shaded region corresponds to the event of interest. Figure 1.1 Venn Diagram for Event A 1
  • 2. Definition 1.4 Let A and B be two event defined on a sample space Ω. A is a subset of B, denoted by A ⊂ B, if an only if (iff), ∀ ω ∈ A, ω ∈ B. (Figure 1.2) Figure 1.2 Venn Diagram for A ⊂ B Definition 1.5 Let A be an event defined on a sample space Ω. ω ∈ Ac iff ω ∉ A. Ac is called the complement of A. (Figure 1.3) Figure 1.3 Venn Diagram for Ac Definition 1.6 Let A and B be two events defined on the sample space Ω. ω ∈ A ∪ B iff ω ∈ A or ω ∈ B (or both). A ∪ B is called the union of A and B (see Figure 1.4). Figure 1.4 Venn Diagram for A ∪ B Let {A1, A2, …} be a collection of events defined on a sample space. ∞ ω ∈ U Aj j =1 iff ∃ some j = 1, 2, … ∋ ω ∈ Aj ∞ UA j =1 j is called the union of {A1, A2, …} Definition 1.7 Let A and B be two events defined on the sample space Ω. ω ∈ A ∩ B iff ω ∈ A and ω ∈ B. A ∩ B is called the intersection of A and B (see Figure 1.5). Figure 1.5 Venn Diagram for Let {A1, A2, …} be a collection of events defined on a sample space. ∞ ω ∈ I Aj j =1 iff ω ∈ A ∀ j = 1, 2, … ∞ IA j =1 j is called the intersection of {A1, A2, …} Example 1.3 (example 1.2 continued) 1. Bc = C 2
  • 3. 2. B ∪ C = D 3. A ∩ B = B Theorem 1.1 Properties of Complements Let A be an event defined on a sample space Ω. Then (a) (b) Theorem 1.2 Properties of the Unions Let A, B, C be events defined on a sample space Ω. Then (a) (b) (c) (d) (e) Example Prove Theorem 1.2 (c) Theorem 1.3 Properties of the Intersection Let A, B, and C be events defined on the sample space Ω. Then (a) (b) (c) (d) (e) Example 1.6 Prove theorem 1.3 (b) 3
  • 4. Theorem 1.4 Distribution of Union and Intersection Let A, B and C be events defined in the sample space Ω. Then (a) A ∩ (B ∪ C) = (A ∩ B) ∪ (A∩C) (b) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) Theorem 1.5 DeMorgan’s Law Let A, B and C be events defined in the sample space Ω. Then (a) (A ∪ B)c = Ac ∩ Bc (b) (A ∩ B) c = Ac ∪ Bc Definition 1.8 Let A and B be two events defined in the sample space Ω. A and B are said to be mutually exclusive or disjoint iff A ∩ B = Ø (Figure 1.6). A collection of events {A1, A2, … }, defined on a sample space Ω, is said to be disjoint iff every pair of events in the collection is mutually exclusive. Figure 1.6 Venn Diagram for Mutually Exclusive Events Definition 1.9 A collection of events {A1, A2, …, An} defined on a sample space Ω, is said to be a partition (Figure 1.7) of Ω iff (a) the collection is disjoint n (b) UA j =Ω j =1 Figure 1.7 Venn Diagram for a Partition Example 1.7 (Example 1.2 continued) Using the defined event, identify: (a) a set of mutually exclusive events (b) a partition of the sample space 4
  • 5. Definition 1.10 A collection of events, F, defined on a sample space Ω, is said to be a field iff (a) Ω ∈ F, (b) if A Ω ∈ F, then Ac ∈ F, then n U A ∈ F, j =1 j We use fields to represent all the events that we are interested in study. To construct a field: 1. we start with Ω 2. Ø is inserted by implication (Definition 1.10 (b) 3. we then add the events of interest 4. we then add complements and unions Example 1.8 Suppose we perform a random experiment which consists of observing the type of shirt worn by the next person entering a room. Suppose we are interested in the following events. L = the shirt that has long sleeves S = the shirt has short sleeves N = the shirt has no sleeves Assuming that {L, S, N} is a partition of Ω, construct an appropriate field Theorem 1.6 Intersection are in Fields Let F be a field of events defined in the sample space Ω. Then if A1, A2, … , An ∈ F, then n IA ∈ F j =1 j Example 1.9 Prove that if A, B ∈ F, then A ∩ B ∈ F. 5
  • 6. Any meaningful expression containing events of interest, ∪ , ∩, and c can be shown to be in the field. Definition 1.11 Consider a set of elements, such as S = {a, b, c}. A permutation of the elements is an ordered sequence of elements. The number of permutations of n different elements is n! where n! = n x (n-1) x (n-2) x …x 2 x 1 Example 1.10 List all the permutations of the elements S Definition 1.12 The number of permutations of subsets of r elements selected from a set of n different elements is Another counting problem of interest is the number of subsets of r elements that can be selected from a set of n elements. Here the order is not important, and are called combinations. Definition 1.13 The number of combinations, subsets of size r that can be selected from a set of n elements, is denoted as Example 1.11 The EN505 class has 13 students. If teams of 2 students can be selected, how many different teams are possible? 1.2 Probability Probability is used to quantify the likelihood, or chance, that an outcome of a random experiment will occur. Definition 1.14 A random variable is a real-valued function defined on a sample space. Random variables are typically denoted by italicized capital letters. Specific values taken on by a random variable are typically denoted by italicized, lower-case letters. Definition 1.15 A random variable that can take on a countable number of values is said to be a discrete random variable. Definition 1.16 A random variable that can take on an uncountable number of values is said to be a continuous random variable. 6
  • 7. Definition 1.17 The set of possible values for a random variable is referred as a range of a random variable. Example 1.12 Consider the following experiments of random variables, define the random variable, identify the range for each random variable, and classify it as discrete or continuous. 1. flip a coin 2. toss a die until a 6 appears 3. quality inspection of a shipment of manufactured items. 4. arrival of customer to a bank Definition 1.18 Let Ω be the random space for some random experiment. For any event defined on Ω, Pr(·) is a function which assigns a number to the event. Pr(A) is called the probability of event A provided the following conditions hold: (a) (b) (c) Probability is used to quantify the likelihood, or chance, that an event will occur within the sample space. 7
  • 8. Whenever a sample consists of N possible outcomes , the probability of each outcome is 1/N Theorem 1.7 Probability Computational Rules Let A and B events defined on a sample space Ω, and let {A1, A2, …, An} be a collection of events defined on Ω. Then (a) (b) (c) (d) (e) (f) Corollary 1.1 Union of Three or More Events Let A, B, C and D be events defined on a sample space Ω. Then, Pr( A ∪ B ∪ C ) = Pr( A) + Pr( B ) + Pr(C ) − Pr( A ∩ B ) − Pr( A ∩ C ) − Pr( B ∩ C ) + Pr( A ∩ B ∩ C ) and Pr( A ∪ B ∪ C ∪ D ) = Pr( A) + Pr( B) + Pr(C ) + Pr( D) − Pr( A ∩ B) − Pr( A ∩ C ) − Pr( A ∩ D ) − − Pr( B ∩ C ) − Pr( B ∩ D) − Pr(C ∩ D) + Pr( A ∩ B ∩ C ) + Pr( A ∩ B ∩ D) + Pr( B ∩ C ∩ D) + − Pr( A ∩ B ∩ C ∩ D ) Example 1.11 Let A, B and C be events defined on a sample space Ω ∋ Pr(A) = 0.30 Pr(Bc) = 0.60 Pr(C) = 0.20 Pr(A ∪ B) = 0.50 Pr( B ∩ C ) = 0.05 A and C are mutually exclusive Compute the following probabilities (a) Pr(B) 8
  • 9. (b) Pr( B ∪ C ) = (c) Pr( A ∩ B ) (d) Pr( A ∪ C ) (e) Pr( A ∩ C ) (f) Pr( B ∩ C c ) (g) Pr( A ∪ B ∪ C ) = 1.3 Independence Two events are independent if any one of the following equivalent statements is true 9
  • 10. (1) P(A|B) = P(A) (2) P(B|A) = P(B) (3) P ( A ∩ B ) = P ( A) P ( B ) Example 2.29 (book work in class) 1.4 Conditional Probability Definition 1.19. Let A and B events define on a sample space Ω ∋ B ≠ Ø. We refer to Pr(A|B) as the conditional probability of event given the occurrence of event B, where Pr( AC | B) = probability of not A given B 10
  • 11. note that Pr(A|Bc) ≠ 1-Pr(A|B) Example 1.12 A semiconductor manufacturing facility is controlled in a manner such that 2% of manufactured chips are subjected to high levels of contamination. If a chip is subjected to high levels of contamination, there is a 12% chance that it will fail testing. What is the probability that a chip is subjected to high levels of contamination and fails upon tests? c= f= Pr(High c level) = Pr(Fail | high c level) = Pr(F∩C) = Example 1.13 An air quality test is designed to detect the presence of two molecules (molecule 1 and molecule 2). 17% of all samples contain both molecules, and 48% of all samples contain molecule 1. If a sample contains molecule 1, what is the probability that it also contains molecule 2? M1 = molecule 1 M2 = molecule 2 Pr(M1∩M2) = Pr(M1) = Pr(M2|M1) = Theorem 1.8 Properties of Conditional Probability Let A and B be non-empty event defined on a sample space Ω. Then (a) If A and B are mutually exclusive then Pr(A|B) = 0 (b) If A ⊂ B then Pr(A|B)>=Pr(A) 11
  • 12. (c) If B ⊂ A then Pr(A|B) =1 Theorem 1.9 Law of Total Probability – Part 1 Let A and B be events defined on a sample space Ω ∋ A ≠ Ø, B ≠ Ø, Bc ≠ Ø. Then Example 1.15 A certain machine’s performance can be characterized by the quality of a key component. 94% of machines with a defective key component will fail. Whereas only 1% of machines with a non-defective key component will fail. 4% of machines have a defective key component. What is the probability that the machine will fail? F = fail D = defective Pr(D) = Pr(F|D) = Pr(F|Dc) = Pr(F) = Theorem 1.11 Bayes’ Theorem – Part 1 Let A and B be events defined on a sample space Ω ∋ A ≠ Ø, B ≠ Ø, Bc ≠ Ø. Then Example 1.15 (Example 1.14 continues) Suppose the machine fails. What is the probability that the key component was defective? Pr(D|F) = 12
  • 13. Theorem 1.12 Law of Total Probability – Part 2 Let A be a non-empty event defined on a sample space Ω, and let {B1, B2, …, Bn}be a partition of Ω ∋ Bj ≠ Ø ∀ j =1, 2, …, n. Then Pr( A) = Theorem 1.13 Bayes’ Theorem – Part 2 Let A be a non-empty event defined on a sample space Ω, and let {B1, B2, …, Bn}be a partition of Ω ∋ Bj ≠ Ø ∀ j =1, 2, …, n. Then Pr( A | B j ) Pr( B j ) Pr( A | B j ) Pr( B j ) Pr( B j | A) = = n Pr( A) ∑ Pr( A | B ) Pr( B ) i =1 i i 13
  • 14. 14
  • 15. 2 DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS A discrete random is a random variable that can take on at most countable number of values. Definition 2.1 Let X be a discrete random variable having cumulative distribution function F. Let x1, x2 , ... denote the possible values of X. Then f(x) is the probability mass function (pmf) of X, if a) f(x) = P(X = x) b) f(xj) > 0, j = 1, 2, … c) f(x) = 0, if x = xj, j = 1, 2, ∞ d) ∑ f (x ) = 1 j =1 j Definition 2.2 The cumulative distribution function of a discrete random variable X, is denoted by F(X), and is given by F(X) = ∑ f ( x) xj ≤x And satisfies the following properties a) F(X) = P(X ≤ x) = ∑ f (x ) xj ≤x j b) 0 ≤ F( X ) ≤ 1 c) if x ≤ y then F( X ) ≤ F( Y ) Example 2.1 Suppose X is a discrete random variable having pmf f and cmf F, where f(1) = 0.1, f(2) = 0.4, f(3) = 0.2, f(4) = 0.3. 1. Construct the cumulative distribution function of X. 15
  • 16. 2. Compute Pr(X ≤ 2). 3. Compute Pr(X < 4). 4. Compute Pr(X ≥ 2). Definition 2.3 The mean or expected value of X, denoted as µ or E(X), is µ = E ( X ) = ∑ xf ( x) x The variance of X, denoted by Var (X) and given by σ 2 = V ( X ) = E ( x − µ )2 = ∑ ( x − µ )2 f ( x) = ∑ x 2 f ( x) − µ 2 x x The standard deviation of X is σ = σ 2 Definition 2.4 Let X be a discrete random variable with a probability mass function f(x). The expected value of X is denoted by E(X) and given by ∞ E( X ) = ∑ x j f ( x j ) j =1 2.1 Discrete Distributions 2.1.1 Discrete Uniform Distribution Suppose a random experiment has a finite set of equally likely outcomes. If X is a random variable such that there is one-to-one correspondence between the outcomes and the set of integers {a, a + 1, … , b}, then X is a discrete uniform random variable having parameters a and b. Notation Range Probability Mass Function 16
  • 17. Parameters Mean Variance. Example 2.2 Let X ~ DU(1, 6). 1. Compute Pr(X = 2). 2. Compute Pr(X > 4) 2.1.2 The Bernoulli Random Variable Consider a random experiment that either “succeeds” or “fails”. If the probability of success is p, and we let X = 0 if the experiment fails and X = 1 if it succeeds, then X is a Bernoulli random variable with probability p. Such a random experiment is referred to as a Bernoulli trial. Notation Range Probability Mass Function Parameter 17
  • 18. Mean Variance 2.1.3 The Binomial Distribution The binomial distribution denotes the number of success in n independent Bernoulli trials with probability p of success on each trial. Notation Range Probability Mass Function Cumulative Distribution Function Parameters Mean Variance Comments if n = 1, then X ~ Bernoulli(p) 18
  • 19. Example 2.3 Each sample of air has a 10% chance of containing a particular rare molecule. 1. Find the probability that in the next 18 samples, exactly 2 contains the rare molecule. 2. Determine the probability that at least four samples contain the rare molecule 3. Determine the probability that there will be at least one sample with the rare molecule but less than four? 2.1.4 The Negative Binomial Random Variable The negative binomial random variable denotes the number of trials until the kth success in a sequence of independent Bernoulli trials with probability p of success on each trial. Notation Range Probability Mass Function 19
  • 20. Cumulative Distribution Function Parameters Mean Variance Example 2.4 A high-performance aircraft contains three identical computers. Only one is used to operate the aircraft; the other two are spares that can be activated in case the primary system fails. During one hour of operation, the probability of a failure in the primary computer is 0.0005. 1. Assuming that each hour represents an independent trial, what is the mean time to failure of all the three components? 3. What is the probability that all three computers fail within a 5-hour flight? 20
  • 21. Comments: If k = 1, then X ~ geom(p), i.e. X is a geometric random variable having a probability of success p 2.1.5 The Geometric Distribution In a series of independent Bernoulli trials, with constant probability p of a success, let the random variable X denote the number of trials until the first success. Then X has a geometric distribution. Notation Range Probability Mass Function Cumulative Distribution Function Parameters Mean Variance Example 2.3 Consider a sequence of independent Bernoulli trials with a probability of success p = 0.2 for each trial. (a) What is the expected number of trials to obtain the first success? 21
  • 22. (b) After the eight successes occurs, what is the expected number of trials to obtain the ninth success? 2.1.6 The Hypergeometric Random Variable Consider a population consisting of N members, K of which are denoted as successes. Consider a random experiment during which n members are selected random from the population, and let X denote the number of successes in the random sample. If the members in the sample are selected from the population without replacement, then X is a hypergeometric random variable having parameters N, k and n. Notation Range Probability mass function Parameters 22
  • 23. Comments If the sample is taken from the population with replacement, then X~bin(n,K/N). Therefore, if n<<N, we can use the approximation bin(n,K/ N) ≈ HG(N, K, n). Example 2.4 Suppose a shipment of 5000 batteries is received, 150 of them being defective. A sample of 100 is taken from the shipment without replacement. Let X denote the number of defective batteries in the sample. 1. What kind of random variable is X, and what is the range of X? 2. Compute Pr(X = 5). 3. Approximate Pr(X = 5) using the binomial approximation to the hypergeometric. 2.1.7 The Poisson Random Variable The Poisson random variable denotes the number of events that occurs in an interval of length t when events occur at a constant average rate λ. Notation 23
  • 24. Probability Mass Function Cumulative Distribution Function Parameters Comments The Poisson random variable X equals the number of counts in the time interval t. The counts in each subinterval is independent of other subinterval. If np>5 or (1-p) > 5, then we can use the approximation, Poisson ≈ bin (n, p). Mean Variance It is important to use consistent units in the calculations of probabilities, means, and variances involving Poisson random variable. Example 2.5 Contamination is a problem in the manufacture of optical storage disks. The number of particles of contamination that occur on an optical disk has a Poisson 24
  • 25. distribution, and the average number of particles per centimeter squared of media surface is 0.1. The area of a disk under study is 100 squared centimeters. a) Find the probability that 12 particles occur in the area of a disk under study. b) Find the probability that zero particles occur in the area of the disk under the study. c) Find the probability that 12 or fewer particles occur in the area of a disk under study. 2.1.8 Poisson Process Up to this time in the course, we have discussed the assignment of probabilities to events and random variables, and by manipulating these probabilities we can analyze “snap- shots” of systems behavior at certain point in the time, or under certain conditions. Now, we are going to study one of the most commonly recognized continuous-time stochastic processes that allow us to study important aspects of systems behavior over a time interval t. Definition 2.5 Let {N(t), t≥ 0} be a counting process. Then, {N(t), t≥ 0}is said to be a Poisson process having rate λ, λ > 0, iff a. start counting from zero 25
  • 26. b. The number of outcomes occurring in one time or (specific region) is independent of the number that occurs in any other disjoint time interval or region space, which can be interpreted that the Poisson process has no memory. c. the number of events in any interval (s, s + t) is a Poisson random variable with mean λt. d. The probability that more that more than one outcome will occur at the same time is negligible. This is denoted by N(t) ~ PP(λ), which λ refers to the average rate at which events occur. The part (c) of the definition implies that 1) 2) 4) Note that in order to be a Poisson proves the average event occurrence MUST BE CONSTANT over time, otherwise the Poisson process would be an inappropriate model. Also note, that t can be interpreted as the specific “time”, “distance”, “area”, “volume” of interest. Example 2.6 Customers arrive to a facility according to a Poisson process with rate λ = 120 customer per hour. Suppose we begin observing the facility at some point in time. a) What is the probability that 8 customers arrive during a 5-minute interval? b) On average, how many customers will arrive during a 3.2-minute interval? 26
  • 27. c) What is the probability that more than 2 customers arrive during a 1-minute interval? d) What is the probability that 4 customers arrive during the interval that begins 3.3 minutes after we start observing and ends 6.7 minutes after we start observing? e) On average, how many customers will arrive during the interval that begins 16 minutes after we start observing and ends 17.8 minutes after we start observing? f) What is the probability that 7 customers arrive during the first 12.2 minutes we observe, given that 5 customers arrive during the first 8 minutes? 27
  • 28. g) If 3 customers arrive during the first 1.2 minutes of our observation period, on average, how many customers will arrive during the first 3.7 minutes? h) If 1 customer arrives during the first 6 seconds or our observations, what is the probability that 2 customers arrive during the interval that begins 12 seconds after we observing and ends 30 seconds after we start observing? i) If 5 customers arrive during the first 30 seconds of our observations, on average, how many customers will arrive during the interval that begins 1 minute after we start observing and ends 3 minutes after we start observing? j) If 3 customers arrive during the interval that starts 1 minute after we start observing and ends 2.2 minutes after we start observing, on average, how many customers will arrive during the first 3.7 minutes? 28
  • 29. Example 2.7 (Binomial approximation) In a manufacturing process where glass products are produced, defects or bubbles occur, occasionally rendering the piece undesirable for marketing. It is known that, on average, 1 in every 1000 of these items produced has one or more bubbles. What is the probability that a random sample of 8000 will yield fewer than 7 items possessing bubbles? 29
  • 30. 3 CONTINUOUS RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS As stated earlier, a continuous random variable is a random variable that can take on an uncountable number of values. Definition 3.1 The probability density function of a continuous random variable X is a nonnegative function f(x) defined ∀ real x ∋ for any set A of real numbers Theorem 3.1 Integral of a Density Function The function f is a density function iff All probability computations for a continuous random variable can be answered using the density function. Theorem3.2 Probability Computational Rules for Continuous Random Variables Let X be a continuous random variable having cumulative distribution function F and probability density function f. Then (a) (b) (c) 30
  • 31. (d) (e) The mean or expected value of X, denoted as µ or E(X), is The variance of X, denoted by Var (X) and given by Example 3.1 Consider a continuous random variable X having the following density function where c is a constant  c(1 − x 2 ) 0 ≤ x ≤ 1 f ( x) =  0 otherwise 1. What is the value of c? 2. Construct the cumulative distribution function of X. 31
  • 32. 3. Compute Pr(0.2< X ≤ 0.8) = 4. Compute Pr(X >0.5) = Part (d) of Theorem 1.3.2 states that the probability density function is the derivative of the cumulative distribution function. Altough this is true, it does not provide adequate intuition as to the interpretation of the density function. For a discrete random variable, the probability mass function actually assigned probabilities to the possible values of the random variables. Theorem 1.3.2 (b) states that the probability of any specific value for a continuous random variable is 0. The probability density function is not the probability of a specific value. It is, however, the relative likelihood ((as compared to other possible values) that the random variable will be near a certain value. Continuous random variables are typically specified in terms of the form of their probability density functions. In addition, some continuous random variables have been widely-used in probability modeling. We will consider some of these more commonly- used random variables, including: 1. the uniform random variable, 2. the exponential random variable, 3. the gamma random variable, 4. the Weibull random variable, 5. the normal random variable, 6. the lognormal random variable, 7. the beta random variable, 3.1 The Uniform Continuous Random Variable Notation Range Probability Density Function 32
  • 33. Cumulative Distribution Function Parameters Mean Variance Comments As its name implies, the uniform random variable is used to represent quantities that occur randomly over some interval of the real line. An observation of a U(0,1) random variable is referred to as a random number. Example 3.2 Verify that the equation for the cumulative distribution of the uniform random variable is correct. Example 3.3 The magnitude (measured in N) of a load applied to a steel beam is believed to be a U(2000, 5000) random variable. What is the probability that the load exceeds 4200 N? 33
  • 34. 3.2 The exponential Random The random variable X that equal the distance (time) between successive counts of a Poisson process with mean λ (rate, events per time unit, i.e. arrivals per hour, failures per day, etc.) has an exponential distribution with parameter λ. Notation Range Probability Density Function Cumulative Distribution Function Parameters Mean Variance Comments λ is called the rate of the exponential distribution. 34
  • 35. Example 3.4 In a large computer network, user log-ons to the system can be modeled as a Poisson process with a mean of 25 logs-on per hour. What is the probability that there are no log-ons in an interval of 6 minutes? What is the probability that the time until the next log-on is between 2 and 3 minutes? Upon converting all units to hours, Determine the interval of time that the probability that no log-on occurs in the interval is 0.90. Te question asks for the length of time x such that Pr(X>x) = 0.90. What is the mean time until the next log-on? What is the standard deviation of the time until the next log-on? 35
  • 36. Theorem 3.3 The Memoryless Property of the Exponential Distribution Let X be a continuous random variable. The X is an exponential random variable iff Theorem 3.4 The Conditional Form of the Memoryless Prperty Let X be a continuous random variable. Then X is an exponential random variable iff Furthermore, no other continuous random variable possesses this property.There are several implications of the memoryless property of the exponential random variable.  First, if the exponential random variable is used to model the lifetime of a device, then at every point in time until it fails, the device is as good as new (from a probabilistic standpoint).  If the exponential random variable is used to model an arrival time, then at every point in time until the arrival occurs, it is as just began “waiting” for the arrival. Example 3.5 Suppose that the life length of a component is an exponential random variable with rate 0.0001. Note that time units are hours. Determine the following. a) What is the probability that the component lasts more than 2000 hours? b) Given that the component lasts at least 1000 hours, what is the probability that it lasts more than 2000 hours? 36
  • 37. Theorem 3.5 Expectation under the Memoryless Property Let X be an exponential random variable. Then Example 3.6 (Example 3.5 continued) a) Given that the component lasts at least 1000 hours, what is the expected value of its life length? b) Given that the component has survived 1000 hours, on average, how much longer will it survive? 3.3 The Normal Distribution Notation Range Probability Density Function Cumulative Distribution Function no closed form expression Parameters 37
  • 38. Mean Variance Comments Standard Normal Random Variable If µ = 0 and σ = 1, then X is referred as the standard normal random variable. The standard random normal variable is often denoted by Z. The cumulative distribution of the standard normal random variable is denoted as Φ ( z ) = Pr( Z ≤ z ) Appendix A Table I provides cumulative probabilities for a standard random variable. For example, assume that Z is a standard normal random variable. Appendix A Table I provides probabilities of the form Pr(Z≤ 1.53). Find in the column z 1.5 and find in the row 0.03, then Pr(Z≤ 1.53) = 0.93699. The same value can be obtained in Excel, type function icon (fx), statistical, NORMSDIST(z), enter 1.53, and Excel will give you the result in the cell =NORMSDIST(1.53) = 0.936992 The function Is denoted a probability from Appendix A Table I. It is the cumulative distribution function of a standard normal random variable. (see figure 4-13 page 124 from the Montgomery book). Example 3.7 (Example 4-12Montgomery) Some useful results concerning a normal distribution are summarized in Fig 4-1413. (textbook). For any random variable 38
  • 39. 1) 2) 3) 4) If X ~ N(µ,σ2), then (X -µ)/ σ ~N(0,1), which is known as the z-transformation. That is, Z is a standard normal random variable. Suppose X is a normal random variable with mean µ and standard deviation σ. Then, 39
  • 40. Example 3.8 One key characteristic of a certain type of drive shaft is its diameter, and the diameter is a normal distributed random variable having µ = 5 cm and σ = 0.08 cm. a) What is the probability that the diameter of a given drive shaft is between 4.9 and 5.05 cm? b) What diameter is exceeded by 90% of drive shafts? c) Provide tolerances, symmetric about the mean, that capture 99% of drive shafts. 40
  • 41. Example 3.9 The diameter of a shaft in an optical storage drive is normally distributed with mean 0.2508 inch and standard deviation 0.0005 inch. The specifications on the shaft are ±0.0015 inch. What proportion of shafts conforms to specifications? 3.3.1 Normal Approximation to the Binomial and Poisson Distributions Binomial Approximation If X is a binomial random variable with parameter n and p Is approximately a standard random variable. To approximate a binomial probability with a normal distribution a correction (continuity) factor is apllied. 41
  • 42. The approximation is good for np > 5 and n(1-p) > 5. Poisson Approximation If X is a Poisson random variable with E(X) = λ and V(X) = λ, is approximately a standard random variable. The approximation is good for λ > 5. Example 3.10 The manufacturing of semiconductor chips produces 2% defective chips. Assume that chips are independent and that a lot contains 1000 chips. a) Approximate the probability that more than 25 chips are defective. b) Approximate the probability that between 20 and 30 chips are defective 42
  • 43. 3.4 Lognormal Distribution Variables in the system sometimes follow an exponential relationship, where the exponent is a random variable, say W, X = exp(W). If W has a normal distribution then the distribution of X is called a lognormal distribution. Notation Range Probability density function Cumulative Distribution Function no closed form expression Parameters Comments If Y ~ N(µ, σ2) and X = eY, then X ~ LN((µ, σ2) The lognormal random variable is often used to represent elapsed times, especially equipment repair times, and material properties. Mean Variance 43
  • 44. Example 3.11 A wood floor system can be evaluated in one way by measuring its modulus of elasticity (MOE) measured in 106 psi. One particular type of system is such that its MOE is a lognormal random variable having µ = 0.375 and σ = 0.25. 1. What is the probability that a system’s MOE is less than 2? 2. Find the value of MOE that is exceeded by only 1% of the systems? 3.5 The Weibull Distribution The Weibull distribution is often used to model the time until failure of many different physical systems. It is used in Reliability time-dependent failures models, where the failure distribution may be used to model both increasing and decreasing failure rates. Notation 44
  • 45. Range Probability Density Function Cumulative Distribution Function Parameters Mean Variance Comments If β = 1, then X ~ expon(1/η) The Weibull random variable is most often used to represent elapsed time, especially time to failure of a unit of equipment. Example 3.12 The time to failure of a power supply is a Weibull random variable having β = 2.0 and η = 1000.0 hours. The manufacturer sells a warranty such that only 5% of the power supplies fail before the warranty expires. What is the time period of the warranty? 45
  • 46. 4 JOINT PROBABILITY DISTRIBUTIONS Up to this point we have considered issues related to a single random variable. Now we are going to consider situations in which we have two or more random variables that we are interested in studying. 4.1 Two or more discrete random variables Definition 4.1 The function f(x, y) is a joint probability distribution or probability mass function of discrete random variables X and Y if 1. 2. 3. Example 4.1 Let X denote the number of times a certain numerical control machine will malfunction: 1, 2 or 3 times on a given day. Let Y denote the number of times a technician is called on an emergency call. Their joint probability distribution is given as f (x, y ) x 1 2 3 1 0.05 0.05 0.1 y 2 0.05 0.1 0.35 3 0 0.2 0.1 a) Find P(X<3, Y = 1) b) Find the probability that the technician is called at least 2 times and the machine fails no more than 1 time. 46
  • 47. c) Find P(X>Y) When studying joint probability distribution we are also interested in the probability distributions of each variable individually, which is referred as the marginal probability distribution. Theorem 4.1 Let X and Y be discrete random variables having joint probability mass functions f(x, y). Let x1 , x2 ,... denote the possible values of X, and let y1 , y2 ,... denote the possible values of Y. Let f x ( x) denote the marginal probability mass function of X, and let f y ( y ) denote the (marginal) probability mass function of Y. Then, 47
  • 48. Example 4.2 Let X and Y be discrete random variables such that f(1, 1) = 1/9 f(1, 2) = 1/6 f(1, 3) = 1/8 f(2, 1) = 1/18 f(2, 2) = 1/9 f(2, 3) = 1/9 f(3, 1) = 1/9 f(3, 2) = 1/9 f(3, 3) = 1/6 Find the marginal probability mass function of X and Y 48
  • 49. Definition 4.2 The function f(x, y) is a joint probability density function of continuous random variables X and Y if 1. 2. 3. Example 4.3 A candy company distributes boxes of chocolates with a mixture of creams, toffees, and nuts coated in both light and dark chocolates. For a randomly selected box, let X and Y, respectively, be the proportions of the light and dark chocolates that are creams and suppose that joint density function is 2   (2 x + 3 y ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 f ( x, y ) =  5  0,  elsewhere   a) verify condition 2 49
  • 50. c) Find P[( X , Y ∈ A], where A={(x,y),|0 ≤ x ≤ 1/2, 1/4 ≤ y ≤ 1/2} Theorem 4.2 Marginal Probability Density Function Let X and Y be continuous random variables having probability density function f(x, y). le. Let f x ( x) denote the marginal probability density function of X, and let f y ( y ) denote the (marginal) probability density function of Y. Then, 50
  • 51. Example 4.4 Let X and Y be continuous random variables such that f(x, y) = 0.75 e-0.3y Find the marginal probability density function of X and Y. Theorem 4.3 The Law of the Unconscious Statistician Let X and Y be discrete (continuous) random variables having joint probability mass (density) function f(x,y). Let x1, x2, … denote the possible values of X, and let y1, y2, … denote the possible values of Y. Let g(X,Y) be a real-valued function. Then 51
  • 52. Example 4.5 Suppose X and Y are discrete random variables having joint probability mass function f(x,y). Let x1, x2, … denote the possible values of X, and let y1, y2, … denote the possible values of Y. What is E(X+Y)? 52
  • 53. Theorem 4.4 Expectation of a Sum of Random Variables Let X1, X2, …, Xn be random variables, and let a1, a2, …an be constants. Then Example 4.6 What is the E(3X – 2Y + 4)? Theorem 4.5 Independent Discrete Random Variables Let X and Y be random variables having joint probability mass function f(x, y). Let fx(x) denote the marginal probability mass function of X, and let fy(y) denote the marginal probability mass function of Y. Then X and Y are said to be independent iff 53
  • 54. Theorem 4.6 Independent Continuous Random Variables Let X and Y be random variables having joint probability density function f(x, y). Let fx(x) denote the marginal probability density function of X, and let fy(y) denote the marginal probability density function of Y. Then X and Y are said to be independent iff Example 4.6 Consider example 4.2. Are X and Y independents? Example 4.7 Consider example 4.4. Are X and Y independent? Definition 4.3 Let X and Y be random variables. The covariance of X and Y is denoted as Cov(X,Y) and given by 54
  • 55. A positive covariance indicates that X tends to increase (decrease) as Y increases (decreases). A negative covariance indicates that X tends to decrease (increase) as Y increases (decreases). Example 4.8 Example 4.2 continued. Find the covariance of X and Y. Theorem 4.7 Covariance of Independent Random Variables Let X and Y be random variables. If X and Y are independent, then Cov(X, Y) = 0. Theorem 4.8 Variance of the Sum of Random Variables Let X1, X2, … , XN be random variables. Then Theorem 4.9 Variance of the Sum of Independent Random Variables Let X1, X2, … , XN be independent random variables. Then 55
  • 56. Definition 4.4 Let X and Y be two random variables. The correlation between X and Y is denoted by ρxy and given by Note that correlation and covariance have the same interpretation regarding the relationship between the two variables. However, correlation does not have units and is restricted to the range (-1, 1). Therefore, the magnitude of the correlation provides some idea of the strength of the relationship between the two random variables. 56
  • 57. 5 RANDOM SAMPLES, STATISTICS AND THE CENTRAL LIMIT THEOREM Definition 5.1 Independent random variables X1, X2, … ,Xn are called a random sample. A randomly selected sample means that if a sample of n objects is selected, each subset of size n is equally likely to be selected. If the number of objects in the population is much larger than n, the random variables X1, X2, … ,Xn that represents the observations from the sample can be shown to be approximately independent random variables with the same distribution. Definition 5.2 A statistic is a function of the random variables in a random sample. Given the data, we calculate statistics all the time, such as the sample mean X and the sample standard deviation S. Each statistic has a distribution and it is the distribution that determines how well it estimates a quantity such as μ. We begin our discussions by focusing on a single random variable, X. To perform any meaningful statistical analysis regarding X, we must have data. Let X be some random variable of interest. A random sample on X consists of n observations on X: x1, x2, … , xn. We assume that these observations are independent and identically distributed. The value of n is referred to as the sample size. Definition 5.3 Descriptive statistics refers to the process of collecting data on a random variable and computing meaningful quantities (statistics) that characterize the underlying probability distribution of the random variable. There are three points of interest regarding this definition. • Performing any type of statistical analysis requires that we collect data on one or more random variables. • A statistic is nothing more than a numerical quantity computed using collected data. • If we knew the probability distribution which governed the random variable of interest, collecting data would be unnecessary. Types of Descriptive Statistics 1. measures of central tendency • sample mean (sample average) • sample median • sample mode (discrete random variables only) 57
  • 58. 2. measures of variability • sample range • sample variance • sample standard deviation • sample quartiles Microsoft Excel has a Descriptive Statistics tool within its Data Analysis ToolPak. Computing the Sample Mean • Most of your calculators have a built-in method for entering data and computing the sample mean. • Note the sample mean is a point estimate of the true mean of X. In other words, Computing the Sample Median To compute the sample median, we first rank the data in ascending order and re- number it: x(1), x(2), …. , x(n). The sample median corresponds to the value that has 50% of the data above it and 50% of the data below it. Computing the Sample Mode The sample mode is the most frequently occurring value in the sample. It is typically only of interest in sample data from a discrete random variable, because sample data on a continuous random variable often does not have any repeated values. 58
  • 59. Compute the Sample Range Computing the Sample Variance • Why do we divide by n − 1? We divide by n − 1 because we have n − 1 degrees of freedom. This refers to the fact that if we know the sample mean and n − 1 of the data values, we can compute the remaining data point. • Note that the sample variance is a point estimate of the true variance. In other words, Computing the Sample Standard Deviation • Note that the sample standard deviation is a point estimate of the true standard deviation. Theorem 5.1. If X1, X2, … ,Xn is a random sample of size n taken from a population with mean μ and variance σ2, μ and variance σ2, and if X is the sample mean, the limiting form of the distribution of 59
  • 60. as n→∞, is the standard normal distribution. 5.1 Populations and Random Samples The field of statistical inference consists of those methods used to draw conclusions about a population. These methods utilize the information contained in a random sample of observations from the population. Statistical inference may be divided into two major areas: • parameter estimation • hypothesis testing Both of these areas require a random sample of observations from one or more populations, therefore, we will begin our discussions by addressing the concepts of random sampling. Definition 5.4 A population consists of the totality of the observations with which we are concerned. • We almost always use a random variable/probability distribution to model the behavior of a population. Definition 5.5 The number of observations in the population is called the size of the population. • Populations may be finite or infinite. However, we can typically assume the population is infinite. • In some cases, a population is conceptual. For example, the population of items to be manufactured is a conceptual population. Definition 5.6 A sample is a subset of observations selected from a population. • We model these observations using random variables. • If our inferences are to be statistically valid, then the sample must be representative of the entire population. In other words, we want to ensure that we take a random sample. 60
  • 61. Definition 5.7 The random variables X1, X2, … , Xn are a random sample of size n if X1, X2, … , Xn are independent and identically distributed. • After the data has been collected, the numerical values of the observations are denoted as x1, x2, … , xn. • The next step in statistical inference is to use the collected data to compute one or more statistics of interest. 5.2 Point Estimates ˆ Definition 5.8 A statistic, Θ , is any function of the observations in a random sample. • In parameter estimation, statistics are used to estimate quantities of interest. • • The measures of central tendency and variability we considered in “Descriptive Statistics” are all statistics. • Definition 5.9 A point estimate of some population parameter θ is a single ˆ ˆ numerical value θ of a statistic Θ . • Estimation problems occur frequently in engineering. The quantities that we will focus on are: • the mean µ of a population • the standard deviation σ of a population • the proportion p of items in a population that belong to a class of interest – p is the probability of success for a Bernoulli trial The point estimates that we use are: • • • 61
  • 62. 5.3 Sampling Distributions A statistic is a function of the observations in the random sample. These observations are random variables, therefore, the statistic itself is a random variable. All random variables have probability distributions. Definition 5.10 The probability distribution of a statistic is called a sampling distribution. • The sampling distribution of a statistic depends on the probability distribution which governs the entire population, the size of the random sample, and the method of sample selection. Theorem 5.3 The Sampling Distribution of the Mean If X1, X2, … , Xn are IID N(µ,σ2) random variables, then the sample mean is a normal random variable having mean and variance . Thus, if we are sampling from a normal population then the sampling distribution of the mean is normal. But what if we are not sampling from a normal population? Theorem 5.4 The Central Limit Theorem If X1, X2, … , Xn is a random sample of size n taken from a population with mean µ and variance σ2, then as n → ∞, is a standard normal random variable. • The quality of the normal approximation depends on the true probability distribution governing the population and the sample size. • For most cases of practical interest, n ≥ 30 ensures a relatively good approximation. • If n < 30, then the underlying probability distribution must not be severely non-normal. Example 5.1 A plastics company produces cylindrical tubes for various industrial applications. One of their production processes is such that the diameter of a tube is normally distributed with a mean of 1 inch and a standard deviation of 0.02 inch. (a) What is the probability that a single tube has a diameter of more than 1.015 inches? 62
  • 63. X = diameter of a tube (measured in inches) ~ N( ) (b) What is the probability that the average diameter of five tubes is more than 1.015 inches? n= X = average diameter ~ N( ) (c) What is the probability that the average diameter of 25 tubes is more than 1.015 inches? n= X = average diameter ~ N( ) Example 5.2 The life length of an electronic component, T, is exponentially distributed with a mean of 10,000 hours. (a) What is the probability that a single component lasts more than 7500 hours? (b) What is the probability that the average life length for 200 components is more than 9500 hours? E(T) = hours σT = hours 63
  • 64. Note that . (c) What is the probability that the average life length for 10 components is more than 9500 hours? n is too small to use the CLT approximation S 10 Note that T = . 10 If we had tried to use the CLT: Now consider the case in which we are interested in studying two independent populations. Let the first population have mean µ1 and standard deviation σ1, and let the second population have mean µ2 and standard deviation σ2. If we are interested in comparing the two means, then the obvious point estimate of interest is µ1 − µ 2 = X 1 − X 2 . ˆ What is the sampling distribution of this statistic? 64
  • 65. Theorem 5.4 The Sampling Distribution of the Difference in Two Means If we have two independent populations with means µ1 and µ2 and standard deviations σ1 and σ2, and if a random sample of size n1 is taken from the first population and a random sample of size n2 is taken from the second population, then the sampling distribution of is standard normal as n1 and n2 → ∞. If the two populations are normal, then the sampling distribution of Z is exactly standard normal. • Again, the approximation is relatively accurate if n1 ≥ 30 and n2 ≥ 30. Example 5.3 The life length of batteries produced by Battery Manufacturer A is a continuous random variable having a mean of 1500 hours and a standard deviation of 100 hours. The life length of batteries produced by Battery Manufacturer B is a continuous random variable having a mean of 1400 hours and a standard deviation of 200 hours. (a) Suppose 50 batteries of each type are tested. What is the probability that Battery Manufacturer A’s sample average life length exceeds Battery Manufacturer B’s by more than 75 hours? (b) How would your answer change if only 12 batteries of each type were tested? There is not enough information to answer the question. If we assume normality, then we could proceed. 65
  • 66. 5.4 Confidence Intervals A point estimate provides only a single number for drawing conclusions about a parameter. And if another random sample were selected, this point estimate would almost certainly be different. In fact, this difference could be drastic. For this reason, a point estimate typically does not supply adequate information to an engineer. In such cases, it may be possible and useful to construct a confidence interval which expresses the degree of uncertainty associated with a point estimate. Definition 5.11 If θ is the parameter of interest, then the point estimate and sampling distribution of θ can be used to identify a 100(1 − α )% confidence interval on θ. This interval is of the form: L and U are called the lower-confidence limit and upper-confidence limit. If L and U are constructed properly, then . The quantity (1 − α) is called the confidence coefficient. • The confidence coefficient is a measure of the accuracy of the confidence interval. For example, if a 90% confidence interval is constructed, then the probability that the true value of θ is contained in the interval is 0.9. • The length of the confidence interval is a measure of the precision of the point estimate. A general rule of thumb is that increasing the sample size improves the precision of a point estimate. Confidence intervals are closely related to hypothesis testing. Therefore, we will address confidence intervals within the context of hypothesis testing. 66
  • 67. 6 FORMULATING STATISTICAL HYPOTHESES For many engineering problems, a decision must be made as to whether a particular statement about a population parameter is true or false. In other words, we must either accept the statement as being true or reject the statement as being false. Example 6.1 Consider the following statements regarding the population of engineering students at the Philadelphia University. 1. The average GPA is 3.0. 2. The standard deviation of age is 5 years. 3. 30% are afraid to fly 4. The average age of mothers is the same as the average age of fathers. Definition 6.1 A statistical hypothesis is a statement about the parameters of one or more populations. • It is worthwhile to note that a statistical hypothesis is a statement about the underlying probability distributions, not the sample data. Example 6.2 (Ex. 6,1 continued) Convert each of the statements into a statistical hypothesis. 1. 2. 3. 4. To perform a test of hypotheses, we must have a contradictory statement about the parameters of interest. Example 6.3 Consider the following contradictory statements. 1. No, it’s more than that. 2. No, it’s not. 3. No, it’s less than that. 67
  • 68. 4. No, fathers are older. The result of our original statement and our contradictory statement is a set of two hypotheses. Example 6.4 (Ex. 6.1 continued) Combine the two statements for each of the examples. 1. 2. 3. 4. Our original statement is referred to as the null hypothesis (H0). • The value specified in the null hypothesis may be a previously established value (in which case we are trying to detect changes to that value), a theoretical value (in which case we are trying to verify the theory), or a design specification (in which case we are trying to determine if the specification has been met). The contradictory statement (H1) is referred to as the alternative hypothesis. • Note that an alternative hypothesis can be one-sided (1, 3, 4) or two-sided (2). • The decision as to whether the alternative hypothesis should be one-sided or two- sided depends on the problem of interest. Type I Error Rejecting the null hypothesis H0 when it is true 68
  • 69. For example, the true mean of example 2.4.1 is 3.0. However, for the randomly selected sample we could observe that the test statistic x falls into the critical region. Therefore, we could reject the null hypothesis in favor of the alternative hypothesis H1. Type II Error Failing to reject the null hypothesis when it is false. 6.1 Performing a Hypothesis Test Definition 6.2 A procedure leading to a decision about a particular null and alternative hypothesis is called a hypothesis test. • Hypothesis testing involves the use of sample data on the population(s) of interest. • If the sample data is consistent with a hypothesis, then we “accept” that hypothesis and conclude that the corresponding statement about the population is true. • We “reject” the other hypothesis, and conclude that the corresponding statement is false. However, the truth or falsity of the statements can never be known with certainty, so we need to define our procedure so that we limit the probability of making an erroneous decision. • The burden of proof is placed upon the alternative hypothesis. Basic Hypothesis Testing Procedure A random sample is collected on the population(s) of interest, a test statistic is computed based on the sample data, and the test statistic is used to make the decision to either accept (some people say “fail to reject”) or reject the null hypothesis. Example 6.5 A manufactured product is used in such a way that its most important dimension is its width. Let X denote the width of a manufactured product. Suppose historical data suggests that X is a normal random variable having σ = 4 cm. However, the mean can change due to fluctuations in the manufacturing process. Therefore, we wish to perform the following hypothesis test. H0: H1: 69
  • 70. The following procedure has been proposed. Inspect a random sample of 25 products. Measure the width of each product. If the sample mean is less than 188 cm or more than 192 cm, reject H0. For the proposed procedure, identify the following: (a) sample size (b) test statistic (c) critical region (d) acceptance region Is the procedure defined in Ex. 6.5 a good procedure? Since we are only taking a random sample, we cannot guarantee that the results of the hypothesis test will lead to us making the correct decision. Therefore, the question “Is this a good procedure?” can be broken down into two additional questions. 1. If the null hypothesis is true, what is the probability that we accept H0? 2. If the null hypothesis is not true, what is the probability that we accept H0? Example 6.6 (Ex. 6.5 continued) If the null hypothesis is true, what is the probability that we accept H0? 70
  • 71. note assumptions Therefore, if the null hypothesis is true, then there is a 98.76% chance that we will make the correct decision. However, that also means that there is a 1.24% chance that we will make the incorrect decision (reject H0 when H0 is true). • Such a mistake is called a Type I error, or a false positive. • α = P(Type I error) = level of significance • In our example, α = 0.0124. When constructing a hypothesis test, we get to specify α. If the null hypothesis is not true (i.e. the alternative hypothesis is true), then accepting H0 would be a mistake. • Accepting H0 when H0 is false is called a Type II error, or a false negative. • • • Unfortunately, we can’t answer this question (find a value for β) in general. Since the alternative hypothesis is µ ≠ 190 cm, there are an uncountable number of situations in which the alternative hypothesis is true. • We must identify specific situations of interest and analyze each one individually. Example 6.7 (Ex. 6.5 continued) Find the probability of a Type II error when µ = 189 cm and µ = 193 cm. For µ = 189 cm: 71
  • 72. For µ = 193 cm: Note that as µ moves away from the hypothesized value (190 cm), β decreases. If we experiment with other sample sizes and critical/acceptance regions, we will see that the values of α and β can change significantly. However, there are some general “truths” for hypothesis testing. 1. We can explicitly control α (given that the underlying assumptions are true). 2. Type I and Type II error are inversely related. 3. Increasing the sample size is the only way to simultaneously reduce α and β. 4. We can only control β for one specific situation. Since we can explicitly control α, the probability of a Type I error, rejecting H0 is a strong conclusion. However, we can only control Type II errors in a very limited fashion. Therefore, accepting H0 is a weak conclusion. In fact, many statisticians use the terminology, “fail to reject H0” as opposed to “accept H0.” • Since “reject H0” is a strong conclusion, we should put the statement about which it is important to make a strong conclusion in the alternative hypothesis. Example 6.8 How would the procedure change if we wished to perform the following hypothesis test? H0: µ ≥ 190 cm 72
  • 73. H1: µ < 190 cm Proposed hypothesis testing procedure: Inspect a random sample of 25 observations on the width of a product. If the sample mean is less than 188 cm, reject H0. 6.1.1 Generic Hypothesis Testing Procedure All hypothesis have a common procedure. The textbook identifies eight steps in this procedure. 1. From the problem context and assumptions, identify the parameter of interest. 2. State the null hypothesis, H0. 3. Specify an appropriate alternative hypothesis, H1. 4. Choose a significance level α. 5. State an appropriate test statistic. 6. State the critical region for that statistic. 7. Collect a random sample of observations on the random variable (or from the population) of interest, and compute the test statistic. 8. Compare the test statistic value to the critical region and decided whether or not to reject H0. 6.2 Performing Hypothesis Tests on µ when σ is Known In this section, we consider making inferences about the mean µ of a single population where the population standard deviation σ is known. • We will assume that a random sample X1, X2, … , Xn has been taken from the population. • We will also assume that either the population is normal or the conditions of the Central Limit Theorem apply. Suppose we wish to perform the following hypothesis test. 73
  • 74. It is somewhat obvious that inferences regarding µ would be based on the value of the sample mean. However, it is usually more convenient to standardize the sample mean. Using what we know about the sampling distribution of the mean, it is reasonable to conclude that the test statistic will be If the null hypothesis is true, then the test statistic is a standard normal random variable. Therefore, we only reject the null hypothesis if the value of Z0 is unusual for an observation on a standard normal random variable. Specifically, we reject H0 if: where α is the specified level of significance. The acceptance region is therefore Obviously, the acceptance and critical regions can be converted to expressions in terms of the sample mean. Reject H0 if X > a or X < b where 74
  • 75. Example 6.9 Let X denote the GPA of an engineering student at the Philadelphia University. It is widely known that, for this population, σ = 0.5. The population mean is not widely known, however, it is commonly believed that the average GPA is 3.0. We wish to test this hypothesis using a sample of size 25 and a level of significance of 0.05. (a) Identify the null and alternative hypotheses. (b) List any required assumptions. (c) Identify the test statistic and the critical region. Reject H0 if (d) Suppose 25 students are sampled and the sample average GPA is 3.18. State and interpret the conclusion of the test. Z0 = (e) What is the probability of a Type I error for this test? (f) How would the results change if we had used α = 0.10? 75
  • 76. Critical region changes. We may also modify this procedure if the test is one-sided. This modification only requires a change in the critical/acceptance regions. If the alternative hypothesis is then a negative value of Z0 would not indicate a need to reject H0. Therefore, we only reject H0 if Likewise, if the alternative hypothesis is Example 6.10 The Glass Bottle Company (GBC) manufactures brown glass beverage containers that are sold to breweries. One of the key characteristics of these bottles is their volume. GBC knows that the standard deviation of volume is 0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a sample size of 30 and a level of significance of 0.01. (a) Identify the null and alternative hypotheses. 76
  • 77. (b) Identify the test statistic and the critical region. (c) Suppose 30 bottles are measured and the sample mean is 12.23. State and interpret the conclusion of the test. 6.2.1 Computing P-Values We have already seen that the choice of the value for the level of significance can impact the conclusions derived form a test of hypotheses. As a result, we may be interested in answering the question: How close did we come to making the opposite conclusion? We answer this question using an equivalent decision approach that can be used as an alternative to the critical/acceptance regions. This approach is called the P- value approach. Definition 6.3 The P-value for a hypothesis test is the smallest level of significance that would lead to rejection of the null hypothesis. How we compute the P-value depends on the form of the alternative hypothesis. We reject H0 if 77
  • 78. Example 6.11 (Ex. 6.9 continued) Compute the P-value for the test. Note when α = 0.05, But, when α = 0.10, Example 612 (Ex. 6.10 continued) Compute the P-value for the test. Since α = 0.01, P > α 6.2.2 Type II Error In hypothesis testing, we get to specify the probability of a Type I error ( α). However, the probability of a Type II error (β) depends on the choice of sample size (n). Consider first the case in which the alternative hypothesis is H1: µ ≠ µ0. Before we can proceed, we must be more specific about “H0 is false”. We will accomplish this by saying: where δ ≠ 0. 78
  • 79. X − µ0  β = P − Z α / 2 ≤  ≤ Zα / 2 µ = µ 0 + δ    σ n   σ σ  β = P µ 0 − Z α / 2  ≤ X ≤ µ 0 + Zα / 2 µ = µ0 + δ    n n   σ σ   µ 0 − Zα / 2 − ( µ0 + δ ) µ 0 + Zα / 2 − (µ0 + δ )  β = P  n n ≤Z≤  σ σ     n n  If the alternative hypothesis is H1: µ > µ0, then . If the alternative hypothesis is H1: µ < µ0. . Example 6.13 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at the Philadelphia University. It is widely known that, for this population, σ = 0.5. The population mean is not widely known, however, it is commonly believed that the average GPA is 3.0. We wish to test this hypothesis using a sample of size 25 and a level of significance of 0.05. In Example 16.5, we formulated this hypothesis test as 79
  • 80. The corresponding test statistic and critical region are given by X − 3.0 Z0 = 0.5 25 (a) If µ = 3.2, what is the Type II error probability for this test? δ = µ − µ0 = β = P( − 3.96 ≤ Z ≤ −0.04 ) = 0.4840 (b) If µ = 2.68, what is the Type II error probability for this test? δ = µ − µ0 = β = P(1.24 ≤ Z ≤ 5.16 ) = 0.1075 (c) If µ = 2.68, what is the power of the test? power = 80
  • 81. (d) If µ = 3.32, what is the power of the test? power = 0.8925 Example 6.14 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures brown glass beverage containers that are sold to breweries. One of the key characteristics of these bottles is their volume. GBC knows that the standard deviation of volume is 0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a sample size of 30 and a level of significance of 0.01. Example 16.6, we formulated this hypothesis test as H0: µ ≤ 12.2 H1: µ > 12.2 The corresponding test statistic and critical region are given by Reject H0 if (a) If µ = 12.27 oz, what is the Type II error probability for this test? δ = µ − µ0 = 0.07  0.07 30  β = P Z ≤ 2.3263 −  = P( Z ≤ −2.47 ) = 0.0068  0.08    (b) If µ = 12.15 oz, what is the Type II error probability for this test? This is a poor question. If µ = 12.15 oz, then “technically” the null hypothesis is true. If we are truly concerned with detecting this, we should have used a two- sided alternative hypothesis. 81
  • 82. 6.2.3 Choosing the Sample Size The expressions for β allow the determination of an appropriate sample size. To choose the proper sample size for our test, we must specify a value of β for a specified value of δ. For the case in which H1: µ ≠µ0, the symmetry of the test allows us to always specify a positive value of δ. If we specify a relatively small value of β (≤ 0.1), then the lower side of the equation becomes negligible. So, the equation for β reduces to: This yields: δ n Z 1− β = − Z β = Z α / 2 − σ δ n = Zα / 2 + Z β σ For both cases in which the alternative hypothesis is one-sided: 82
  • 83. Example 6.14 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at the Philadelphia University. It is widely known that, for this population, σ = 0.5. The population mean is not widely known, however, it is commonly believed that the average GPA is 3.0. We wish to test this hypothesis using a sample of size n and a level of significance of 0.05. In Example 16.5, we formulated this hypothesis test as H0: µ = 3.0 H1: µ ≠ 3.0 The corresponding test statistic and critical region are given by (a) If we want β = 0.10 at µ = 3.2, what sample size should we use? δ = 0.2 n = 66 (b) If we want β = 0.10 at µ = 3.25, what sample size should we use? δ = 0.25 n= ( Z 0.025 + Z 0.10 ) 2 0.5 2 = (1.96 + 1.282) 2 0.5 2 = 42.04 0.25 2 0.25 2 n = 43 (c) If we want β = 0.05 at µ = 3.2, what sample size should we use? δ = 0.2 n= ( Z 0.025 + Z 0.05 ) 2 0.5 2 = (1.96 + 1.645) 2 0.52 = 81.2 0.2 2 0.2 2 83
  • 84. n = 82 Example 6.15 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures brown glass beverage containers that are sold to breweries. One of the key characteristics of these bottles is their volume. GBC knows that the standard deviation of volume is 0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a sample size of n and a level of significance of 0.01. Example 16.6, we formulated this hypothesis test as The corresponding test statistic and critical region are given by If we wish to have a test power of 0.95 at µ = 12.25 oz, what is the required sample size for this test? 6.2.4 Choosing the Sample Size The expressions for β allow the determination of an appropriate sample size. To choose the proper sample size for our test, we must specify a value of β for a specified value of δ. For the case in which H1: µ ≠µ0, the symmetry of the test allows us to always specify a positive value of δ. If we specify a relatively small value of β (≤ 0.1), then the lower side of the equation becomes negligible. So, the equation for β reduces to:  δ n β = P Z ≤ Z α / 2 −    σ   84
  • 85. This yields: δ n Z 1− β = − Z β = Z α / 2 − σ δ n = Zα / 2 + Z β σ (Z α/2 + Zβ ) σ 2 2 n= δ2 For both cases in which the alternative hypothesis is one-sided: (Z α + Zβ ) σ 2 2 n= δ2 Example 6.14 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at the Philadelphia University. It is widely known that, for this population, σ = 0.5. The population mean is not widely known, however, it is commonly believed that the average GPA is 3.0. We wish to test this hypothesis using a sample of size n and a level of significance of 0.05. In Example 16.5, we formulated this hypothesis test as H0: µ = 3.0 H1: µ ≠ 3.0 The corresponding test statistic and critical region are given by X − 3.0 Z0 = 0.5 n Reject H0 if Z0 < −Zα/2 = −Z0.025 = −1.96 or if Z0 > Zα/2 = 1.96 (a) If we want β = 0.10 at µ = 3.2, what sample size should we use? δ = 0.2 n= ( Z 0.025 + Z 0.10 ) 2 0.5 2 = (1.96 + 1.282) 2 0.5 2 = 65.7 0.2 2 0.2 2 n = 66 (b) If we want β = 0.10 at µ = 3.25, what sample size should we use? 85
  • 86. δ = 0.25 n= ( Z 0.025 + Z 0.10 ) 2 0.5 2 = (1.96 + 1.282) 2 0.5 2 = 42.04 0.25 2 0.25 2 n = 43 (c) If we want β = 0.05 at µ = 3.2, what sample size should we use? δ = 0.2 n= ( Z 0.025 + Z 0.05 ) 2 0.5 2 = (1.96 + 1.645) 2 0.52 = 81.2 0.2 2 0.2 2 n = 82 Example 6.15 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures brown glass beverage containers that are sold to breweries. One of the key characteristics of these bottles is their volume. GBC knows that the standard deviation of volume is 0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a sample size of n and a level of significance of 0.01. Example 16.6, we formulated this hypothesis test as H0: µ ≤ 12.2 H1: µ > 12.2 The corresponding test statistic and critical region are given by X − 12.2 Z0 = 0.08 n Reject H0 if Z0 > Zα = Z0.01 = 2.3263 If we wish to have a test power of 0.95 at µ = 12.25 oz, what is the required sample size for this test? δ = 0.05 β = 0.05 n= ( Z 0.01 + Z 0.05 ) 2 0.082 = ( 2.326 + 1.645) 2 0.082 = 40.4 0.05 2 0.05 2 86
  • 87. n = 41 6.3 Statistical Significance A hypothesis test is a test for statistical significance. When we reject H 0, we are stating that the data indicates a statistically significant difference between the true mean and the hypothesized value of the mean. When we accept H 0, then we are stating that there is not a statistically significant difference. Statistical difference and practical significance are not the same. This is especially important to recognize when the sample size is large. 6.3.1 Introduction to Confidence Intervals As we have previously discussed, the sample mean is the most often used point estimate for the population mean. However, we also pointed out that two different samples would most likely result in two different sample means. Therefore, we define confidence intervals as a means of quantifying the uncertainty in our point estimate. If θ is the parameter of interest, then the point estimate and sampling distribution of θ can be used to identify a 100(1 − α )% confidence interval on θ. This interval is of the form: L ≤ θ ≤ U. L and U are called the lower-confidence limit and upper-confidence limit. If L and U are constructed properly, then P(L ≤ θ ≤ U) = 1 − α. The quantity (1 − α) is called the confidence coefficient. The confidence coefficient is a measure of the accuracy of the confidence interval. For example, if a 90% confidence interval is constructed, then the probability that the true value of θ is contained in the interval is 0.9. 87
  • 88. The length of the confidence interval is a measure of the precision of the point estimate. A general rule of thumb is that increasing the sample size improves the precision of a point estimate. 6.3.2 Confidence Interval on µ when σ is Known We can use what we have learned to construct a 100(1 − α )% confidence interval on the mean, assuming that (a) the population standard deviation is known, and (b) the population is normally distributed (or the conditions of the Central Limit Theorem apply). P( − Z α / 2 ≤ Z ≤ Z α / 2 ) = 1 − α  X −µ  P − Z α / 2 ≤ ≤ Zα / 2  = 1−α  σ n     σ σ  P X − Z α / 2  ≤ µ ≤ X + Zα / 2  = 1−α   n n Such a confidence interval is called a two-sided confidence interval. We can also construct one-sided confidence intervals for the same set of assumptions (σ known, normal population or Central Limit Theorem conditions apply). The 100(1 − α)% upper-confidence interval is given by  σ  P µ ≤ X + Z α   = 1−α   n and the 100(1 − α)% lower-confidence interval is given by  σ  P µ ≥ X − Z α   = 1−α .   n Example 6.16 Let X denote the GPA of an engineering student at the Philadelphia University. It is widely known that, for this population, σ = 0.5. The population mean is not widely known, however, we have a collected a sample of size 25 from the population. The resulting sample mean was 3.18. 88
  • 89. (a) What assumptions, if any, are required to use this data to construct a confidence interval on the mean GPA? GPA is normally distributed. (b) Construct a 95% confidence interval on µ and interpret its meaning. σ 0.5 X ± Z 0.025 = 3.18 ± 1.96 n 25 2.984 ≤ µ ≤ 3.376 P( 2.984 ≤ µ ≤ 3.376 ) = 0.95 (c) Construct a 99% confidence interval on µ and compare it to the confidence interval obtained in part (b). σ 0.5 X ± Z 0.005 = 3.18 ± 2.58 n 25 2.922 ≤ µ ≤ 3.438 more accurate, but less precise (d) Construct a 95% upper-confidence interval on µ and interpret its meaning. σ 0.5 X + Z 0.05 = 3.18 + 1.645 n 25 µ ≤ 3.3445 P( µ ≤ 3.3445) = 0.95 (e) Construct a 95% lower-confidence interval on µ and interpret its meaning. σ 0.5 X − Z 0.05 = 3.18 − 1.645 n 25 µ ≥ 3.0155 P( µ ≥ 3.0155) = 0.95 (f) Combine the two confidence intervals obtained in parts (d) and (e). Is this confidence interval superior to the one constructed in part (b)? 89
  • 90. 3.0155 ≤ µ ≤ 3.3445 No, it is only a 90% confidence interval. 6.3.3 Choosing the Sample Size for a Confidence Interval on µ when σ is Known The percentage of a confidence interval is a measure of the accuracy of the confidence interval. The half-width of the confidence interval, E, is a measure of the precision of the confidence interval. For a two-sided confidence interval, E = (U – L)/2. For an upper- confidence interval, E = U − θ and for a lower-confidence interval, E = θ − L. For a given level of accuracy (α), we can control the precision of the confidence interval using the sample size. For the two-sided confidence interval on µ, we specify a value of E and note that: σ E = Zα / 2 . n Then, we can solve for n. 2 Z σ  n =  α/2   E  For the one-sided confidence intervals: 2 Z σ  n= α  .  E  Example 6.17 (Ex. 6.16 continued) (a) If we wish to construct a 95% confidence interval on µ that has a half-width of 0.1, how many students should we survey? 2 2  Z σ   1.96 ⋅ 0.5  n =  0.025  =   = 96.04  E   0.1  n = 97 90
  • 91. (b) If we wish to construct a 95% upper-confidence interval on µ that has a half- width of 0.1, how many students should we survey? 2 2  Z σ   1.645 ⋅ 0.5  n =  0.05  =   = 67.65  E   0.1  n = 68 (c) If we wish to construct a 90% confidence interval on µ that has a half-width of 0.1, how many students should we survey? 2 2  Z σ   1.645 ⋅ 0.5  n =  0.05  =   = 67.65  E   0.1  n = 68 6.3.4 Using Confidence Intervals to Perform Hypothesis Tests on µ when σ is Known Thus far, we have considered two methods of evaluating hypothesis tests: critical regions and P-values. A third, equivalent method is to use a confidence interval. 1. Specify: µo, α, n 2. If H1: µ ≠ µo, construct a 100(1 − α)% confidence interval on µ. If H1: µ > µo, construct a 100(1 − α)% lower-confidence interval on µ. If H1: µ < µo, construct a 100(1 − α)% upper-confidence interval on µ. 3. Reject H0 is µo is not contained in that confidence interval. Example 6.17 (Ex. 6.10 continued) Let X denote the GPA of an engineering student at the Philadelphia University. It is widely known that, for this population, σ = 0.5. The population mean is not widely known, however, it is commonly believed that the average GPA is 3.0. We wish to test this hypothesis using a sample of size 25 and a level of significance of 0.05. From Ex. 6.10: H0: µ = 3.0 H1: µ ≠ 3.0 91
  • 92. Suppose the sample mean is 3.18. Use a confidence interval to evaluate the hypothesis test. α = 0.05, H1: ≠ 95% confidence interval From Ex. 6.16: 2.984 ≤ µ ≤ 3.376 3.0 is in the confidence interval fail to reject H0 Example 6.18 (Ex. 6.11 continued) The Glass Bottle Company (GBC) manufactures brown glass beverage containers that are sold to breweries. One of the key characteristics of these bottles is their volume. GBC knows that the standard deviation of volume is 0.08 oz. They wish to ensure that the mean volume is not more than 12.2 oz using a sample size of 30 and a level of significance of 0.01. From Ex. 3.2.2: H0: µ ≤ 12.2 H1: µ > 12.2 Suppose the sample mean is 12.23. Use a confidence interval to evaluate the hypothesis test. α = 0.01, H1: > 99% lower-confidence interval σ 0.08 X − Z 0.01 = 12.23 − 2.3263 n 30 µ ≥ 12.1960 12.2 is in the confidence interval fail to reject H0 92
  • 93. 6.4 Hypothesis Test on μ and σ unkown What if σ is Unknown? Suppose we are interested in studying the mean of a population, but we do not know the value of the population standard deviation? • We can use the procedures defined in section 2.3 and replace σ with S, provided that the sample size is large (n ≥ 30). • When the sample size is small and σ is unknown, then we must assume that the population is normally distributed. The t Distribution Suppose we wish to perform the following hypothesis test. H0: µ = µ0 H1: µ ≠ µ0 Suppose we have collected a random sample of size n and that we have used this sample data to compute the sample mean X and the sample standard deviation S. If σ were known then we would compute the test statistic: X − µ0 Z0 = . σ n Therefore, a logical approach is to replace σ with S. The resulting test statistic is: X − µ0 T0 = . S n Before we can proceed, we should analyze the sampling distribution of this test statistic. Theorem 6.1 The t Distribution Let X1, X2, … , Xn be a random sample from a normal population having mean µ. The quantity X −µ T= S n 93
  • 94. has a t distribution with n – 1 degrees of freedom. While we won’t discuss the details of the t distribution, it is important to recognize two points regarding the t probability density function. • First, it is symmetric about 0. • Second, as the number of degrees of freedom increases, the t distribution approaches the standard normal distribution. This explains why it is OK to use the procedures from section 2.3 when n ≥ 30 (at 29 degrees of freedom there is little difference between t and Z). Example 6.19 Suppose T has a t distribution with 7 degrees of freedom. Find the following: (a) P(T > 2.365) Excel function TDIST(x, degrees of freedom, 1 or 2 tails)=P(2.3625, 7, 1) = 0.025 Note Excel gives you the value P(X>x) (b) P(T > 1.415) 0.10 (c) P(T < −3.499) P(T > 3.499) = 0.005 (d) P(T > −2.8) = 1 – Pt (T<2.88) = 0.9867 (e) the value a such that P(T >a) = 0.05 a = t0.05,7 = 1.895 (f) the value of a such that P(T > a) = 0.01 a = t0.01,7 = 2.998 (g) the value of a such that P(T < a) = 0.9975 a = t0.0025,7 = 4.029 94