Chapter 20 and 21 combined testing hypotheses about proportions 2013
1. Hypotheses
In Statistics, a hypothesis proposes a model for
the world and then we look at the data.
If the data are consistent with that model, we
have no reason to disbelieve the hypothesis.
◦ Data consistent with the model lend support
to the hypothesis, but do not prove it.
But if the facts are inconsistent with the model,
we need to make a choice as to whether they are
inconsistent enough to disbelieve the model.
◦ If they are inconsistent enough, we can reject
the model.
2. Hypotheses Testing
Think about the logic of jury trials:
◦ To prove someone is guilty, we start by
assuming they are innocent.
◦ We retain that hypothesis until the
facts make it unlikely beyond a
reasonable doubt.
◦ Then, and only then, we reject the
hypothesis of innocence and declare the
person guilty.
3. Hypotheses (cont.)
The statistical twist is that we can
quantify our level of doubt.
◦ We can use the model proposed by our
hypothesis to calculate the probability
that the event we’ve witnessed could
happen.
◦ That’s just the probability we’re looking
for—it quantifies exactly how surprised
we are to see our results.
◦ This probability is called a P-value.
4. Our Problem
Suppose we tossed a coin 100 times
and we have obtained 38 heads and
62 tails. Is the coin biased toward
tails?
There is no way to say yes or no
with 100% certainty.
But we can evaluate the strength of
support to the hypothesis that “the
coin is biased”.
5. Hypotheses (cont.)
Null hypothesis- H0 established fact, no
change of parameters, a statement that we
expect data to contradict (status quo)
Alternative hypothesis- HA new
conjuncture, change of parameters, your
claim, a statement that needs a strong
support from data to claim it.
Our problem: testing a hypothesis about
p = proportion of times it turns tails (in the
long run)
H0: coin is fair, p = 0.5 (or p ≥ 0.5)
HA: coin is biased, p > 0.5
6. Ex: A statistics professor wants to see if more than
80% of her students enjoyed taking her class. At the end
of the term, she takes a random sample of students from
her large class and asks, in an anonymous survey, if the
students enjoyed taking her class. Which set of
hypotheses should she test?
A. H0: p < 0.80 HA: p > 0.80
B. H0: p = 0.80 HA: p > 0.80
C. H0: p > 0.80 HA: p = 0.80
D. H0: p = 0.80 HA: p < 0.80
7. Ex: An online catalog company wants on-time delivery
for 90% of the orders they ship. They have been shipping
orders via UPS and FedEx but will switch to a
new, cheaper delivery service (ShipFast) unless there is
evidence that this service cannot meet the
90% on-time goal. As a test the company sends a
random sample of orders via ShipFast, and then makes
follow-up phone calls to see if these orders arrived on
time. Which hypotheses should they test?
A. H0: p < 0.90 HA: p > 0.90
B. H0: p = 0.90 HA: p > 0.90
C. H0: p > 0.90 HA: p = 0.90
D. H0: p = 0.90 HA: p < 0.90
8. Hypotheses (cont.)
When the data are consistent with the model from
the null hypothesis, the P-value is high and we are
unable to reject the null hypothesis.
◦ In that case, we have to “retain” the null
hypothesis we started with.
◦ We can’t claim to have proved it; instead we “fail
to reject the null hypothesis” when the data are
consistent with the null hypothesis model and in
line with what we would expect from natural
sampling variability.
If the P-value is low enough, we’ll “reject the null
hypothesis,” since what we observed would be very
unlikely were the null model true.
Assume that the null hypothesis Ho is true and
uphold it, unless data strongly speaks against it.
9. Testing Hypotheses
The null hypothesis, which we denote H0,
specifies a population model parameter of
interest and proposes a value for that
parameter.
We want to compare our data to what we would
expect given that H0 is true.
◦ We can do this by finding out how many
standard deviations away from the proposed
value we are.
We then ask how likely it is to get results like
we did if the null hypothesis were true.
10. The Reasoning of Hypothesis Testing
1. Hypotheses
◦
The null hypothesis: To perform a
hypothesis test, we must first
translate our question of interest
into a statement about model
parameters.
◦
In general, we have
H0: parameter = hypothesized value.
The alternative hypothesis: The
alternative hypothesis, HA, contains
the values of the parameter we
accept if we reject the null.
11. The Reasoning of Hypothesis Testing
(cont.)
2. Model
◦
The test about proportions is called a
one-proportion z-test.
12. One-Proportion z-Test
The conditions for the one-proportion z-test are the same
as for the one proportion z-interval. We test the
hypothesis
H0: p = p0
using the statistic
z
where SD p
ˆ
ˆ
p p0
ˆ
SD p
p0 q0
n
When the conditions are met and the null hypothesis is
true, this statistic follows the standard Normal model, so
we can use that model to obtain a P-value.
13. The Reasoning of Hypothesis Testing
(cont.)
3. Mechanics
◦
◦
◦
Under “mechanics” we place the
actual calculation of our test statistic
from the data.
Different tests will have different
formulas and different test
statistics.
Usually, the mechanics are handled by
a statistics program or calculator,
but it’s good to know the formulas.
14. The Reasoning of Hypothesis Testing
(cont.)
3. Mechanics
◦
If the difference between what we
have observed and what is expected
under the null model H0 assumption is
statistically significant (large enough)
then we reject H0 in favor of HA.
16. The Reasoning of Hypothesis Testing
(cont.)
3. Mechanics continued
◦
The ultimate goal of the calculation is
to obtain a P-value.
The P-value is the probability that the
observed statistic value (or an even more
extreme value) could occur if the null
model were correct.
If the P-value is small enough, we’ll
reject the null hypothesis.
Note: The P-value is a conditional
probability—it’s the probability that the
observed results could have happened if
the null hypothesis is true.
17. The Reasoning of Hypothesis Testing
P-value
The probability that the test statistics takes
the observed or more extreme value, when
the null hypothesis H0 is true.
Our Problem:
P-value = P(z > 2.4)= .0082
For a fair coin the probability of seeing 62 or more
tails in 100 tosses is less than 0.01 (1%).
The smaller the p-value, the stronger evidence
against H0 (that is in favor of HA). So we reject
the null hypothesis that this is a fair coin and
support the alternative that it is biased towards
tails.
18. Just Checking
1. An allergy drug has been tested and found to give
relief to 75% of the patients in a large clinical trial.
Now the scientists want to see if the new improved
version works even better. What would the null
hypothesis and alternative hypothesis be?
2. The new drug is tested and the P-value is 0.0001.
What would you conclude about the new drug?
19. P-value info (Ch 21)
We can use an alpha level or to set a threshold on our P-value.
◦ Alpha level is also called the significance level.
If our P-value is less than our alpha level, we will reject the null
hypothesis.
If our P-value is greater than our alpha level, we have to fail to
reject the null hypothesis.
We can define a “rare event” arbitrarily by setting a threshold for
our P-value.
We would then say that the results are statistically significant.
Alpha levels are represented using the symbol α.
Typically we use α = 0.1, 0.05, or 0.01.
When in doubt, we use α = 0.05.
Partially depends on importance of claim being made.
◦ The more important the claim or higher the stakes, the higher an
alpha level you would use.
20. Statistically Significant (Ch 21)
When we get a P-value below our alpha
level (let’s assume 0.05), we can say “we
reject the null hypothesis at the 5% level
of significance”.
Sometimes, statistical significance doesn’t
mean the difference is important in the
context of the situation.
On the other hand, sometimes a significant
difference may turn out to not be
statistically significant.
◦ Sometimes a larger sample size can fix this.
21. Statistically Significant (Ch 21)
It may make you uncomfortable to reject/fail to
reject.
If your P-value falls just slightly above your
alpha level, you’re not allowed to reject the null
hypothesis. (fail to reject the null)
Yet a P-value just barely below the alpha level
leads to rejection.
When you decide to declare a verdict, it is a
good idea to report the P-value as an indication
of the strength of the evidence.
22. The Reasoning of Hypothesis Testing
(cont.)
4. Conclusion/Decision
The conclusion/decision in a
hypothesis test is always a statement
about the null hypothesis.
The conclusion must state either
◦
◦
◦
Reject H0
Fail to reject H0 (uphold H0)
And, as always, the conclusion should
be stated in context.
23. The Reasoning of Hypothesis Testing
(cont.)
4. Conclusion
◦
◦
Your conclusion about the null
hypothesis should never be the end
of a testing procedure.
Often there are actions to take or
policies to change.
24. Alternative Hypotheses
There are three possible alternative
hypotheses:
HA: parameter < hypothesized value
HA: parameter ≠ hypothesized value
HA: parameter > hypothesized value
25. Alternative Hypotheses (cont.)
HA: parameter ≠ value is known as a two-sided
alternative because we are equally interested in
deviations on either side of the null hypothesis
value.
For two-sided alternatives, the P-value is the
probability of deviating in either direction from
the null hypothesis value.
26. Alternative Hypotheses (cont.)
The other two alternative hypotheses are called
one-sided alternatives.
A one-sided alternative focuses on deviations
from the null hypothesis value in only one
direction.
Thus, the P-value for one-sided alternatives is
the probability of deviating only in the direction
of the alternative away from the null hypothesis
value.
28. Critical Values for Hypothesis
Testing
Just like we used critical values in
confidence intervals, we will use
them with alpha levels.
If our z-score is more extreme than
the critical value, then we will have
a P-value smaller than our alpha
level.
29. Just Checking cont.
3. A bank is testing a new method for getting
delinquent customers to pay their past-due
credit card bills. The standard way was to
send a letter (costing about $0.40 each)
asking the customer to pay. That worked
30% of the time. They want to test a new
method that involves sending a video tape to
the customer encouraging them to contact
the bank and set up a payment plan.
Developing and sending the video costs
about $10.00 per customer. What is the
parameter of interest? What are the null
and alternative hypotheses?
30. Just Checking cont.
4. The bank sets up an experiment to test the
effectiveness of the video tape. They mail
it out to several randomly selected
delinquent customers and keep track of how
many actually do contact the bank to
arrange payments. The bank’s statistician
calculates a P-value of 0.003. What does
this P-value suggest about the video tape?
31. 5. Some people are concerned that new
tougher standards and high-stakes tests
may drive up the high school dropout
rate. The National Center for Education
Statistics reported that the high school
dropout rate for the year 2004 was
10.3%. One school district, whose
dropout rate has always been very close
to the national average, reports that
210 of their 1782 students dropped out
last year. Is their experience evidence
that the dropout rate is increasing?
32. 6. In a study of 11,000 car crashes, it was
found that 5720 of them occurred
within 5 miles of home. Is this
significant evidence to show that more
than 50% of car crashes occur within 5
miles of home?
33. Confidence Intervals and
Hypothesis Tests
Confidence intervals and hypothesis tests are built on the
same calculations with the same assumptions and
conditions.
Our conclusion about the null should be consistent with
whether or not the proportion in the claim falls within the
confidence interval.
A 95% confidence interval corresponds with a two-sided
hypothesis test with α = 5%.
34. Confidence Levels and
Hypothesis Testing
A confidence interval with a confidence level of C%
corresponds to a two-sided hypothesis test with an α
level of 100 – C%.
A confidence interval with a confidence level of C%
corresponds to a one-sided hypothesis test with an α level
of ½(100 – C)%.
◦ Think about it: A one-sided test with α = 5%
corresponds to a confidence interval with 5% on each
side, giving 90% confidence level.
35. Example: Is Euro a fair coin?
Soon after the Euro was introduced as currency in
Europe, it was widely reported that someone had
spun a Euro 250 times and gotten heads 140 times.
a. Estimate the true proportion of heads using a 95%
confidence interval. (remember to check conditions)
CI :
p
z
*
pq
n
(.56)(.44)
.56 1.96
250
.56 .062
CI : (.488,.622)
b. Does your confidence interval provide evidence that
the coin is unfair when spun? Explain.
c. What is the significance level?
36. Just Checking
7. An experiment to test the fairness of a roulette
wheel gives a z-score of 0.62. What would you
conclude?
8. We encountered a bank that wondered if it could
get more customers to make payments on delinquent
balances by sending them a DVD urging them to set
up a payment plan. Well, the bank just got back the
results on their tests of this strategy. A 90%
confidence interval for the success rate is
(0.29, 0.45). Their old send-a-letter method had
worked 30% of the time. Can you reject the null
hypothesis that the proportion is still 30% at
=0.05? Explain.
9. Given the confidence interval the bank found in
their trial of DVDs, what would you recommend that
they do? Should they scrap the DVD strategy?
37. Errors in Hypothesis Testing
Even with our careful analysis and lots of
evidence, we can make an incorrect decision.
Two ways we can make mistakes with hypothesis
testing:
Type I: null hypothesis is true, but we reject
it. (HOT)
Type II: null hypothesis is false, but we fail
to reject it. (HAT)
Which error is more serious depends on the
situation.
38. Type I Error- HOT
In medical terms, this would be a
false positive.
◦ A healthy person is diagnosed with a
disease incorrectly.
In jury terms, this would mean an
innocent person is convicted.
39. Type II Error- HAT
In medical terms, this would be a
false negative.
◦ An infected person goes undiagnosed.
In jury terms, this would mean an
guilty person is not convicted.
41. Just Checking continued
10. Remember our bank? It is looking
for evidence that the costlier DVD
strategy produces a higher success
rate than the letters it has been
sending. Explain what a Type I error
is in this context and what would the
consequences would be to the bank?
11. What’s a Type II error in the bank
experiment context, and what would
the consequences be?
42. Example: Spam Filter
12. Suppose a spam filter uses a point system to score each email
based on sender, subject, and keywords. The higher the point
total, the more likely that the message is spam. We can think of
the filter’s decision as a hypothesis test. The null hypothesis is
that the email is a real message. A high point score would be
evidence that it is junk and will therefore reject the null
hypothesis and classify it as spam.
a. When the filter allows spam to slip through into your
inbox, which kind of error is this?
b. Which kind of error is it when a real message gets classified as
junk?
c. If the filter has a default cutoff score of 50 , but you reset it
to 60, is that analogous to choosing a higher or lower value of α
for a hypothesis test?
43. Probability of Errors
To reject H0, the P-value must fall below .
When H0 is true that happens exactly with
probability so when you choose the level , you
are setting the probability of a Type I error to
.
When H0 is false and we fail to reject it, we
have made a Type II error. We assign the letter
to the probability of this mistake.
44. Reducing Errors
We can reduce α to lower the
chance of a Type I Error, but then
that will have the effect of raising
β.
The only way to really reduce both
Type I and Type II errors
simultaneously is to increase our
sample size, which will reduce our
standard deviations.
45. What Can Go Wrong?
Don’t interpret the P-value as the
probability that H0 is true.
Don’t believe too strongly in
arbitrary alpha levels.
Don’t confuse practical and
statistical significance.
Don’t forget that in spite of all your
care, you might make a wrong
decision.
Hinweis der Redaktion
The null hypothesis is H0: p=0.75., HA: p>0.75With a P-value of 0.0001, this is very strong evidence against the null hypothesis. We can reject H0 and conclude that the improved version of the drug gives relief to a higher proportion of patients.
4. The parameter of interest is the proportion, p, of all delinquent customers who will pay their bills. H0: p = 0.30 and HA: p> 0.30.
5. The very low P-value leads us to reject the null hypothesis. There is strong evidence that the video tape is more effective in getting people to start paying their debts than just sending a letter had been.
P = proportion of students in districts like this one who drop out.H0: p = 10.3 (or p<=.103)HA: p >.103Phat= .118P0= .103Q0= .897Sd= sqrt((.103)(.897)/1782)= .007Z= (.118-.103)/.007=2.14So Normalcdf(2.14, 99)= .016=1.6%= p-valueThis p value is really low so we are going to reject the null hypothesis and conclude that ______________.
P = proportion of car crashes occurring within 5 miles of home.H0: p = 50 (or p<=.50)HA: p >.50STAT, TESTS #5, enter in infoZ= 4.195P = .00000136This p value is extremely small so we reject the null hypothesis and conclude …
a) Independence assumption: The Euro spins are independent. One spin is not going to effect the others. (With true independence, it doesn’t make sense to try to check the randomization condition or the 10% condition. These verify our assumption of independence, and we don’t need to do that!)Success/Failure condition: npˆ = 140 and nqˆ = 110 are both greater than 10, so the sample is large enough.Since the conditions are met, we can use a one-proportion z-interval to estimate the proportion of heads in Euro spins.We are 95% confident that the true proportion of heads when a Euro is spun is between 0.498 and 0.622.b) Since 0.50 is within the interval, there is no evidence that the coin in unfair. 50% is a plausible value for the true proportion of heads. (That having been said, I’d want to spin this coin a few hundred more times. It’s close!)c) The significance level is α = 0.05. It’s a two-tail test based on a 95% confidence interval.
7. With a z-score of 0.62, you can’t reject the null hypothesis. The experiment shows no evidence that the wheel is not fair.8. At alpha=0.05, you can’t reject the null hypothesis because 0.30 is contained in the 90% confidence interval- it’s plausible that sending the DVDs is no more effective than just sending letters.9. The confidence interval is from 29% to 45%. The DVD strategy is more expensive and may not be worth it. We can’t distinguish the success rate from 30% given the results of this experiment, but 45% would represent a large improvement. The bank should consider another trial, increasing their sample size to get a narrower confidence interval.
10. A Type I error would mean deciding that the DVD success rate is higher than 30% when it really isn’t. They would adopt a more expensive method for collecting payments that’s no better than the less expensive strategy.11. A Type II error would mean deciding that there’s not enough evidence to say that the DVD strategy works when in fact it does. The bank would fail to discover an effective method for increasing their revenue from delinquent accounts.
a) Type II. The filter decided that the message was safe, when in fact it was spam.b) Type I. The filter decided that the message was spam, when in fact it was not.c) This is analogous to lowering alpha. It takes more evidence to classify a message as spam.