CPS08_Description.pdf
Documentation for CPS08 Data
Each month the Bureau of Labor Statistics in the U.S. Department of Labor
conducts the “Current Population Survey” (CPS), which provides data on labor force
characteristics of the population, including the level of employment, unemployment, and
earnings. Approximately 65,000 randomly selected U.S. households are surveyed each
month. The sample is chosen by randomly selecting addresses from a database
comprised of addresses from the most recent decennial census augmented with data on
new housing units constructed after the last census. The exact random sampling scheme
is rather complicated (first small geographical areas are randomly selected, then housing
units within these areas randomly selected); details can be found in the Handbook of
Labor Statistics and is described on the Bureau of Labor Statistics website
(www.bls.gov).
The survey conducted each March is more detailed than in other months and asks
questions about earnings during the previous year. The file CPS08 contains the data for
2008 (from the March 2009 survey). These data are for full-time workers, defined as
workers employed more than 35 hours per week for at least 48 weeks in the previous
year. Data are provided for workers whose highest educational achievement is (1) a high
school diploma, and (2) a bachelor’s degree.
Series in Data Set:
FEMALE: 1 if female; 0 if male
YEAR: Year
AHE : Average Hourly Earnings
BACHELOR: 1 if worker has a bachelor’s degree; 0 if worker has a high school degree
PS2 ECO480 F2015(1).pdf
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
1
Instruction: The problem sets are designed to be difficult and very time-intensive, so plan ahead. The
problem sets consists of solving theoretical problems and analyzing real data. You may discuss the
questions with your classmates, but you are required to hand in your own independently written solutions.
For problems that require you to use Stata submit independently written do-files, and log-files. No late
work will be accepted and I do NOT accept any electronic copy. All the data necessary for the problem
set is available under UBlearns.
Important: It is extremely important to write a clean well-commented program for transparency and
replication purposes. In any empirical work, you should always be able to reproduce your result from raw
data to support your claim.
What to hand in: Typed write-up answering the assigned questions and interpreting your findings, do-file,
and log-file for problems that require you to use Stata. For questions involving data analysis, you will
NOT get any credit if you do not provide a program code. You may NOT use Excel
1. Suppose the following equation describes the relationship between the average number of classes
missed during a semester (missed) and the distance from school (distanc.
CPS08_Description.pdfDocumentation for CPS08 Data Ea.docx
1. CPS08_Description.pdf
Documentation for CPS08 Data
Each month the Bureau of Labor Statistics in the U.S.
Department of Labor
conducts the “Current Population Survey” (CPS), which
provides data on labor force
characteristics of the population, including the level of
employment, unemployment, and
earnings. Approximately 65,000 randomly selected U.S.
households are surveyed each
month. The sample is chosen by randomly selecting addresses
from a database
comprised of addresses from the most recent decennial census
augmented with data on
new housing units constructed after the last census. The exact
random sampling scheme
is rather complicated (first small geographical areas are
randomly selected, then housing
units within these areas randomly selected); details can be
found in the Handbook of
2. Labor Statistics and is described on the Bureau of Labor
Statistics website
(www.bls.gov).
The survey conducted each March is more detailed than in other
months and asks
questions about earnings during the previous year. The file
CPS08 contains the data for
2008 (from the March 2009 survey). These data are for full-
time workers, defined as
workers employed more than 35 hours per week for at least 48
weeks in the previous
year. Data are provided for workers whose highest educational
achievement is (1) a high
school diploma, and (2) a bachelor’s degree.
Series in Data Set:
FEMALE: 1 if female; 0 if male
YEAR: Year
AHE : Average Hourly Earnings
BACHELOR: 1 if worker has a bachelor’s degree; 0 if worker
has a high school degree
PS2 ECO480 F2015(1).pdf
3. ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
1
Instruction: The problem sets are designed to be difficult and
very time-intensive, so plan ahead. The
problem sets consists of solving theoretical problems and
analyzing real data. You may discuss the
questions with your classmates, but you are required to hand in
your own independently written solutions.
For problems that require you to use Stata submit independently
written do-files, and log-files. No late
work will be accepted and I do NOT accept any electronic copy.
All the data necessary for the problem
set is available under UBlearns.
Important: It is extremely important to write a clean well-
commented program for transparency and
replication purposes. In any empirical work, you should always
be able to reproduce your result from raw
4. data to support your claim.
What to hand in: Typed write-up answering the assigned
questions and interpreting your findings, do-file,
and log-file for problems that require you to use Stata. For
questions involving data analysis, you will
NOT get any credit if you do not provide a program code. You
may NOT use Excel
1. Suppose the following equation describes the relationship
between the average number of classes
missed during a semester (missed) and the distance from school
(distance, measure in miles) (Total 4
points):
missed = 3 + 0.2 distance
a. Sketch this line, being sure to label the axes. How do you
interpret the intercept in this equation?
(2 points)
b. What is the average number of classes missed for someone
who lives five miles away? (1 point)
c. What is the difference in the average number of classes
missed for someone who lives 10 miles
5. away and someone who lives 20 miles away? (1 points)
2. Use COLLDIS.dta for this problem. A detailed description of
the data is given in
COLLDIS_Description.pdf. This contains data from a random
sample of high school seniors
interviewed in 1980 and re-interviewed in 1986. In this
exercise, you will use these data to investigate
the relationship between the number of completed years of
education for young adults and the
distance from each student’s high school to the nearest four-
year college. (Proximity to college lowers
the cost of education, so that students who live closer to a four-
year college should, on average,
complete more years of higher education.) (Total 12 points)
a. Run a regression of years of completed education (ed) on
distance to the nearest college (dist),
where dist is measured in tens of miles. (For example, dist = 2
means that the distance is 20
miles.) What is the estimated intercept? What is the estimated
slope? Use the estimated regression
to answer this question: How does the average value of years of
completed schooling change
when colleges are built close to where students go to high
6. school? (4 points)
b. Bob’s high school was 20 miles from the nearest college.
Predict Bob’s years of completed
education using the estimated regression. How would the
prediction change if Bob lived 10 miles
from the nearest college? (2 points)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
2
c. If the distance is measured in kilometers, what is your new
estimation and interpretation of the
result? (4 points)
d. Beware the omitted variable. List five possible omitted
variables. Are they all measurable? (2
points) [Hint: Omitted variables from the regression may or may
not be measurable by
econometricians.]
7. 3. Use CPS08.dta for this problem. A detailed description of the
data is given in
CPS08_Description.pdf. In this exercise, you will investigate
the relationship between a worker’s age
and earnings. (Generally, older workers have more job
experience, leading to higher productivity and
earnings. (Total 10 points)
a. Report mean, median, and standard deviation of worker’s age
and earning. (3 points)
b. Run a regression of average hourly earnings (AHE) on age
(Age). What is the estimated intercept?
What is the estimated slope? Use the estimated regression to
answer this question: How much do
earnings increase as workers age by 1 year? (4 points)
c. Bob is a 26-year-old worker. Predict Bob’s earnings using the
estimated regression. Alexis is a
30-year-old worker. Predict Alexis’s earnings using the
estimated regression. (1 points)
d. Does age account for a large fraction of the variance in
earnings across individuals? Why? (2
points)
8. 4. Battery packs in electric go-carts need to last a fairly long
time. The run-time (time until it needs to be
recharged) of the battery packs made by a particular company
are Normally distributed with a mean
of 2 hours and a standard deviation of 20 minutes. (Total 3
points)
a. What percentage of these battery packs lasts longer than 3
hours? Show your work. (1 point)
b. What is the third quartile for the run-time distribution? Show
your work. (1 point)
c. Battery packs that have a run-time in the highest 10% of the
run-time distribution are highly
sought after by go-cart drivers. How long does the battery pack
have to last for it to fall in this
highly sought-after class? Show your work. (1 point)
9. ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
3
5. In the language of government statistics, you are “in the labor
force” if you are available for work and
either working or actively seeking work. The unemployment
rate is the proportion of the labor force
(not of the entire population) who are unemployed. Here are
data from the Current Population Survey
(CPS) for the civilian population aged 25 years and over. The
table entries are counts in thousands of
people. You must show your work in answering the following
questions. (Total 5 points)
Highest Education
Total
Population
In Labor
10. Force Employed
Did not finish high school 28,021 12,623 11,552
High school but no college 59,844 38,210 36,249
Some college, but no bachelor's degree 46,777 33,928 32,429
College graduate 51,568 40,414 39,250
a. Find the unemployment rate for people with each level of
education. How does the
unemployment rate change with education? Explain carefully
why your results show that level of
education and being employed are not independent. (1 point)
b. What is the probability that a randomly chosen person 25
years of age or older is in the labor
force? (1 point)
c. If you know that the person chosen is a college graduate,
what is the conditional probability that
he or she is in the labor force? (1 point)
d. Are the events “in the labor force” and “college graduate”
independent? How do you know? (1
point)
11. e. You know that a person is employed. What is the conditional
probability that he or she is a
college graduate? You know that a second person is a college
graduate. What is the conditional
probability that he or she is employed? (1 point)
6. Do problems 4.30, 4.32, 4.118, 4.119, and 4.130. (Total 12
points)
a. 4.30 (p.250) (2 points)
b. 4.32 (p.250) (3 points)
c. 4.118 (p.295) (2 points)
d. 4.119 (p.119) (2 points)
e. 4.130 (p.297) (3 points)
7. Suppose 40% of adults get enough sleep, 46% get enough
exercise, and 24% do both. You must show
your work in answering the following questions. (Total 3
points)
a. Draw a Venn diagram showing the probabilities for exercise
and sleep. (1 point)
ECO 480 Econometrics I
12. Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
4
b. Find the probabilities of the following events (2 points):
i. Enough sleep and not enough exercise
ii. Not enough sleep and enough exercise
iii. Not enough sleep and not enough exercise
iv. For each of parts i, ii, iii, states the rule that you used to
find your answer.
8. Facebook provides a variety of statistic on their Web site that
detail the growth and popularity of the
site. One such statistic is that the average user has 130 friends.
This distribution only takes integer
values, so it is certainly not Normal. We will also assume it is
skewed to the right with a standard
deviation σ = 85. Consider a SRS of 30 Facebook users. You
must show your work in answering the
following questions. (Total 3 points)
a. What are the mean and standard deviation of the total number
of friends in this sample? (1 point)
13. b. What are the mean and standard deviation of the mean
number of friends per user? (1 point)
c. Use the central limit theorem to find the probability that the
average number of friends in 30
Facebook users is greater than 140. (1 point)
9. North Carolina State University posts the grade distribution
for its courses online. Students in one
section of English 210 in the Fall 2008 semester received 33%
A’s, 24% B’s, 18% C’s, 16% D’s, and
9% F’s. You must show your work in answering the following
questions. (Total 3 points)
a. Using the common scale A=4, B=3, C=2, D=1, F=0, take X to
be the grade of a randomly chosen
English 210 students. Use the definition of the mean and
standard deviation for discrete random
variables to find the mean µ and the standard deviation σ of the
grades in the course. (1 point)
b. English 210 is a large course. We can take the grades of a
simple random sample of 50 students to
be independent of each other. If �̅� is the average of these 50
grades, what are the mean and
standard deviation of �̅� ? (1 point)
14. c. What is the probability P(�̅� ≥ 3) that the grade point
average for 50 randomly chosen English 210
students is a B or better? (1 point)
10. A $1 bet in a state lottery’s Pick 3 game pays $500 if the
three-digit number you choose exactly
matches the winning number, which is drawn at random. Here is
the distribution of the payoff X:
Payoff X $0 $500
Probability 0.999 0.001
Each day’s drawing is independent of other drawings. You must
show your work in answering the
following questions. (Total 4 points)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
5
15. a. What are the mean and standard deviation of X? (1 point)
b. Joe buys a Pick 3 ticket twice a week. What does the law of
large numbers say about the average
payoff Joe receives from his bets? (1 point)
c. What does the central limit theorem say about the distribution
of Joe’s average payoff after 104
bets in a year? (1 point)
d. Joe comes out ahead for the year if his average payoff is
greater than $1(the amount he spent each
day on a ticket). What is the probability that Joe ends the year
head? (1 point)
11. A selective college would like to have an entering class of
950 students. Because not all students who
are offered admission accept, the college admits more than 950
students. Past experience shows that
about 75% of the students admitted will accept. The college
decides to admit 1,200 students.
Assuming that students make their decisions independently, the
number who accept has the
B(1200,0.85) distribution. If this number is less than 950, the
college will admit students from its
16. waiting list. You must show your work in answering the
following questions. (Total 4 points)
a. What are the mean and the standard deviation of the number
X of students who accept? (1 point)
b. Use the Normal approximation to find the probability that at
least 800 students accept. (1 point)
c. The college does not want more than 950 students. What is
the probability that more than 950
will accept? (1 point)
d. If the college decides to increase the number of admission
offers to 1,300, what is the probability
that more than 950 will accept? (1 point)
12. Here is a simple probability model for multiple-choice tests.
Suppose that each student has probability
p of correctly answering a question chosen at random from a
universe of possible questions. (A strong
student has a higher p than a weak student.) The correctness of
an answer to a question is independent
of the correctness of answers to other questions. Jodi is a good
student for whom p = 0.88. You must
17. show your work in answering the following questions. (Total 5
points)
a. Use the Normal approximation to find the probability that
Jodi scores 85% or lower on a 100-
question test. (1 point)
b. If the test contains 250 questions, what is the probability that
Jodi will score 85% or lower? (1
point)
c. How many questions must the test contain in order to reduce
the standard deviation of Jodi’s
proportion of correct answers to half its value for a 100-item
test? (2 points)
d. Lisa is a weaker student for whom p = 0.72. Does the answer
you gave in part c for the standard
deviation of Jodi’s score apply to Lisa’s standard deviation
also? Why or why not? (1 point)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
6
18. 13. According to genetic theory, the blossom color in the
second generation of a certain cross of sweet
peas should be red or white in a 3:1 ratio. That is, each plant
has probability ¾ of having red
blossoms, and the blossom colors of separate plants are
independent. Show your work. (3 points)
a. What is the probability that exactly 9 out of 12 of these
plants have red blossoms? (1 point)
b. What is the mean number of red-blossomed plants when 120
plants of this type are grown from
seeds? (1 point)
c. What is the probability of obtaining at least 80 red-blossomed
plants when 120 plans are grown
from seeds? (1 point)
COLLDIST.dta
COLLDIST_Description.pdf
Documentation for CollegeDistance Data
These data are taken from the HighSchool and Beyond survey
conducted by the
19. Department of Education in 1980, with a follow-up in 1986.
The survey included
students from approximately 1100 high schools.
The data used here were supplied by Professor Cecilia Rouse of
Princeton University and
were used in her paper “Democratization or Diversion? The
Effect of Community
Colleges on Educational Attainment,” Journal of Business and
Economic Statistics, April
1995, Vol. 12, No. 2, pp 217-224.
The data in CollegeDistance exclude students in the western
states. The data in
CollegeDistanceWest includes only those students in the
western states.
Series in Data Set
Name Description
ed Years of Education Completed (See below)
female 1 = Female/0 = Male
black 1 = Black/0 = Not-Black
20. Hispanic 1 = Hispanic/0 = Not-Hispanic
bytest Base Year Composite Test Score. (These are
achievement tests given to high
school seniors in the sample)
dadcoll 1 = Father is a College Graduate/ 0 = Father is not a
College Graduate
momcoll 1 = Mother is a College Graduate/ 0 = Mother is not a
College Graduate
incomehi 1 = Family Income > $25,000 per year/ 0 = Income ≤
$25,000 per year.
ownhome 1= Family Owns Home / 0 = Family Does not Own
Home
urban 1 = School in Urban Area / = School not in Urban Area
cue80 County Unemployment rate in 1980
stwmfg80 State Hourly Wage in Manufacturing in 1980
dist Distance from 4yr College in 10's of miles
tuition Avg. State 4yr College Tuition in $1000's
Years of Education: Rouse computed years of education by
assigning 12 years to all
members of the senior class. Each additional year of secondary
education counted as a
21. one year. Students with vocational degrees were assigned 13
years, AA degrees were
assigned 14 years, BA degrees were assigned 16 years, those
with some graduate
education were assigned 17 years, and those with a graduate
degree were assigned 18
years.
CPS08.dta
Documentation for CPS08 Data
Each month the Bureau of Labor Statistics in the U.S.
Department of Labor
conducts the “Current Population Survey” (CPS), which
provides data on labor force
characteristics of the population, including the level of
employment, unemployment, and
earnings. Approximately 65,000 randomly selected U.S.
households are surveyed each
22. month. The sample is chosen by randomly selecting addresses
from a database
comprised of addresses from the most recent decennial census
augmented with data on
new housing units constructed after the last census. The exact
random sampling scheme
is rather complicated (first small geographical areas are
randomly selected, then housing
units within these areas randomly selected); details can be
found in the Handbook of
Labor Statistics and is described on the Bureau of Labor
Statistics website
(www.bls.gov).
The survey conducted each March is more detailed than in other
months and asks
questions about earnings during the previous year. The file
CPS08 contains the data for
2008 (from the March 2009 survey). These data are for full-
time workers, defined as
workers employed more than 35 hours per week for at least 48
weeks in the previous
year. Data are provided for workers whose highest educational
achievement is (1) a high
23. school diploma, and (2) a bachelor’s degree.
Series in Data Set:
FEMALE: 1 if female; 0 if male
YEAR: Year
AHE : Average Hourly Earnings
BACHELOR: 1 if worker has a bachelor’s degree; 0 if worker
has a high school degree
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
1
Instruction: The problem sets are designed to be difficult and
very time-intensive, so plan ahead. The
problem sets consists of solving theoretical problems and
analyzing real data. You may discuss the
questions with your classmates, but you are required to hand in
your own independently written solutions.
24. For problems that require you to use Stata submit independently
written do-files, and log-files. No late
work will be accepted and I do NOT accept any electronic copy.
All the data necessary for the problem
set is available under UBlearns.
Important: It is extremely important to write a clean well-
commented program for transparency and
replication purposes. In any empirical work, you should always
be able to reproduce your result from raw
data to support your claim.
What to hand in: Typed write-up answering the assigned
questions and interpreting your findings, do-file,
and log-file for problems that require you to use Stata. For
questions involving data analysis, you will
NOT get any credit if you do not provide a program code. You
may NOT use Excel
1. Suppose the following equation describes the relationship
between the average number of classes
missed during a semester (missed) and the distance from school
(distance, measure in miles) (Total 4
points):
25. missed = 3 + 0.2 distance
a. Sketch this line, being sure to label the axes. How do you
interpret the intercept in this equation?
(2 points)
b. What is the average number of classes missed for someone
who lives five miles away? (1 point)
c. What is the difference in the average number of classes
missed for someone who lives 10 miles
away and someone who lives 20 miles away? (1 points)
2. Use COLLDIS.dta for this problem. A detailed description of
the data is given in
COLLDIS_Description.pdf. This contains data from a random
sample of high school seniors
interviewed in 1980 and re-interviewed in 1986. In this
exercise, you will use these data to investigate
the relationship between the number of completed years of
education for young adults and the
distance from each student’s high school to the nearest four-
year college. (Proximity to college lowers
the cost of education, so that students who live closer to a four-
year college should, on average,
26. complete more years of higher education.) (Total 12 points)
a. Run a regression of years of completed education (ed) on
distance to the nearest college (dist),
where dist is measured in tens of miles. (For example, dist = 2
means that the distance is 20
miles.) What is the estimated intercept? What is the estimated
slope? Use the estimated regression
to answer this question: How does the average value of years of
completed schooling change
when colleges are built close to where students go to high
school? (4 points)
b. Bob’s high school was 20 miles from the nearest college.
Predict Bob’s years of completed
education using the estimated regression. How would the
prediction change if Bob lived 10 miles
from the nearest college? (2 points)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
27. 2
c. If the distance is measured in kilometers, what is your new
estimation and interpretation of the
result? (4 points)
d. Beware the omitted variable. List five possible omitted
variables. Are they all measurable? (2
points) [Hint: Omitted variables from the regression may or may
not be measurable by
econometricians.]
3. Use CPS08.dta for this problem. A detailed description of the
data is given in
CPS08_Description.pdf. In this exercise, you will investigate
the relationship between a worker’s age
and earnings. (Generally, older workers have more job
experience, leading to higher productivity and
earnings. (Total 10 points)
a. Report mean, median, and standard deviation of worker’s age
and earning. (3 points)
b. Run a regression of average hourly earnings (AHE) on age
(Age). What is the estimated intercept?
What is the estimated slope? Use the estimated regression to
answer this question: How much do
28. earnings increase as workers age by 1 year? (4 points)
c. Bob is a 26-year-old worker. Predict Bob’s earnings using the
estimated regression. Alexis is a
30-year-old worker. Predict Alexis’s earnings using the
estimated regression. (1 points)
d. Does age account for a large fraction of the variance in
earnings across individuals? Why? (2
points)
4. Battery packs in electric go-carts need to last a fairly long
time. The run-time (time until it needs to be
recharged) of the battery packs made by a particular company
are Normally distributed with a mean
of 2 hours and a standard deviation of 20 minutes. (Total 3
points)
a. What percentage of these battery packs lasts longer than 3
hours? Show your work. (1 point)
b. What is the third quartile for the run-time distribution? Show
your work. (1 point)
c. Battery packs that have a run-time in the highest 10% of the
run-time distribution are highly
sought after by go-cart drivers. How long does the battery pack
29. have to last for it to fall in this
highly sought-after class? Show your work. (1 point)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
3
5. In the language of government statistics, you are “in the labor
force” if you are available for work and
either working or actively seeking work. The unemployment
rate is the proportion of the labor force
(not of the entire population) who are unemployed. Here are
data from the Current Population Survey
30. (CPS) for the civilian population aged 25 years and over. The
table entries are counts in thousands of
people. You must show your work in answering the following
questions. (Total 5 points)
Highest Education
Total
Population
In Labor
Force Employed
Did not finish high school 28,021 12,623 11,552
High school but no college 59,844 38,210 36,249
Some college, but no bachelor's degree 46,777 33,928 32,429
College graduate 51,568 40,414 39,250
a. Find the unemployment rate for people with each level of
education. How does the
unemployment rate change with education? Explain carefully
why your results show that level of
education and being employed are not independent. (1 point)
31. b. What is the probability that a randomly chosen person 25
years of age or older is in the labor
force? (1 point)
c. If you know that the person chosen is a college graduate,
what is the conditional probability that
he or she is in the labor force? (1 point)
d. Are the events “in the labor force” and “college graduate”
independent? How do you know? (1
point)
e. You know that a person is employed. What is the conditional
probability that he or she is a
college graduate? You know that a second person is a college
graduate. What is the conditional
probability that he or she is employed? (1 point)
6. Do problems 4.30, 4.32, 4.118, 4.119, and 4.130. (Total 12
points)
a. 4.30 (p.250) (2 points)
b. 4.32 (p.250) (3 points)
c. 4.118 (p.295) (2 points)
d. 4.119 (p.119) (2 points)
e. 4.130 (p.297) (3 points)
32. 7. Suppose 40% of adults get enough sleep, 46% get enough
exercise, and 24% do both. You must show
your work in answering the following questions. (Total 3
points)
a. Draw a Venn diagram showing the probabilities for exercise
and sleep. (1 point)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
4
b. Find the probabilities of the following events (2 points):
i. Enough sleep and not enough exercise
ii. Not enough sleep and enough exercise
iii. Not enough sleep and not enough exercise
iv. For each of parts i, ii, iii, states the rule that you used to
find your answer.
8. Facebook provides a variety of statistic on their Web site that
33. detail the growth and popularity of the
site. One such statistic is that the average user has 130 friends.
This distribution only takes integer
values, so it is certainly not Normal. We will also assume it is
skewed to the right with a standard
deviation σ = 85. Consider a SRS of 30 Facebook users. You
must show your work in answering the
following questions. (Total 3 points)
a. What are the mean and standard deviation of the total number
of friends in this sample? (1 point)
b. What are the mean and standard deviation of the mean
number of friends per user? (1 point)
c. Use the central limit theorem to find the probability that the
average number of friends in 30
Facebook users is greater than 140. (1 point)
9. North Carolina State University posts the grade distribution
for its courses online. Students in one
section of English 210 in the Fall 2008 semester received 33%
A’s, 24% B’s, 18% C’s, 16% D’s, and
9% F’s. You must show your work in answering the following
questions. (Total 3 points)
34. a. Using the common scale A=4, B=3, C=2, D=1, F=0, take X to
be the grade of a randomly chosen
English 210 students. Use the definition of the mean and
standard deviation for discrete random
variables to find the mean µ and the standard deviation σ of the
grades in the course. (1 point)
b. English 210 is a large course. We can take the grades of a
simple random sample of 50 students to
be independent of each other. If �̅� is the average of these 50
grades, what are the mean and
standard deviation of �̅� ? (1 point)
c. What is the probability P(�̅� ≥ 3) that the grade point
average for 50 randomly chosen English 210
students is a B or better? (1 point)
10. A $1 bet in a state lottery’s Pick 3 game pays $500 if the
three-digit number you choose exactly
matches the winning number, which is drawn at random. Here is
the distribution of the payoff X:
Payoff X $0 $500
Probability 0.999 0.001
Each day’s drawing is independent of other drawings. You must
35. show your work in answering the
following questions. (Total 4 points)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
5
a. What are the mean and standard deviation of X? (1 point)
b. Joe buys a Pick 3 ticket twice a week. What does the law of
large numbers say about the average
payoff Joe receives from his bets? (1 point)
c. What does the central limit theorem say about the distribution
of Joe’s average payoff after 104
bets in a year? (1 point)
d. Joe comes out ahead for the year if his average payoff is
greater than $1(the amount he spent each
day on a ticket). What is the probability that Joe ends the year
head? (1 point)
36. 11. A selective college would like to have an entering class of
950 students. Because not all students who
are offered admission accept, the college admits more than 950
students. Past experience shows that
about 75% of the students admitted will accept. The college
decides to admit 1,200 students.
Assuming that students make their decisions independently, the
number who accept has the
B(1200,0.85) distribution. If this number is less than 950, the
college will admit students from its
waiting list. You must show your work in answering the
following questions. (Total 4 points)
a. What are the mean and the standard deviation of the number
X of students who accept? (1 point)
b. Use the Normal approximation to find the probability that at
least 800 students accept. (1 point)
c. The college does not want more than 950 students. What is
the probability that more than 950
will accept? (1 point)
d. If the college decides to increase the number of admission
offers to 1,300, what is the probability
that more than 950 will accept? (1 point)
37. 12. Here is a simple probability model for multiple-choice tests.
Suppose that each student has probability
p of correctly answering a question chosen at random from a
universe of possible questions. (A strong
student has a higher p than a weak student.) The correctness of
an answer to a question is independent
of the correctness of answers to other questions. Jodi is a good
student for whom p = 0.88. You must
show your work in answering the following questions. (Total 5
points)
a. Use the Normal approximation to find the probability that
Jodi scores 85% or lower on a 100-
question test. (1 point)
b. If the test contains 250 questions, what is the probability that
Jodi will score 85% or lower? (1
point)
c. How many questions must the test contain in order to reduce
the standard deviation of Jodi’s
proportion of correct answers to half its value for a 100-item
test? (2 points)
d. Lisa is a weaker student for whom p = 0.72. Does the answer
38. you gave in part c for the standard
deviation of Jodi’s score apply to Lisa’s standard deviation
also? Why or why not? (1 point)
ECO 480 Econometrics I
Problem Set 2
Due: Wednesday, October 14, 2015 (beginning of the class)
6
13. According to genetic theory, the blossom color in the
second generation of a certain cross of sweet
peas should be red or white in a 3:1 ratio. That is, each plant
has probability ¾ of having red
blossoms, and the blossom colors of separate plants are
independent. Show your work. (3 points)
a. What is the probability that exactly 9 out of 12 of these
plants have red blossoms? (1 point)
b. What is the mean number of red-blossomed plants when 120
plants of this type are grown from
seeds? (1 point)
c. What is the probability of obtaining at least 80 red-blossomed
plants when 120 plans are grown