Quantitative Methods for Lawyers - Class #15 - R Boot Camp - Part 2 - Professor Daniel Martin Katz
1. Quantitative
Methods
for
Lawyers
Statistical Tests Using R
R Boot Camp - Part 2
Class #15
@ computational
computationallegalstudies.com
professor daniel martin katz danielmartinkatz.com
lexpredict.com slideshare.net/DanielKatz
2. My Challenge to You
Use R to
Download and
Clean this Simple
DataSet
7. # Here is the Data -
It Looks Okay
# Here is the Problem -
Our Data are Factors not Numeric
8. # Here is the Data -
It Looks Okay
# Here is the Problem -
Our Data are Factors not Numeric
# Thus we get this when trying to
calculate a mean
9. # Here is the Data -
It Looks Okay
# Here is the Problem -
Our Data are Factors not Numeric
# Thus we get this when trying to
calculate a mean
We have two
problems -
(1) the fact that
our data is non
numeric
(2) and the
commas
10. # Here is the Data -
It Looks Okay
# Here is the Problem -
Our Data are Factors not Numeric
# Thus we get this when trying to
calculate a mean
# Okay This Is What We Need
We have two
problems -
(1) the fact that
our data is non
numeric
(2) and the
commas
15. Binomial Distribution
“A binomial experiment (also known as a Bernoulli trial) is a
statistical experiment that has the following properties:
The experiment consists of n repeated trials.
Each trial can result in just two possible outcomes.
The probability of success, denoted by P, is the
same on every trial.
The trials are independent”
16. Example: Coin Flip
Nostradamus
Predicting Coin Flips -
Does you Friend Have the General Ability to
Actually Predict Coin Flips?
How Would You Evaluate This Proposition?
How Many Predictions Would Your Friend Have to Get Right
For You To Believe They Actually Have Real Ability?
17. Example: Coin Flip
Nostradamus
Ho: Cannot Actually Predict Coin Flips
H1: Can Actually Predict Coin Flip
(i.e. do so at a rate greater than chance)
Ho is the Null Hypothesis
H1 is the Alternative Hypothesis
18. Reject the Null versus
Failing to Reject the Null
If We Fail to Reject the Null, we are left with the assumption
of no relationship
In the Coin Flip Example, We might have enough evidence
to reject the null
Remember the default (null) is that there is no
relationship
Although a Relationship might actually exist
19. Example: Coin Flip
Nostradamus
If He Were Guessing - what is the Probability Coin Flip
Nostradamus Predicts at least 3 of 4 Coin Tosses ?
p
probability of success
x
number of successes
n
number of trials
3 or 4 4 1/2
20.
21. Example: Coin Flip
Nostradamus
If He Were Guessing - what is the Probability Coin Flip
Nostradamus Predicts at least 3 of 4 Coin Tosses ?
p
probability of success
x
number of successes
n
number of trials
3 or 4 4 1/2
22. Example: Coin Flip
Nostradamus
If He Were Guessing - what is the Probability Coin Flip
Nostradamus Predicts at least 3 of 4 Coin Tosses ?
#Here We Get Only For X=3
23. Example: Coin Flip
Nostradamus
If He Were Guessing - what is the Probability Coin Flip
Nostradamus Predicts at least 3 of 4 Coin Tosses ?
#Here We Get Only For X=3
#Now We Get a Vector if X=3, X=4
24. Example: Coin Flip
Nostradamus
If He Were Guessing - what is the Probability Coin Flip
Nostradamus Predicts at least 3 of 4 Coin Tosses ?
#Here We Get Only For X=3
#Now We Get The
Sum of X=3, X=4
#Now We Get a Vector if X=3, X=4
27. Does 30 heads in 50 flips imply an unfair coin?
Assuming a Fair Coin - what is the 95% Conf. Interval for 50 flips?
28. Does 30 heads in 50 flips imply an unfair coin?
Assuming a Fair Coin - what is the 95% Conf. Interval for 50 flips?
29. Imagine that I gave out a 15
question multiple choice test
with 5 possible answers per
question.
30. Imagine that I gave out a 15
question multiple choice test
with 5 possible answers per
question.
Using random guessing, what is the probability of
getting exactly 7 questions correct?
31. Imagine that I gave out a 15
question multiple choice test
with 5 possible answers per
question.
p
probability of success
x
number of successes
n
number of trials
7 15 1/5
Using random guessing, what is the probability of
getting exactly 7 questions correct?
32. Imagine that I gave out a 15
question multiple choice test
with 5 possible answers per
question.
p
probability of success
x
number of successes
n
number of trials
7 15 1/5
Using random guessing, what is the probability of
getting exactly 7 questions correct?
33. Imagine that I gave out a 15
question multiple choice test
with 5 possible answers per
question.
p
probability of success
x
number of successes
n
number of trials
7 15 1/5
Using random guessing, what is the probability of
getting exactly 7 questions correct?
This is the exact probability for 7
But What About 7 or Greater?
34. Imagine that I gave out a 15
question multiple choice test
with 5 possible answers per
question.
Using random guessing, what is the probability of
getting greater than 7 questions correct?
This is our prior answer
Here we are summing 7:15
36. Imagine that a population of students
take a test with an average score of 78
and a standard deviation of 9.
Assuming the test scores are normally
distributed, how many students
received a 90 or higher?
37.
38. Imagine that a population 100
Students take a test with an
average score of 78
and a standard deviation of 9.
Assuming the test scores are
normally distributed, how many
students received a 90 or higher?
39. Imagine that a population 100
Students take a test with an
average score of 78
and a standard deviation of 9.
Assuming the test scores are
normally distributed, how many
students received a 90 or higher?
pnorm(q , mean= , sd= , lower.tail=TRUE) This is the Syntax:
40. Imagine that a population 100
Students take a test with an
average score of 78
and a standard deviation of 9.
Assuming the test scores are
normally distributed, how many
students received a 90 or higher?
pnorm(q , mean= , sd= , lower.tail=TRUE) This is the Syntax:
pnorm(90 , mean= 78 , sd=9 , lower.tail=FALSE) What We Want:
because we want upper tail
41. Imagine that a population 100
Students take a test with an
average score of 78
and a standard deviation of 9.
Assuming the test scores are
normally distributed, how many
students received a 90 or higher?
pnorm(q , mean= , sd= , lower.tail=TRUE) This is the Syntax:
pnorm(90 , mean= 78 , sd=9 , lower.tail=FALSE) What We Want:
because we want upper tail
42. In the 2011-2012 year the national
average on the LSAT was 150.66
with a Standard Deviation of
10.19
Assuming those scores are
normally distributed, what
percentage of test takers scored
160 or above? http://www.lsac.org/docs/default-source/
research-%28lsac-resources%29/tr-12-03.pdf
Table 1 on Page 9
43. In the 2011-2012 year the national
average on the LSAT was 150.66
with a Standard Deviation of
10.19
Assuming those scores are
normally distributed, what
percentage of test takers scored
160 or above? http://www.lsac.org/docs/default-source/
research-%28lsac-resources%29/tr-12-03.pdf
Table 1 on Page 9
46. H0: There is No Difference Between the Mean Damage
Award in Bloom County and the Mean Damage Award in
the Rest of the State
Num of Obs. Mean Std. Dev.
GROUP 1
Rest of State
21 $371,621 $289,823
GROUP 2
Bloom County
25 $547,784 $703,314
52. Male Female Totals
Not Research Asst 319 323 642
Research Assistant 60 34 94
Total 379 357 736
RA’s Hired at a School are mostly Men
60 out of 94 RA’s are Men (See Above)
Could this just be chance or is it too large to be
explained by chance?
Chi Square ( ) Statisticχ 2