(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
R introduction v2
1. THE WRIGHT LAB COMPUTATION LUNCHES
An introduction to R
Gene expression from DEVA to differential expression
Handling massively parallel sequencing data
28. Exercise: Make a scatterplot of weight
and horn length in green unicorns.
Write all code in the
unicorn_analysis.R script.
Save the plots as variables so you can
refer back to them.
32. Small multiples
split the plot into multiple subplots
useful for looking at patterns
qplot(x=x.var, y=y.var, data=data,
facets=~variable)
facets=variable1~variable2
33. Exercise: Again, make a boxplot of
diet and horn length, but separated
into small multiples by colour.
43. model <- lm(horn.length ~ weight, data=data)
summary(model)
Call:
lm(formula = horn.length ~ weight, data = data)
Residuals:
Min
1Q Median
-6.5280 -2.0230 -0.1902
3Q
2.5459
Max
7.3620
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.41236
3.85774
5.291 1.76e-05 ***
weight
0.03153
0.01093
2.886 0.00793 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.447 on 25 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.2499, Adjusted R-squared: 0.2199
F-statistic: 8.327 on 1 and 25 DF, p-value: 0.007932
44. What have we actually fitted?
model.matrix(horn.length ~ weight, data)
What were the results?
coef(model)
How uncertain?
confint(model)
45. Plotting the model
regression equation y = a + x b
a is the intercept
b is the slope of the line
pull out coefficients with coef( )
a plot with two layers: scatterplot
with added geom_abline( )
47. A regression diagnostic
the linear model needs several
assumptions, particularly linearity and
equal error variance
the residuals vs fitted plot can help spot
gross deviations
52. We want a two-way anova with an F-test for diet.
model.int <- aov(weight ~ diet * colour, data=data)
drop1(model.int, test="F")
model.add <- aov(weight ~ diet + colour, data=data)
drop1(model.add, test="F")
Single term deletions
Model:
weight ~ diet + colour
Df Sum of Sq
RSS
AIC F value Pr(>F)
<none>
85781 223.72
diet
1
471.1 86252 221.87 0.1318 0.71975
colour 1
13479.7 99260 225.66 3.7714 0.06396 .
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
53. Black magic
none of the is really R’s fault, but things
that come up along the way
54. Black magic
none of the is really R’s fault, but things
that come up along the way
missing values: na.rm=T, na.exclude( )
55. Black magic
none of the is really R’s fault, but things
that come up along the way
missing values: na.rm=T, na.exclude( )
type I, II and III sums of squares in Anova
56. Black magic
none of the is really R’s fault, but things
that come up along the way
missing values: na.rm=T, na.exclude( )
type I, II and III sums of squares in anova
floating-point arithmetic, e.g. sin(pi)
57. Reading
Daalgard, Introductory statistics with R,
electronic resource at the library
Faraway, The linear model in R
Gelman & Hill, Data analysis using regression
and multilevel/hierarchical models
Wickham, ggplot2 book
tons of tutorials online, for instance
http://martinsbioblogg.wordpress.com/a-slightlydifferent-introduction-to-r/
58. Exercise
More (and some of the same) analysis of the
unicorn data set.
Use the R documentation and google.
I will post solutions.