6. Get the data
We will use the Global Summary of Day (GSOD)
data of NCDC.
ftp://ftp.ncdc.noaa.gov/pub/data/gsod/
Downloading takes a lot of time.
However, we can selectively download part of the
data, in an automatic way.
We will show how to do it with a toy example.
Then we will use data from disk to continue.
7. Selecting stations first
Select stations within a geographic extent
data(stations)
locsExtent <-c(0,20,40,60)
stationsSelected <- stationsExtent(locsExtent, stations)
Show on a map
plot(stationsSelected[c("LON","LAT")], pch=3, cex=.5)
library(maps)
map("world",add=TRUE, interior=F)
8. Download the data
Make a working directory first.
setwd(“yourFolder”)
Now download the files to this working directory.
downloadGSOD(2010, 2010, stations = stationsSelected,
silent = FALSE, tries = 2, overwrite = FALSE)
After a few downloads, kill the process by pressing “Esc”.
Inspect what you have in “yourFolder” and delete the
downloaded files.
9. Read the data into R
Copy the data we have provided you into
“yourFolder”.
The following lines will make a table and remove
missing observations.
weather <- makeTableGSOD()
weather <- na.omit(weather)
fix(weather)
10. Getting some trial data
The idea is to link weather data to crop trial data.
We get some trial data that was incorporated in
the package.
trial <-
read.csv(system.file("external/trialsCA.csv",
package="cropData"))
locs <-
read.csv(system.file("external/locationsCA.csv",
package="cropData"))
11. Make a quick map
stationsSelected <- stationsExtent(c(-110,-
60,5,25), stations)
plot(stationsSelected[c("LON","LAT")], pch=3,
cex=.5)
points(locs[c("LON","LAT")], pch=15)
map("world",add=TRUE, interior=F)
12. Interpolation
We have already seen interpolation at work.
Now we use interpolation to estimate weather
variables for the trial locations.
The function interpolateDailyWeather()
automatically interpolates the weather
surface for each day and extracts the values
for each trial location.
13. Interpolate
Interpolate weather for the years 2003, 2004 and
2005.
ipW2003 <- interpolateDailyWeather(
tableGSOD = weatherCA,
locations = locs[c("ID", "LON", "LAT", "ALT")],
startDate="2003-5-15",
endDate="2003-9-25",
stations = stationsSelected)
Repeat for the other years and then combine:
ipW <- rbind(ipW2003,ipW2004,ipW2005)
14. Thermal stress
Minimum is Duration of
assumed to T > 30 °C =
be at sunrise. 4.8 h
Temperature (°C)
Maximum is
assumed to
be 2 h after
solar noon.
Time
15. Derive ecophysiological vars
?thermalStressDaily
Run the example to see how this works.
Then:
TEMPSTRESS30 <- thermalStressSeasonal(30,
ipW, trial, locs)
PREC <- precipitationSeasonal(ipW, trial)
RADIATION <- radiationSeasonal(ipW, trial, locs)
trial <- cbind(trial, TEMPSTRESS30, PREC,
RADIATION)
16. Do RDA on residuals
Instead of a normal PCA, we constrain the
axes of the PCA with linear combinations of
the ecophysiological variables.
This type of constrained PCA is called
redundancy analysis (RDA)
17. Do ANOVA
m <- lm(Yield ~ Variety + Location + Plant.m2,
data=tr2005)
G + GxE are left over, the rest is filtered out
tr2005$Yield <- residuals(m)
tr2005 <-
tr2005[,c("Variety","Location","Yield")]
20. Putting GxE on map
It is possible to use the resulting RDA model to
predict for any locations.
The steps would be:
1. Interpolate weather variables for new
location
2. Derive ecophysiological variables
3. Predict yield value for this new location
(not taking into account additive
environmental effect)
21. Final remarks
Trial data are often noisy – extracting the
signal from the data is the objective
Many environmental variables are difficult to
measure, but can be taken to be “random” in
the analysis
Many statistical tools exist to link weather
data to crop trial data.