Speaker: James W Jones (Jim) Emeritus Distinguished Professor, Agricultural and Biological Engineering, University of Florida.
The talk will cover overall perspective of both genetic and modeling and advanced methods for working with the genetic and phenotypic data with crop models and a perspective on promising future approaches.
2. Gene-Based Crop Modeling
J. W. Jones, M. J. Correll, K. J. Boote, S. Gezan, and C. E. Vallejos
CIAT
Aug 4, 2015
Source: Monica Ozores-Hampton
3. Crop models can be considered as non-linear functions
Estimate GSPs (Genetic Coefficients), fit linear statistical model
to estimate GSPs vs. QTLs
Develop new statistical linear mixed effects models of G, E, and
GxE for different processes
• E.g., flowering date, node addition rate, leaf size, max number of
MS nodes, …
Integrate new relationships into existing DSSAT CROPGRO-
Bean model
Develop component process modules using linear or nonlinear
mixed effects models of traits vs. QTLs and environmental
factors, combine them to demo modular approach
Future – compare Genomic Prediction for beans similar to
Technow et al. Plos One 2015)
Discussion
Outline: Our Work in Modeling
CIAT
Aug 4, 2015
4. Dynamic Crop Models
Dynamic, variables of interest change over time (state variables)
Environment also changes over time
System of equations & not just a single variable to predict
Variables interact, typically in highly non-linear ways, varying
over time
There is not a single equation to calculate the response that one
is interested in (e.g., final yield of a crop)
Final yield (and other variables) may reach their final values in
many different ways, depending on genetics and environment
CIAT
Aug 4, 2015
5. General Form of a Dynamic System Model
Discrete Time/Difference Equation
Difference equation form, when time step
equals 1 (e.g., 1 day):
U1,t+1 = U1,t + g1[Ut, Xt, θ]
U2,t+1 = U2,t + g2[Ut, Xt, θ]
.
.
.
US,t+1 = US,t + gS[Ut, Xt, θ]
CIAT
Aug 4, 2015
6. Example
Final yield response to all variables during a season
Y =f (X;θ)
where
X represents all explanatory variables during a season,
θ represents all parameters of the dynamic model
f represents a function (typically implicit function)
• We could write this as
Y = simulated final grain biomass at harvest time, T, as affected by
explanatory variables (e.g., irrigation applied during a season) and by
all parameters
Dynamic System Model as a Response Model
CIAT
Aug 4, 2015
8. How Simulation Computes Responses
Figure 1.3. Computer program flow diagram showing how a simulation model is used
as a function such that any time a response is needed, the simulation is run to
calculate state variables for every time step, but return only the value of selected state
variable for the time of interest. In this case, we are interested in Y at a time t = 140.
CIAT
Aug 4, 2015
9. Quantities in the model that represent variations in crop
performance across cultivars or lines
GSPs are the same as “cultivar coefficients” that have been used
routinely in the models contained in DSSAT
Examples
• Phenology – e.g., duration to first flower under optimal conditions
• Size of leaves on the main stem
• Maximum rate of node appearance on the main stem under optimal conditions
• Number of seeds per pod (or per ear in maize)
Must be known for each cultivar to simulate its performance
Genotype-Specific Parameters (GSPs)
CIAT
Aug 4, 2015
10. Example, DSSAT CROPGRO-Bean Model using GSPs
0
500
1000
1500
2000
20 40 60 80
Leaf,Stem,orSeedMass
Days after Sowing
Leaf-Jatu-Rong
Leaf-Porrillo S.
Stem-Jatu-Rong
Stem-Porrillo S.
Seed-Jatu-Rong
Seed-Porrillo S.
Obs Leaf
Obs Leaf
Obs Stem
Obs Stem
Obs Seed
Obs Seed
Flw Sd
Flw Sd
R7
R7
Figure 6. Time course of leaf, stem, and seed mass accumulation of Jatu-Rong (Andean)
and Porrillo Sintetico (Meso-American) cultivars relative to time of first flower (Flw),
first seed (Sd), & beginning maturity (R7) (grown at Palmira, Colombia (data from Sexton et al., 1994, 1997).
CIAT
Aug 4, 2015
11. Application of Crop Models
Genotypes
G, M Selection for
Optimal Responses
Bean Crop
Model
Environment,
Management Data
Sim Phenotypic
Responses
Iterative
Exploration
GSPs
CIAT
Aug 4, 2015
12. TRIFL is a GSP in the existing bean model
TRIFL is the maximum rate of node appearance on the main stem,
number per day
Temperature has a major effect on how rapid new nodes appear on
the main stem
The model* in the DSSAT common bean model is:
GSP Example - TRIFL
𝑁𝐴𝑅(𝑡) = 𝑇𝑅𝐼𝐹𝐿 ∙ (
1
24
)
𝑇ℎ∗ − 𝑇𝑏𝑎𝑠𝑒
(𝑇𝑜𝑝𝑡1 − 𝑇𝑏𝑎𝑠𝑒)
where
NAR(t) = rate of new node or leaf appearance on the main stem on day t, #/day,
TRIFL = maximum node/main stem leaf addition rate, number per day,
Tbase = base temperature, below which the rate is 0.0, 0C,
Topt1 = temperature above which node addition rate remains its maximum value, 0C,
Thour = hourly temperature in the field where the crop is growing, 0C, and
𝑇ℎ∗
=
𝑇𝑏𝑎𝑠𝑒 𝑖𝑓
𝑇ℎ𝑜𝑢𝑟 𝑖𝑓
𝑇𝑜𝑝𝑡1 𝑖𝑓
𝑇ℎ𝑜𝑢𝑟 < 𝑇𝑏𝑎𝑠𝑒
𝑇𝑏𝑎𝑠𝑒 < 𝑇ℎ𝑜𝑢𝑟 < 𝑇𝑜𝑝𝑡1
𝑇𝑜𝑝𝑡1 < 𝑇ℎ𝑜𝑢𝑟
CIAT
Aug 4, 2015
13. TRIFL is a GSP
Tbase and Topt1 are not GSPs, but are species-dependent
parameters in the current bean model
Also, TRIFL has been used as fixed across cultivars in the past due
to lack of information
We now know that TRIFL varies significantly across lines/cultivars,
based on our NSF study
What about Tbase and Topt1?
Example will be given later in the week on how this new information
is affecting how we model beans
TRIFL Example (continued)
CIAT
Aug 4, 2015
14. Data are needed for each cultivar or genotype
In our NSF study, we had over 180 genotypes, and for each of them,
we had observations in the field at 5 locations
These data were used to estimate GSPs, as will be shown later in
the workshop
The basic idea is that we use the multi-location experiment
phenotypic data:
• Set initial GSPs as input to the simulation,
• compare simulated and observed phenotypic data,
• compute a measure of how close the simulated phenotypic data are to observed
• Vary the GSPs and search the range of feasible values until a criterion is met,
such as minimizing the sum of the differences (errors) squared (e.g., MSE basis)
or maximizes a likelihood function
Estimating GSPs
CIAT
Aug 4, 2015
15. GSP Estimation: Various Approaches, including
Bayesian MCMC for Model Development, Genomic
Prediction, etc.
RILs
Error/Likelihood
Bean Crop
Model
Multi-Location
Experiments
Phenotypic
Data
QTLs
(~traits)
Environment,
Management Data
Sim Phenotypic
Responses
Iterative
Estimation
GSPs
GSP* & QTL
effects
16. Adding Genetic Information for Application of Crop
Models (Ideotype Design, Selection of G, M for E,
Genomic Prediction)
Genotypes
G, M Selection for
Optimal Responses
Bean Crop
Model
QTLs
Environment,
Management Data
Sim Phenotypic
Responses
Iterative
Exploration
GSPs
CIAT
Aug 4, 2015
17. Current approaches – develop relationships between GSPs
and QTLs (e.g., White and Hoogenboom, 1996, 2003;
Messina et al., 2006; etc.)
Why not continue this?
• Current models do not include GSPs for all processes and traits that
we now know are under genetic control (examples from this study)
• May need to modify environmental effects, interactions, in the model
• Current crop models are not ideally structured to make all of the
changes that are needed.
• Major changes are likely needed in many places, although some
code may be reusable
• Although some existing crop models are modular, new modules are
needed that are designed based on what we are now learning about
genetic control of processes and so that new modules can be easily
modified as more is learned, fine granularity
Need for a new gene-based model
CIAT
Aug 4, 2015
18. Example Results After Incorporating* Gene-
Based Component in CROPGRO-Bean
0
2
4
6
8
10
12
14
16
18
20 40 60 80 100
Days after Planting
Leaf number (Jamapa QTLs (-1) 0.3 m ro)
Leaf number (Calima QTLs (+1) 0.3 m ro)
0
1000
2000
3000
4000
5000
6000
20 40 60 80 100
Days after Planting
Grain wt kg/ha (Jamapa QTLs (-1) 0.3 m ro)
Tops wt kg/ha (Jamapa QTLs (-1) 0.3 m ro)
Grain wt kg/ha (Calima QTLs (+1) 0.3 m ro)
Tops wt kg/ha (Calima QTLs (+1) 0.3 m ro)
Main Stem Node
Number
Biomass and Pod Mass,
kg/ha
* Incorporated NAR to compute TRIFL only
CIAT
Aug 4, 2015
19. Need to account for G x E x M interactions on processes
Need to design for evolution as more knowledge about
genetic effects on crop components is obtained
Example Gene-based Model of bean leaf area
Design modules with QTL effects on CM processes
Still a work in progress
New Modular Approach
CIAT
Aug 4, 2015
20. 𝑵𝑨𝑹(𝒕) = 0.252 + 0.021 ∙ 𝑇𝐸𝑀𝑃 − 21.51 − 0.005 ∙
𝑆𝑅𝐴𝐷 − 17.38 − 0.004 ∙ 𝐷𝐿 − 12.74 − 0.010 ∙ 𝐵𝑛𝑔072 −
0.032 ∙ 𝐹𝐼𝑁 + 0.009 ∙ 𝐵𝑛𝑔083 − 0.008 ∙ 𝐷𝑖𝑀7−𝟕 − 0.004 ∙
𝐵𝑛𝑔072 ∙ 𝐷𝐿 − 12.74 − 0.003 ∙ 𝐹𝐼𝑁 ∙ (𝑇𝐸𝑀𝑃 − 21.51)
Linear Mixed Effects Model for NAR(t)
Bng072 Marker for QTL found to influence NAR, + 1 for Calima and -1 for Jamapa parental lines
Bng083 Marker for QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental lines
DL Average daylength during time when nodes were being added in genotype g at site s (h)
DLmean Average daylength across sites in the experiment during node addition, h
Dim7-7 Gene or QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental lines
FIN Gene or QTL found to influence NAR, equal to + 1 for Calima and -1 for Jamapa parental lines
NAR(t) Node addition rate, nodes per day added to the main stem for genotype g grown at site s
SRAD Average SRAD across sites in the experiments, MJ m-2 d-1
TEMP Average of daily mean temperature during the time when nodes were added, 0C
CIAT
Aug 4, 2015
21. NAR vs. Temperature
Parent Lines
0
0.1
0.2
0.3
0.4
0.5
0.6
0 10 20 30 40
NodeAdditionRate,#/d
Temperature, C
Jamapa (-1) Calima (+1)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 10 20 30 40
NodeAdditionRate,#/d
Temperature, C
Jamapa (-1) Jamapa with Calima FIN
Calima (+1) Calima with Jamapa FIN
(a) (b)
CIAT
Aug 4, 2015
22. Modular Approach
Example of a module: model that computes node addition rate on day t (NAR(t))
CIAT
Aug 4, 2015
23. We know that temperature effects on most crop
growth processes is nonlinear
Also, this linear model uses mean temperature
during observation period, when we know that plants
respond non-linearly to temperature and should be
considered hourly
So, modules need to be dynamic and include
nonlinear effects
But, is Linear Model Adequate?
CIAT
Aug 4, 2015
25. What are the GSPs in the above equation?
Are they constant across environments?
Does this nonlinear formulation make sense relative to physiological
process and what we know?
Is it sufficiently robust? How can we determine this?
Will the GSPs in this equation remain fixed across genotypes?
Environments? Management?
Will “calibration” be needed after fitting these equations to field
data? If so, how will this differ from what we now do?
We should formulate nonlinear models based on mechanistic
knowledge, then estimate parameters using data from genetic family
across diverse environments.
What About GSPs?
CIAT
Aug 4, 2015
29. CommonBean Model: Integrating modules
Integrated
Modules
CommonBean Model
Initialize T storage matrix
Initialize d = VEDAP
Set initial state variables
Set hour h = 1
Read T for hour h
Calculate & update mean T
Hour = 24 ?
Run MSNOD.max Module
Run NAR Module
Run LAMS Module
Day (d) = End point (DAY) ?
End CommonBean Model
h=h+1
d=d+1
No
No
CIAT
Aug 4, 2015
34. Crop Model-Based Genomic Prediction
outperforms GBLUP
QTLs estimate Yield via
Crop Model function using GSPs
QTLs estimate Yield via
GBLUP
Yield=f(4 GSPs,Env)
35. Discussion
Demonstrated benefits of merging crop modeling
and genetics
Various methods are reasonable
Need new G,E nonlinear functions estimated using
mixed effects models, physiologically based with G
and E components (management also)
Modularity is important, short and long term
Paper in Special Issue
Genomic Prediction with crop models likely to
perform better than other methods (GBLUP)
CIAT
Aug 4, 2015