Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Multivariate Dynamic Spatial Factor Model for Speciated Pollutants and Adverse Birth Outcomes - Montse Fuentes, Aug 23, 2017
Evidence suggests that exposure to elevated concentrations of air pollution during pregnancy may increase risks of birth defects and other adverse birth outcomes. While current regulations put limits on total PM2.5 concentrations, there are many speciated pollutants within this size class that likely have distinct effects on perinatal health. However, due to correlations between these speciated pollutants it can be difficult to decipher their effects in a model for birth outcomes. To combat this difficulty we develop a new multivariate spatio-temporal Bayesian model for speciated particulate matter using dynamic spatial factors. These spatial factors can then be interpolated to the pregnant women’s homes to be used in a birth outcomes model. The model for birth outcomes allows the impact of pollutants to vary across different weeks of the pregnancy in order to identify susceptible periods. The proposed innovative methodology is implemented using pollutant monitoring data from the Environmental Protection Agency and birth records from the National Birth Defect Prevention Study.
Work in collaboration with Kimberly Kaufeld, Brian Reich, Amy Herring, Gary Shaw and Maria Terres.
Ähnlich wie Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Multivariate Dynamic Spatial Factor Model for Speciated Pollutants and Adverse Birth Outcomes - Montse Fuentes, Aug 23, 2017
State of the Evidence - The Connection between Breast Cancer & the Environment v2zq
Ähnlich wie Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Multivariate Dynamic Spatial Factor Model for Speciated Pollutants and Adverse Birth Outcomes - Montse Fuentes, Aug 23, 2017 (20)
Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Multivariate Dynamic Spatial Factor Model for Speciated Pollutants and Adverse Birth Outcomes - Montse Fuentes, Aug 23, 2017
1. A Multivariate Dynamic Spatial Factor Model for
Speciated Pollutants and Adverse Birth Outcomes
Montse Fuentes
Virginia Commonwealth University
August 23, 2017
Joint with with Kimberly Kaufeld, Brian Reich, Amy Herring, Gary Shaw and Maria Terres
Montse Fuentes Multivariate Factor Model for Birth Defects
2. Why Birth Defects?
Birth defects: a physical or biochemical abnormality that is
present at birth and that may be inherited or the result of
environmental influence.
Around 3% of all births result in a birth defect. Orofacial
defects are the most common.
Cause of cleft palate defects is unknown
Contributes to long-term disability, which may have significant
impacts on individuals, families, health-care systems, and
societies.
Figure: Cleft lip and palate http://www.wetherallgroup.com/cleft-palate/
Montse Fuentes Multivariate Factor Model for Birth Defects
3. Scientific Motivation
Maternal exposure to air pollutants have been related to
adverse birth outcomes
Preterm birth
Low birth weight
Birth defects, i.e. cleft lip/palate
Researchers believe that exposure to high concentrations of air
pollution during pregnancy may significantly increase the risk
of birth defects and other adverse birth outcomes.
Montse Fuentes Multivariate Factor Model for Birth Defects
4. Challenges:
Estimating mother’s exposure to pollutants with sparse
monitor locations
Current regulations put limits on total PM2.5 concentrations,
there are many speciated pollutants within this size class that
likely have varying effects on perinatal health.
Correlations between speciated pollutants it can be difficult to
decipher their effects in a model for birth outcomes.
Montse Fuentes Multivariate Factor Model for Birth Defects
5. Background-Birth Defects and Pollution
Orofacial cleft defects (cleft palate & or lip) appear in the first
trimester, weeks 3-8.
Orofacial cleft associations with air pollution
Gilboa et al. (2005) found a weak association between PM10
and isolated cleft lip with or without cleft palate.
Association of air pollution, CO and ozone, 03, exposure and
oral clefts (Ritz 2002).
Association between traffic density and cleft lip with or
without palate (Padula et al 2013)
Montse Fuentes Multivariate Factor Model for Birth Defects
6. Pollutant exposure assessment
Pollutant Data and Birth Defects
Mothers pollutant exposure assigned nearest air pollution
monitor (Ritz et al., 2002, Vrijheid et al., 2011).
Generally, mother’s residence is based on the home at time of
birth which doesn’t necessarily correspond to home during first
trimester.
Monitoring stations resulting in similar exposure over long
areas equals a community-wide variation in air pollution.
Montse Fuentes Multivariate Factor Model for Birth Defects
7. Statistical Challenges
Incorporate spatial analysis of environmental health data,
typically not considered in classical birth defect epidemiological
studies.
Incorporate spatial prediction for pollution to provide
measurements at the pregnant mothers’ homes to be used in a
model for birth outcomes.
Model multivariate birth defects to include multi-pollutant
level data at the same time.
Create statistical models to identify the specific
critical windows and spatial locations during the pregnancy
when high exposures to pollutants more negatively affect the
birth defects.
Characterize different sources of uncertainty in data and
models.
Montse Fuentes Multivariate Factor Model for Birth Defects
8. California National Birth Defects Data
Live births with and without any known birth defects where
Woman’s date of conception is based upon the estimated date
of delivery, the due date that the woman received from her
physician reported at the study interview; 2003-2006
Cases included live births, and controls are live births without
any known birth defects, identified randomly from selected
hospitals in California
Self reported survey where demographic and behavior
information was reported
There were a total of 208 cleft lip or cleft palate defects
reported and 358 controls
Montse Fuentes Multivariate Factor Model for Birth Defects
9. Birth Defect Geocoded Data
As part of the study women reported their complete residential
history during pregnancy.
Residences were geocoded each week during the first eight
weeks of pregnancy to assign exposure levels, to account for
mobility of the mothers during the study period
Woman’s date of conception based upon the estimated date of
delivery, due date that the woman received from her physician
which was reported at the study interview
Most of the data was collected in the San Joaquin Valley in
California the initial study region
Approximately 15% of the women moved during the study to
areas outside of the initial study area, with the majority of the
women moving to the San Francisco Bay and Los Angeles area.
Montse Fuentes Multivariate Factor Model for Birth Defects
10. Birth Defect Geocoded Data
−124 −122 −120 −118 −116 −114
323436384042
Longitude
Latitude
Improve
STN
BD Residence
Fresno CA
Long Beach CA
Los Angeles CA
San Diego CA
San Francisco CA
San Jose CA
Figure: Representative BD Residences and pollutant monitor locations
Montse Fuentes Multivariate Factor Model for Birth Defects
11. NBDPS data
Covariates include:
Sex of the infant
Maternal age classifications
19 and under (baseline)
20-24
25-29
30-34
35 and older
Prediabetes
Maternal education (high school, some college and college)
High blood pressure during pregnancy
Smoking while pregnant (Yes/No)
Alcohol use while pregnant (Yes/No)
Montse Fuentes Multivariate Factor Model for Birth Defects
13. Pollutant Sources
Speciated Pollutants, PM2.5
Figure: PM 2.5 components, source: Guaita et al., 2011
Montse Fuentes Multivariate Factor Model for Birth Defects
14. Pollution Data
The Air Quality System (AQS) monitoring data
Ambient air pollution data collected by EPA, state, local, and
tribal air pollution control agencies in California
Measurements collected every three to six days for
Interagency Monitoring for Protected Visual Environments
(IMPROVE) - rural sites
STN sites - urban sites
40 monitors in CA where 20 are in the study region
Montse Fuentes Multivariate Factor Model for Birth Defects
15. Pollutant Data Sources
California, 2003-2006
Ammonium- Weekly average (ug/m3
)
Nitrate (NO3)- Weekly average (ug/m3
)
Sulfate (SO4)- Weekly average (ug/m3
)
Total Carbon- Weekly average (ug/m3
)
Calcium (Ca) - Weekly average (ug/m3
)
Iron (Fe) - Weekly average (ug/m3
)
Potassium (K)- Weekly average (ug/m3
)
Silicon (Si)- Weekly average (ug/m3
)
Sulfur (Su)- Weekly average (ug/m3
)
Montse Fuentes Multivariate Factor Model for Birth Defects
16. STN and IMPROVE Data Monitors
Figure: Active Monitors, 2003-2006 Blue-IMPROVE, Black-STN
Montse Fuentes Multivariate Factor Model for Birth Defects
17. Weekly Pollutant Data
Figure: STN and IMPROVE speciated components 2003-2006 weekly
averages for each site where the vertical lines separate each year.
0 50 100 150 200
−4−2024
Week
logAmmonium
STN
Improve
0 50 100 150 200
−4−2024
Week
logNitrate
0 50 100 150 200
−4−20123
Week
logSulfate
0 50 100 150 200
−4−2024
Week
logTotalCarbMass
0 50 100 150 200
−4−3−2−10
Week
logCalcium
0 50 100 150 200
−4−3−2−10
Week
logIron
0 50 100 150 200
−4−2012
Week
logPotassium
0 50 100 150 200
−4−201
Week
logSilicon
0 50 100 150 200−4−201
Week
logSulfur
Montse Fuentes Multivariate Factor Model for Birth Defects
18. Speciated Pollutant Correlations
Table: Pearson correlation coefficients of weekly speciated pollutants
from stations relevant to the study in the state of California, 2003-2006.
Nitrate Sulfate Total Calcium Iron Potassium Silicon Sulfur
Carbon
Mass
Sulfate 0.41
Total Carbon Mass 0.38 0.05
Calcium 0.03 0.08 0.20
Iron 0.19 0.11 0.48 0.77
Potassium 0.07 0.17 0.08 0.16 0.12
Silicon -0.03 0.01 0.05 0.85 0.78 0.13
Sulfur 0.41 0.97 0.03 0.10 0.12 0.20 0.04
Ammonium 0.96 0.60 0.34 0.05 0.19 0.06 -0.01 0.59
Montse Fuentes Multivariate Factor Model for Birth Defects
19. Multivariate Spatial-Temporal Model
Our model is a dynamic linear model with observation and
evolution equations.
Let Ytp(s) be the observation at location s on day t = 1, ..., T, for
pollutant p = 1, ..., P, and Yt(s) = [Yt1(s), · · · , YtP(s)]T .
Yt(s) = µt(s) + Λ(s)δt(s) + t(s)
δt(s) = Γ(s)δt−1(s) + wt(s)
where
t(s) = [ t1(s), · · · , tP(s)]T are errors with tj (s) ∼ N(0, σ2)
wt(s) = [wt1(s), · · · , wtM(s)]T denoted wtl ∼ GP[0, s2
w (φw )]
Montse Fuentes Multivariate Factor Model for Birth Defects
20. Factor model
The factor loading matrix, Λ(s) is:
The effect of factor f on pollutant p is determined by the
(p, f ) element of Λ(s), denoted by λpf (s).
To ensure identification we fix λpf (s) = 0 for f < p and
λpf (s) > 0 for p = 1, ..., M.
To induce spatial smoothness in the loadings log[λpp(s)] and
λpf (s) for f > p are GP[0, s2
Λf ,p
R(φΛf ,p
)].
Montse Fuentes Multivariate Factor Model for Birth Defects
21. Factor model
The propagation matrix Γ(s) is:
Diagonal with diagonal elements γ1(s), ..., γM(s)
To ensure stationarity in time the factor evolution coefficients
are restricted to the interval (−1, 1)
Yt(s) = µt(s) + Λ(s)δt(s) + t(s)
δt(s) = Γ(s)δt−1(s) + wt(s)
γf ∼ TN(−1,1)[0N, s2
Γf
R(φΓf
)]
λpf (s) ∼ GP[0, s2
Λf ,p
R(φΛf ,p
)]
δf ,0 ∼ GP[0, s2
δf
R(φδf
)]
Montse Fuentes Multivariate Factor Model for Birth Defects
22. Spatial Dynamic Factor Analysis Model
Y 1
t
Y 2
t
...
Y P
t
=
µ1
1N
µ2
1N
...
µP
1N
+
Λ1
1 Λ1
2 . . . Λ1
m
0 Λ2
2 . . . Λ2
m
...
...
...
...
0 0 . . . ΛP
m
δ1,t
δ2,t
...
δm,t
+
1
t
2
t
...
P
t
δ1,t
δ2,t
...
δm,t
=
Γ1 0 . . . 0
0 Γ2 . . . 0
...
...
...
...
0 0 . . . Γm
δ1,t−1
δ2,t−1
...
δm,t−1
+
w1,t
w2,t
...
wm,t
where µp
is the mean of pollutant p across all time points and locations
Montse Fuentes Multivariate Factor Model for Birth Defects
23. Factor updates: FFBS
The factors are updated through a Forward Filtering Backwards
Sampling (FFBS) algorithm (Carter & Kohn 1994,
Frühwirth-Schnatter 1994:
Forward Filtering : For t = 1, . . . , T, compute mt = at + At (Yt − ˜Yt ) and
Ct = Rt − At Qt At , where at = Γmt−1, At = Rt Λ Q−1
t ,
Qt = ΛRt Λ + Σ , Rt = ΓCt−1Γ + Σw , and ˜Yt = µ + Λat .
Then sample δT ∼ N(mT , CT ).
Backwards Sampling : For t = (T − 1), . . . , 0 sample δt ∼ N(˜at , ˜Ct ) where
˜at = mt + Bt (δt+1 − at+1), ˜Ct = Ct − Bt Rt+1Bt , and
Bt = Ct Γ R−1
t+1.
Montse Fuentes Multivariate Factor Model for Birth Defects
24. Pollutant Means
The pollutant data are from two observation networks, STN for
urban sites, and IMPROVE (IMP) for rural sites. We allow the
mean and error variances to differ based upon the pollutants sites
in the model by accounting for the two networks as follows,
µtp(s) =
¯µ0p if s is an IMPROVE site
¯µ0p + ¯µ1p if s is a STN site
and t,p(s) ∼ N[0, σ2
p(s)] where
σ2
tp =
σ2
STN,p
σ2
IMP,p.
Montse Fuentes Multivariate Factor Model for Birth Defects
25. Birth Defect Model
The California birth defect data has two binary responses denoted
˜Yi1 =
0 no cleft palate defect
1 cleft palate defect for individual i
˜Yi2 =
0 no cleft lip defect
1 cleft lip defect for individual i.
Montse Fuentes Multivariate Factor Model for Birth Defects
26. Defect Model
The probability of having a cleft defect we assume a set of latent
variables, Zi = (Zi1, Zi2) such that ˜Yij = I(Zij > 0) and
Zi = βT
xi +
M
m=1
L=8
=3
wT
mδi m + i
where
xi contains individual-level covariates such as maternal age
β is a matrix of regression coefficients specific to the birth
defect
wT
m represents the effect of exposure factor m during
gestation week
i ∼ N(0, Ω)
Montse Fuentes Multivariate Factor Model for Birth Defects
27. Weekly Coefficients on Pollutant Indices
Zi = βT
xi +
M
m=1
L=8
=3
wT
mδi m + i
The temporal covariates from the M factors are accounted for by
wT
m ∼ GP[0, R(φ)]
where the coefficients are allowed to change smoothly across
pregnancy weeks.
In this case, wT
m are the coefficients for each of the Zi clefts (cleft
lip and cleft palate) weekly measurements of pollutant index M
(factors)
Montse Fuentes Multivariate Factor Model for Birth Defects
28. Correlated cleft defects
Zi = βT
xi +
M
m=1
L=8
=3
wT
mδi m + i
The defects are correlated, to account for this we use an inverse
Wishart for the covariance structure in i ∼ N(0, Ω)
Ω ∼ IW (Σ, ν)
where Σ = I2 and ν = 2.
Montse Fuentes Multivariate Factor Model for Birth Defects
29. Model comparison
We compare M = 1, 2, and 3 factor models using the deviance
information criterion (DIC) (Spiegelhalter 2002)
Factors DIC
1 1624.06
2 1628.15
3 1629.98
In addition to DIC,the impact of the pollutants at the weekly level
are more clearly defined in the one-factor model.
Montse Fuentes Multivariate Factor Model for Birth Defects
30. Model comparison
Classification of defects:
Analyzed the predicted probabilities to assess the performance
of the one factor model.
Checked to see how well it classified defects.
Percent of defects correctly identified as defects,
P( ˜Y |Y = 1) = 0.92.
Montse Fuentes Multivariate Factor Model for Birth Defects
31. Impacts on Oral Cleft Risks
Figure: Covariate effects for the multivariate binary probit model. The posterior
means (dots) and 95% credible intervals (lines) for a change in cleft defects with a
one-standard deviation increase in maternal age, education, maternal smoking, and
maternal alcohol consumption in the California National Birth Defects Prevention
Study, 2003-2006.
Montse Fuentes Multivariate Factor Model for Birth Defects
32. Latent Factor Results
Figure: Weekly association of latent pollutants factors (w m) for the one-factor
model.
Montse Fuentes Multivariate Factor Model for Birth Defects
33. Pollutants
Figure: The standardized pollutants, Ammonium, Total Carbon Mass and
Sulfate. STN and IMPROVE monitoring sites with actual data values
shaded in the dots.
Montse Fuentes Multivariate Factor Model for Birth Defects
34. Pollutants
Figure: The standardized pollutants, Silicon, Iron, and Latent Factor
Values. STN and IMPROVE monitoring sites with actual data values
shaded in the dots.
Montse Fuentes Multivariate Factor Model for Birth Defects
35. Factor Loadings
Figure: California factor loadings. The Λp(s) for Ammonium, Nitrate,
Sulfate, The dots represent the STN and IMPROVE monitoring sites.
Montse Fuentes Multivariate Factor Model for Birth Defects
36. Factor Loadings
Figure: California factor loadings. The Λp(s) are Total Carbon Mass,
Calcium, Iron. The dots represent the STN and IMPROVE monitoring
sites.
Montse Fuentes Multivariate Factor Model for Birth Defects
37. Factor Loadings
Figure: California factor loadings. The Λp(s) are Potassium, Silicon, and
Sulfur. The dots represent the STN and IMPROVE monitoring sites.
Montse Fuentes Multivariate Factor Model for Birth Defects
38. Conclusions
Our multivariate spatiotemporal model
Accounts for the bias in the monitoring stations, STN and
IMPROVE
Incorporates factors that are both spatial and temporal
through Gaussian processes, unlike some previous work
Weekly averages speciated pollutants to identify the impacts
of pollutants and weeks when the fetus is more susceptible to
pollutants in the air which we note as weeks 3-8
Accounts for mothers’ mobility at the time of pregnancy
Montse Fuentes Multivariate Factor Model for Birth Defects
39. Health model conclusions
The health component shared information across two types of
cleft defects, lip and palate
Identify factors that increase the risks of cleft lip and cleft
palate defects, in particular mother’s that had diabetes.
Identified gestation weeks where the fetus is at the greatest
risk of birth defects based upon the latent factor of pollutants
that mother’s were exposed to at the time of gestation
Identify pollutants in the study region that weighted more
heavily than other pollutants
Montse Fuentes Multivariate Factor Model for Birth Defects