Modeling Ground Ozone Levels Across the Contiguous United States

Modeling Ground Ozone for
the Contiguous United States
By
Michael Tuffly, Ph.D.
ERIA Consultants, LLC
GIS in the Rockies 2013
Cable Center
Denver, Colorado
10/9/2013

http://www.eriaconsultants.com
mtuffly@eriaconsultants.com

What is Ozone


Chemically


It is a molecule containing 3 Oxygen atoms (aka triatomic)
oxygen (O3).



Ozone is a powerful oxidizer (e.g. combines with Oxygen).


Examples of Oxidation


Rust on metal objects



Fire

“Oxidation is an increase in the oxidation number or a real or
apparent loss of one or more electrons.” (Miller 1981).
Miller. G. T., 1981. Chemistry: A basic Introduction Second Edition.
Wadsworth Publishing Company, Belmont, Californai. USA.

Ozone’s Location


Ozone which is located in the lower stratosphere (20 –
50 km in elevation) is beneficial to life on earth.


In the lower stratosphere ozone molecules form a
protective layer that filters out much of the high-energy
solar ultraviolet radiation.

3O2

Ultraviolet
Radiation

2 O3



Ground Ozone

Ozone at ground level can be an issue to the health of plants and animals


One way ground ozone is formed is via a reaction of NOx VOC’s, and
sunlight.


The primary source of NOx is from internal combustion engines (i.e. cars) and coal fire power plants.



Many sources of VOC’s


Methane, CFC, Benzene, Methylene chloride, etc…



VOC’s have a high vapor pressure which produces low boiling point temperatures



Low boiling point temperatures allows VOC’s to escape to the atmosphere

Some Effects of Ground
Ozone


In animals




Lung tissue damage can result from inhalation of ozone

In plants


Leaf surface damage (oxidation)



Disruption in stomata cell functions


Causing excessive water loss emulating drought conditions
(Smith et al. 2008).

Smith, G. C., J. W. Coulston, and B. M. O'Connell. 2008. Ozone Bioindicators and Forest Health: A Guide to the
Evaluation, Analysis and Interpretation of the Ozone Injury Data in the Forest Inventory and Analysis Program.
United States Department of Agriculture, Forest Service General Technical Report 34

Other ways ozone can be
formed


Lighting (natural) (small contributor)



Shorts in electrical equipment (anthropogenic)




Provides that unique smell (very small contributor)

Ozone is also use as a replacement for Chlorine
(potentially high contributor; but, really unknown)


In swimming Pools



In sewage treatment plants



In domestic water supply as a disinfectant

Modeling Ozone


Source ozone data are from EPA CASTNET




ftp://ftp.epa.gov/castnet/data/

Data are from a single year 2010


In the summer months during the “Ozone Activity Envelope” (OAE)


June – August from 1:00 PM – 5:00 PM



Base data for ozone are recorded every hour



Only 73 ground ozone collections sites were used




This is part of a larger study over a ten year time period. These 73 sites were the only
sites consistent from 2002 to 2011.

Five variables were extracted from these data for the OAE and averaged:


Ozone (PPB)



Wind Speed (MS)



Relative Humidity (% * 100)



Solar Radiation (Watts per m2)



Temperature (degrees C * 10)

Modeling Methods


Four different modeling methods were investigated:



Ordinary Kriging



Generalized Linear Model (GLM)





Inverse Weighted Distance (IDW)

Geographically Weighted Regression (GWR)

Results for all four modeling methods were:


Compared with a set of sample data not used in model
creation via the Mean Squared Error Predicted (MSEP)
method.

Autocorrelation


First, need to know if the data are autocorrelated



If the data are autocorrelated then we can use:





IDW
Kriging

Results from Morans’I (a test for autocorrelation) (Moran 1950


Data have a strong positive autocorrelation





Data points that are close together have similar values
Index = 0.421; p-value = 0

If data were not autocorrelated


Our best estimate using IDW or Kriging would be the mean for the whole
study site.

.

Moran, P.A.P. (1950). Notes on continuous stochastic phenomena, Biometrika 37, pp17-23

IDW


Called a deterministic function


Using the same input parameters will get the same results.



Data needs to be spatially autocorrelated



Three Basic parameters are required


Number of nearest neighbors



Power



Study area boundary



Useful for Continuous data (e.g. rainfall, elevation)



Not useful for: Categorical, Binary, Ordinal

Identifying IDW Parameters


Cross Validation



Calculate a new value for that point using the neighboring points



Repeat this for all points





Remove one data point at one location

Calculate the mean squared error and variance

Mean Squared Error Predicted (MSEP) gives:


The best number of nearest neighbors



The best power



The fewer number of nearest neighbors produces good local
estimates; but, poor global.



A larger number of nearest neighbors produces good global
estimates; but, poor local.



Need to balance between local and global estimates.

IDW
n

Zi
∑ Dy
i =1
i
x= n
1
∑ Dy
i =1
i
y = some exponent:;
usually 1 or 2

Distance is calculated
using the Pythagorean
Theorem
a2 + b2 = c2
For Distance A to x (C)
1.582 + 1.582 = 2.232
2.4964 + 2.4964 = 4.9729
4.97290.5 = 2.23

A
a

B
c

b
C

D

55
50
45

out2.mse

50
45

out.mse

55

60

Year = 2010 Power = 2, MSE Resd

60

Year = 2010 Power = 1, MSE Resi

43.3
40
35

35

40

41.8

0

8

10

20

30

40

50

num_neighbors

60

70

0

8

10

20

30

40

50

num_neighbors

60

70

Ordinary Kriging


(Krige 1951) (Matheron 1962)

A stochastic or indeterminate interpolation process


Where estimates or interpolations at an unobserved location are made based upon: the weighted
average of values at an observed location



Weights are base upon





The distance separating points
The function for the variogram

A variogram is used to identify key Kriging parameters:




Assumes an unknown stationary mean.




Sill, Range, Nugget, and covariance

Stationary mean refers that the mean over the area behaves predictably (e.g.. Gaussian).

Consider unbias





Mean residual sum to zero
Variance of error is minized

BLUE


Best Linear Unbias Estimator (Isaaks and Srivastava 1989)

Isaaks, E. H., and Srivastava, R (1989). An Introduction to Applied Spatial Statistics. Oxford, UK:
Oxford University Press.
Krige, D. G. 1951 A statistical approach to some basic mine valuation problems on the
Witwatersrand. Journal of the Chemical, Metal and Mining Society of South Africa 52 (6): 119 –
139)
Matheron, G. 1962. Traite de geostatistique appliquee. Editions Technip.

R output from Variogram

Spherical

Least Squares Estimate
Nugget = 7.7377
Sill = 47.48165
Range = 1100000
AICC

= 125.5306

Estimates:
Nugget = 15
Sill = 30
Range = 1,100,000

Gaussian

Least Squares Estimate
Nugget = 13.6845
Sill = 52.25631
Range = 1100000

AICC

= 128.4038

Exponential
Nugget = 9.2776
Sill = 71.61078
Range = 1100000
AICC

= 132.1289

Spherical and
Gaussian have
an AICC is less
than 3 units
apart; So
there is no
difference.

70

Graphic R Output

60

Gau

50
40

Sph

R
a
n
g
e

0

10

20

30

Ozone Values

Sill
52.7

Exp

Year = 2010 Krig Raw Data

Nugget 13.6

0e+00

2e+05

4e+05

6e+05
Distance Meters

8e+05

1e+06

Number of Nearest Neighbors

39
38
37
36
35

var(crossidw$resid)

40

41

Kriging Cross Validation, Gaussian Model

5

10

15

20
No. of Neighbors

25

30

Generalized Linear Models
(GLM)


Similar to linear regression



Different than IDW and Kriging


Needs predictor input variables


solar radiation and relative humidity proved to be significant predicator
variables.



Need to create the solar radiation and relative humidity surface via IDW
as input into the GLM equation.



The GLM equation is:
45.35 + (SR * 0.0332) + (RH * -0.235)


R2 = 0.58



The GLM describes the “Large Scale Variability”



The “Small Scale Variability” is computed by calculating the differences
between the observed values and the (GLM) predicted values.



Adding the “Large Scale Variability” to the “Small Scale Variability” can
produce a good predicative surface.

Geographically Weighted
Regression (GWR)


A powerful modeling method that includes:





Linear Regression
Space

In a nutshell


GWR creates a series of local linear equations base upon the spatial parameters of the independent
variables:



Kernel Function





Fixed Search Radius
Variable (number of neighbors)* (AKA Adaptive)

Bandwidth Method (fixed radius)





Cells located with in the search radius will have the same coefficients.
Best if sample points are located in a systematic method (e.g. no a gird with fixed distances).

Bandwidth Method (Adaptive or variable search radius)


One that uses the number of nearest neighbors from user input



One that uses a cross validation method which attempts to minimize the collinearity


Best if sample points are randomly located in the study area.






A sample point will be used multiple times to construct multiple linear equations
Each cell may contain different regression coefficients

Each linear equation (fixed radius or adaptive) uses the same global predictor variables as GLM


Solar Radiation and Relative Humidity proved to be the best global independent variables.

Results
Test

Residuale Autocorrelated
No
GLM + IDW
GWR using AICC and 25 nn No
GWR using CV
No
IDW
No
Kriging
No

MSE
MSE New Points
0.54
196.06
21.98
265.09
38.43
241.2
0.6
204.45
6.48
191.86

Data Issues
1) Should have more data points to create and test the models
2) Data points should be more distributed over the study area
(e.g. no points in Oregon, Idaho, etc.. and few points in
center of the nation.)
3) IDW MSE values for the observe points should not be
different. This is likely due to cell size and rounding errors.
4) The variables temperature and wind speed were tested in the
GWR model. Test results using these covariates included both
the CV method or number of nearest neighbors. Results were
very poor and not shown here.

Take Home Message Final


Statistical models are an abstraction of reality.



No statistical model is perfect. (e.g. errors)


Some models are better than other (Crawley 2007).



The correct model can never be known with complete certainty (Crawley 2007).



The simpler the model the better it is (Crawley 2007).


Models should include the Principle of Parsimony (Occam’s Razor)





Use the fewest number of variables
The correct explanation is the simplest explanation

Make sure that the assumptions of the model are followed.





Are the data IID.
Are the data spatially autocorrelated

Are the input variables correct?





Errors in measurement
Using temperature when solar radiation is a better independent variable.

How was the data collect





Random Sample, Systematic, etc…
Is there bias in the sample data?

Always as yourself does this model make sense.


Is the model predicted something where it should not


Example a fish population on land.

Crawley, M. J. 2007. The R Book. Imperial College London at Silwood Park, UK.

Final Quote

“Son you're going to drive me to drinking… if you don’t stop
driving that hot rod Lincoln.”
1971.

Modeling Ground Ozone Levels Across the Contiguous United States

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (16)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Modeling Ground Ozone Levels Across the Contiguous United States

Ähnlich wie Modeling Ground Ozone Levels Across the Contiguous United States (20)

Mehr von GIS in the Rockies

Mehr von GIS in the Rockies (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Modeling Ground Ozone Levels Across the Contiguous United States