Some engineering and scientific computer models that have high dimensional input space are actually only affected by a few essential input variables. If these active variables are identified, it would reduce the computation in the estimation of the Gaussian process (GP) model and help
researchers understand the system modeled by the computer simulation. More importantly, reducing the input dimensions would also increase the prediction accuracy, as it alleviates the "curse of dimensionality" problem.
In this talk, we propose a new approach to reduce the input dimension of the Gaussian process model. Specifically, we develop an optimization method to identify a convex combination of a subset of kernels of lower dimensions from a large candidate set of kernels, as the correlation function for the GP model. To make sure a sparse subset is selected, we add a penalty on the weights of kernels. Several numerical examples are shown to show the advantages of the
method. The proposed method has many connections with the existing methods including active subspace, additive GP, and composite GP models in the Uncertainty Quantification literature.
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies - Ansu Chatterjee, May 17, 2018
1. “Networks and Extremes”
sub-group of Extremes working group,
SAMSI
program on
Program on Mathematical and Statistical Methods for Climate
and the Earth System (CLIM)
Members:
Whitney Huang, Adway Mitra, Chen Chen,
Zhonglei Wang, Imme Ebert-Uphoff, Dan Cooley,
Ansu Chatterjee, …
2. The 8th International Workshop
on Climate Informatics
Topics included: any research combining climate
science with approaches from statistics, machine learning and
data mining, including position papers and work in progress.
Submissions due (abstracts / short papers): June 30, 2018.
Workshop: Sept 20-21, 2018 @ NCAR MESA Lab (Boulder, CO).
For more information: www.climateinformatics.org
Organizing committee includes:
Chairs: Dan Cooley & Eniko Szekely
PC chairs: Chen Chen & Jakob Runge
Local chair: Dorit Hammerling
Steering committee: Doug Nychka, C. Monteleoni, I. Ebert-Uphoff
3. Our goals about 9 months back
• This is more of a exploratory, “learn as you go”
working group.
• Get folks who know extremes (networks)
educated on networks (extremes).
• Review existing literature on climate
networks, find open problems.
• Find measures of relating ``vertices’’ that may
be pertinent for understanding climate
extremes, causality.
4. Our main activities:
• Event synchronization network for extreme
rainfall during the Indian summer monsoons.
• Chi network for Gulf coast hurricane related
rainfall extremes.
• Life cycle of extreme events revealed by
networks.
• Resampling methods for extremes and tails.
5. “Complex Networks” have recently found much use to investigate
spatial structure of climate phenomena.
But: Many of them use measures related to correlation
Those do not properly capture phenomena related to the tail of
the distribution, i.e. extreme events.
Thus most common existing complex network types are not
applicable to extreme event analyses.
That fact motivated this WG to ask the following question:
What kinds of networks can be used to
analyse/model relationships between extremes?
Networks for Extremes
6. Q1: Which network types already exist that focus on extreme
events?
Answer: Identified “event synchronization” networks as most closely
related existing type.
Q2: Which measure from extreme value analysis is suitable to
identify connectivity between nodes in a an “extreme value” sense?
Answer: χ -measure is natural choice of extreme value dependency
measure to try for new network type.
Q3: How do those two measures compare?
See next page.
Findings
7. Q3: How do those two measures compare? (paper in progress)
1) We defined a new “language of building blocks” that can be used
to express both event synchronization measures and the χ –
measure in the same framework.
2) Connection: Event synchronization consists of 2 measures, a
directed and an undirected measure. It turns out that the
undirected measure is extremely similar to applying some data
pre-processing PLUS using χ –measure.
3) That connection helps us
• To better understand and interpret existing work on event
synchronization measures in an extreme value sense.
• To define a new network type based on the χ –measure that
is based on solid extreme value theory.
What we have found
8. 1) Event synchronization -
existing
Origin: Analysis of brain
connectivity for seizures
Designed from scratch, does not
seem to be connected to any
theory.
Used to construct climate
networks related to extremes.
Appears to be only tool used for
that purpose to date!
We identified two primary measures
2) Chi measure – new in this
context
Origin: Extreme value theory
Well supported by theory, so its
properties are well understood.
Not yet used to construct climate
networks.
9. Origin of Event Synchronization Measure
2002: Event Synchronization proposed for brain signal analysis
Purpose: Study relationship between signals obtained from different
electrode locations, to locate origin of seizures (Quiroga et al. 2002).
Paper: Quiroga, R. Q., Kreuz, T., & Grassberger, P. (2002). Event synchronization:
a simple and fast method to measure synchronicity and time delay
patterns. Physical review E, 66(4), 041904.
2012: Exact same measure adopted to define climate networks.
Purpose: Identify connections between different locations around
the globe.
Paper: Malik, N., Bookhagen, B., Marwan, N., & Kurths, J. (2012). Analysis of
spatial and temporal extreme monsoonal rainfall over South Asia using complex
networks. Climate dynamics, 39(3-4), 971-987.
Origin of Event Synchronization
10. Pros of χ –measure:
• Properties of χ are well understood from extreme value theory,
while event synchronization is more ad-hoc (was developed to
analyze seizures in the brain).
• χ can also be applied on an annual scale, while event
synchronization needs much larger sample size.
Pro of event synchronization:
• Offers a directed measure, while the χ –measure does not.
Work in progress:
• Demonstrating and comparing the properties of both network
types using real-world examples, namely precipitation in India and
during hurricane season in the US.
What we have found
11. Example of Block language – Event Synchronization
Time series
X_S
Source S
Extract
extremes
Stretch events
by D_S,
weights w_S
Time series
X_T
Extract
extremes
Stretch events
by D_T,
weights w_T
Target T
Bivariate
function
Connectivity measure
for X_S X_T
Extremes: discrete
or continuous
representation
12. Networks and extremes: detailed
presentations
Coming up shortly
Whitney
Adway
Chen
A brief round-up of other related work on
networks:
14. Ongoing work on networks
• We use a gridded dataset from (NOAA)
https://www.esrl.noaa.gov/psd/gcos_wgsp/Grid
ded/data.noaa.erslp.html
• The data is on a 89x180 grid of spatial
locations, and there is a monthly time series
at each location for about 150 years.
• We adjust for trends and seasonality at the
pre-processing step.
15. Ongoing work on networks
• The search is to divide the space into many
homogeneous regions, and then study
relations between such homogeneous regions.
• Long-distance (negative) relations: tele-
connections.
• Causal relations are of special interest.
16. Ongoing work on networks
• Pairwise association-based networks:
– Identify local neighborhoods using a lower bound
on positive correlations between time series on
nearby locations.
– Summarize the neighborhoods (smooth spatially,
take an average time series), then find out if there
is a pair of neighborhoods with a decent negative
correlation value.
17. Ongoing work on networks
• More than pairwise:
– Identify local neighborhoods using a lower bound
on positive correlations between time series on
nearby locations.
– Summarize the neighborhoods (smooth spatially,
take an average time series).
– Use canonical correlations.
18. Ongoing work on networks
• Other approaches for involving more than two
locations together: Use the entire correlation
matrix, and then
– Get a sparse precision matrix.
– Use the spectrum.
– Use canonical correlation.
19. More ongoing work: Regularized
regression/VAR undirected network
• Regress the time series at each location on the
time series of every other location, and its own
lags.
• Penalize for too large a local neighborhood, too
many tele-connections, too much of a lagged
dependency.
• Penalize for spatial incoherence (difficult).
• Causality (to be) derived from the noise precision
matrix.
• (Future work idea:) Put this on a manifold.