Statistical estimation and inference for large data sets require computationally efficient optimization methods. Remote sensing retrievals are, in fact, estimates of the underlying true state, and their optimization routines must necessarily make compromises in order to keep up with large data volumes. A sub-group of the Remote Sensing Working Group of the SAMSI Program on Mathematical and Statistical Methods for Climate and the Earth System is investigating how optimization in Bayesian-inspired retrievals and o_-line statistical methods could be made more computationally efficient. We will report on discussions held to-date and describe how progress in the theory of data systems research can positively impact optimization methodologies.
Micro-Scholarship, What it is, How can it help me.pdf
CLIM Program: Remote Sensing Workshop, Optimization Methods in Remote Sensing - Jessica Matthews, Feb 12, 2018
1. Click to edit Master title style
Click to edit Master subtitle style
1
Optimization methods in remote
sensing
Jessica Matthews
SAMSI-JPL Workshop
Remote Sensing, Uncertainty Quantification, and the Theory of Data Systems
February 12-14, 2018
2. 2
Remote Sensing Working Group
Remote Sensing Working Group
Spatial X Spatial Y
Optim-
ization
Theory of
Data
Systems
Emulators
Program on Mathematical and Statistical
Methods for Climate and the Earth System
3. 3
Optimization subgroup
• Felix Alcantara (CA State U)
• Hans Engler (Georgetown U)
• Yawen Guan (SAMSI)
• Jon Hobbs (JPL)
• Emily Kang (U of Cincinnati)
• Georgios Karagiannis (Durham U)
• Alex Konomi (U of Cincinnati)
• Pulong Ma (U of Cincinnati)
• Jessica Matthews (CICS-NC)
• Gavino Puggioni (U of RI)
• Christian Sampson (SAMSI)
• Zhengyuan Zhu (Iowa State)
4. 4
Definition of optimization
• An optimization problem consists of minimizing (or
maximizing) a real function by systematically
choosing input values from within an allowed set and
computing the value of the function.
• The artful and interesting pieces are in the design of
the cost function and the
choice of algorithm to
traverse the cost function
space.
www.mathworks.com
5. 5
Role of optimization in Remote
Sensing
• Retrievals are essentially inverse problem formulations
• Given radiance observations, through use of physical or
statistical models, derive geophysical information via
optimization
Satellite
Data
MODEL
Other inputs
Band 2
Band 4
Band 14
6. 6
Getting started…
• This sub-group was born out of discussions at
the Opening Workshop
• We spent much of the weekly fall meetings
with member presentations of their past work
and discussions to choose a project focus
7. 7
Hans Engler
• Described experiences at the Joint Center for Satellite
Data Assimilation (NASA/NOAA/Navy/AirForce)
• Retrievals from microwave domain (temperature,
water vapor, surface emissivity, …)
8. 8
Emily Kang
• Led discussion on optimization element of unmixing
problem of “Hyperspectral Remote Sensing Data
Analysis and Future Challenges” (Bioucas-Dias et al.,
2013)
• Hyperspectral unmixing = determining what
materials are present in the pixels directly from the
respective measured spectral vectors
9. 9
Christian Sampson
• Sea ice concentration (SIC)
retrievals based on passive
microwave data
• Melt ponds atop the surface of
sea ice during the Arctic summer
mimic the appearance of open water and result in
underestimating SIC
• As temperatures rise during the Arctic summer, water
content of surface snow increases, impacting
emissivity, resulting in overestimating SIC
• Potential project ideas to improve optimal
parameterization or to reformulate the retrieval itself
10. 10
Alex Konomi
• Parallel and Interacting Stochastic Approximation
Annealing algorithms for global optimization
• A method to quickly locate global minimum,
especially in situations where the cost function may
be complex
11. 11
Jessica Matthews
• Land surface albedo (physics-based retrieval)
• Atmospheric temperature and humidity
profiles (non-physics-based retrieval)
• Among the group members, I had fairly easy
access to data and code for several remotely
sensed products
12. 12
• High Resolution Infrared Radiometer Sounder
(HIRS)
• Aboard NOAA polar orbiting satellite series
• Swath width: 2160 km
• Spatial res: 20 km
Temperature and humidity profiles
NOAA-19 Satellite. Image
credit: www.ospo.noaa.gov
HIRS/3 instrument. Image
credit: NOAA, NASA
13. 13
• From 12 longwave HIRS infrared channels,
CO2 data, emissivity info:
– Derive temperature at 12 different altitudes
• Surface, 2m, 1000, 850, 700, 600, 500, 400, 300, 200,
100, 50 mb
– Derive humidity at 8 different altitudes
• 2m, 1000, 850, 700, 600, 500, 400, 300 mb
Temperature and humidity profiles
14. 14
Temperature and humidity profiles
Channel Center
Wavelength
(microm)
Principal
absorbing
constituent
Measurement
1
2
3
4
5
14.95
14.71
14.49
14.22
13.97
CO2 Temperature
sounding
6
7
13.64
13.35
CO2/H20 Temperature
sounding
8 11.11 Surface temperature
and Cloud detection
9 9.71 Total ozone Total ozone
10
11
12
12.47
7.33
6.52
H20 Water vapor
15. 15
Temperature and humidity profiles
• What is a neural network?
• The k-th layer has nodes:
Image credit:
codeproject.com
N1,k = f1,k (N1,k−1,..., Nnk−1,k−1)
N2,k = f2,k (N1,k−1,..., Nnk−1,k−1)
Nnk ,k
= fnk ,k
(N1,k−1,..., Nnk−1,k−1)
16. 16
Neural Network
• “Truth” comparison
– Radiative transfer model that simulates physics as
satellite would view it
• Items to optimize:
– How many layers?
– How many nodes per layer?
– Definitions of functions?
Image credit: NASA
17. 17
• The optimization (neural network training)
occurs offline before processing
• Training currently using Matlab built-in tools
– 14 options for transfer functions
– Choice of performance functions (e.g. sse, mse,
etc)
– Choice of training algorithms (e.g. levenberg-
marquardt, gradient descent, etc.; see nntrain for
options)
– Using BIC to decide on best network
Neural Network
18. 18
• Using co-located PATMOS-x CDR
– Cloud_fraction
– Cloud_probability
• Using 850 mb data
• Comparing to 2008-2012 COSMIC2013
• Evaluated “either” and “or” scenarios
• Minimizing: % pts excluded, std(hirsTemp-cosmicTemp),
1-corr(hirsTemp,cosmicTemp) to identify cloud_fraction
and cloud_probability thresholds
• Indicated with quality flags of: clear, partial cloudy, likely
cloudy, no cloud info available
Cloud-screening
19. 19
• Using co-located PATMOS-x CDR variables, use only clear-sky
HIRS data for bias correction
• Each pressure level done separately
• Each hemisphere done separately
• Each 10 degree bin done separately
• RS92: 1000-400mb; COSMIC2013: 300-50 mb
Temperature bias correction
22. 22
• Incorporating UQ
• Improving inter-satellite calibration
• Atmospheric profiles can differ significantly depending on surface
elevation (currently training separate networks for 2 different bins
of surface heights). Is there a more optimum way to handle?
• Cloud screening is important and is currently done by matching
with another cloud-based product. The thresholds for different
cloud parameters are optimized. Is there a better way to do this?
Can we incorporate spatial/temporal dependence of cloud-flagged
pixels?
• After initial training, there is another step to apply bias corrections
as based on radiosonde datasets. Currently a multiple regression
approach with bins based on latitudes and measurement values. Is
there a better way to include this data in the initial training?
Project ideas
25. 25
Optimization ⋂ ToDS
• Although not the focus of our selected
problem, it can be easily seen how
optimization approaches could be impacted
by the study of ToDS
X Y
C
min
%
&((, *, +)
26. 26
Optimization ⋂ ToDS
• Periodic intercommunication
– Independent calculations (assuming processing
capabilities on stores) min
%
&′(), +) and min
%
&′′(-, +)
– During optimization have some infrequent
communication to update with intermediary results to
achieve global minimum
• Improved sampling
– Devise method to find subsets x ⊂ X and y ⊂ Y where
x, y are somehow superior data points of the larger
sets X, Y
– Calculations completed in centralized env’t (C)
27. 27
Optimization ⋂ ToDS
• Informing storage partitions
– As a contributing design element, prior analysis of
data spatial and temporal characteristic
relationships may determine ideal partitioning for
later processing considerations.