10,00 Modelling and analysis of geophysical data using geostatistics and machine learning
Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)
Ähnlich wie 10,00 Modelling and analysis of geophysical data using geostatistics and machine learning Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)
Ähnlich wie 10,00 Modelling and analysis of geophysical data using geostatistics and machine learning Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.) (20)
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
10,00 Modelling and analysis of geophysical data using geostatistics and machine learning Vasily Demyanov – Heriot–Watt Institute, Edinburgh (U.K.)
1. UNCERTAINTY QUANTIFICATION OF
GEOSCIENCE PREDICTION MODELS
BASED ON SUPPORT VECTOR
REGRESSION
V. Demyanov1, A. Pozdnoukhov2, M. Kanevski3, M. Christie1
1
Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh, UK
vasily.demyanov@pet.hw.ac.uk
2
National Centre for Geocomputation, National University of Ireland, Maynooth.
3
Institute of Geomatics and Risk Analysis, University of Lausanne
2. Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model
– Case study
– Robustness to noise
– Predictions with uncertainty
• Conclusions
3. Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model
– Case study
– Robustness to noise
– Predictions with uncertainty
• Conclusions
5. Adaptive Stochastic Optimisation for UQ
Sampling
prior iteration
distribution
Evaluation:
Model 1
Model 2
Model New
Model 3 Ranking Reproduction
simulation population
…………
Mismatch
Model n
calculation
Ensemble of
Models
Sampling algorithms:
• Genetic algorithms
• Particle swarm optimisation
• Ant Colony optimisation
Inferred Ensemble of
• Neighbourhood Models for prediction Inference
approximation
6. Search for Matching Models Challenge
• FW simulation of multiple models generated
for different combinations of parameter values
is computationally expensive
• High-dimensional parameter space remains
fairly empty and poorly described despite
thousands of generated models
Number of parameters
Region of
computational
efficiency
100-10,000 FW runs
Number of points per axis
8. Challenges in Geomodelling
• Improve representation of the reality with
geologically realistic models based on identifiable
parameters.
• More effective use of information from
various sources by incorporating prior geological
and expert knowledge with associate uncertainty
• Uncertainty propagation from data into
the model without “freezing” assumptions and
predefined model dependencies.
9. Aims
Uncertainty quantification with a geomodel
which is able to improve geological realism
by more effective use of prior information
• Model petrophysical properties in a fluvial reservoir
using a robust machine learning approach –
semi-supervised Support Vector Regression (SVR)
• Reproduce realistic geological structures and inherent
uncertainty of the geomodel
• Integrate additional spatial data that are non-linearly
correlated with reservoir properties.
10. Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model
– Case study
– Robustness to noise
– Predictions with uncertainty
• Conclusions
11. Support Vector Regression (SVR)
• Linear regression in hyperspace L
w + C ∑ ξi
2
• Complexity control with training errors: min 1
2
w
i =1
SVR is formulated in terms of dot products of input data: (x ∙ x') → K (x , x')
where K(x,xi) is a symmetric and positively defined kernel function.
Kernel trick projects data into sufficiently high dimensional space:
L
f ( x) = wx + b f ( x) = ∑ yiα i K ( x,xi ) + b
i =1
support vectors
12. Semi-supervised Learning Concept
• Supervised learning with a tutor
– Learn from known input and output
(e.g. multi-layer perceptron neural network)
• Unsupervised learning without a tutor
– Learn from known inputs only, no outputs are
available (e.g. Kohonen classification maps)
• Semi-supervised learning
– Learn from a combination of data:
• Labelled with both known input and output
• Unlabelled with only input available (manifold)
13. Kernel Methods on Geo-manifolds
• Data-driven models incorporate prior knowledge on the domain
of the problem using graph models of natural manifolds
• Kernel function enforces continuity along the graph model –
manifold – obtained from the prior information
Spiral manifold Conventional regression Semi-supervised
represented by estimate based on regression estimation
unlabelled points (+) labelled data only (●) follows the smoothness
along the graph
14. Semi-supervised Approach
• Manifold assumption: data actually lie on the
low-dimensional manifold in the input space
• Geometry of the manifold can be estimated with
unlabelled data:
– incorporate natural similarities in data
– enforce smoothness on the manifold
• Manifold carries physical information and
incorporates prior physical knowledge
• Geo-manifold can reflect stochastic nature of the
inherent model uncertainty
15. Sources of Geo-manifold fro Reservoir Models
Geo-manifold for reservoir model can be elicited
from prior information:
– on-site spatial data (seismic, well logs)
– other relevant data
(outcrops, modern analogues, lab experiments)
– expert knowledge in a non-parametric form
– parametric geological models
(object shapes, process models)
– training image based models
16. Semi-supervised SVR Geomodel
Prior information
SVR
Learning
Seismic data
Machine
+ geo-manifold
unlabelled data
Stanford VI
synthetic
case study Semi-supervised (SVR)
• poro&perm
labelled data
from wells
17. Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model
– Case study
– Robustness to noise
– Predictions with uncertainty
• Conclusions
18. Case Study
Stanford VI: a realistic synthetic reservoir data set
• Fluvial clastic reservoir:
- sinuous channels
- meandering channels
- delta front
• Geomodel:
- multi-points statistics models
- sedimentation process model
• “Hard” poro/perm data from wells
•Synthetic seismic data:
- 6 attributes:
AI, EI, λ, μ, Sw, Poisson ratio
S. Castro, J. Caers and T. Mukerji
19. Variability in Facies Modelling
Multi-point simulation realisations
Training Image Hard well data Soft probabilistic data based on seismic
20. Case Study
2D layer slices from different geological section: porosity truth case
• sinuous channels
• delta front
SVR geomodel (tuneable or fixed parameters):
• Spatial correlation size
– Gaussian kernel width σ
• Continuity strength
– Impact of unlabelled data of the manifold
• Smoothness along the manifold
– Number of unlabelled points in the manifold
– Number of neighbours in kernel regression
• Prior belief level for seismic data
– Weight of additional seismic input (scaling parameter)
• Trade-off between goodness of fit and complexity
– Regularisation term C determines balance between training error and margin max
– Classification error
21. Stochastic Sampling for Matched Models
• 640 models generated in 8D parameter space
• 40 good fitting models with misfit < 250
Misfit minimisation: Generated models home in the regions of good fit: Misfit
channel porosity 170
180
200
channel permeability 220
250
shale porosity 300
500
1000
shale permeability
2000
5000
channel porosity channel permeability
22. Fitted Model: Property Distribution
Realistic reproduction of geological structures detected from the prior data:
– fluvial channels
– thin mud channel boundaries
– point-bars
porosity truth case
23. Fitted Model Forecast: Fluvial Channels case
Oil and water
production from
7 largest producing
wells:
● History data
(truth case + noise)
○ Validation truth
case forecast data
Matched model
24. Variability of Uncertain Model Properties
• Correlation
- kernel size σ σ σ
channel sands shale
• Smoothness along the
manifold - number of
unlabelled points N N N
channel sands shale
• Impact of additional
data (seismic) on the
predicted variables
scaling porosity scaling for permeability
• Seismic interpretation
uncertainty
Amplitude threshold for channel/shale boundary
25. Non-uniqueness of Semi-supervised SVR
Stochastic realisations, based on geo-manifolds generated with
different random seeds, represent inherent non-uniqueness of
the model with the given combination of the parameter values
Realisation 1 Realisation 2 Truth case
26. Impact of Noise in Seismic Data
Original seismic data with injected noise N(0,σ) ● unlabelled data
Semi-SVM porosity
Truth case porosity
Semi-SVM
porosity for
N(0,2σ) added
noise
27. Production: Stochastic Realisations
Realisations of a single
fitted model with unique
set of parameters
Oil production profiles for
10 stochastic realisations
for 6 wells:
● History data
(truth case + noise)
○ Validation truth
case forecast data
Oil production
profiles for semi-SVR
model realisations
28. Multiple matching models vs Truth case porosity
Multiple good fitting φ models Truth case φ
The river delta front structure is very similar for different models due to the very
clean synthetic seismic with no noise.
29. Fitted Model Forecast: Delta Front case
Oil and water
production from
7 largest producing
wells:
● History data
(truth case + noise)
Fitted model
Truth case
30. Fitted Model Forecast: Delta Front case
Oil production from
7 largest producing
wells:
● History data
(truth case + noise)
Fitted model
Truth case
31. Forecast with Uncertainty
Confidence P10/P90 interval for
production forecast based on
multiple models:
Total oil and water production
profiles:
● History data
(truth case + noise)
○ Validation truth
case forecast data
P10/P90 production
forecast confidence bounds
32. Uncertainty of Model Parameters
Posterior
probability
distribution of the
geomodel
parameters:
• Kernel width
– correlation –
for poro & perm
in sand or shale
• Continuity in sand
and shale bodies
– by N unlab
• Impact of seismic
data to poro & perm
– weight
33. Outline
• Geoscience modelling under uncertainty
• Machine learning based geomodels
• Semi-supervised SVR reservoir model
– Case study
– Robustness to noise
– Predictions with uncertainty
• Conclusions
34. Conclusions
• A novel learning based model of petroleum reservoir based on
capturing complex dependencies from data.
• Semi-supervised SVR geomodel takes into account natural similarities in space
and data relations:
– Reproduction of geological structures and anisotropy of a fluvial systems in a
realistic way based on prior information on geo-manifold represented by
unlabelled data
– Robustness to noise and flexible control of signal/noise levels in data to detect
geologically interpretable information
– Stochastic non-uniqueness inherent to the model is represented by the
distribution of unlabelled data
• Multiple fitted models match both production history and the
validation data in the forecast
• Uncertainty of the SVR model is quantified by inference of the multiple
generated models, which provide uncertainty forecast envelope based on
posterior probability
35. Further work
• Extension to 3D case by adding one more input to the SVR model
• Integrate other relevant data from outcrops and lab experiments
• Apply SVR modelling approach with Bayesian UQ framework to
application in different fields: environmental and climate modelling,
epidemiolgy, etc.
• 2 PhD positions in the Uncertainty Quantification project:
– Geologist, data integration
– Uncertainty modelling with machine learning
Apply to vasily.demyanov@pet.hw.ac.uk
36. Acknowledgments
• J. Caers and S. Castro of Stanford University for providing Stanford
VI case study
• UK EPSRC grant (GR/T24838/01)
• Swiss National Science Foundation for funding “GeoKernels: kernel-
based methods for geo- and environmental sciences”
• Sponsors of Heriot-Watt Uncertainty Quantification project:
37. Research Summary
• Developed a novel model for petroleum reservoir based on capturing
complex dependencies from data with learning methods.
• Novel model provide multiple HM model for different fluvial reservoirs:
sinuous channels, delta front
– both production history and the validation data in the forecast are matched
• Benefits of the novel data driven geomodelling approach:
– Reproduce realistic geological structure and anisotropy of property
distribution.
– Robust to noise in prior data
– Relate to identifiable properties: continuity, correlation, prior belief in data,
etc.
• Model uncertainty is described by the inference of multiple models
– Posterior confidence interval describe uncertainty forecast
– Uncertainty of the model parameters is quantified by posterior probability
distributions
38. Multiple good fitting φ models
Labelled (●) & unlabelled (+) data Seismic data
Prior information
Learning
Machine
(SVR)
39. Next Steps
• Production uncertainty forecasting based on the inference of the
generated HM models.
• Extension to 3D case by adding one more input to the SVR model
• Integrate other relevant data from outcrops and lab experiments
40. Aims
Uncertainty quantification with a geomodel
which is able to improve geological realism
by more effective use of prior information
• Explore robustness of semi-supervised SVR
geomodel to noisy data
• Develop a way to reproduce inherent uncertainty
of the semi-supervised SVR geomodel by
stochastic realisations
• Integrate semi-supervised SVR geomodel into the
Bayesian uncertainty quantification framework
41. Content
• Motivation and Aims
• Semi-supervised learning concept
– Support Vector Machine (SVM) recap
• Machine learning based geomodel
– Noise pollution experiment
– Inherent non-uniqueness of SVR-based model
– SVR geomodel in Bayesian sampling framework
• Conclusions
42. Impact of Noise in Seismic Data
In a real case additional data (seismic) are usually noisier
than in our synthetic case
Seismic is processed through a low pass filter to build a
manifold of unlabelled points:
Elastic impedance Filtering low frequency Channel geo-manifold
component from seismic defined by unlabelled points
43. Seismic Data Polluted with Noise
Gaussian noise with zero mean and 3 different std.dev σ is added.
N(0, σ) N(0, 2σ) N(0, 3σ)
Truth case
44. Filtering
Only a low frequency component is left after filtering
N(0, σ) N(0, 2σ) N(0, 3σ)
Truth case
46. Porosity SVR Estimates for Noisy Data
Noise level: 1 σ Noise level: 2 σ Noise level: 3 σ
Geo-manifold becomes less concentrative
and the channel “erodes” with increase of the
noise level
Truth case
47. Prediction with a Large Noise Level
Noise level: 3σ
Even with large noise levels the channel
continuity can be traced in SVR prediction
although it is barely visible in the input data
Truth case
48. Impact of Inherent Non-uniqueness
Stochastic realisations
of water production
from 6 largest
producing wells
49. NA Sampling: Misfit Distribution
Misfit of models
generated by NA
Lowest misfit = 188
50. NA Sampling: Parameter Distributions
Histogram of
parameter values
for the generated
models
Models generated
by NA home in the
regions of good fit
51. Support Vector Machine (SVM)
Linear separation problem
1 αi = 0 Normal Samples
+ b=
wx
1
0 < αi < C Support Vectors (SV)
αi = C Support Vectors
untypical or noisy
L
w + C ∑ ξi
2
Soft margin: min 1
2
w
i =1
ξ ξi ≥ 0 slack variables to allow
1
=- noisy samples & outliers
+b
x2 to lie inside or on the
w
outer side of the margin
Trade-off between: margin maximisation & training error minimisation
Increase space dimension to solve separation problem linearly