Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhenzhen Zhong
1. DEEP LEARNING APPROACH IN
CHARACTERIZING SALT BODY ON
SEISMIC IMAGES
LICHENG ZHANG, MENG ZHANG, ZHENZHEN ZHONG*,TIANXIA ZHAO,YUE WU,
VARUN TYAGI, JIA WEI, CHENG ZHAN
3. DOMAIN KNOWLEDGE
The salt body characterization is crucial for reservoirs exploration, while hands-on horizon picking is a
very time consuming task.With the growing size of seismic volumes and computing powers, a super
efficient auto salt segmentation tool is highly desirable.
Sound waves are bounced off underground rock
formations and the waves that reflect back to the
surface are captured by recording sensors
Salt boundary interpretation is important
for understanding of geo-structures and
very critical for hydrocarbon exploration
4. EVOLUTION OF SALT BODY HORIZON PICKING
References:
https://www.geoteric.com/blog/geoteric-2017.1-release-redefine-your-seismic-interpretation-with-adaptive-horizons-0
https://saytosid.github.io/segnet/
auto tracking
Conventional hand-engineered
horizon picking
Auto Seed Tracking
Based on correlation
When computer vision
Meets seismic images
5. DATA AT A GLIMPSE
The world’s leading geoscience company TGS released a collection of 22,000 subsurface images on
Kaggle platform. In this work, we developed a framework, which can automatically and accurately
identifies if a subsurface target is salt or not.
Dataset
• Images chosen at various locations
• 4000 training images, 18000 blind test images
• 101 x 101 pixels , each pixel of training images is labeled as either salt or sediment
• depth information is provided for each image
Image Label /
Salt Mask
https://www.kaggle.com/c/tgs-salt-identification-challenge
6. CONVOLUTION NEURAL NETWORK
CNN architecture allows the network to concentrate on low-level features in the first hidden layer, then
assemble them into higher-level features in the next hidden layer, and so on
The goal of pooling layer is to subsample (i.e., shrink) the input image in
order to reduce the computational load, the memory usage, and the
number of parameters, thereby limiting the risk of overfitting.
7. METRIC – INTERSECTION OVER UNION (IOU)
The predicted bounding box is drawn
in red while the ground-truth bounding
box is drawn in green.
Computing Intersection over Union can
therefore be determined by the ratio of
area of overlap over the area of union.
8. DATA AUGMENTATION
Data augmentation consists of generating new
training instances from existing ones, artificially
boosting the size of the training set.This will
reduce overfitting.
It is often preferable to generate training
instances on the fly during training rather than
wasting storage space and network bandwidth.
TensorFlow offers several image manipulation
operations such as transposing (shifting),
rotating, resizing, flipping, and cropping, as well
as adjusting the brightness, contrast, saturation,
and hue
--Hands on Machine Learning with
Scikit-learn and Tensorflow
9. EXAMPLES OF AUGMENTED IMAGES FOR
TRAINING
Original Overlay
With Salt Masks
Horizontal
Flip
Shear Rotation
Horizontal Flip
And
Shear Rotation
10. PREPROCESSING
Equalized histogram, Gamma filter, Edge, Frangi, Laplacian filters etc. are tested to enhance image
contrast, identify discontinuity, sharpening
Original Image
Overlay with Salt Masks
Image Normalization Brightness Correction Edge Detection Vessel Shape Detection Binary Image
11. MODEL ARCHITECTURE
convolution
transposed
convolution
The contracting branch
implements a standard
convolutional
architecture with
alternating convolution
and pooling operations
and progressively down
sampled feature maps.
Every step in the
expansive path performs
up sampling of the
current feature map
followed by a
convolution, thus
gradually increasing the
resolution of the output.
The expansive branch combines
them with high-resolution
features from the contracting
branch via inter-connections
2- dimensional
softmax assigns
each pixel
probability to
belong to each of
the classes.
https://arxiv.org/pdf/1505.04597
.pdf
12. RESNET AS UNET BACKBONE
Identity Block: input shape = output shape Conv Block: input shape != output shape
Pretrained Resnet, used at UNET both contracting side and expansive side
https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035
F(x)
Layers learn residual between
output and input : F(x) = H(x) - x
13. LOSS FUNCTION
• Binary cross entropy(BCE) log loss is used for training.
• Dice Loss
• Lovasz loss, directly related to IOU is used to fine tune the training process
Reference:The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
https://arxiv.org/abs/1705.08790
Increases as the predicted probability
diverges from the actual label. So
predicting a probability of .012 when
the actual observation label is 1 would
be bad and result in a high loss value.
As the predicted probability
approaches 1, log loss slowly
decreases. A perfect model would
have a log loss of 0.
14. SQUEEZE AND EXCITATION BLOCKS
The network weights each of its channels equally when
creating the output feature maps. SE blocks are all about
changing this by adding a content aware mechanism to
weight each channel adaptively, meaning adding a
single parameter to each channel and giving it a linear
scalar how relevant each one is.:
1. Squeeze each channel to a single numeric value
using average polling
2. Adding a fully connected layer followed by a ReLU
3. A second fully connected layer followed by a
Sigmoid activation gives each channel a smooth
gating function.
4. weight each feature map of the convolutional block
based on the result of our side network.
https://towardsdatascience.com/squeeze-and-excitation-
networks-9ef5e71eacd7
15. HYPERCOLUMN
Reference: Hypercolumns for Object Segmentation and Fine-grained Localization
https://arxiv.org/abs/1411.5752
The bottom image is the
input, and above it are the
feature maps of different
layers in the CNN.
The hypercolumn at a pixel
is the vector of activations of
all units that lie above that
pixel
17. CYCLIC LEARNING RATE
• Why use cyclical learning rate? Gradually decayed or step decay could be stuck at local minimum
The model converges to a minimum at the end of training
with typical learning rate schedule vs
the model undergoes several learning rate cycles,
converging to and escaping from multiple local minima.
19. JIGSAW PUZZLE
• We hypothesized that the training and testing images are
smaller blocks of a big seismic image
• Jigsaw puzzle to connect images together (22000 images
overlay with their masks), to get images of larger patches
• 1. - Define dissimilarity between 2 images over an
edge as the distance between the pixels in the
common edges
• 2.- For each image find more promising candidates as
neighbors .This is done using k-NN
• 3.- Order the candidates by similarity scores and
filter the lower score images
• 4.- Combine the images in vertical and horizontal
sequence
21. IOU SCORES ON THE TEST DATA
1 fold 2 folds 5 folds
0.828 0.839 0.846
Since the amount of train data is too small, the bias of one single fold is large,
ensemble between different folds can cancel the bias.
Original resnet34 Modified resnet34 Modified resnet34 +
hypercolumn + SE blocks
0.810 0.818 0.828
Modified resnet34 is simply making the stride=1 instead of 2, so the bottleneck size would
be 8x8 instead of 4x4, the idea is that increasing the resolution might help with the result.
1 fold with random
noise added
0.834
22. CONCLUSION
• We integrated Resnet and Unet to solve images segmentations tasks.This
underlying architecture is dedicated to restoring pixel location
information before output segmentation map.With help of computer
vision, seismic image interpreters may start looking forward to spending
time doing less tedious things than picking complex salt bodies.
• Lessons learned
• Data augmentation
• Model architecture (test and learn)
• Cross Validation and Ensemble
• QC Approach
23. FUTURE WORK
• Semi supervised deep learning
• Model architecture
• Ensemble approach
• Domain knowledge leverage
24. SEMI SUPERVISED DEEP LEARNING
• Take the same model that you used with
your training set and that gave you good
results.
• Use it now with your unlabeled test set to
predict the outputs ( or pseudo-labels).We
don’t know if these predictions are correct,
but we do now have quite accurate labels
and that’s what we aim in this step.
• Concatenate the training labels with the test
set pseudo labels.
• Concatenate the features of the training set
with the features of the test set.
• Finally, train the model in the same way you
did before with the training set.
https://www.analyticsvidhya.com/blog/2017/09/pseudo-
labelling-semi-supervised-learning-technique/
25. ACKNOWLEDGEMENT
• Thanks all the Team Members for great effort and
excellent team work
• ThanksYan XU for organizing the workshop
• Thanks Kaggle Platform for featuring this competition and
GCP for free GPU credit