KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
Video Forgery Detection: Literature review
1. An overview of modern video forgery detection
A Literature Review
(Kumara M.P.T.R., Fernando W.M.S., Perera J.M.C.U., Philips C.H.C., Jayawardane A.L.H.S)
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka
1
2. Table of Contents
1. Introduction........................................................….....................................3
2. Literature survey........................................................................................4
2.1 Double MPEG compression................................................................4
2.1.1 Intra Coding principle.....................................................................5
2.1.2 Inter Coding Principle....................................................................5
2.1.3 Detecting static artifacts in a double compressed MPEG video......7
2.1.4 Detecting temporal artifacts in a double compressed MPEG video 8
2.1.4.1. Discrete Cosine Transform and Support Vector Machine......8
2.1.4.2 Double Compression Detection Algorithm.............................9
2.1.4.2.1 First digit distribution in MPEG video.............................9
2.2 Detecting Duplication..........................................................................13
2.2.1 Detecting duplicated frames..........................................................13
2.2.2 Detecting duplicated regions across frames ..................................14
2.3 Extending image forgery detection techniques for videos...................15
2.3.1 An approach for JPEG resize and image splicing detection...........15
2.3.2 JPEG compression analysis and algorithm for forgery detection...16
2.3.3 Method based on directional filter using JPEG image analysis.....18
2.4 Combining artifact in screen shots......................................................19
2.5 Multimodal feature fusion...................................................................23
2.6 Detecting Video Forgery by Ghost Shadow Artifact...........................26
2.6.1 Video Inpainting............................................................................26
2.6.2 Ghost Shadow Artifact..................................................................26
2.6.3 Detecting Video Forgery...............................................................27
3. Conclusion...............................................................................................28
4. References................................................................................................29
2
3. 1. Introduction
Video data has become more popular with the advancement of digital cameras and networking
technologies with high speed bandwidths. As a result many systems make use of video data and
rely on the accuracy of such data. On the other hand, an inevitable adverse effect of this critical
nature of video data is video forgery.
There are many software available all over the internet that facilitates video editing. With these
resources, video editing has become increasingly easier and even novices can make an edited
video stream within minutes. This can introduce many security concerns. So detecting video
forgery has become a critical requirement to ensure integrity of video data.
Basically there are two major techniques for protecting video data against tampering, active and
passive methods. Earlier, active video protection methods like digital signatures and
watermarking techniques are used to maintain integrity and authenticate video data. But the
problem with these methods is that the situations where we can apply these techniques are
greatly limited and they reduce the video quality and require specialized hardware[1]. Therefore
more useful and improved passive techniques are used to detect forgeries in video data without
relying on the previously embedded information such as digital signatures. In these passive video
protection methods, statistical details in the video data are used to recognize the validity of data.
So the entire objective of video forgery detection is to ensure that video data have not been
changed after the recording time.
The ideas of video forgery detection has evolved a lot and in this literature survey, these methods
will be assessed based on their applicability and how much each method fit into a given
situation.
3
4. 2. Literature survey
As described in the introduction, various video forgery methods exploit a wide range of
tampering techniques. Some of the most common techniques are,
● Inpainting - Image inpainting, Video inpainting, Motion inpainting (Inserting or
removing frames from the actual scene)
● Object computing (Motion estimation motion compensation and motion tracking)
● Frame and region duplication
Video forgery refers to the finalizing techniques after doing inpainting and object computing for
integrating them to create a forged video with high quality. Methods have been developed to
identify this video tampering. These techniques will be discussed throughout this survey.
2.1 Double MPEG compression
Digital videos are usually encoded with MPEG-X or H.26X coding standard. MPEG video are
organized into hierarchy of layers. The layer hierarchy is as follows.
Figure 1. The hierarchy of MPEG video
The algorithm and corresponding experiments in this paper focus on MPEG-2 Constant Bit Rate
(CBR) video, but they are backward compatible to MPEG-1. There are two principles to be
discussed.
1. Intra coding principle
2. Inter coding principle
4
5. 2.1.1 Intra Coding principle
When an image is compressed without reference to any other images, the time axis naturally
does not come into play and therefore, this type of compression is referred to as “intra-coded”
compression. [16]
Intra coding includes three technologies.
1. DCT
2. Quantization
3. Entropy Coding (Variable length coding)
DCT, which is introduced above is used to achieve energy concentration and signal data de-
correlation. Including the Direct Current (DC) coefficient, the low frequency DCT coefficients,
are significant than high frequency DCT coefficients, which are almost zero. On account of the
energy concentration in DCT, higher frequency DCT coefficients will be more coarsely
quantized. The corresponding quantization matrix value in the matrix are pre-scaled by
multiplying with quantize scale, which take effects on a macroblock basis. Quantizer scale is an
encoder parameter in Variant Bit Rate (VBR) videos and an output bit rate controller in CBR
which change adaptively. Each DCT coefficient is divided by the respective quantization matrix
value. Entropy coding is applied to Quantized DCT coefficients.
2.1.2 Inter Coding Principle
Double MPEG compression is a widely used method for video forgery detection. Here double
compression means encoding the original video data twice, first by the recorder itself and later
by the tamperer after introducing changes to the video stream. This double compression
introduce unique static and temporal statistical perturbations. These changes can be used as
evidence to video tampering.
Motion compensation is an algorithmic technique employed in the encoding of video data for
video compression including MPEG-x compression. Motion compensation describes the relative
transformation of picture to current state from a reference picture.
The reference picture may be either previous or future. Compression efficiency can be improved
if the images are synthesized from previously transmitted or stored images.
For many frames of a movie, the only difference between one frame and another is the result of
either the camera moving or an object in the frame moving. This interprets that, much of the
information that represents one frame will be the same as the information used in the next frame
in reference to a video file. Motion Compensation exploits this fact.
Inter coding is about applying motion compensation to exploit temporal redundancy. To identify
redundancy spatial search is performed in a search range. If a relatively good match can be
found, this macroblock will go through inter coding, else it will perform intra coding.
After predicting frames using motion compensation, the coder finds the error (residual) which is
then compressed and transmitted. The predicted macroblock is subtracted from real macroblock.
This leaves a less complicated residual error.
5
6. Similarly as the intra coding residual error is spatially coded. The difference is that the
quantization matrix is a flat matrix with a constant value. Most of the quantized DCT coefficients
are zero because residual error has less information as described above.
Figure 2: Schematic view of the double MPEG compression [1]
MPEG compression process can be described as follows. Once we produce an MPEG video
using an original video stream, a coding sequence of three different frame types are formed[2]
which normally follows an iterative pattern.
● Intra (I) frames - These are typically the highest quality frames and consequently least
compression happens in I frames. In this type of frames, compression is achieved by
removing spatial redundancies within a single frame and no other reference is made to the
neighboring frames. [2]
● Predictive (P) frames - These frames are generated as a result of removing temporal
redundancies across the frames and predicting the motion of the current frame with
respect to the preceding frames. [2]
● Bidirectionally predictive (B) frames - Like P frames, B frames are encoded with motion
compression. But unlike P frames, B frames use past and future I,P frames (preceding
and succeeding frames) for motion detection. So B frames will have the highest
compression level. [2][16]
As mentioned earlier, these I,P and B frames are packed in an iterative pattern where we call the
iterative pattern a Group Of Pictures(GOP). A GOP starts with an I frame and ends with the next
I frame. This defines the GOP length N. Within the GOP, the gap between two P frames is
denoted by M. Consider the following frame sequence as an example with N=12, M=3.
I1 B1 B2 P4 B5 B6 P7 B8 B9 P10 B11 B12 I13 B14 …
So if someone tampers this video and save again in MPEG format, this pattern is distorted and
hence can be recognized. This frame distortion adds both spatial and temporal effects to the
resulting video stream.
6
7. Figure 2 describes how double MPEG compression introduces evidences of tampering into the
video stream.
Figure 3: How the original GOP changes after double MPEG compression [2]
The topmost frame sequence is the original MPEG video stream. Now assume that shaded three
frames are deleted from the video. Second line of frames shows the resulting frame sequence.
The following section describes how these static and temporal artifacts are detected in a double
compressed MPEG video.
2.1.3 Detecting static artifacts in a double compressed MPEG video
As mentioned earlier, I frames are totally based on static compression. That is, the amount of
compression in I frames does not depend on neighboring frames. So the static artifacts of double
compression are introduced by the double compression of I frames. Further details about this
process and double quantization is described in section 2.3.
7
8. 2.1.4 Detecting temporal artifacts in a double compressed MPEG video
P and B frames are the ones responsible for introducing temporal artifacts to a double
compressed MPEG video. This is because the amount of compression in P and B frames depend
on the neighboring frames. So when a sequence of frames are altered, removed or added to the
original frames sequence, the existing P and B frames also get altered as a result of this
accumulate compression procedure. At this point it is noteworthy that the effect of temporal
dependency on neighboring frames only prevails within a single GOP because every GOP is
surrounded by two I frames whose compression is absolute. Once a sequence of frames are
altered and re-encoded, frames from different GOPs will be grouped together in the same GOP.
In the original video sequence, P frames are strongly correlated with the respective I frames in
their GOP. But once the P frames shift to other GOPs as a result of frame removal/addition, this
correlation becomes weaker increasing the motion error [2]. Once the motion errors are large
enough, they can be identified by performing a Fourier domain analysis on the video stream.
The advantage of this technique is that it detects both static and temporal errors in the video
stream. Unlike the active video protection techniques, this method employs statistical analysis on
video data. So it does not introduce any quality degradation.
On the other hand there are certain limitations in this method. This method fails to detect a
tampered video stream if the number of removed frames is a multiple of GOP length because it
will not add a net change to the video stream when an entire GOP is removed. Not only that, this
methodology is applicable for the videos that are encoded by MPEG compression scheme. But
the effect of this issue is not critical because MPEG is the most widely used video encoding
method. With that being said, one can find a number of ways to countermeasure that can hide
video tampering. Newer methods are proposed to overcome the limitations in this method. One
such method is exploiting markov statistics to detect double MPEG compression that has proven
to provide average detection accuracy over 90% [3]. 0
Another method of detecting double MPEG compression is described below.
2.1.4.1. Discrete Cosine Transform and Support Vector Machine
Discrete Cosine Transform (DCT)
It expresses a finite sequence of data points, represented as a sum of cosine functions which
have different frequencies. (Like in Cosine Fourier Series. ) Use of cosine series instead of sine
series is critical in these applications.
Double compression disturb DCT coefficients, which reflects the violation of the parametric
logarithmic law for first digit distribution of quantized Alternating Current coefficients.
8
9. Support Vector Machine (SVM)
A SVM is a computer algorithm that learns by example to assign labels to objects. As an
example it can learn to detect fraudulent bank cheques by looking at hundreds or thousands of
fraudulent cheques and in-fraudulent cheques.
2.1.4.2 Double Compression Detection Algorithm
2.1.4.2.1 First digit distribution in MPEG video
The paper [15 ]says that it is reasonable to deduce that the first digit distribution in MPEG has
the same characteristic as in JPEG. Since it has been proved that the first digit distribution can be
utilized well in JPEG double compression detection same is assumed for MPEG.
This is a reasonable assumption and can be proved with an example.
Parameter Logarithmic Law
y = corresponding value N, q = [0.1, 3]
x = first digits s = [-1, 1]
Figure 4.Fitting results for original MPEG video
(a) Intra (I) frame. (b) Non-intra (P) frame.[15]
9
10. Figure 5 . Fitting results for doubly compressed MPEG video [15]
(a) Target bit rate is larger than original bit rate. (b) Target bit rate
is smaller than original bit rate.
Figure 6. Fitting results in log-log scale
Solid, dashed and dash dotted line stand for original video, doubly compressed video with larger bit rate
and smaller bit rate.[15]
First Digit Distribution of the original MPEG video and the non-intra frame are considered in
figure 1. They follow the parametric logarithmic law as shown. But first digit distribution in
non-intra frame differs. AC coefficients with first digit 1 has a bigger proportion, but first digit
7, 8 and 9 are smaller. So the non-intra frames may not follow the logarithmic law as well as
intra frames. This is because of the inter coded macroblocks in P and B frames.
The fitting results of doubly compressed MPEG video is considered in figure 2. The violation is
so high for the case where target bit rate to doubly compress is larger than original bit rate. That
can be seen clearly by naked eye. It is not that obvious when the rate is smaller than original bit
10
11. rate. There first distribution has the tendency to follow the algorithm. The results of that situation
are re-plotted using log-log scale in figure 3. There it is clearly shown that doubly compressed
MPEG video drift away from the first digit distribution of original MPEG video. [15]
From this point onwards the algorithm followed is as follows.
1) For both query and training video, the first digit distribution of quantized AC coefficients is
extracted.
2) Test the first digit distribution with parametric algorithmic law. Three goodness-to-fit
statistics are calculated, including squares due to error (SSE), root mean squared error (RMSE)
and R-square. SSE and RMSE closer to zero, R-square closer to one means a good fit.
3) Combine the first digit probabilities and goodness-to-fit statistics to compose a 12-D feature.
Only I frames are taken into consideration because the fitting results for intraframes are better
than that for non-intra frames.
4) Each GOP with a 12-D feature is treated as a detection unit, so the SVM classifier will judge
on a GOP basis. The GOP proportion D is defined as D= M/N
According to [16], there are three categories of current video and image forensic approaches.
● Source identification or camera sensor fingerprinting.
● Image and video tampering detection.
● Image and video hidden content detection and recovery.
This falls into image and video tampering detection category.
The methods that are used in this category are detection of cloned region, analysis of feature.
variations (using the original and tampered image/video), inconsistencies in features, video
splicing and cloning detection, inconsistencies regarding the acquisition process or even
structural inconsistencies present in the targeted attacks, lighting inconsistencies and variation in
image features etc. [16]
The implemented method in [15] falls into video splicing and cloning detection.
According to [16] there are two ways to implement video splicing and cloning for MPEG-2.
● Decoding the MPEG-2 encoded file using a video editor and using the editor to edit the
frames and then re-encode the modified video as a new MPEG-2 encoded file.
● Directly editing the information stored in the MPEG-2 encoded bit stream, i.e., .mpeg
file, without going through an intermediate video editor.
Described method in [15] flows into the first category which is known as double compression
and discussed above. The second method is seldom used because the attacker is unable to
visualize how the video frames looks like since the MPEG -2 file is in an encoded and
compressed form.
The benefits of using these methods rather than other methods are,
11
12. ● Do not have to analyze about the hardware devices like cameras, which is used to record
the video, voice recorders etc
● Tracking inconsistencies like light inconsistencies, image inconsistencies, voice
inconsistencies may spend a lot of resources and effort and may need expertise
experience in the field
● The method used to tamper the video by double compression is the most frequently used
one. So using this method more forgeries can be detected
● Currently a lot of researches are done and going on double compression detection
techniques
● The method can be improved furthermore using that knowledge. As an example analysis
of the histograms of the video, which are featured to detect double compression will help
the method describe
● Human interaction is not a must
The important point to be considered is that this is not the only method. So if no double
compression is detected at all, anyone can’t assure that the video is not forged. So it is requested
to do other analysis for other type of forgeries and prove that those are not used to forge the
video without double compression.
Use of first digit distribution is a good solid practice, but may give fallacious interpretations. As
an example when the target bit rate is smaller than original bit rate, as shown in figure 2, (b), an
inexperienced analyzer may think that it is not forged. It is encouraged to use other methods
rather than using first digit distribution alone.
As a suggestion, using Fast Fourier Transform (FFT) of the mean motion errors of P-frames for
a doubly compressed video will show spikes when the frames are deleted and added. But may
not accurate at times. This is more general technique than first digit distribution. The reason for
this change in motion error is that the P-frames within a single GOP are correlated to its initial I
-frame. These compression artifacts propagate through the P-frames, due to the motion
compensation encoding. The motion errors of each P-frame will be correlated to its neighboring
P-frame or I–frame as a result. [16] But again there are limitations in this method also. That is
the “spikes” will be visible as long as the deletion (or addition) of P-frames is not a multiple of
the GOP length. So this technique will not work if the deleted or added P-frames are exactly a
multiple of the GOP length. So a forensic investigator can’t be satisfied by using only this
technique also. As an example if an entire GOP is deleted it is not shown by the FFT graph.
According to [16] there are encoders which can adaptively change GOP length while encoding.
Other limitation is this need human interaction to see spikes.
Another method to be considered is counter-forensics algorithm in which the algorithm first
constructs a target P-frame motion error sequence that is free from the temporal fingerprints and
then selectively alters the video’s predicted frames so that the motion error sequence from the
tampered video matches the targeted one. This is done by setting the P-frame’s motion vectors
12
13. for certain macroblocks to zero and then recalculating the associated motion error values for the
affected macroblocks.[16]
The rate of detection of tampered videos by a forensics investigator depends on following two
things, (whether applied with counter-forensics technique or not)
● Carefulness the attacker is when crafting a tampered video
● Solidness of the threshold that a forensics investigator would set in detecting tampered
videos
Therefore it is recommended for a forensic investigator to try all the methods and techniques that
is possible to detect video forgeries. No technique is 100% assured and its not a good practice to
use one technique alone.
2.2 Detecting Duplication
One very common form of altering video data is by duplication. Duplication can be used to
remove people, objects and undesired events from a video sequence. Duplication is relatively
easy and when done with care hard to detect. Over the years there had been some methods
developed to detect duplication. But the main problem associated with these methods is the
computational cost. The reason for the computational cost is that a video of a modest length can
generate thousands of frames [8]. Therefore it is important to develop methods for detecting
duplication which are computationally efficient.
Detecting Duplication can be divided into two parts
● Detecting duplicated frames
● Detecting duplicated regions across frames
2.2.1 Detecting duplicated frames
The basic approach is to first divide the full length video sequence into short overlapping
subsequences. Then the temporal and spatial correlations in each subsequence is computed and
compared throughout the video. Similarities in these correlation values are used to detect
duplication. When we represent the images in vector form, correlation coefficient between two
vectors u and v is given by the following equation.
where and are the element of and and and are the respective means of vectors
and [8].
13
14. For a video subsequence with n frames, it can be denoted by,
Where denotes the sequence starting time.
Then the temporal correlation matrix can be defined as the symmetric matrix whose
element is the correlation coefficient between the and frame of the subsequence.
Similarly the spatial correlation matrix can be computed by tiling each frame
with m non overlapping blocks. In this matrix element gives the correlation coefficient
between the and blocks.
In the first stage of the detecting process temporal correlation matrix of all overlapping
subsequences are computed. Then the correlation coefficient between pairs of these matrices is
computed. A threshold value close to 1 is considered as an indicator of possible duplication. In
the next stage spatial correlation matrices for those candidate subsequences are computed and
compared. Again a specified threshold value close to 1 used as an indicator that exhibit the
duplication.
2.2.2 Detecting duplicated regions across frames
Two cases of region duplication can be considered.
● Stationary Camera
● Moving Camera
In the case of stationary camera, normalized cross power spectrum is first defined as,
Where and are the Fourier transforms of the two frames, * denotes
the complex conjugate and ||.|| is the complex magnitude.
In this case, at the origin (0, 0) significant peak is expected compared to other positions. Peaks
at other positions can be used as indicators for duplications.
In the case of moving camera, rough measure of the motion of camera is computed to determine
that the field of view between two frames is sufficiently different to ensure that they do not
contain any overlap. Camera motion can be approximated as a global translation [8].
14
15. A one main advantage associated with the described methods is the computational efficiency. In
addition to low computational cost, these methods have the capability of detecting duplications
in both high and low quality compressed videos. One disadvantage associated with detecting
frame duplication is that a stationary surveillance camera generally records a static scene and
hence likely to generate large number of duplicated frames and when used with this method
generates a value close to the threshold. Therefore this method cannot differentiate static frames
from duplicated frames. The problem with detecting region duplication is that this method is not
designed to be robust to geometric transformations [9].
2.3 Extending image forgery detection techniques for videos
Another famous approach video forgery detection is to treat a video as a sequence of stationary
images. So when detecting the forgeries in videos, image forgery techniques can be used to
detect video forgeries.
2.3.1 An approach for JPEG resize and image splicing detection
We can detect forge images using correlation of neighbouring discrete cosine transform (DCT)
coefficients. By using these DCT we can detect resizing of images and image splicing.
According to the studies it has shown that this method is highly effective in detecting JPEG
image resizing and image splicing. One drawback in this method is the performance is depend on
the image complexity and the resize scaling factor.
Studies have shown that adaptively varying two parameters of the generalized Gaussian
Distribution[4] can achieve a good probability distribution (PDF). In this method the probability
distribution function they have used is
Where
alpha models the width of the PDF peak and shape parameter where beta models the shape of the
distribution.[13]
To study the dependency between the compressed DCT coefficients and their neighbors, we
study the neighboring joint distribution of the DCT coefficients and a multivariate GGD model
may achieve a good approximation of the probability density function as described below.
15
16. where the indicates a normalized constant defined by , and is the covariance matrix
and is the expectation vector. [13]
Regarding the neighboring joint probability density, If the left adjacent DCT coefficient denoted
by the random vector x1 and the right DCT coefficient is denoted by the x2, we can define x
=(x1,x2). if the different part of the JPEG image found with different resize history and then the
image may be a tampered one.
The recent studies of JPEG steganalysis shows that the most information hiding techniques
modify the neighboring joint density of DCT coefficients[7]. Based on those studies neighboring
DCT coefficient method assumes that when a JPEG image resized or spliced the DCT
neighboring joint probability density functions will be affected. In this detection method
neighboring joint density probabilities used as the detection parameters.
Here are the other existing methods to detect JPEG image resizing and splicing.
● Image splicing detection method based on the bipolar signal perturbation
● Image splicing detection method base on consistency checking on the camera
characteristics of the image.
● Image splicing detection method based on statistical moment generating functions
Compared to the above methods the DCT analysis of neighboring pixels gives much more
accuracy to detect the spliced images. The main disadvantage of this method is that the
performance of the image splicing detection is strongly correlated to the image compression.
2.3.2 JPEG compression analysis and algorithm for forgery detection
Another method for image forgery detection is format base forgery detection. Formats are
additive tag for every file system. Format based image forgery detection can be used to detect
both copy-paste ,copy move image forgery detection. A crafty individual, who wants to perfect
an image forgery, who has all the time needs can make a image which is very hard to detect
forgery. For those kinds of images this method might not work. But for most image forgeries
done using simple tools can be easily detected by format base image forgery detection method.
Photo image forgery is classified into two categories. They are,
● Copying one area of the image and pasting it to another area. (Copy Move forgery or
Cloning)
● Copying and pasting areas from one or more images and pasting onto an image being
forged.(Copy Create Image forgery)
Block based image processing is a famous and efficient way to process an image. The image is
broken into sub parts or equal squares.
16
17. Here is a simple algorithm to perform image compression.[14]
Figure 7: JPEG Image Compression Algorithm
Steps of the JPEG compression algorithm is given below.
Step 1 : Divide image into disjoint 8 x 8 compression blocks (i,j)
For each 8 x8 JPEG compression block (i,j) within bounds:
where A=pixelValue([8*i]+1,8*j) , B= pixelValue(8*i,[8*j]+1),
C= pixelValue([8*i]+1,8*j), D= pixelValue(([8*i]+1,8*j+1)
Step 2: For each 8 x 8 JPEG compression Block(i,j) within bounds
Dright(i,j) =
Dbottom(i,j) =
Step 3: For each 8x8 JPEG compression block (i,j) within bounds
If(Dright(i,j) or Dbottom(i,j) )
Set all pixel values in (i,j) to white
Else
Set all pixel values in (i,j) to black
End
Here are the results of the JPEG compression based forgery detection.
17
18. Figure 8: Forgery Detection Result for different threshold values.
One drawback of this method is that this is only applicable for JPEG image forgery detections
only. The following method described below can be used to detect the image forgeries
independent of the image format.
2.3.3 Method based on directional filter using JPEG image analysis
Forgery detection methods which are based on JPEG compression threshold which work for
only JPEG image format. So that is a huge disadvantage of JPEG compression threshold
method.Today digital cameras support various kinds of image format. This method described
here can be used to detect forgeries in any image format. This gives us more freedom to detect
image forgeries.
Here we explain the Steps involved in this algorithm.[14]
Step 1 Image Processing: If the image is not represented in HSV color space convert the image
color space to HSV by using correct transformations.
Step 2 Edge Detection: This step focuses the attention to areas where tampering may occur. In
this method it uses a simple method for converting the gay scale image into an edge image.
Step 3 Localization of tampered part:Horizontal and vertical projections are calculated and
with the help of horizontal and vertical thresholds other directional edges are removed.
18
19. Figure 9: Edge detection of tampered area.
STEP 4: Calculate Horizontal and Vertical projection profile
STEP 5: Find boundary pixel values, which differ with Projection profile with X, and Y values
STEP 6: Calculate feature map
STEP 7: Identify the forgery region
STEP 8: Display the Forgery Region
STEP 9: Extract the forgery Region
2.4 Combining artifact in screen shots
Video forgery is not all about detecting video tampering. With the rise of modern social media
and a number of content sharing services, many aspects of copyrighted video data are violated.
One such scenario is illegal screen shots obtained from copyrighted video content. In modern
video playback software, frame capturing facilities are provided and it has introduced some
adverse effects as well. Nowadays, it is a common site that people share these screen shots taken
from various video streams (eg. movies) without the owner’s consent which is a serious
copyright violation. So it has become a main concern in video forgery detection. Hence methods
are developed to identify whether these still images are obtained from a particular video stream.
In this method, the properties of interlaced video are exploited to detect illegal screen shots. In
contrast to progressive scan mode, in interlaced video mode a video frame is represented as the
combination of two components, odd field and the even field. Surveys have shown that many
illegal screen shots are taken through the television. TVs use the interlaced video mode to render
19
20. frames [5]. This adds some artifacts to video frames as a result of this interlacing process. One
such instance is called the combing artifact.
Figure 10: Combining artifact in interlaced video [5]
The following section explains how the combing artifact is generated in an interlaced video. A
video frames can be broken into two parts, odd field and even field. Odd field is obtained from
selecting only the odd horizontal lines of the full resolution frame where the even field is
obtained by extracting the even lines of the full resolution frame.
Figure 11: Odd and Even fields in a video frame [5]
With that being said, video interlacing can be explained now. An interlaced video frame is
developed by combining an odd field and an even field. But these two fields do not correspond to
the same full resolution frame. If the odd field is taken from the frame at time t (F(x,y,t)), even
field shall be taken from the frame at time t-1 (F(x,y,t-1)).
Figure 12: Combining odd and even fields to obtain an interlaced video frame [6]
20
21. It is obvious that two field have a motion difference because the two fields are taken from two
video frames, t-1 and t . This introduces an artifact due to the motion between the odd field and
the even field [6] and it is called the combing artifact (Figure 3). In this method, the ideas is to
capture the traces of combing artifact in a screen shot as evidence to interlaced video capturing.
Initial step is to extract a good set of features which emphasizes the areas that are rich of
combing artifact. It is observed that combing artifact is predominant in areas where there are
vertical edges in the screen shot. [6]
8 features have been identified and there are divided to two categories. One is sub-bands (4 sub
bands Low-Low,Low-High,High-Low,High-High) obtained from Discrete Wavelet
Transformation and the other from the differential histograms (vertical and horizontal) of the
screen shot. [6]
Figure 13: Feature extraction procedure of combing artifact [6]
Standard deviation, skewness and kurtosis are calculated in both LH and HL bands which
provides 3 * 2 features.
21
22. Standard deviation
where ,I has N x N pixels
Skewness
where s is standard deviation of I
Kurtosis
where s is the standard deviation of I
Two other features are extracted from vertical histogram (Hv) and horizontal histograms(Hh) of
the screen shot. The algorithm for calculating these histograms are given below.
for each pixel b in the block b do
end for
Now we define two another feature based on this histograms.
Manhattan distance
22
23. Euclidean distance
Now we have eight features altogether which then fed into a SVM that recognizes the amount of
combing artifact. The SVM classifier is then trained by extracted features from various screen
shots and original images [6]
The reason for the robustness and reliability of this method is that it uses a support vector
machine algorithm for screen shot detection. Not like pure statistical methods, this algorithm has
the capability to learn and evolve with the experience it obtains over time. So the average
accuracy improves with time. According to [6] , the test has achieved an accuracy of 97.77%. So
this method has proven successful for identifying whether a screen shot is taken from a video
stream or not.
The main drawback of this method is that it totally relies on the combing artifact of a screen shot.
But there can be situations where the relative motion between odd field and even field is not
significant and consequently the combing artifact is not so rigorous. In this scenario, the SVM
algorithms would not produce satisfactory results. This argument implies that we will need to
extract a set of more robust features to detect these extreme scenarios.
This method has been used to detect screen shots in many popular image formats like JPEG,
BMP, TIFF and video formats like MPEG-2,MPEG-4 and H.264. Possible improvements to this
method involves making this system compatible with other image,video formats as well.
2.5 Multimodal feature fusion
This method, based on combining local image features and global image features offer an
opportunity to discriminate genuine image from a tampered or forged image. In video footage
depicting certain human communication and interaction tasks such as speaking, talking, acting or
expressing emotions, different regions in faces such as lips, eyes, and eyebrows undergo
different levels of motion, and by exploiting the spatio-temporal dynamics from different regions
in face images, it is possible to discriminate a genuine video from tampered or forged video.
The region of interest (ROI) segmentation for detecting faces and regions within faces (lips,
eyes, eyebrows, nose) is done in the first frame of the video sequence. The tracking of the face
and lip region in subsequent frames is done by projecting the markers from the first frame. This
is followed by measurements on the lip region boundaries based on pseudo-hue edge detection
and tracking. The advantage of this segmentation technique based on exploiting the alternative
color spaces is that it is simpler and more powerful in comparison with other methods used to
segment the regions of interest.
23
24. The first stage is to classify each pixel in the given image as a skin or non-skin pixel. The second
stage is to identify different skin regions in the image through connectivity analysis. The last
stage is to determine for each of the identified skin regions- whether it represents a face. This is
done using two parameters. They are the aspect ratio (height to width) of the skin-colored blob,
and template matching with an average face image at different scales and orientations.
With a Gaussian skin color model based on red-blue chromatic space, the “skin likelihood” for
any pixel of an image is extracted.
Figure 14: Face detection by skin color in red-blue
chromatic space [18]
The skin probability images obtained is thresholded to obtain a binary image. The morphological
segmentation involves the image processing steps of erosion, dilation and connected component
analysis to isolate the skin colored blob. By applying an aspect ratio rule and template matching
with an average face, it is ascertained the skin-colored blob is indeed a face. Above figure shows
an original image from a facial video database, and the corresponding skin-likelihood and skin-
segmented images, obtained by the statistical skin color analysis and morphological
segmentation. Once the ROI is segmented and face region is localized, the local features from
different sections of the face (lip region for example) are detected using another color space.
Figure 15 shows ROI region extraction (lip region) based on hue-saturation thresholding.
Figure 15: Lip region localization using hue-saturation
thresholding [18]
24
25. An illustration of the geometry of the extracted features and measured key points is shown in the
figure below. The extracted local features from the lip region are used as visual feature vector.
This method is applied to the video sequences showing a speaking face for all subjects in the
database.
Figure 16: Lip-ROI key points for different lip openings of
a speaking face [18]
For extracting global features, Principal Component Analysis (PCA) or Eigen analysis is
performed on face and segmented facial regions corresponding to lips, eyes, eyebrows, forehead
and nose. The feature fusion involves concatenation of local and global features corresponding to
each facial region.
For evaluating the performance of the proposed local feature extraction and feature fusion
technique experiments with a facial video sequence corpus, a video database was used. A
broadcast-quality digital video camera in a noisy office environment was used to record the data.
The fusion of local and global features is evaluated for three different experimental scenarios. In
the first experiment, the local feature vector for each facial part (lips, eyes, eyebrows, forehead
and nose) is obtained. In the second experiment, global features in terms of 10 eigen projections
of each facial part, based on principal component analysis, is used. For the third experiment, the
local features from the each part of the face were concatenated with global features.
Discriminating a forged/tampered image from genuine image is done in Bayesian framework and
was approached as a two class classification task based on hypothesis testing. It should be noted
that decision scores are computed from for each facial part separately and combined using a late
fusion approach with equal weights. Instead of making decision from one image, the decision
about genuine and tampered/forged image sequence is done using at-least 100 correlated image
frames.
The proposed feature fusion of local features extracted from alternative color spaces such as red-
blue chrominance color space and pseudo-hue color space coupled with global features obtained
with principal component analysis shows a significant improvement in performance in detecting
25
26. tamper/forgery as compared to single global or local features. The technique demonstrates a
simple and powerful method of verifying authenticity of images. Further investigations include
extending the proposed modeling techniques for extracting and estimating blindly the internal
camera processing techniques targeted for complex video surveillance, secure access control, and
forensic investigation scenarios.
2.6 Detecting Video Forgery by Ghost Shadow Artifact
2.6.1 Video Inpainting
Video inpainting is one of the fields in computer vision that deals with the removal of objects or
restoring missing or infected regions present in a video sequence by using the temporal and
spatial information gained from neighboring scenes. The objective of video inpainting is to
generate an inpainted area that is merged into the video so that the consistency is maintained.
Therefore when the video is played as a sequence, a human eye will not be able detect any
distortion in the affected areas [12].
Video inpainting techniques can be classified into two sections [11].
● Patch based techniques
● Object based techniques
Patch based methods extend the image inpainting methods to video inpainting. Some of these
methods are,
● Navier-Stokes video inpainting algorithm
● Video Completion using global optimization
● Video Completion using tracking and fragment merging
Object based methods usually produce high quality results than the patch based methods [11].
Some of these methods are,
● Video repairing under variable illumination using cycle motion
● Rank minimization approach
● Human object inpainting using manifold learning-based posture sequence estimation
2.6.2 Ghost Shadow Artifact
The problem associated with video inpainting is that due to the temporal discontinuities of the
inpainted area, flickers are produced. Ghost shadow artifact is the visual annoyance that can be
caused by these flickers.
Even though efforts are made to remove ghost shadow complicated, video and camera motions
will degrade video inpaint performance which leads to ghost shadow artifact [10].
26
27. 2.6.3 Detecting Video Forgery
In order to detect video forgeries first each frame is segmented into static background and
moving foreground by block matching. Here a rough motion confidence mask is computed for a
frame by comparing each frame with the following frame using block matching. The camera
shift can be estimated as the median shift of all the blocks in the image. Blocks having
significant shift after subtracting camera motion can be identified as the moving foreground.
Then all frames are aligned with camera motion and the foreground mosaic is build. Foreground
mosaic is the panoramic image obtained by stitching number of frames together. Then the binary
accumulative difference image (ADI) is formed by using a reference frame and comparing other
frames with this reference frame. Speed and direction of the moving object can be obtained from
the accumulative difference image. But because of the distortions caused by the MPEG
compression binary ADI of the inpainted video sequence can be composed of some isolated
points. Mathematical morphological operations can be easily used to remove these isolated
points. Finally the inconsistencies in the foreground mosaic and the track are used as indications
of forged video [10].
The advantage of this method is that it can be used even with MPEG compression and
recompression to detect video inpainting. Limitation associated with this approach is that
currently it can be used only to detect inpainting in video sequences with static backgrounds.
Modifications are needed when this method is applying to video sequences with complicated
video motions with different camera angles [10].
27
28. 3. Conclusion
Throughout this literature survey, a number of video/image forgery detection mechanisms and
techniques have been discussed with different perspectives. Video tampering is done using
different methods. So it is obvious that there should be different methods to detect these different
types of video forgery. Modern video forgery detection mechanisms are in a much advanced
level when compared with what we had several years ago. It shows how rapidly this area of
study is being evolved.
No single detection method works best for every situation. So what video forgery detection
method is appropriate for a given situation depends on a number of reasons.
● Techniques used for video forgery
● Available technology
● Computational restrictions
● Video/Image quality
● Video/Image formats
So it is essential to understand the requirement and the environmental parameters as described
above in video forgery detection.
In contrast to the early days of video forgery detection, researchers are more interested in
intelligent video forgery detection mechanisms. This is mainly because of the technical charisma
of these intelligent algorithms which are capable of self learning and evolving. Most of the recent
researches are oriented around these intelligent concepts and they have showed promising results
which encourages the use of more and more learning algorithms in video forgery detection.
With that being said, one might think that classical image processing techniques will no longer
entertain modern approaches. But this is not true for a very good reason.
Even though the detection methods have been revolutionary developed, the basic structure of
video and images are almost the same. For an example, televisions have been broadcasting
interlaced video for a very long time. So what this argument implies is that even though the
detection mechanism change rapidly, the theoretical concepts will sustain many years. As a
result, the validity of fundamental image processing are preserved.
With video recorders and digital cameras becoming easily accessible to the public, video forgery
detection has become one of the most challenging topics. Various forms of optical data are used
in almost every application nowadays from simple photography/videography to advanced
applications like face recognition, security access clearance and many other security focused
fields. The applications of video data in critical fields like security has added more value to the
field of video forgery detection as well.
28
29. 4. References
[1] Wen Chen, Yun Q. Shi, “Detection of Double MPEG Compression Based on First Digit
Statistics” in Digital Watermarking , Springer Berlin Heidelberg, 2008, pp 16-30
[2] Weihong Wang, Hany Farid, “Exposing digital forgeries in video by detecting double MPEG
compression”, MM&Sec, 2006
[3] Xinghao Jiang, Wan Wang, Tanfeng Sun, Yun Q. Shi, Fellow, IEEE, and Shilin Wang,
“Detection of Double Compression in MPEG-4 Videos Based on Markov Statistic”, IEEE Signal
Processing Letters, Vol 20, No 5, May 2013
[4] Sharifi K and Garcia AL (1995). Estimation of shape parameter for generalized Gaussian
distributions in subband decompositions of video. IEEE Trans. Circuits Syst. Video Technol. 5:
52–56.
[5] “Interlacing- Luke’s Video Guide” [Online]. Available:
http://www.neuron2.net/LVG//interlacing.html [Accessed: 15-Aug-2013]
[6] Ji-Won Lee, Min-Jeong Lee, Tae-Woo Oh, Seung-Jin Ryu, Heung-Kyu Lee, “Screenshot
identification using combining artifact from interlaced video”, MM&Sec, 2010
[7] Liu Q, Sung AH, and Qiao M (2009). Improved detection and evaluation for JPEG
steganalysis.ACM-MM 2009, Beijing, China. October 19-24, 2009.
[8] W. Wang and H. Farid, “Exposing digital forgeries in video by detecting duplication,” in
Proceedings of the 9th workshop on Multimedia & security, 2007
[9] W. Wang, “Digital video forensics,” 2009
[10] Jing Zhang, Yuting Su, Mingyu Zhang, “Exposing Digital Video Forgery by Ghost Shadow
Artifact”, MiFor '012 Proceedings of the First ACM workshop on Multimedia in forensics
[11] Anu Rachel Abraham, A. Kethsy Prabhavathy, J. Devi Shree, PhD, “A Survey on Video
Inpainting”, International Journal of Computer Applications (0975 – 8887) Volume 55– No.9,
October 2012
[12] Sean Moran, “Video Inpainting”, April 2012
[13] Qingzhong Liu, Andrew H. Sung , “A New Approach for JPEG Resize and Image Splicing
Detection”, MiFor, 2009
29
30. [14] S.Murali ,Govindraj B. Chittapur , H.S Prabhakar, “Format Based Photo Forgery Image
Detection”, CCSEIT '12 Proceedings of the Second International Conference on Computational
Science, Engineering and Information Technology,2012
[15] Tanfeng Sun, Wan Wang, Xinghao Jiang, “Exposing Video Forgeries by Detecting MPEG
Double Compression” Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE
International Conference,2012
[16] Ho Hee-Meng, “Digital Video Forensics Detecting MPEG-2 Video Tampering through
Motion Errors” , Technical Report RHUL–MA–2013–5, 01 May 2013
[17] Michihiro Kobayashi, Takahiro Okabe, Yoichi Sato, “Detecting Video Forgeries Based on
Noise Characteristics”, PSIVT, 2006
[18] Girija Chetty, Matthew Lipton, “Multimodal feature fusion for video forgery detection”,
Information Fusion, 2010
30