phd-mark4

Estimating Video Authenticity via
the Analysis of Visual Quality and
Video Structure
画質と映像構造の解析に基づく
映像の信頼性推定に関する研究
2015/07/31
Laboratory of Media Dynamics
Graduate School of Information Science and Technology
Michael Penkov

1
[*] http://www.youtube.com/t/press_statistics/ (accessed 2015/06/29)
Need to distinguish between the parent and edited videos
Introduction :: Background
Estimating Video Authenticity via the Analysis of Visual Quality and Video Structure
Many independent
uploaders
Upload rate:
300 h/min [*]
Much video is duplicated
No screening of
content
Parent
video
Edited video
(1st gen.)
Edited video
(2nd gen.)
Authenticity of video
important event
(e.g. news)
most objective
most reliable
closest to the truth
least edited
Search result
summarization
Content tracking
Content
aggregation
Cheaper
phones/cameras
Faster
networks
Increase of video sharing
Free sharing
services
(Low)(High)
Editing Editing
Upload Reupload
…
…
How similar is the edited video to the parent video?

[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to
JPEG2000,” in Elsevier Signal Processing: Image Communication, vol. 19, pp.
163–172, 2004.
[2] B. Coskun, B. Sankur, and N. Memon, “Spatio-Temporal Transform Based
Video Hashing,” IEEE Transactions on Multimedia, vol. 8, no. 6, pp. 1190–1208,
Dec. 2006.
[3] Z. Dias, A. Rocha, and S. Goldenstein, “Video Phylogeny: Recovering near-
duplicate video relationships,” in 2011 IEEE International Workshop on Information
Forensics and Security. IEEE, Nov. 2011, pp. 1–6.
[4] F. Battisti, M. Carli, and A. Neri, “Image forgery detection by means of no-
reference quality metrics,” in SPIE Vol. 8303, 2012.
[5] S. Lameri, P. Bestagini, A. Melloni, S. Milani, A. Rocha, M. Tagliasacchi, and S.
Tubaro, “Who is my parent? Reconstructing video sequences from partially
matching shots,” in IEEE International Conference on Image Processing (ICIP),
2014.
Video
Phylogeny [3]
Forgery
Detection [4]
2Introduction :: Related Research
Our Research
Parent Video
Estimation [5]
Existing methods can…
• Estimate visual quality [1]
• Quantify video similarity [2]
• Estimate hierarchical relationships
between videos [3]
• Detect copy-paste forgeries through
inconsistencies in visual quality [4]
• Estimate parent video from edited
videos [5]
2010 2015
Visual quality ∝ Authenticity
Video structure  estimate deleted shots
Video Similarity
Digital Forensics
Visual Quality Assessment
No-reference
VQA [1]
Robust video
hash [2]
Estimate authenticity of edited videos
Visual quality: low visual quality  low authenticity
Video structure: many deleted shots  low authenticity

3Introduction :: Research Map
Digital
Forensics
Shot
Segmentation
Video
Similarity
Visual Quality
Assessment
Our Research
Contribution: bridging Visual Quality Assessment and Digital Forensics
[4] F. Battisti, M. Carli, and A. Neri, “Image forgery detection by means of no-reference quality metrics,” in SPIE Vol. 8303, 2012.
[5] S. Lameri, P. Bestagini, A. Melloni, S. Milani, A. Rocha, M. Tagliasacchi, and S. Tubaro, “Who is my parent? Reconstructing
video sequences from partially matching shots,” in IEEE International Conference on Image Processing (ICIP), 2014.
[6] ペンコフマイケル, 小川貴弘, 長谷山美紀 “Fidelity estimation of online video based on video quality measurement and Web
information” 第 26 回信号処理シンポジウム, vol. A3-5, pp. 70–74 (2011).
[4] [6]
[5]
[5]

4Introduction :: Our Contribution
Authenticity degree: the proportion of information retained by an edited video.
10
A relative scale for ranking edited videos.
Bridge digital forensics and visual quality assessment.
Authenticity Degree
Edited Edited Edited Parent
not
available
Information
Information: the message contained by the video; the reason why people watch the video.
Lower visual quality
Many deleted shots
Higher visual quality
Few deleted shots

Introduction :: Thesis Contents
• Chapter 1: Introduction
• Chapter 2: Visual Quality Assessment
• Our contribution: Visual quality ∝ Authenticity
• Reviews conventional algorithms
• Enables the proposed method to quantify information loss
• Chapter 3: Shot Identification
• Utilizes the structure of videos to
1. Enable the reconstruction of the parent video when it is not available
2. Enable detecting deleted shots
3. Enable applications of conventional visual quality assessment algorithms from Chapter 2
• Chapter 4: The Video Authenticity Degree
• The proposed method for estimating video authenticity
• Chapter 5: Conclusion
5

Thesis Contents
6

7Chapter 2 :: Visual Quality Assessment
An Overview
Full-reference
Algorithms [9]
Subjective
Evaluation [8]
No-reference
Algorithms [1]
Target
image
Reference
image
Human subjects
Reduced-
reference
Algorithms [7]
Extracted
features
Compression
[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to JPEG2000,” in Elsevier Signal Processing: Image Communication, vol. 19, pp. 163–172, 2004.
[7] Z. Wang and A. C. Bovik, “Modern Image Quality Assessment,” Synthesis Lectures on Image, Video, and Multimedia Processing, vol. 2, no. 1, pp. 1–156, Jan. 2006.
[8] ITU-T Recommendation BT.500: “Methodology for the subjective assessment of the quality of television pictures”
[9] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13,
no. 4, pp. 600–612, Apr. 2004.

𝑉1 (edited)
𝑉0 (parent)
8Chapter 2 :: Visual Quality Assessment
Known Problems and Limitations of No-reference Algorithms
1.0
1.5 0.5
1.4
1.2
1.0X
(deleted)
1.2
Low  good quality
High  poor quality
Example:
Problem: algorithms do not consider deleted shots.
[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to JPEG2000,” in
Elsevier Signal Processing: Image Communication, vol. 19, pp. 163–172, 2004.
Blurring
strength [1]
Mean blurring strength
for entire video
Problem: algorithm output is relative to the visual content.
Our solution: shot identifiers (Chapter 3) and a shot-based
penalty model (Chapter 4).
Better quality (!)
Worse quality (!)
𝑉1 (edited video)
𝑉0 (parent video)
Enable detection of deleted shots
Group visually similar shots together
Normalize algorithm outputs

Thesis Contents
9

Chapter 3 :: Shot Identification :: Summary
1. Enable the reconstruction of the parent video
3. Enable applications of algorithms from Chapter 2
10
𝑉1 𝑉2 𝑉3
𝑉1 𝑉2 𝑉3
1 122 33 4 4
Aim
Method Represent each unique shot as a unique integer.

ID = 1 ID = 2 ID = 3 ID = 4
Chapter 3 :: Shot Identification :: Details
11
𝑉1 𝑉2 𝑉3
𝑉1
1
𝑉1
3
𝑉2
2
𝑉2
3
𝑉3
1
𝑉3
2
Shot
segmentation
[10]
Visual
similarity
calculation [2]
Connected
Components
Computation
[11]
Shot ID
assignment
𝑉2
1
𝑉1
2
[2] B. Coskun et al, “Spatio-temporal Transform Based Video Hashing”, IEEE Transactions on Multimedia, vol. 7, no. 3, pp. 524-537, Jun. 2005.
[10] A. Nagasaka and Y. Tanaka, “Automatic Video Indexing and Full-Video Search for Object Appearances”, North Holland Publishing Co., 1992
[11] Hopcroft, J.; Tarjan, R. (1973). "Efficient algorithms for graph manipulation". Communications of the ACM 16 (6): 372–378.
Visually similar shots  equal shot identifiers.

Thesis Contents
12

Chapter 4 :: The Video Authenticity Degree
Our Strategy
13
Problem: the parent video 𝑉0 is usually unavailable.
Solution: estimate the parent video 𝑉0 from the available edited videos.
Penalty PenaltyPenalty
Aggregate penalties
Detect information loss
Calculate penalties
Proposed
method
Authenticity degree of edited video 𝑉𝑗
𝑉0 (parent video) 𝑉𝑗 (edited video)
X
Editing
Information
1. Shot removal (full loss)
2. Recompression (partial loss)
Penalties
Information: the message contained by the video; the reason why people watch the video.

Calculate shot identifiers
An Example
14
𝑉0 (estimate of parent)
2 3 41A set of edited videos
Estimate parent video
Detect removed shots
𝑉2
2 3 4X
𝑉4
2 3 4 5 6X
𝑉3
1 4X X
𝑉1
1 2 3 X
Calculate penalties
Aggregate penalties
0.1
0.2
1.0
1.0
1.0
1.0 1.0
0.1 0.2
0.10.3 0.2
0.20.1 0.2
0.2
Penalties
Shot IDs
0.65
0.60
0.40
0.63
Authenticity
degrees
Authenticity degree
for each edited video
Not in parent
(no penalties)

15
𝑉2
2 3 4
𝑉4
2 3 4 5 6
𝑉3
1 4
𝑉1
1 2 3
Calculate penalties
Aggregate penalties
Shot IDs
0.65
0.60
0.40
0.63
Authenticity
degrees
Authenticity degree
An Example

6
1 23
234
14
2345
0
1
2
3
4
1 2 3 4 5 6
Frequency
Shot ID
Chapter 4 :: Estimating the Parent Video 𝑉0
16
Problem: how can we estimate 𝑉0 from the available edited videos?
Solution: examine the frequently-occurring shot identifiers.
𝑉1
𝑉2
𝑉3
𝑉4
𝑉0 (estimate of parent video)
2341
Threshold
Edited videos Shot ID histogram Estimated result

17
𝑉2
2 3 4X
𝑉4
2 3 4 5 6X
𝑉3
1 4X X
𝑉1
1 2 3 X
Calculate penalties
Aggregate penalties
0.1
0.2
1.0
1.0
1.0
1.0 1.0
0.1 0.2
0.10.3 0.2
0.20.1 0.2
0.2
Penalties
Shot IDs
Authenticity degree
An Example
Not in parent
(no penalties)

Chapter 4 :: Penalizing Information Loss (1)
Solving the Problem of Relativity to Visual Content
18
Problem: algorithm output is relative to the visual content.
Solution: normalize visual quality for each unique shot ID individually.
𝑉2
1
𝑉3
1
Information loss is
proportional to visual quality
loss.
Utilize visual quality
algorithms to estimate
information loss.
𝑉4
1
𝑉1
2
𝑉2
2
1.5 1.0 2.0 4.0 5.0Direct [1]
0.0 -1.2 1.2 -1.0 1.0Normalized
Shot ID = 1 Shot ID = 2
𝜇 = 1.5, 𝜎 = 0.4 𝜇 = 4.5, 𝜎 = 0.5
0.2 0.0 0.5 0.0 0.5Penalties
Penalty calculation function
𝑧 =
𝑥 − 𝜇
𝜎
𝑥: direct output
𝑧: normalized output

Chapter 4 :: Penalizing Information Loss (2)
A Model for Penalizing Shot Removal
19
Problem: algorithms do not consider deleted shots.
Solution: model shot removal as complete information loss.
Information loss is
proportional to visual quality
loss.
information loss.
Penalty calculation function
2 3 41
𝑉2
2 3 4X
𝑉3
1 4X X
1.0
1.0 1.0
Maximum penalty
Shot IDs
Deleted shot

20
Chapter 4 :: Experiments :: Summary
Exp. Purpose Videos Data type Editing operations
1 Demonstrate that the method
correctly detects editing
operations
10 Artificial Scaling
Recompression
Remove whole shots
Remove parts of shots
Reverse shot order
Add logo
can correctly estimate the
parent video for a large variety
of videos
Recompression
Remove whole shots
3 Demonstrate the effectiveness
of the proposed method in a
real-life situation
175 Real Unknown

21
Chapter 4 :: Experiments :: Evaluation Method
Given 𝑛 videos, the output of a proposed/comparative method:
𝑥 = [ 𝑥1, 𝑥2, … , 𝑥 𝑛]
Ground truth:
𝑦 = 𝑦1, 𝑦2, … , 𝑦 𝑛
Sample correlation coefficient:
𝑟(𝑥, 𝑦) =
𝑖=1
𝑛
(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑖=1
𝑛
𝑥𝑖 − 𝑥 2
𝑖=1
𝑛
𝑦𝑖 − 𝑦 2
Rank-order correlation coefficient:
𝜌 𝑥, 𝑦 = 𝑟(𝑋, 𝑌)
𝑥 = 2.4, 3.9, 4.1, 3.5, 4.0, 5.9, 6.3
𝑋 = [1, 2, 3, 4, 5, 6, 7]
High correlation coefficient corresponds to a good result.

22
operations
Recompression
Remove whole shots
Reverse shot order
Add logo
of videos
Recompression
Remove whole shots
real-life situation
175 Real Unknown

23
Chapter 4 :: Experiment 1 :: Overview
Data
Video Comments Ground
truth
𝑉0 Parent video (consists of 4 shots) 1
𝑉1 Reuploaded 𝑉0 to YouTube 2
𝑉2 Removed 10 frames from each shot of 𝑉1 3
𝑉3 Reversed order of shots of 𝑉1 4
𝑉4 Added a shot to 𝑉1 5
𝑉5 Added a logo to 𝑉1 6
𝑉6 Downsampled 𝑉1 to 720p 7
𝑉7 Removed one shot from 𝑉0 8
𝑉8 Removed two shots from 𝑉0 9
𝑉9 Removed 60 shots from each shot of 𝑉0 10
Evaluation
Criteria
Sample correlation coefficient between the ranks of the
output of the proposed method and the ground truth
1dataset
10 videos

24
Chapter 4 :: Experiment 1 :: Results
Comments ER 𝜸 = 𝟎. 𝟏𝟑 𝜸 = 𝟎. 𝟐𝟓 𝜸 = 𝟎. 𝟓𝟎 𝜸 = 𝟎. 𝟕𝟓
𝑉0 Parent video 1 0.99 (1) 0.97 (1) 0.94 (1) 0.92 (1)
𝑉1 Reuploaded 𝑉0 to YouTube 2 0.97 (3) 0.93 (3) 0.87 (3) 0.80 (3)
𝑉2 Removed 10 frames from each shot of 𝑉1 3 0.96 (4) 0.92 (4) 0.84 (4) 0.75 (4)
𝑉3 Reversed order of shots of 𝑉1 4 0.94 (6) 0.88 (6) 0.77 (6) 0.65 (6)
𝑉4 Added a shot to 𝑉1 5 0.95 (5) 0.89 (5) 0.78 (5) 0.67 (5)
𝑉5 Added a logo to 𝑉1 6 0.97 (2) 0.94 (2) 0.88 (2) 0.81 (2)
𝑉6 Downsampled 𝑉1 to 720p 7 0.88 (7) 0.75 (7) 0.50 (8) 0.25 (9)
𝑉7 Removed one shot from 𝑉0 8 0.72 (8) 0.68 (8) 0.61 (7) 0.55 (7)
𝑉8 Removed two shots from 𝑉0 9 0.48 (9) 0.45 (9) 0.40 (9) 0.36 (8)
𝑉9 Removed 60 frames from each shot of 𝑉0 10 0.00 (10) 0.00 (10) 0.00 (10) 0.00 (10)
𝒓 0.88 0.88 0.87 0.85
Different values for 𝛾 influence the output values, but not their rank.
Proposed method estimates the authenticity of this dataset effectively.
Proposed method does not penalize partial short removal or changes in shot order.

25
operations
Recompression
Remove whole shots
Reverse shot order
Add logo
of videos
Recompression
Remove whole shots
real-life situation
175 Real Unknown

26
Data
Sample correlation coefficient: 𝑟(𝑥, 𝑦)
Rank-order correlation coefficient: 𝜌 𝑥, 𝑦
16 parent videos from #PopularOnYouTube
Genres: Movie trailers, documentaries, comedy, sports, etc.
Each parent video edited to create 17 edited videos
Ground truth (𝑦): subjective evaluation by 12 individuals
Editing operation Parameter type Parameter values
Downsampling Resolution 720p, 480p, 360p
H. 264 recompression CRF 18, 26, 34, 40
Shot removal Percentage 10%, 20%, …, 90%
16datasets
272 videos
12subjects
Criteria
No-reference visual quality assessment algorithm [1]
Comparative
Methods (𝑥)

27Chapter 4 :: Experiment 2
Obtaining Subjective Evaluation Scores
Problem: many videos and parameters  objective evaluation is difficult.
Solution: obtain ground truth through subjective evaluations.
For each experiment subject:
For each video:
1. Score visual quality (1 = worst, 5 = best)
2. Score removed shots (1 = most, 5 = least)
3. Score authenticity (1 = lowest, 5 = highest)
For each video:
Ground truth score  mean for (3) across all subjects.

28Chapter 4 :: Experiment 2
Subjective Evaluation Interface
Demo
available

29
Proposed method is more effective than the comparative method for most datasets.
Comparative method is not sensitive to editing other than recompression & resampling.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Comp
.
Sample correlation coefficient (𝑟) Rank-order correlation coefficient (𝜌)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Comp
.

30
operations
Recompression
Remove whole shots
Reverse shot order
Add logo
of videos
Recompression
Remove whole shots
real-life situation
175 Real Unknown

31
Data
5 search queries
8 ~ 76 videos downloaded from YouTube for each query
Ground truth (𝑦): subjective evaluation by 20 individuals
Name Videos Total duration Shots Unique IDs
Bolt 68 4 h 42 min 1933 275
Kerry 5 0 h 47 min 103 24
Klaus 76 1 h 16 min 253 61
Lagos 8 0 h 6 min 17 17
Russell 18 2 h 50 min 1748 103
Total 175 9 h 41 min 4116 480
5datasets
175 videos
20subjects
Comparative
Methods (𝑥)
Criteria
(1) View count
(2) Upload timestamp
(3) No-reference visual quality assessment algorithm [1]
Sample correlation coefficient: 𝑟(𝑥, 𝑦)
Rank-order correlation coefficient: 𝜌 𝑥, 𝑦

32
Sample correlation coefficient (𝑟)
Estimating authenticity for real data is a difficult task, even for humans.
Rank-order correlation coefficient (𝜌)
Proposed method outperforms the comparative methods for most datasets.
0
0.2
0.4
0.6
0.8
1
View #
Time
Edge W.
Prop.
Ideal
0
0.2
0.4
0.6
0.8
1
View #
Time
Edge W.
Prop.
Ideal

33
Chapter 4 :: Demo :: Summary
Video Editing operations
Authenticity
Degree
Parent video None 1.00
Edited video 1 H.264 Recompression (H.264 CRF = 40) 0.70
Edited video 2 Removed shots (60% of all shots removed) 0.43
Parent video available at: http://youtu.be/xAsjRRMMg_Q (July 21)

34
Authenticity
Degree

Chapter 4 :: Demo :: Parent Video Screenshot
Authenticity Degree (AD) = 1.00

Chapter 4 :: Demo :: Edited Video 1 Screenshot

Chapter 4 :: Demo :: Zoomed Comparison
Parent video (AD = 1.00) Edited video 1 (AD = 0.70)

38
Authenticity
Degree

39Chapter 4 :: Demo :: Parent Video Shots

40Chapter 4 :: Demo :: Edited Video 2 Shots
Full videos
available

Conclusion and Future Work
41
Future work:
‒ Consider shot order
‒ Consider inter-frame differences
‒ Detect partial shot removal
‒ Focus on the audio signal as well
Many applications require a method for determining video authenticity.
Search result
summarization
Content tracking Content aggregation
Information loss is
proportional to visual
quality loss.
information loss.

phd-mark4

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie phd-mark4

Ähnlich wie phd-mark4 (20)

phd-mark4

Hinweis der Redaktion