A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
phd-mark4
1. Estimating Video Authenticity via
the Analysis of Visual Quality and
Video Structure
画質と映像構造の解析に基づく
映像の信頼性推定に関する研究
2015/07/31
Laboratory of Media Dynamics
Graduate School of Information Science and Technology
Michael Penkov
2. 1
[*] http://www.youtube.com/t/press_statistics/ (accessed 2015/06/29)
Need to distinguish between the parent and edited videos
Introduction :: Background
Estimating Video Authenticity via the Analysis of Visual Quality and Video Structure
Many independent
uploaders
Upload rate:
300 h/min [*]
Much video is duplicated
No screening of
content
Parent
video
Edited video
(1st gen.)
Edited video
(2nd gen.)
Authenticity of video
important event
(e.g. news)
most objective
most reliable
closest to the truth
least edited
Search result
summarization
Content tracking
Content
aggregation
Cheaper
phones/cameras
Faster
networks
Increase of video sharing
Free sharing
services
(Low)(High)
Editing Editing
Upload Reupload
…
…
How similar is the edited video to the parent video?
3. [1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to
JPEG2000,” in Elsevier Signal Processing: Image Communication, vol. 19, pp.
163–172, 2004.
[2] B. Coskun, B. Sankur, and N. Memon, “Spatio-Temporal Transform Based
Video Hashing,” IEEE Transactions on Multimedia, vol. 8, no. 6, pp. 1190–1208,
Dec. 2006.
[3] Z. Dias, A. Rocha, and S. Goldenstein, “Video Phylogeny: Recovering near-
duplicate video relationships,” in 2011 IEEE International Workshop on Information
Forensics and Security. IEEE, Nov. 2011, pp. 1–6.
[4] F. Battisti, M. Carli, and A. Neri, “Image forgery detection by means of no-
reference quality metrics,” in SPIE Vol. 8303, 2012.
[5] S. Lameri, P. Bestagini, A. Melloni, S. Milani, A. Rocha, M. Tagliasacchi, and S.
Tubaro, “Who is my parent? Reconstructing video sequences from partially
matching shots,” in IEEE International Conference on Image Processing (ICIP),
2014.
Video
Phylogeny [3]
Forgery
Detection [4]
2Introduction :: Related Research
Estimating Video Authenticity via the Analysis of Visual Quality and Video Structure
Our Research
Parent Video
Estimation [5]
Existing methods can…
• Estimate visual quality [1]
• Quantify video similarity [2]
• Estimate hierarchical relationships
between videos [3]
• Detect copy-paste forgeries through
inconsistencies in visual quality [4]
• Estimate parent video from edited
videos [5]
2010 2015
Visual quality ∝ Authenticity
Video structure estimate deleted shots
Video Similarity
Digital Forensics
Visual Quality Assessment
No-reference
VQA [1]
Robust video
hash [2]
Estimate authenticity of edited videos
Visual quality: low visual quality low authenticity
Video structure: many deleted shots low authenticity
4. 3Introduction :: Research Map
Estimating Video Authenticity via the Analysis of Visual Quality and Video Structure
Digital
Forensics
Shot
Segmentation
Video
Similarity
Visual Quality
Assessment
Our Research
Contribution: bridging Visual Quality Assessment and Digital Forensics
[4] F. Battisti, M. Carli, and A. Neri, “Image forgery detection by means of no-reference quality metrics,” in SPIE Vol. 8303, 2012.
[5] S. Lameri, P. Bestagini, A. Melloni, S. Milani, A. Rocha, M. Tagliasacchi, and S. Tubaro, “Who is my parent? Reconstructing
video sequences from partially matching shots,” in IEEE International Conference on Image Processing (ICIP), 2014.
[6] ペンコフ マイケル, 小川 貴弘, 長谷山 美紀 “Fidelity estimation of online video based on video quality measurement and Web
information” 第 26 回信号処理シンポジウム, vol. A3-5, pp. 70–74 (2011).
[4] [6]
[5]
[5]
5. 4Introduction :: Our Contribution
Estimating Video Authenticity via the Analysis of Visual Quality and Video Structure
Authenticity degree: the proportion of information retained by an edited video.
10
A relative scale for ranking edited videos.
Bridge digital forensics and visual quality assessment.
Authenticity Degree
Edited Edited Edited Parent
not
available
Information
Information: the message contained by the video; the reason why people watch the video.
Lower visual quality
Many deleted shots
Higher visual quality
Few deleted shots
6. Introduction :: Thesis Contents
• Chapter 1: Introduction
• Chapter 2: Visual Quality Assessment
• Our contribution: Visual quality ∝ Authenticity
• Reviews conventional algorithms
• Enables the proposed method to quantify information loss
• Chapter 3: Shot Identification
• Utilizes the structure of videos to
1. Enable the reconstruction of the parent video when it is not available
2. Enable detecting deleted shots
3. Enable applications of conventional visual quality assessment algorithms from Chapter 2
• Chapter 4: The Video Authenticity Degree
• The proposed method for estimating video authenticity
• Chapter 5: Conclusion
5
7. Thesis Contents
• Chapter 1: Introduction
• Chapter 2: Visual Quality Assessment
• Our contribution: Visual quality ∝ Authenticity
• Reviews conventional algorithms
• Enables the proposed method to quantify information loss
• Chapter 3: Shot Identification
• Utilizes the structure of videos to
1. Enable the reconstruction of the parent video when it is not available
2. Enable detecting deleted shots
3. Enable applications of conventional visual quality assessment algorithms from Chapter 2
• Chapter 4: The Video Authenticity Degree
• The proposed method for estimating video authenticity
• Chapter 5: Conclusion
6
8. 7Chapter 2 :: Visual Quality Assessment
An Overview
Full-reference
Algorithms [9]
Subjective
Evaluation [8]
No-reference
Algorithms [1]
Target
image
Reference
image
Human subjects
Reduced-
reference
Algorithms [7]
Extracted
features
Compression
[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to JPEG2000,” in Elsevier Signal Processing: Image Communication, vol. 19, pp. 163–172, 2004.
[7] Z. Wang and A. C. Bovik, “Modern Image Quality Assessment,” Synthesis Lectures on Image, Video, and Multimedia Processing, vol. 2, no. 1, pp. 1–156, Jan. 2006.
[8] ITU-T Recommendation BT.500: “Methodology for the subjective assessment of the quality of television pictures”
[9] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13,
no. 4, pp. 600–612, Apr. 2004.
9. 𝑉1 (edited)
𝑉0 (parent)
8Chapter 2 :: Visual Quality Assessment
Known Problems and Limitations of No-reference Algorithms
1.0
1.5 0.5
1.4
1.2
1.0X
(deleted)
1.2
Low good quality
High poor quality
Example:
Problem: algorithms do not consider deleted shots.
[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to JPEG2000,” in
Elsevier Signal Processing: Image Communication, vol. 19, pp. 163–172, 2004.
Blurring
strength [1]
Mean blurring strength
for entire video
Problem: algorithm output is relative to the visual content.
Our solution: shot identifiers (Chapter 3) and a shot-based
penalty model (Chapter 4).
Better quality (!)
Worse quality (!)
𝑉1 (edited video)
𝑉0 (parent video)
Enable detection of deleted shots
Group visually similar shots together
Normalize algorithm outputs
10. Thesis Contents
• Chapter 1: Introduction
• Chapter 2: Visual Quality Assessment
• Our contribution: Visual quality ∝ Authenticity
• Reviews conventional algorithms
• Enables the proposed method to quantify information loss
• Chapter 3: Shot Identification
• Utilizes the structure of videos to
1. Enable the reconstruction of the parent video when it is not available
2. Enable detecting deleted shots
3. Enable applications of conventional visual quality assessment algorithms from Chapter 2
• Chapter 4: The Video Authenticity Degree
• The proposed method for estimating video authenticity
• Chapter 5: Conclusion
9
11. Chapter 3 :: Shot Identification :: Summary
1. Enable the reconstruction of the parent video
2. Enable detecting deleted shots
3. Enable applications of algorithms from Chapter 2
10
𝑉1 𝑉2 𝑉3
𝑉1 𝑉2 𝑉3
1 122 33 4 4
Aim
Method Represent each unique shot as a unique integer.
12. ID = 1 ID = 2 ID = 3 ID = 4
Chapter 3 :: Shot Identification :: Details
11
𝑉1 𝑉2 𝑉3
𝑉1
1
𝑉1
3
𝑉2
2
𝑉2
3
𝑉3
1
𝑉3
2
Shot
segmentation
[10]
Visual
similarity
calculation [2]
Connected
Components
Computation
[11]
Shot ID
assignment
𝑉2
1
𝑉1
2
[2] B. Coskun et al, “Spatio-temporal Transform Based Video Hashing”, IEEE Transactions on Multimedia, vol. 7, no. 3, pp. 524-537, Jun. 2005.
[10] A. Nagasaka and Y. Tanaka, “Automatic Video Indexing and Full-Video Search for Object Appearances”, North Holland Publishing Co., 1992
[11] Hopcroft, J.; Tarjan, R. (1973). "Efficient algorithms for graph manipulation". Communications of the ACM 16 (6): 372–378.
Visually similar shots equal shot identifiers.
13. Thesis Contents
• Chapter 1: Introduction
• Chapter 2: Visual Quality Assessment
• Our contribution: Visual quality ∝ Authenticity
• Reviews conventional algorithms
• Enables the proposed method to quantify information loss
• Chapter 3: Shot Identification
• Utilizes the structure of videos to
1. Enable the reconstruction of the parent video when it is not available
2. Enable detecting deleted shots
3. Enable applications of conventional visual quality assessment algorithms from Chapter 2
• Chapter 4: The Video Authenticity Degree
• The proposed method for estimating video authenticity
• Chapter 5: Conclusion
12
14. Chapter 4 :: The Video Authenticity Degree
Our Strategy
13
Problem: the parent video 𝑉0 is usually unavailable.
Solution: estimate the parent video 𝑉0 from the available edited videos.
Penalty PenaltyPenalty
Aggregate penalties
Detect information loss
Calculate penalties
Proposed
method
Authenticity degree of edited video 𝑉𝑗
𝑉0 (parent video) 𝑉𝑗 (edited video)
X
Editing
Information
1. Shot removal (full loss)
2. Recompression (partial loss)
Penalties
Information: the message contained by the video; the reason why people watch the video.
15. Calculate shot identifiers
Chapter 4 :: The Video Authenticity Degree
An Example
14
𝑉0 (estimate of parent)
2 3 41A set of edited videos
Estimate parent video
Detect removed shots
𝑉2
2 3 4X
𝑉4
2 3 4 5 6X
𝑉3
1 4X X
𝑉1
1 2 3 X
Calculate penalties
Aggregate penalties
0.1
0.2
1.0
1.0
1.0
1.0 1.0
0.1 0.2
0.10.3 0.2
0.20.1 0.2
0.2
Penalties
Shot IDs
0.65
0.60
0.40
0.63
Authenticity
degrees
Authenticity degree
for each edited video
Not in parent
(no penalties)
16. Calculate shot identifiers
15
𝑉0 (estimate of parent)
2 3 41A set of edited videos
Estimate parent video
Detect removed shots
𝑉2
2 3 4
𝑉4
2 3 4 5 6
𝑉3
1 4
𝑉1
1 2 3
Calculate penalties
Aggregate penalties
Shot IDs
0.65
0.60
0.40
0.63
Authenticity
degrees
Authenticity degree
for each edited video
Chapter 4 :: The Video Authenticity Degree
An Example
17. 6
1 23
234
14
2345
0
1
2
3
4
1 2 3 4 5 6
Frequency
Shot ID
Chapter 4 :: Estimating the Parent Video 𝑉0
16
Problem: how can we estimate 𝑉0 from the available edited videos?
Solution: examine the frequently-occurring shot identifiers.
𝑉1
𝑉2
𝑉3
𝑉4
𝑉0 (estimate of parent video)
2341
Threshold
Edited videos Shot ID histogram Estimated result
18. Calculate shot identifiers
17
𝑉0 (estimate of parent)
2 3 41A set of edited videos
Estimate parent video
Detect removed shots
𝑉2
2 3 4X
𝑉4
2 3 4 5 6X
𝑉3
1 4X X
𝑉1
1 2 3 X
Calculate penalties
Aggregate penalties
0.1
0.2
1.0
1.0
1.0
1.0 1.0
0.1 0.2
0.10.3 0.2
0.20.1 0.2
0.2
Penalties
Shot IDs
Authenticity degree
for each edited video
Chapter 4 :: The Video Authenticity Degree
An Example
Not in parent
(no penalties)
19. Chapter 4 :: Penalizing Information Loss (1)
Solving the Problem of Relativity to Visual Content
18
Problem: algorithm output is relative to the visual content.
Solution: normalize visual quality for each unique shot ID individually.
𝑉2
1
𝑉3
1
Information loss is
proportional to visual quality
loss.
Utilize visual quality
algorithms to estimate
information loss.
𝑉4
1
𝑉1
2
𝑉2
2
1.5 1.0 2.0 4.0 5.0Direct [1]
0.0 -1.2 1.2 -1.0 1.0Normalized
Shot ID = 1 Shot ID = 2
𝜇 = 1.5, 𝜎 = 0.4 𝜇 = 4.5, 𝜎 = 0.5
0.2 0.0 0.5 0.0 0.5Penalties
Penalty calculation function
[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to JPEG2000,” in
Elsevier Signal Processing: Image Communication, vol. 19, pp. 163–172, 2004.
𝑧 =
𝑥 − 𝜇
𝜎
𝑥: direct output
𝑧: normalized output
20. Chapter 4 :: Penalizing Information Loss (2)
A Model for Penalizing Shot Removal
19
Problem: algorithms do not consider deleted shots.
Solution: model shot removal as complete information loss.
Information loss is
proportional to visual quality
loss.
Utilize visual quality
algorithms to estimate
information loss.
Penalty calculation function
𝑉0 (estimate of parent)
2 3 41
𝑉2
2 3 4X
𝑉3
1 4X X
1.0
1.0 1.0
Maximum penalty
Shot IDs
Deleted shot
21. 20
Chapter 4 :: Experiments :: Summary
Exp. Purpose Videos Data type Editing operations
1 Demonstrate that the method
correctly detects editing
operations
10 Artificial Scaling
Recompression
Remove whole shots
Remove parts of shots
Reverse shot order
Add logo
2 Demonstrate that the method
can correctly estimate the
parent video for a large variety
of videos
272 Artificial Scaling
Recompression
Remove whole shots
3 Demonstrate the effectiveness
of the proposed method in a
real-life situation
175 Real Unknown
23. 22
Chapter 4 :: Experiments :: Summary
Exp. Purpose Videos Data type Editing operations
1 Demonstrate that the method
correctly detects editing
operations
10 Artificial Scaling
Recompression
Remove whole shots
Remove parts of shots
Reverse shot order
Add logo
2 Demonstrate that the method
can correctly estimate the
parent video for a large variety
of videos
272 Artificial Scaling
Recompression
Remove whole shots
3 Demonstrate the effectiveness
of the proposed method in a
real-life situation
175 Real Unknown
24. 23
Chapter 4 :: Experiment 1 :: Overview
Data
Video Comments Ground
truth
𝑉0 Parent video (consists of 4 shots) 1
𝑉1 Reuploaded 𝑉0 to YouTube 2
𝑉2 Removed 10 frames from each shot of 𝑉1 3
𝑉3 Reversed order of shots of 𝑉1 4
𝑉4 Added a shot to 𝑉1 5
𝑉5 Added a logo to 𝑉1 6
𝑉6 Downsampled 𝑉1 to 720p 7
𝑉7 Removed one shot from 𝑉0 8
𝑉8 Removed two shots from 𝑉0 9
𝑉9 Removed 60 shots from each shot of 𝑉0 10
Evaluation
Criteria
Sample correlation coefficient between the ranks of the
output of the proposed method and the ground truth
High correlation coefficient corresponds to a good result.
1dataset
10 videos
25. 24
Chapter 4 :: Experiment 1 :: Results
Comments ER 𝜸 = 𝟎. 𝟏𝟑 𝜸 = 𝟎. 𝟐𝟓 𝜸 = 𝟎. 𝟓𝟎 𝜸 = 𝟎. 𝟕𝟓
𝑉0 Parent video 1 0.99 (1) 0.97 (1) 0.94 (1) 0.92 (1)
𝑉1 Reuploaded 𝑉0 to YouTube 2 0.97 (3) 0.93 (3) 0.87 (3) 0.80 (3)
𝑉2 Removed 10 frames from each shot of 𝑉1 3 0.96 (4) 0.92 (4) 0.84 (4) 0.75 (4)
𝑉3 Reversed order of shots of 𝑉1 4 0.94 (6) 0.88 (6) 0.77 (6) 0.65 (6)
𝑉4 Added a shot to 𝑉1 5 0.95 (5) 0.89 (5) 0.78 (5) 0.67 (5)
𝑉5 Added a logo to 𝑉1 6 0.97 (2) 0.94 (2) 0.88 (2) 0.81 (2)
𝑉6 Downsampled 𝑉1 to 720p 7 0.88 (7) 0.75 (7) 0.50 (8) 0.25 (9)
𝑉7 Removed one shot from 𝑉0 8 0.72 (8) 0.68 (8) 0.61 (7) 0.55 (7)
𝑉8 Removed two shots from 𝑉0 9 0.48 (9) 0.45 (9) 0.40 (9) 0.36 (8)
𝑉9 Removed 60 frames from each shot of 𝑉0 10 0.00 (10) 0.00 (10) 0.00 (10) 0.00 (10)
𝒓 0.88 0.88 0.87 0.85
Different values for 𝛾 influence the output values, but not their rank.
Proposed method estimates the authenticity of this dataset effectively.
Proposed method does not penalize partial short removal or changes in shot order.
26. 25
Chapter 4 :: Experiments :: Summary
Exp. Purpose Videos Data type Editing operations
1 Demonstrate that the method
correctly detects editing
operations
10 Artificial Scaling
Recompression
Remove whole shots
Remove parts of shots
Reverse shot order
Add logo
2 Demonstrate that the method
can correctly estimate the
parent video for a large variety
of videos
272 Artificial Scaling
Recompression
Remove whole shots
3 Demonstrate the effectiveness
of the proposed method in a
real-life situation
175 Real Unknown
27. 26
Chapter 4 :: Experiment 2 :: Overview
Data
Sample correlation coefficient: 𝑟(𝑥, 𝑦)
Rank-order correlation coefficient: 𝜌 𝑥, 𝑦
High correlation coefficient corresponds to a good result.
16 parent videos from #PopularOnYouTube
Genres: Movie trailers, documentaries, comedy, sports, etc.
Each parent video edited to create 17 edited videos
Ground truth (𝑦): subjective evaluation by 12 individuals
Editing operation Parameter type Parameter values
Downsampling Resolution 720p, 480p, 360p
H. 264 recompression CRF 18, 26, 34, 40
Shot removal Percentage 10%, 20%, …, 90%
16datasets
272 videos
12subjects
Criteria
[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to JPEG2000,” in
Elsevier Signal Processing: Image Communication, vol. 19, pp. 163–172, 2004.
No-reference visual quality assessment algorithm [1]
Comparative
Methods (𝑥)
28. 27Chapter 4 :: Experiment 2
Obtaining Subjective Evaluation Scores
Problem: many videos and parameters objective evaluation is difficult.
Solution: obtain ground truth through subjective evaluations.
For each experiment subject:
For each video:
1. Score visual quality (1 = worst, 5 = best)
2. Score removed shots (1 = most, 5 = least)
3. Score authenticity (1 = lowest, 5 = highest)
For each video:
Ground truth score mean for (3) across all subjects.
30. 29
Chapter 4 :: Experiment 2 :: Results
Proposed method is more effective than the comparative method for most datasets.
Comparative method is not sensitive to editing other than recompression & resampling.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Comp
.
Sample correlation coefficient (𝑟) Rank-order correlation coefficient (𝜌)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Comp
.
31. 30
Chapter 4 :: Experiments :: Summary
Exp. Purpose Videos Data type Editing operations
1 Demonstrate that the method
correctly detects editing
operations
10 Artificial Scaling
Recompression
Remove whole shots
Remove parts of shots
Reverse shot order
Add logo
2 Demonstrate that the method
can correctly estimate the
parent video for a large variety
of videos
272 Artificial Scaling
Recompression
Remove whole shots
3 Demonstrate the effectiveness
of the proposed method in a
real-life situation
175 Real Unknown
32. 31
Chapter 4 :: Experiment 3 :: Overview
Data
High correlation coefficient corresponds to a good result.
5 search queries
8 ~ 76 videos downloaded from YouTube for each query
Ground truth (𝑦): subjective evaluation by 20 individuals
Name Videos Total duration Shots Unique IDs
Bolt 68 4 h 42 min 1933 275
Kerry 5 0 h 47 min 103 24
Klaus 76 1 h 16 min 253 61
Lagos 8 0 h 6 min 17 17
Russell 18 2 h 50 min 1748 103
Total 175 9 h 41 min 4116 480
5datasets
175 videos
20subjects
Comparative
Methods (𝑥)
Criteria
[1] P. Marziliano et al, “Perceptual blur and ringing metrics: Application to JPEG2000,” in
Elsevier Signal Processing: Image Communication, vol. 19, pp. 163–172, 2004.
(1) View count
(2) Upload timestamp
(3) No-reference visual quality assessment algorithm [1]
Sample correlation coefficient: 𝑟(𝑥, 𝑦)
Rank-order correlation coefficient: 𝜌 𝑥, 𝑦
33. 32
Sample correlation coefficient (𝑟)
Estimating authenticity for real data is a difficult task, even for humans.
Rank-order correlation coefficient (𝜌)
Proposed method outperforms the comparative methods for most datasets.
0
0.2
0.4
0.6
0.8
1
View #
Time
Edge W.
Prop.
Ideal
0
0.2
0.4
0.6
0.8
1
View #
Time
Edge W.
Prop.
Ideal
Chapter 4 :: Experiment 3 :: Results
34. 33
Chapter 4 :: Demo :: Summary
Video Editing operations
Authenticity
Degree
Parent video None 1.00
Edited video 1 H.264 Recompression (H.264 CRF = 40) 0.70
Edited video 2 Removed shots (60% of all shots removed) 0.43
Parent video available at: http://youtu.be/xAsjRRMMg_Q (July 21)
35. 34
Chapter 4 :: Demo :: Summary
Video Editing operations
Authenticity
Degree
Parent video None 1.00
Edited video 1 H.264 Recompression (H.264 CRF = 40) 0.70
Edited video 2 Removed shots (60% of all shots removed) 0.43
Parent video available at: http://youtu.be/xAsjRRMMg_Q (July 21)
41. 40Chapter 4 :: Demo :: Edited Video 2 Shots
Authenticity Degree (AD) = 0.43
Full videos
available
42. Conclusion and Future Work
41
Future work:
‒ Consider shot order
‒ Consider inter-frame differences
‒ Detect partial shot removal
‒ Focus on the audio signal as well
Many applications require a method for determining video authenticity.
Search result
summarization
Content tracking Content aggregation
Information loss is
proportional to visual
quality loss.
Utilize visual quality
algorithms to estimate
information loss.
Hinweis der Redaktion
Define “authenticity” here
Similarity to the original? How well does the video convey the message of the parent video?
Explain meaning of blue line.
[4] is different direction, but similar in general.