Trends and Recent Developments in Video Coding Standardization
1. Trends and Recent Developments in Video Coding
Standardization
ICME 2018 Tutorial, San Diego, 23.07.2018
Jens-Rainer Ohm Mathias Wien
Institute of Communication Engineering Institute of Imaging and Computer Vision
RWTH Aachen University, Germany RWTH Aachen University, Germany
ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
2. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
2
1. Introduction and history of video coding standardization (Jens)
2. Source formats and resolutions (Mathias)
3. State of the art in video compression (Mathias)
4. Versatile Video Coding (Jens)
5. Exploratory trends and perspectives (Jens)
6. Coding tools for multi-camera captures (Jens)
7. Summary and outlook
Outline
3. Part I: Introduction and history of video coding
standardization
ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization
Jens-Rainer Ohm Mathias Wien
Institute of Communication Engineering Institute of Imaging and Computer Vision
RWTH Aachen University, Germany RWTH Aachen University, Germany
ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
4. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
4
Video coding standardization organisations
• ISO/IEC MPEG = “Moving Picture Experts Group”
(ISO/IEC JTC 1/SC 29/WG 11 = International Standardization Organization and International Electrotechnical Commission,
Joint Technical Committee 1, Subcommittee 29, Working Group 11)
• ITU-T VCEG = “Video Coding Experts Group”
(ITU-T SG16/Q6 = International Telecommunications Union – Telecommunications Standardization Sector (ITU-T,
a United Nations Organization, formerly CCITT),
Study Group 16, Working Party 3, Question 6)
• JVT = “Joint Video Team” collaborative team of MPEG & VCEG, responsible for developing AVC
(discontinued in 2009)
• JCT-VC = “Joint Collaborative Team on Video Coding” team of MPEG & VCEG , responsible for
developing HEVC (established January 2010)
• JVET = “Joint Video Experts Team” exploring potential for new technology beyond HEVC (established Oct.
2015 as Joint Video Exploration Team, renamed Apr. 2018)
5. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
5
History of international video coding standardization (1985 2020)
H.263/+/++
(1995-2000+)
MPEG-4
Visual
(1998-2001+)
MPEG-1
(1993)
ISO/IECITU-T
H.120
(1984-1988)
H.261
(1990+)
H.262 / 13818-2
(1994/95-1998+)
H.264 / 14496-10
AVC
(2003-2018+)
H.265 / 23008-2
HEVC
(2013-2018+)
Videotelephony
Computer
SD HD 4K UHD
(Advanced Video Coding
developed by JVT)
(High Efficiency Video
Coding developed by
JCT-VC)
(MPEG-2)
H.26x / 23090-3
VVC
(2020-...)
8K, 360, ...
(Versatile Video Coding
to be developed
by JVET)
6. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
6
The scope of video standardization
• Only Specifications of the Bitstream, Syntax, and Decoder are standardized:
• Permits optimization beyond the obvious
• Permits complexity reduction for implementability
• Provides no guarantees of quality
Pre-Processing Encoding
Source
Destination
Post-Processing
& Error Recovery
Decoding
Scope of Standard
7. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
7
Hybrid Coding Concept
Basis of every standard since H.261
8. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
8
Input Signal
Current Stage
Used since early days of video
compression standards, e.g.
H.261, MPEG-1/-2/-4, H.263, AVS,
H.264/AVC, HEVC and also in
most proprietary codecs (VC1, VP8 etc.)
Hybrid video coding concept
9. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
9
Input Signal DCT
Hybrid video coding concept
10. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
10
QuantizedInput Signal DCT
010011101001…
Hybrid video coding concept
11. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
11
QuantizedInput Signal DCT
010011101001…
Inverse DCT
Hybrid video coding concept
12. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
12
Next Input Signal Reconstruction
vs.
Hybrid video coding concept
13. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
13
Next Input Signal Reconstruction
010011101001…
vs.
Hybrid video coding concept
14. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
14
Input Signal MC Prediction Residual
– =
Residual w/o MC
Hybrid video coding concept
15. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
15
Residual DCT
Hybrid video coding concept
16. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
16
Residual DCT Quantized
010011101001…
Hybrid video coding concept
17. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
17
Residual DCT Quantized Inverse DCT
Hybrid video coding concept
18. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
18
Residual MC Prediction Reconstruction
+ =
usw.
Hybrid video coding concept
19. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
19
Performance history of standard generations
0 100 200 300
28
30
32
34
36
38
40
bit rate (kbit/s)
PSNR
(dB)
Foreman
10 Hz, QCIF
100 frames
HEVC
AVC
H.262/MPEG-2 H.261H.263 +
MPEG-4 Visual
JPEG
35
Bit-rate Reduction: 50%
20. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
20
• Improvements of motion compensation
Variable partitions & merged partitions
Flexible frame referencing & combined prediction
Sub-sample precision and high performance sub-sample interpolation
More efficient vector prediction & coding, supporting large vector ranges
• Improvements of 2D coding
Efficient intra prediction and intra mode coding
Design of transform bases and variable transform block sizes
• Loop filtering for artifact reduction
Deblocking, sample-adaptive offset
• Improvements of entropy coding
Flexible binarization of syntax elements
Arithmetic coding
Adaptation and usage of context information
• These are coupled with encoder optimization
Rate distortion optimization – spend bits where they give best benefit in terms of distortion reduction
Adaptive rate control and perceptually tuned quantization
What made this happen over the years?
21. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
21
• Group of Picture (GoP) structures allowing random access (used since MPEG-1)
• Bi-(directional) prediction for better compression performance (used since MPEG-1)
Reference picture structures
B B B B B B B
previous picture references
......
1 2 3 4 5 6 7
Uni-directional prediction
I|P B B P B B P
pre-previous picture references
Bi-directional prediction
......
1 2 3 4 5 6 7
I|P
a b
22. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
22
• Hierarchical prediction structures for frame rate scalability and further improved compression performance
(used in AVC and HEVC)
Reference picture structures
1P I /P00I /P00 3P 3P3P 2P 3P3P 3P3P2P 2P 1P I /P002P 3P
1B I /P00I /P00 3B 3B3B 2B 3B3B 3B3B2B 2B 1B I /P002B 3B
L prediction0
L prediction1
L prediction2
L prediction3
L prediction0
L prediction1
L prediction2
L prediction3
a
b
a
23. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
23
Coder control is a non-normative part of video codecs
Choose coding parameters at encoder side
“What part of the video signal should be coded using what method and parameter settings?”
Constrained problem:
Unconstrained Lagrangian formulation:
l depends on slope of rate-distortion function:
Small value: High rate, low distortion
High value: Low rate, high distortion
Can be applied in motion parameter estimation, mode decision, transform coefficient
quantization, … - typically set relationship between l and QP value
D - Distortion
R - Rate
p - Parameter Vector
opt argmin ( ) ( )D Rl
p
p p p
opt Targetargmin ( ) s.t. ( )D R R
p
p p p
Coder control
24. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
24
• Video is continually increasing by resolution
HD existing, UHD (4Kx2K, 8Kx4K) appearing
Mobile services going towards HD/UHD
Stereo, multi-view, 360° video
• Devices available to record and display ultra-high resolutions
Becoming affordable for home and mobile consumers
• Video has multiple dimensions to grow the data rate
Frame resolution, Temporal resolution
Color resolution, bit depth
Multi-view
Visible distortion still an issue with existing networks
• Necessary video data rate grows faster than feasible network transport capacities
Better video compression (than current HEVC) needed in next decade, even after availability of 5G
Motivation for improved video compression
25. Part II: Source formats and resolutions
ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization
Jens-Rainer Ohm Mathias Wien
Institute of Communication Engineering Institute of Imaging and Computer Vision
RWTH Aachen University, Germany RWTH Aachen University, Germany
ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
26. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
26
• Sequence of pictures successively captured or rendered
• Progressive and interlaced formats
• Picture rate measured in pictures per second, unit Hertz (Hz)
• Minimum picture rate at 24Hz for impression of fluent motion [Po12]
Standard Definition TV at 50/60Hz interlaced
High Definition (HD) video at 50/60Hz progressive
Ultra HD (UHD) video up to 120Hz
Up to 300Hz considered
Structure of a Video Sequence
[Po12] Charles Poynton. Digital Video and HD: Algorithms and Interfaces. Waltham, MA, USA: Morgan Kaufman Publishers, 2012.
27. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
27
• Picture
Set of arrays or a single array of samples with intensity values
Monochrome picture: single intensity array
Color video: usually three intensity arrays
⇒ three color components representing the color
Color sample (all three components) also referred to as a pixel
(derived from picture element, sometimes also denoted as pel)
Optional alpha channel to indicate opaqueness (transparency) for mixing applications
Pictures, Frames, and Fields
28. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
29
• Picture
Set of pixel lines, defined number of pixels per line
Shape of pixels not necessarily square, depends on picture format
Examples:
Pixel Shape
29. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
30
• Human visual system less sensitive to color than to structure and texture
⇒ full resolution luma, lower resolution chroma
• Chroma sub-sampling types commonly specified by relation between
number of luma an chroma samples
YCbCr Y : X1 : X2
• With Y: number of luma pixels
• Sub-sampling format of chroma components specified by X1 and X 2
• X1 : horizontal sub-sampling
• X2 = 0: vertical sub-sampling identical to horizontal sub-sampling
• X2 = X1 : no vertical sub-sampling
Chroma Sub-Sampling
30. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
31
• Color Impression
Visible range of spectrum range from
380 nm to 780 nm
Impression of color: intensity density
distribution over the visible spectral range
Colors corresponding to single wavelength:
spectral colors or primary colors
Human visual system has three color receptors (cone cells)
Maximum sensitivity in the wavelength areas of red, green and blue
Additional ’gray-scale’ receptors (rod cells): responsive in low lighting conditions
Representation of Color
Picture source: Wikipedia, artwork by Holly Fischer
31. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
32
• Visual perception split into perception of brightness (light and dark) and
chromaticity (color impression)
Brightness is driven by summarized intensity of observed spectrum
Color impression is driven by shape of intensity distribution
• Functional expression to represent perceived color by a mathematical
description first standardized in the CIE 1931 Standard Observer
• Color as a point in a three-dimensional XYZ space
• X,Y,Z values derived from the observed spectrum
• Three color matching functions
The CIE Standard Observer
CIE: Commission internationale de l’éclairage, http://www.cie.co.at
Standard Observer specified in ISO11664-1
32. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
33
•
The CIE Standard Observer
33. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
34
• Normalization for expression of the chromaticity independent observed brightness
• Since , therefore
• Chromaticity specified by (x,y)-pair
• Definition of a standardized white point, e.g. ’white C’, ’white D65’
The CIE Standard Observer
[Po12] Charles Poynton. Digital Video and HD: Algorithms and Interfaces. Waltham, MA, USA: Morgan Kaufman Publishers, 2012.
[Hu04] Robert G.W. Hunt. The Reproduction of Colour. 6th ed. Chichester, West Sussex, England: Whiley-VCH, 2004.
34. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
35
• Colour space
Standard Dynamic Range (SDR) video
Contrast approx. 1000 : 0
ITU-R BT.709 colour space
High Dynamic Range (HDR) video
Contrast approx. 1000000 : 0
ITU-R BT.2100 colour space
Color Spaces: Standard and Hight Dynamic Range / Wide Color Gamut
Figure from N1508: Ajay Luthra, Edouard Francois, and Walt Husak (Eds.). Requirements and Use Cases for HDR and
WCG Content Coding. Doc. N15084. Geneva, CH, 111th meeting: MPEG, Feb. 2015.
ITU-R BT.709: Parameter values for the HDTV standards for production and international programme exchange. ITU-R,
Apr. 2004. URL: http://www.itu.int/rec/R-REC-BT.709/en .
ITU-R BT.2020: Parameter values for ultra-high definition television systems for production and international programme
exchange. ITU-R, Oct. 2015. URL : http://www.itu.int/rec/R-REC-BT.2020/en
ITU-R BT.2100: Image parameter values for high dynamic range television for use in production and international
programme exchange. ITU-R, Jun. 2017. URL: http://www.itu.int/rec/R-REC-BT.2100-1-201706-I/en
35. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
36
Color Spaces: Standard and Hight Dynamic Range / Wide Color Gamut
Figure from: Ajay Luthra, Edouard Francois,
and Walt Husak (Eds.). Requirements and Use
Cases for HDR and WCG Content Coding. Doc.
N15084. Geneva, CH, 111th meeting: MPEG,
Feb. 2015.
36. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
37
HDR/WCG Conversion Practices: Scope
ITU-T H Suppl. 15 | ISO/IEC TR 23008-14, Conversion and Coding Practices for HDR/WCG Y′CbCr 4:2:0 Video with PQ Transfer Characteristics.
ITU-T H Suppl. 18 | ISO/IEC TR 23008-15, Signalling, backward compatibility and display adaptation for HDR/WCG video coding.
Figure from: Jonatan Samuelsson et al.: Conversion and Coding Practices for HDR/WCG Y′CbCr 4:2:0 Video with PQ Transfer Characteristics (Draft 4). Doc. JCTVC-Z1017. 26th meeting,
Geneva, CH: Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Jan 2017.
37. Part III: State of the Art in Video Compression
ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization
Jens-Rainer Ohm Mathias Wien
Institute of Communication Engineering Institute of Imaging and Computer Vision
RWTH Aachen University, Germany RWTH Aachen University, Germany
ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
38. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
39
Comparison of HEVC and the Joint Exploration Test Model (JEM) of JVET
• A glimpse on high-level syntax (HEVC)
• Coding structures
• Walk-through of the coding loop
Intra coding
Inter coding
Transform coding
Loop filters
Entropy coding
Outline and Concept for Part III
39. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
40
• Coded Video Sequence (CVS)
Starts with a random access point (intra-coded picture)
One or more CVSs in a bitstream
→ Coded Video Sequence Group (CVSG)
• Network Abstraction Layer (NAL)
Encapsulation of coded video sequence for transport and storage
Video coding layer (VCL) NAL units
Information directly for reconstruction of samples and pictures
Non-VCL NAL units
Parameter sets
Supplemental enhancement information
...
Network Abstraction Layer and Video Coding Layer
40. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
41
• RBSP: Raw byte sequence payload
Sequence of bytes comprising the coded NAL unit payload
RBSP stop bit (=’1’) plus zero bits for byte alignment
• SODB: String of data bits
Concatenation of bits in the RBSP bytes from MSB to LSB
All bits needed for the decoding process
Only the bits needed for the decoding process
NAL Unit Structure
NAL unit header
41. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
47
• Blocks and Units
Block: Square or rectangular area in a color component array
Unit: Collocated blocks of the (three) color components, associated syntax elements and
prediction data (e.g. motion vectors)
• Picture partitioning
Coding Tree Blocks / Coding Tree Units (CTBs / CTUs)
Each CTU in exactly one slice segment
Independent slice segment: full header, independently decodable
Dependent slice segment: very short header, relies on corresponding independent slice,
inherits CABAC state
• Slice types
I-slice: Intra prediction only
P-slice: Intra prediction and motion compensation with one reference picture list
B-slice: Intra prediction and motion compensation with one or two reference picture lists
HEVC Spatial Coding Structures
CABAC: Context-based Adaptive Binary Arithmetic Coding
42. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
48
Tiles in HEVC
• Change scanning order of CTBs in picture
• Slices in tiles, or tiles in slices
• Reset of prediction and entropy coding → parallel processing
43. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
49
• Maximum CTU size: 64×64 pixels
• Quadtree partitioning of CTB into CBs
• If picture size not integer multiple of CTB size:
Implicit CTB partitioning to meet picture size (must be multiple of 8×8 pixels)
HEVC: Coding Tree Blocks and Coding Blocks (CBs)
44. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
50
• Prediction block partitioning of a 2N×2N CB
• Transform block partitioning of a CB
Quadtree partitioning of CB → Residual Quad Tree (RQT)
Transform size 4×4 to 32×32
TB size 4×4 to 64×64
PB boundaries inside TBs allowed
HEVC: Prediction Blocks (PBs) and Transform Blocks (TBs)
45. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
51
• QTBT structure removes concept of multiple partition types (TU = PU = CU)
• Maximum CTU size: 256×256 pixels (128×128 used in common testing conditions)
• Binary trees starting from leaves of quad-tree (with horizontal / vertical split indication)
→ CU can have either square or rectangular shape
• Configuration
MinQTSize, MaxBTSize : minimum quadtree leaf node size / maximum binary tree root node size
MaxBTDepth, MinBTSize : maximum binary tree depth / minimum binary tree leaf node size
JEM: Quad-Tree plus Binary Tree Partitioning (QTBT)
1
1
0
1
0
0
Figure from: Jianle Chen et al. Algorithm Description of Joint Exploration Test Model 7. Doc. JVET-G1001. Torino, IT, 7th meeting: Joint Video
Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Jul. 2017.
46. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
52
Intra Prediction
47. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
53
Intra prediction modes
• Planar prediction: mode 0
• DC intra prediction: mode 1
• Numbering from diagonal-up to diagonal-down
Modes 2 – 18: horizontal
• Modes 19 – 34: vertical
• Horizontal: mode 10
Vertical: mode 26
Intra prediction block size
• Intra prediction mode coded per CU
• Prediction block size derived from residual quadtree
• Boundary samples of neighboring block used for prediction
• Efficient representation
• Local update of prediction source
HEVC Intra Prediction Modes
48. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
54
• Concept of HEVC as basis
Higher number of prediction modes
Larger maximum block size
• Chroma
Prediction modes from neighbors
Derived modes from collocated luma
JEM Intra Prediction Modes
Figure from: Jianle Chen et al. Algorithm Description of Joint Exploration Test Model 7. Doc. JVET-G1001. Torino, IT, 7th meeting: Joint Video
Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Jul. 2017.
49. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
55
• HEVC
2-tap filters
Weight derived from prediction direction
• JEM
4-tap filters
Cubic interpolation for blocks with ≤ 64 samples
Gaussian interpolation filters elsewhere
Parameters fixed according to block size
Same filter for all predicted samples, all modes
Interpolation Filters for Directional Intra Prediction Modes
50. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
56
• HEVC
Boundary sample filtering for intra prediction modes 10, 26
(horizontal / vertical)
Local, 1-sample update at boundary perpendicular to prediction direction
• JEM
Extended to directional modes
Boundary samples up to four columns or rows
2-tap filter for intra modes 2 & 34
3-tap filter for intra modes 3–6 & 30–33
Intra Prediction Boundary Filtering
Figure from: . JVET-G1001: Algorithm Description of Joint Exploration Test Model 7.
51. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
57
• Chroma samples predicted using corresponding reconstructed luma samples
𝑝𝑟𝑒𝑑 𝐶 𝑖, 𝑗 = 𝛼 · 𝑟𝑒𝑐 𝐿′ 𝑖, 𝑗 + 𝛽
• Parameters 𝛼 and 𝛽: minimize regression error between
neighbouring reconstructed luma and chroma samples
around current block
• Further prediction between chroma components with updated
parameters
𝑝𝑟𝑒𝑑 𝐶𝑟
∗
𝑖, 𝑗 = 𝑝𝑟𝑒𝑑 𝐶𝑟 𝑖, 𝑗 + 𝛼 · 𝑟𝑒𝑠𝑖 𝐶𝑏′ 𝑖, 𝑗
Multiple model CCLM mode (MMLM)
• Neighbouring luma samples and neighbouring chroma samples classified
into two groups
• Linear model for each group
JEM: Cross-Component Linear Model Prediction (CCLM)
Figures from: JVET-G1001: Algorithm Description of Joint Exploration Test Model 7.
52. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
58
• Combination of the un-filtered boundary
reference samples and HEVC style intra
prediction with filtered boundary reference
samples
Position-dependent weighting of filtered
and unfiltered reference, configurable by
four weighing parameters (hor/ver + corner)
Filtered reference: linear comination of un-
filtered reference and lowpass, configurable
weight
Three predefined lowpass filters selectable
(3-tap, 5-tap, 7-tap)
Prediction parameters stored per block size
JEM: Position Dependent Intra Prediction Combination for Planar Mode (PDPC)
Figure from: JVET-G1001: Algorithm Description of Joint Exploration Test Model 7.
53. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
59
• HEVC
Bi-linear smoothing
Depending on prediction block size
Mode-dependent Intra Reference Sample Smoothing (MDIS)
• Temporally adopted in JEM (removed in JEM7)
Adaptive reference sample smoothing (ARSS)
3-tap LPF with the coefficients of [1, 2, 1] / 4
5-tap LPF with the coefficients of [2, 3, 6, 3, 2] / 16
Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
54. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
60
Inter Prediction
55. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
61
Prediction from reference picture lists
• Uni-prediction
P-slices only with List0, B-slices with List0 or List1
HEVC: Minimum PB size 8×4 or 4×8
• Bi-prediction, only in B-slices
One predictor from List0, one predictor from List1
HEVC: Minimum prediction block size 8×8
Motion Compensated Prediction
56. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
62
• Merge mode
Motion vector (MV) derived from candidate set
(spatial and temporal neighborhood)
Merge mode candidate index coded
No motion vector difference encoded
• Advanced motion vector prediction
Predictor derived from candidate set
(spatial and temporal neighborhood)
Predictor index coded
Motion vector difference encoded
• Skip mode
Only merge candidate signaled, no residual
HEVC: Motion Vector Representation
57. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
63
• CU: at most one set of motion parameters for each prediction direction
• Option to split large CU into sub-CUs
Alternative temporal motion vector prediction (ATMVP)
Fetch multiple sets of motion information from multiple blocks in collocated reference picture
Spatial-temporal motion vector prediction (STMVP)
Derive recursively by temporal motion vector predictor and spatial
neighbouring motion vector
• ATMVP and STMVP: additional merge candidates (list extended to max 7)
JEM: Sub-CU based motion vector prediction
Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
58. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
64
• Locally adaptive motion vector resolution (LAMVR)
motion vector difference (MVD) coded in units of
quarter luma samples,
integer luma samples, or
four luma samples
• Higher motion vector storage accuracy
Internal motion vector storage and merge candidate at 1/16 pel (skip and merge modes only)
SHVC upsampling interpolation filters for the additional fractional pel positions
JEM Motion Vector Representation
SHVC: Scalable High Efficiency Video Coding, HEVC Annex G
59. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
65
• Overlapped Block Motion Compensation (OBMC) previously been used in ITU-T H.263
• Switchable on CU level
Motion compensation block boundaries except the right and bottom boundaries of CU
Applied for both the luma and chroma components
Performed at sub-block level for all MC block boundaries
JEM: Overlapped Block Motion Compensation
Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
60. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
66
• Linear model for illumination changes, using a scaling factor a and an offset b concept taken from 3D-HEVC
• Enabled or disabled adaptively for each inter-mode coded coding unit (CU)
• Least square error method employed to derive the parameters a and b
• CU in 2N×2N merge mode
LIC flag copied from neighbouring blocks (like merge)
Otherwise, LIC flag at CU level
JEM: Local Illumination Compensation (LIC)
Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
61. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
67
• Motion vector field (MVF) for CU, applicable MV derived for each
4×4 block at 1/16 pel resolution
Control point motion vector (CPMV)
• AF INTER mode
Signaling CPMV difference from predictor
Block width and height ≥ 8 required
• AF MERGE mode
Derivation of CPMV from neigborhood
JEM: Affine Motion Vector Derivation for MC
y
xxyy
y
x
yyxx
x
vy
w
vv
x
w
vv
v
vy
w
vv
x
w
vv
v
0
0101
0
0101
)()(
)()(
Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
62. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
68
• Special merge mode based on Frame-Rate Up Conversion (FRUC) techniques
Options for
Bilateral matching
Template matching (applicable also for AMVP mode, CU level only)
• Motion vector derivation process
Initial motion vector for CU of size 𝑊 × 𝐻
Sub-CU motion refinement for blocks of size 𝑀 × 𝑀
𝑀 = max{4, min{
𝑊
2 𝐷 ,
𝐻
2 𝐷}}
JEM: Pattern Matched Motion Vector Derivation (PMMVD)
bilateral
Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
63. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
69
• Sample-wise motion refinement on top of block-wise motion compensation for bi-prediction
• No extra signaling, applied on 4×4 block basis
• MVF determined by minimizing difference Δ between points 𝐴 and 𝐵 on trajectory
by Taylor expansion
Δ = 𝐼(0)
− 𝐼0
1
+ 𝑣 𝑥 𝜏1
𝜕𝐼 1
𝜕𝑥
+ 𝜏0
𝜕𝐼 0
𝜕𝑥
+ 𝑣 𝑦 𝜏1
𝜕𝐼 1
𝜕𝑦
+ 𝜏0
𝜕𝐼 0
𝜕𝑦
• Limited search window
• Optimized search
First vertical, then horizontal search
Memory usage: only access samples
inside block
JEM: Bi-directional optical flow (BIO)
Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
64. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
70
• MVs of bi-prediction refined by bilateral template matching process
• Search between bilateral template and reference pictures
⇒ refined MV without further signaling
• Applied only with reference pictures with pocRef𝑖 < poccurr < pocRef𝑗
• Not applied if enabled in CU:
LIC,
Affine motion,
FRUC, or
sub-CU merge candidate
JEM: Decoder-side Motion Vector Refinement (DMVR)
Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
65. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
71
Residual Coding
66. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
72
• Transform block sizes 4×4, 8×8, 16×16, and 32×32
Integer approximations of the DCT-II transform matrix
• Additionally, integer approximation of 4×4 DST-VI transform matrix
• ’Single-norm’ design per transform block size → simple quantizer implementation
• Not all perfectly orthogonal, leakage below normalization threshold
HEVC Core Transforms
67. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
73
• Quantizer step size Δq derived from quantization parameter QP
• Exponentional relation of quantizer step sizes
• Double step size every 6 QP
Δq QP + 1 =
6
Δ 𝑞 QP
• Definition: Δq = 1 for QP = 4, thereby
Δq,0 = 2−
4
6, 2−
3
6, 2−
2
6, 2−
1
6, 1, 2
1
6
• Quantizer step sizes for given QP
Δq QP = Δq,0 QP mod 6 ⋅ 2
QP
6
Quantizer Implementation
68. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
74
• Large block-size transforms with high-frequency zeroing
Maximum transform size up to 128 × 128
Coefficients with column / row index > 32 set to 0
if
Block width > 64
Block height > 64, respectively
• Adaptive multiple core transform (AMT)
Transform matrices quantized more accurately
Applicable for block sizes ≤ 64 × 64
Indicated by CU flag
Mode-dependent transform-set selection
for intra prediction modes
JEM Transforms
Tables from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
69. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
75
• Motivation
Remaining correlation between coefficients after primary transform!
Dependency on intra prediction mode!
• Approach: mode dependent transforms (have been studies as tool for HEVC)
• MDNSST Structure:
35×3 non-separable secondary transforms for both 4×4 and 8×8 block size
3 NSST candidates for each intra prediction mode
Application of transposed transform blocks for modes > 34
JEM: Mode-Dependent Non-separable Secondary Transforms (MDNSST)
Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
70. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
76
• Only applied to the low frequency coefficients after the primary transform
For blocks ≥ 8 × 8, application of 8 × 8 transform to lowest frequency coefficients of primary transform
For blocks < 8 × 8, application of 4 × 4 transform to lowest frequency coefficients of primary transform
• Implementation by Hypercube-Givens Transform (HyGT)
• Two rounds for 4 × 4, four rounds for 8 × 8 secondary transforms
JEM: Mode-Dependent Non-separable Secondary Transforms (MDNSST)
Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
71. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
77
• Searching 𝑁 similar patches in reconstructed region of picture, based on template
• Scheme of KLT matrix derivation:
Collection of 𝑁 prediction residuals: 𝑼 = (𝒖 𝟏,𝒖 𝟐,…,𝒖 𝑵)
covariance matrix Σ = 𝑼𝑼 𝑻
Eigenvectors are KLT bases
• Application of proposed KLT on 4×4, 8×8, 16×16 and 32×32 coding blocks
• Note: Tool not activated in JVET Common Testing Conditions [JVET-G1010]
JEM: Signal dependent transform
Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
72. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
78
Loop Filtering
73. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
79
• HEVC deblocking filter also used in JEM
Filtering at prediction and transform
block edges on a 8 × 8 grid
Independent operation on 8 × 8 blocks
possible parallel processing enabled
• Deblocking filtering
Boundary processed in 4-sample sections (edges)
Filter strength determined based on analysis of top
and bottom rows of edge
Normal: Filtering of maximum two samples into block
Strong: Up to four samples into block
Deblocking Filter
74. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
80
• HEVC SAO filtering also used in JEM
• Local processing of samples
Depending on local neighborhood (edge offset)
Direction signaled, smoothing only
Depending on sample value (band offset)
Configurable correction of sample intensity
values for four transition bands
• Operation independent of processed samples
→ parallel processing
• Local filter parameter adaptation
• Four different offset values available (plus SAO off)
• Dedicated SAO parameters for Y, Cb, Cr
Common SAO mode for chroma components
Sample Adaptive Offset Filter (SAO)
edge offset
band offset
75. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
81
• First loop filter in the decoding process chain of JEM
• Each luma sample in reconstructed TU is replaced by weighted average of itself and its neighbours within TU
sample located at (𝑖, 𝑗), neighbouring sample at (𝑘, 𝑙)
𝐼(𝑖, 𝑗 ) and 𝐼(𝑘, 𝑙): reconstructed intensity value
𝜎 𝑑: spatial parameter (transform size, pred.mode)
𝜎𝑟: range parameter (QP)
𝜔 𝑖, 𝑗, 𝑘, 𝑙 = exp −
𝑖 − 𝑘 2
+ 𝑗 − 𝑙 2
2𝜎𝑑
2
−
𝐼 𝑖, 𝑗 − 𝐼 𝑘, 𝑙 2
2𝜎𝑟
2
𝐼 𝐹 𝑖, 𝑗 =
σ 𝑘,𝑙 𝐼 𝑘, 𝑙 ⋅ 𝜔(𝑖, 𝑗, 𝑘, 𝑙)
σ 𝑘,𝑙 𝜔(𝑖, 𝑗, 𝑘, 𝑙)
Integer implementation with look-up table
for division
JEM: Bilateral filter
Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
76. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
82
• Luma component
25 filters available for each 2×2 block, based on direction and activity of local gradients
Diamond filter shapes (3 × 3, 5 × 5, 7 × 7)
Classification into 25 classes, based on
Activitiy index
Directionality index
• Chroma components
Diamond filter shape 5 × 5
No classification
Single set of filter coefficients
• Geometric transformations based on data from classification
Transpose, vertical flip, rotation
• Filter coefficients signaled with 1st CTU, FIFO buffering for temporal prediction in inter pictures, 16 candidate
sets for intra pictures
JEM: Adaptive loop filter (ALF)
77. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
83
Entropy Coding
78. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
84
• Fixed length and variable length codes (FLC, VLC)
High-level syntax
Parameter sets, slice segment header
SEI messages
Fixed-length codes, Exp-Golomb codes
• Arithmetic coding
Slice level, CTUs
Context-based adaptive coding
Bypass coding (complexity, throughput)
Entropy Coding
CTU = Coding Tree Unit
SEI = Supplemental Enhancement Information
79. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
85
• VCL NAL Unit
FLC, VLC for header information
CABAC for CTUs
Byte alignment in case of multiple tiles, or with wavefront parallel
processing (not present otherwise)
Fixed and Variable Length Coding
NAL = Network Abstraction Layer
VCL = Video Coding Layer
CABAC = Context-based Adaptive Binary Arithmetic Coding
ba = byte alignment
80. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
86
• Arithmetic coding engine
Binarization
Context model selection
Binary arithmetic coding
Optimized binarization design
Reduced number of non-bypass
bins compared to H.264 | AVC
• JEM
Modified context modeling for transform coefficients
Multi-hypothesis probability estimation with context-dependent updating speed
Adaptive initialization for context models
Context-Based Adaptive Binary Arithmetic Coding (CABAC)
81. Part IV: Versatile Video Coding
ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization
Jens-Rainer Ohm Mathias Wien
Institute of Communication Engineering Institute of Imaging and Computer Vision
RWTH Aachen University, Germany RWTH Aachen University, Germany
ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
82. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
88
• Experimental software “Joint Exploration Model“ (JEM) developed by JVET
Intended to investigate potential for better compression beyond HEVC
Was initially started extending HEVC software by additional compression tools, or replace existing tools
(see previous section)
• Substantial benefit was shown over HEVC, both in subjective quality and objective metrics
Proven in "Call for Evidence" (July 2017)
JEM was however not designed for becoming a standard (regarding all design tradeoffs)
Call for Proposals was issued by MPEG and VCEG (October 2017)
• Call for Proposals very successful (responses received by April 2018)
32 companies in 21 proponent groups responded
46 category-specific submissions: 22 in SDR, 12 each in HDR and 360° video
All responses clearly better than HEVC, some evidently better than JEM
This marked the starting point for VVC development
Steps towards next generation standard – Versatile Video Coding (VVC)
83. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
90
• Document JVET-H1002
• Test categories
Standard dynamic range (SDR): 5 UHD and 5 HD sequences
High dynamic range (HDR): 3 HLG and 5 PQ sequences
360° video (360): 5 sequences in ERP format
• Constraint sets
Constraint set 1 (C1): Random access configuration
Max 1.1s random access intervals, structural delay max 16 pictures
Constraint set 2 (C2): Low delay configuration only evaluated for SDR HD sequences
No picture reordering between input and output
• Encoding constraints
No pre-processing, post-processing only within the coding loop
Static quantizer setting with one-time change to meet target bitrate
Relevant optimization methods to be reported
Joint Call for Proposals (CfP) on Video Compression with Capability beyond HEVC
UHD = Ultra High Definition, HD = High Definition, HLG = Hybrid Log Gamma, PQ = Perceptive Quantization (ITU-T BT2020), ERP = Equirectangular Projection
84. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
91
• SDR-A: 3840×2160
• SDR-B: 1920×1080
• HDR (PQ HD, HLG 4K)
• 360 Video (8K, 6K)
VVC CfP Test Sequences
FoodMarket4 60p CatRobot1 60p DaylightRoad2 60p ParkRunning3 50p Campfire 30p
BasketballDrive 50p Cactus 50p BQTerrace 60p RitualDance 60p MarketPlace 60p
Market3 HD50p Hurdles HD50p Starting HD50p ShowGirls2 HD25p Cosmos1 HD24p
DayStreet 60p PeopleInShop... SunsetBeach 60p
ChairliftRide 30p KiteFlite 30p Harbor 30p Trolley 30p Balboa 60p
85. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
92
• Category-specific submissions (total 46):
SDR: 22 submissions (8 of which are registered only in this category)
HDR: 12 submissions
360°: 12 submissions (2 of which are registered only in this category)
For all categories: HEVC anchors (HM) and JEM anchors
• Proposals
Described in JVET input documents JVET-J0011...JVET-J0033
Participation of 32 institutions
VVC CfP Responses
JVET documents available at http://phenix.it-sudparis.eu/jvet
86. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
93
• Submissions had to provide coded/decoded sequences
4 rate points each, two constraint conditions "low delay" (LD) and "random access" (RA)
SDR: 5x HD (both LD and RA), 5x UHD-4K (only RA)
HDR: 5x HD (PQ grading), 3x UHD-4K (HLG grading)
360°: 5 sequences 6K/8K for the full panorama
• Double stimulus test with two hidden anchors HEVC-HM & JEM
Rate points defined with lowest rate was typically less than "fair" quality for HEVC, but still possible to code
Quality was judged to be distinguishable when confidence intervals were non-overlapping
• Evaluation: Three ways of judging benefit:
Mean MOS over all test cases (28x4 test points: 23x4 C1, 5x4 C2 )
Count cases where a proposal was visually better/worse than JEM
Count cases where a proposal was visually better than HEVC (HEVC at higher rate point)
• Reports: Input subjective test [JVET-J0080], output CfP results [JVET-J1003]
Performance
87. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
94
• Measured by objective performance (PSNR), best performers report >40% bit rate reduction compared
to HEVC, >10% compared to JEM (for SDR case)
Similar ranges for HDR and 360°
Obviously, proposals with more elements show better performance
Some proposals showed similar performance as JEM with significant complexity/run time reduction
2 proposals used some degree of subjective optimization, not measurable by PSNR
• Results of subjective tests generally show similar (or even better) tendency
Benefit over HEVC very clear
Benefit over JEM visible at various points
Proposals with subjective optimization also showing benefit in some cases
Performance
88. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
95
• JVET-J1003:
Report of subjective
evaluation contains
28 plots as shown,
one per sequence
• Count significant
cases of positive/
negative benefit
with non-overlapping
confidence interval
against JEM
Performance
HM
JEM
Proposals ranked by MOS (per rate point)
+1 credit
-1 credit
89. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
96
• "Mean" and "significance-count"
method suggested at least 7
proposals that were obviously
better than JEM
Performance SDR
Pxx 10
Pxx 8
Pxx 8
Pxx 6
Pxx 6
Pxx 6
Pxx 6
Pnn 3
Pnn 3
Pnn 2
Pnn 2
Pnn 1
Pnn 1
JEM 0
Pnn 0
Pnn -1
Pnn -1
Pnn -1
Pnn -2
Pnn -2
Pnn -2
Pnn -3
Pnn -4
HM -36
Pxx 6,53
Pxx 6,46
Pxx 6,41
Pxx 6,37
Pxx 6,33
Pxx 6,33
Pxx 6,26
Pnn 6,23
Pnn 6,17
Pnn 6,15
Pnn 6,13
Pnn 6,11
Pnn 6,04
Pnn 6,04
Pnn 6,03
Pnn 6,03
Pnn 6,01
JEM 6,01
Pnn 6,00
Pnn 5,96
Pnn 5,94
Pnn 5,88
Pnn 5,86
HM 4,57
Mean MOS Significance vs. JEM
60 ... +60
90. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
97
• Similar
tendency
in HDR
and 360°
categories
• Mostly same
coding tools
as in SDR
provide good
benefit
Performance HDR / 360°
Mean MOS Signif. vs. JEM
Pxx 6,04
Pxx 6,00
Pxx 5,94
Pxx 5,93
Pxx 5,86
Pnn 5,85
Pnn 5,80
Pnn 5,67
JEM 5,62
Pnn 5,60
Pnn 5,59
Pnn 5,45
Pnn 5,11
HM 4,14
Pxx 7
Pxx 3
Pxx 2
Pxx 2
Pxx 2
Pnn 1
Pnn 1
JEM 0
Pnn 0
Pnn 0
Pnn -1
Pnn -1
Pnn -6
HM -20
32 ... +32
Mean MOS Signif. vs. JEM
Pxx 6,20
Pxx 6,19
Pxx 6,06
Pxx 6,03
Pxx 5,99
Pxx 5,96
Pxx 5,86
Pnn 5,69
Pnn 5,67
Pnn 5,51
Pnn 5,45
JEM 5,11
HM 3,79
Pnn 3,45
Pxx 9
Pxx 9
Pxx 8
Pnn 7
Pxx 7
Pxx 6
Pxx 5
Pxx 4
Pnn 2
Pnn 1
Pnn 1
JEM 0
HM -9
Pnn -12
20 ... +20HDR 360°
91. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
98
• How often are best performing proposals better than HEVC at higher rate?
• Note: R11 Mbit/s; R2 1.6 Mbit/s; R3 2.8 Mbit/s; R4 4.6 Mbit/s
Performance compared to HEVC
Pbest vs HM R1 vs R2 R1 vs R3 R1 vs R4 R2 vs R3 R2 vs R4 R3 vs R4
SDR UHD 60% 40% 0% 80% 0% 20%
SDR HD/RA 40% 0% 0% 20% 0% 20%
SDR HD-/LD 40% 0% 0% 0% 0% 0%
HLG 67% 0% 0% 67% 0% 33%
PQ 40% 0% 0% 40% 0% 20%
360° 40% 20% 0% 20% 0% 60%
Rate saving 37.5% 65% 78% 43% 35% 39%
92. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
99
• How often is HEVC better than best performing proposals at lower rate?
- Note: 1-xx% means that best performing proposal is equal or better
• Note: R11 Mbit/s; R2 1.6 Mbit/s; R3 2.8 Mbit/s; R4 4.6 Mbit/s
Performance compared to HEVC
HM vs Pbest R1 vs R2 R1 vs R3 R1 vs R4 R2 vs R3 R2 vs R4 R3 vs R4
SDR UHD 0% 0% 60% 0% 0% 0%
SDR HD/RA 0% 60% 100% 0% 80% 0%
SDR HD-/LD 0% 60% 80% 0% 80% 0%
HLG 0% 0% 100% 0% 67% 0%
PQ 0% 60% 100% 0% 60% 0%
360° 0% 40% 80% 0% 40% 0%
Rate saving 37.5% 65% 78% 43% 65% 39%
93. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
100
• The subjective quality of best performing proposals is always equal or sometimes better (~1/3 of cases) than
HEVC at next higher rate point, over all categories (with approx. 40% less rate)
• The subjective quality of best performing proposals is always equal or sometimes better (~1/5 of cases) than
HEVC at 2nd higher rate point, in SDR-UHD category (with approx. 65% less rate)
• Though it is not always the same proposal that performs best at a given rate point, it can be anticipated that
merits of different proposals can be combined
• 50% (or more) bit rate reduction with same quality will probably be achievable by the new standard
Performance compared to HEVC
94. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
101
• New elements (some come with high complexity):
Decoder side estimation for mode/MV derivation and sample prediction both in intra and inter coding (JEM)
Finer partitioning: Asymmetric, geometric
Neural networks for prediction, loop filtering, upsampling, (encoder control)
Additional elements using template matching
Intra block copy / current picture referencing
Additional non-linear, de-noising and statistics-based loop filters
Additional linear and non-linear elements in prediction
• HDR specific:
New adaptive reshaping and quantization, also in-loop
HDR-specific modifications of existing tools, e.g. deblocking
• 360-video specific:
Variants of projection formats, geometry-corrected face boundary padding
Modification and disabling of existing tools at face boundaries
CfP analysis: What was proposed?
95. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
102
• VVC Working Draft 1 / Test Model 1 (VTM1): basic approach
built on "reduced HEVC" starting point
• VTM Block structure
Unified tree (coding block unites prediction and transform)
CTU size 128x128, rectangular blocks (dyadic sizes),
smallest luma size 4x4
Maximum transform size 64x64
• VTM: Some removed elements of HEVC:
Mode dependent transform (DST-VII), mode dependent scan
Strong intra smoothing
Sign data hiding in transform coding
Unnecessary high-level syntax (e.g. VPS)
Tiles and wavefront
Quantization weighting
VVC Working Draft and Test Model 1
96. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
103
• Report of Results from the Call for Proposals on Video Compression with Capability beyond HEVC
[JVET-J1003]
Documentation of results per sequence, marking HM and JEM anchors, not identifying individual proponents
Assessment of qualitative (and as far as possible quantitative) benefit of submitted technology compared to
anchors
• Working Draft 1 of Versatile Video Coding [JVET-J1001]
"Reduced" HEVC plus quad/binary/ternary tree structure
• Test Model 1 of Versatile Video Coding (VTM 1) [JVET-J1002]
Corresponding encoder and algorithm description
Documents issued after CfP Results
97. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
104
• Benchmark Set (BMS) was defined in addition to VTM, including the following well-known JEM tools:
• 65 intra prediction modes
• Coefficient coding
• AMT + 4x4 NSST
• Affine motion
• Geometry based adaptive loop filter
• Subblock merge candidate (ATMVP)
• Adaptive motion vector precision
• Decoder motion vector refinement
• LM Chroma mode
• Purpose: Testing benefit of technology against better performing set
Holding extra potential features we aren’t so sure about yet
Superset of VTM; should have significant gain over the VTM
Unveils in CEs whether gains are independent, or how much gain remains when a tool is combined with a
set of more performant tools
Can be a common basis for further CE tests of modified versions of features
Not necessarily ultra-low complexity, but encoder needs to be runnable in reasonable amount of time
Benchmark Set and its role
98. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
105
• The only fundamental new element of version 1
• Simple multi-type tree split, can be alternated
Quad/binary/ternary partitioning
Example:
Figures from: JVET-J1001
99. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
106
• PSNR-based Common Test Conditions (CTC) BD-Rate savings relative to HEVC reference software (10 bit)
• Note that gain over HEVC with CTC
is lower than with CfP test set
(other sequences, higher rates,
lower resolutions)
Performance of VTM1 and initial BMS compared to HEVC
vs HM16.18 VTM BMS
4k UHD 10% 28%
1080p 8% 22%
WVGA 6% 19%
Average 8% 23%
Decode time 0.8× 2×
Encode time 2× 9×
100. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
107
• Working Draft 2 of Versatile Video Coding [JVET-K1001]
Normative text specification
No descriptive text of building blocks "borrowed" from HEVC: These would anyway be placeholders which
are likely to be replaced later
Starting from this meeting, precise specification of more substantial newly adopted building blocks is being
added (see subsequent slides)
• Test Model 2 of Versatile Video Coding (VTM 2) [JVET-K1002]
Encoder and algorithm description
Has corresponding software implementation
Latest status (from last week)
101. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
108
• QT/BT/TT no longer “placeholder”
• Remove unnecessary partitioning restrictions
• Implicit splitting at picture boundaries
• Separate trees for intra slices
• Position Dependent Prediction Combination
• Cross Component Linear Model
• 87 intra modes (wide angles included), 3 MPM, TU binarization
• Affine MC (4x4 fixed subblock size, 4/6 parameter model switching at CU level)
• Affine MV coding
list construction contains inheritance and derivation spatial/temporal
improved difference coding
• Adaptive motion vector resolution (AMVR)
• Subblock MC (4x4) from ATMVP merge, 8x8 granularity motion vector storage [High precision]
Latest status (from last week): New elements of WD2 / VTM2
102. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
109
• Multiple transform selection (all are DCT/DST types) for intra and inter
• Increase max QP from 51 to 63
• Modified entropy coding supporting dependent quantization
• Sign data hiding reinvoked from HEVC
• Adaptive loop filter
4x4 classification based (gradient strength & orientation) for luma
7x7 luma, 5x5 chroma filters)
enabling flag at CTU level
• Basic high-level syntax (SPS, PPS, slice)
• Update of BMS contains
generalized Bi prediction (kind of local weighted prediction)
Decoder-side estimation: BIO, simplified bilateral matching
Current picture referencing (aka intra block copy)
Latest status (from last week): New elements of WD2 / VTM2
103. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
110
• For rectangular blocks, prediction directions witch angles beyond 45/135 degrees are reasonable
• This can be implemented by adding modes at both ends
• VTM2 uses a total of 85 directional intra modes now
(plus DC and planar)
Wide angular modes
Figures from JVET-K0500
104. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
111
• Alternating between two quantizers based on state transition rule allows to select an optimum
sequence of reconstruction values (e.g. by trellis-like search)
• Decoder needs to implement the sequential state transition rule
• CABAC contexts needs to be modified as well for this case
(greater than 0/1/2/... would have different meaning depending on Q0/Q1)
Dependent quantization
0 1
2 3
Q0
Q1
(k & 1) == 1
(k & 1) == 1
(k & 1) == 1
(k & 1) == 1
(k & 1) == 0
(k & 1) == 0
start
state
current
state
next state for …
(k & 1) == 0 (k & 1) == 1
0 0 2
1 2 0
2 1 3
3 3 1
-9Δ -8Δ 8Δ3Δ2Δ 4Δ 5Δ 6Δ 7Δ-Δ-6Δ-7Δ -5Δ -4Δ -3Δ -2Δ Δ0 9Δ
0
1
4-2 1-4 -3
0
-1
Q0
t
2 3
2 3 4 5-1-2-3-4-5
Q1
A AA BA B B A B
DC C D C DDCDCD
Figures from JVET-K0071
105. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
112
• Ongoing investigations on
Improved merge, intra prediction, etc.
Decoder-side estimation with low complexity
Multi-hypothesis prediction and OBMC
Diagonal and other geometric partitioning
Secondary transforms
New approaches of loop filtering, reconstruction and prediction filtering
(denoising, non-local, diffusion based, bilateral, etc.)
Current picture referencing, template matching, palette mode
Neural networks for loop filtering and prediction
• Core experiments (CE) process
coordinated effort to investigate performance, complexity impact of proposed elements
typically based on a specific technology proposed, or combination of several technologies
allows detailed study / cross-checks by other interested parties
allows identifying which elements of a proposal are useful, if it is nit useful at all, or if further improvements
are needed
Further promising fields
106. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
113
• Motivation: Towards object-oriented coding
Follow object boundaries more closely
Less coding artifacts where it matters
• Prediction, transform and coding driven by actual object
shape under RD-constraint
Inter- and intra-predicted segments for handling of
disocclusions
Overlapped wedge based filtering at partition boundary
Shape-adaptive DCT for spatially localized transform
coding
Geometric Partitioning (GEO)
Source: M. Bläser, J. Sauer, and M. Wien, “Description of SDR and 360o video coding technology proposal
by RWTH Aachen University,” Doc. JVET-J0023, Joint Video Experts Team of ITU-T VCEG and ISO/IEC MPEG, San Diego, USA, 10th meeting, Apr. 2018
107. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
114
• GEO available for all block sizes ≥ 8×8 luma samples
• Partitioning is represented by two coordinate points 𝑃0 and 𝑃1 on the block boundary
• Prediction of two coordinate points 𝑃0 and 𝑃1 from 16 pre-defined templates (scaled for non-square blocks)
Alternative: Spatial or temporal prediction
Refinement: block size dependent offset
• Integration with AMVP, MERGE, FRUC
(no AFFINE (yet))
GEO: Partitioning Coding and Prediction
108. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
115
Results for GEO
JEM 7.0 JEM 7.0 + GEO
• Visual improvements at object boundaries
Sharper contours
Less staircase-effect
More background details
• Objective gains (BD-rate savings)
Against HEVC: ~33% on C1, ~25% on C2
Against JEM: ~0.8% for both, C1 and C2
JEM 7.0
109. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
116
Results for GEO
JEM 7.0 JEM 7.0 + GEO
• Visual improvements at object boundaries
Sharper contours
Less staircase-effect
More background details
• Objective gains (BD-rate savings)
Against HEVC: ~33% on C1, ~25% on C2
Against JEM: ~0.8% for both, C1 and C2
JEM 7.0 + GEO
110. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
117
• CE1: Partitioning
• CE2: Adaptive loop filter
• CE3: Intra prediction and mode coding
• CE4: Inter prediction and MV coding
• CE5: Arithmetic coding engine
• CE6: Transforms and transform signalling
• CE7: Quantization and coefficient coding
• CE8: Current picture referencing
• CE9: Decoder side MV derivation
• CE10: Combined and multi-hypothesis prediction
• CE11: Deblocking
• CE12: Mapping for HDR content
• CE13: Coding tools for omnidirectional video
• CE14: Post-reconstruction filtering
• CE15: Palette mode
Current Core Experiments
111. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
118
• Technically similar elements to HEVC/JEM/VVC or JVET study
Partitioning: 128x128 "superblock" with equivalent to quad/binary sub-splits (no 1:2:1 ternary)
Directional intra prediction, 56 directional modes, DC and "true motion" mode
Chroma from luma prediction
Intra block copy
Up to 7 reference frames (allows similar structure to hierarchical B)
Spatial/temporal motion vector referencing
Affine motion compensation (pixel based)
OBMC
DCT/DST based transforms, and skip
Adaptive arithmetic coder
Context-based transform coefficient coding
Film grain synthesis
Adaptive loop filter (Wiener like)
Deblocking
AOM's AV1
112. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
119
• Other elements
Recursive-filtering intra predictor
Prediction based on color palette
Wedge-based prediction, 16 diagonal/asymmetric modes for square/rectangular blocks, similar to GEO
Difference-modulated prediction (based on difference between two references)
Contrast enhancement/deringing loop filter
Self-guided filter (somewhat similar to bilateral & diffusion filters)
Super-resolution coding mode (with coding at lower res.)
• Performance
Owners report 20% average bit rate reduction (PSNR based)
compared to X.265-style HEVC encoder, set of full HD sequences
Other reports indicate much less gain, or even losses compared
to HM encoder (using sequences from JVET's CTC)
According to the same reports, JEM performs significantly better than AV1
Some of those may not have used the newest JEM version, though
AOM's AV1
113. Part V: Exploratory trends and perspectives
ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization
Jens-Rainer Ohm Mathias Wien
Institute of Communication Engineering Institute of Imaging and Computer Vision
RWTH Aachen University, Germany RWTH Aachen University, Germany
ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
114. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
121
• PSNR mostly used for video quality assessment
targeting Pixel fidelity which does not necessarily reflect subjective quality
• Specific artifacts produced by video codecs:
blockiness, blur and banding
motion jerkiness
time-varying edge noise ("mosquito effect")
• Alternative metrics may be clustered into
full reference quality metrics
reduced reference quality metrics
no-reference quality metrics
• Note that also subjective testing methods require some reference (e.g. impairment compared to original or
another anchor)
full reference metrics are most reliable and are also typically used for encoder decisions
• Note: Subsequent slide gives an example (SSIM) – not claimed that this is the best!
Quality metrics
115. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
122
• Example of another full-reference metric which better matches subjective quality at least for images
• Structural SIMilarity Index (SSIM) [Wang et al. 2004] measures the structural distortion by exploring three
components: Luminance, Contrast and Structural changes.
Luminance:
Contrast:
Structure comparison:
• Numerous variants:
Computation separately for regions
Weighting by amount of motion and frame averaging for video
Computation in complex wavelet domain for frequency weighting (MS-SSIM, multi-scale)
Perceptually adapted quality metrics example: SSIM
1
2 2
1
2
( , ) x y
x y
C
l x y
C
2
2 2
2
2
( , ) x y
x y
C
c x y
C
3
3
2
( , )
xy
x y
C
s x y
C
( , ) [ ( , )] .[ ( , )] .[ ( , )]SSIM x y l x y c x y s x y
116. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
123
• Textures with large amount of detail and/or motion are often extremely challenging for video codecs
• On the other hand, the exact pixel-wise appearance is largely irrelevant for human observers, whereas
degradation of visual quality is critical
• Textures in videos can be static or dynamically changing over time
Static textures basically rigid (but may be moving globally)
Dynamic textures have high amount of irregular local motion
Examples: water, smoke, head-and-shoulder sequences
• Both categories should have some stationarity properties in space and/or time, for allowing modelling as
random process expressed by parametric description – examples:
Spectral properties
Moments (marginal statistics and covariance statistics)
Random field models
• In case of dynamic texture, modelling the motion properties is relevant as well, can also be understood as a
random field with certain amount of variation
Perceptual coding: Texture analysis and synthesis
117. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
124
• Example below is based on a parametric statistical description in complex wavelet domain (steerable
pyramid), with lowpass baseband and four directional orientations in bandpass layers
[Portilla, Simoncelli 2000]
• Efficient coding of parameters needed for synthesis by [Thakur, Ray 2016]
• Marginal statistics expressed as scalar values
• Auto and cross correlation statistics compressed via DCT
Static texture synthesis
Reference HEVC Intra Coding 0.223bpp Thakur et al. 0.213bpp
118. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
125
MVF MV T(i,j)
Dense OF
between adjacent
frames
Analyse
Motion
Distribution
Discard
non-probable
MV combinations
T original frames
MVF MV T'(i,j)
Compressed
MCM Mc
MCM M
Discard Intermediate
Frames
Derive Motion
Vectors
Invert MVF
Synthesized
MVF
T-2 synthesized frames
Frame Warping
and Blending
Analysis
Synthesis
Source: Chubach et al. 2017
Dynamic texture synthesis method
119. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
126
HEVC 6 of 8 frames synthesized
Dynamic texture synthesis vs. HEVC at same rate
120. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
127
• Recently, many signal processing tasks are solved by employing machine learning, deep learning and
convolutional neural networks (CNN)
• Advantages for video compression could be as follows:
• Systematic approach of optimizing with big data sets (rather than hand-crafted design)
• Detection and exploitation of nonlinear dependencies in images and video
• Inclusion of perceptual criteria by mimicking human observer behaviour
• On the downside, both training and running e.g. CNN algorithms e.g. for encoder decisions or at the decoder
may be overly complex
• Types of NN that have been proposed for image/video compression
• Autoencoders
• Adversarial networks
• Recurrent networks, particularly based on LSTM (long short-term memory) elements
Learning based approaches: Overview
121. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
128
• An autoencoder is a deep (convolutional) neural network with a sparse hidden layer that represents the code
• The encoder typically performs subsequent filtering and downsampling steps on input x per layer (note
conceptual similarity with transform coding!)
• The decoder performs complementary upsampling steps and generates output y
• Encoder and decoder are trained jointly
such that
• Difference between x and y
is minimized w.r.t. some distortion
• Code z is as sparse (minimum amount
of information) as possible
• Use Bayes formula P(z|x) P(x|z)P(z)
and minimize Kullback Leibler divergence
of conditional probabilities to achieve
the latter [Kingma, Welling 2014]
Convolutional Neural Networks: Autoencoders (AE)
Source: Wikipedia
x y
z=F(x) y=G(z)
122. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
129
• Generator net G generates samples y from random variables z (G would be the decoder, z the code)
• Discriminator net D decides whether the samples could match with real-world images x which stem from an
unknown distribution P(x)
• Generator and discriminator nets are trained iteratively, optimizing following function
• Minimax optimization:
• Train D such that V is maximized
• Train G such that V is minimized
• Problem: There is no corresponding
mapping from x to z (no encoder)
• Solution (e.g. [Santurkar et al. 2017]):
Combination AE and GAN, i.e. train
F(x) from AE joint with G(z) and D(⋅)
Convolutional Neural Networks: Generative Adversarial Networks (GAN)
Source: Slideshare.net – K. McGuinness
z
x
y
G(z)
D(x) or D(y)
123. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
130
• Variable-rate and variable-size coding not straightforward
• Option to operate over small patches / blocks
• Train separate for different content complexity
• Code residual differences
• Cost functions for rate distortion optimization not straightforward to implement
• Option to re-formulate rate constraint as energy minimization problem
• Hybrid solutions where conventional entropy coding is operated after network output at encoder
• None of these solutions may lead to a consistent optimum, and may require to be driven by some external
decision mechanism
Convolutional Neural Networks: General problems and possible solutions
124. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
131
• Autoencoder could be interpreted as a monolithic non-linear transform (though operating with local kernels)
– see previously used notation in light green below
• A similar approach is proposed in [Ballé et al. 2017], with additional criteria for rate distortion optimization
and quantization / entropy coding on the sparse representation (called y here)
• Perceptual optimization based on nonlinear "generalized divisive normalization" and L2 norm minimization in
nonlinear space
• Authors report significantly improvement on detail structures, also improved MS-SSIM compared to
conventional codecs – transform optimized based on cost criterion below:
Trained non-linear transforms
(x)
(y)
(z)F(x)
G(z') (z')
Source: Ballé et al. 2017
125. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
132
• All methods discussed so far were developed for still image coding, and could be used in intra coding for video
• Main problem: Motion compensation is a very effective tool, and can hardly be trained into a network (or would
be tremendously more complex than conventional motion estimation)
• Some work on using CNN for
Sub-pel interpolation
Resolution up-conversion
Post-processing
Texture synthesis and inpainting
• It is also not as simple to train for perceptual criteria in video
NN for video
126. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
133
• NN-based approaches were so far more successful in still image coding rather than video coding
Perceptual criteria also better understood for images
• In video coding, motion compensation is a most effective key component
Requires motion estimation for which "conventional" algorithms appear to be less complex
Analogy: Eye tracking – the brain processes a motion compensated input
• CNN have been demonstrated to provide benefit in context of video coding for
Resolution up-conversion
Post-processing and loop filtering
Intra coding
Encoder optimization, in particular partitioning which is basically a segmentation problem
NN for video
127. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
134
• Switching to lower resolution is common (an necessary) when data rate is low
• Video is locally varying by detail, and may not require encoding at full resolution everywhere
• Lower resolution may also be useful with high motion, motion blur, etc.
• Need to code less information in such irrelevant areas can save data rate
• Tools "Reduced Resolution Update" or "Dynamic Resolution Conversion" were included in MPEG-4 part 2 and
H.263+, but not well understood by that time
• Requires tools for
downsampling when generating prediction from reference
signalling the coding with variable resolution
upsampling for generating full-resolution picture
• Three examples shown subsequently:
Down/Up-sampling using neural networks / conventional filters
Coding B pictures of dynamic texture with low resolution
Dictionary-based super-resolution upsampling
Variable-resolution coding
128. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
135
• Basic idea of dynamic resolution coding:
Downsample and code by lower resolution (less bitrate cost)
Upsample at decoder side to full resolution
Encoder decides using full res, conventional or CNN-based down- and upsampling
CNN-based could generate super-resolution upsampling, sharper edges, etc.
• Can be implemented in combination with intra and inter prediction coding
• Operated on block by block basis
CNN for resolution up-conversion
Figure from JVET-J0032
129. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
136
• Loop filtering is common in video coding
removes compression artifacts from reconstruction
improves prediction from reconstructed frames
• Generally, signal-adaptive and non-linear filters
e.g., de-blocking, de-ringing, de-banding
edge-adaptive & Wiener optimized
bi-lateral filters
...
• CNN reconstruction
provides additional
gain (3-5% rate red.)
and might replace
some conventional
filters
• Can be operated on
block basis, parallel
processing possible
CNN for loop filtering
Figures from JVET-I0022
Process Unit
Block7
2*padding_size
Block6
Block1 Block2 Block3 Block4 Block5
Block8 Block9 Block10
2*padding_size
padding_size
Block11 Block12 Block13 Block14 Block15
Block16 Block17 Block18 Block19 Block20
2*padding_size
padding_size
Conv1 (5, 5, 45)
Conv2 (3, 3, 54)
Conv3 (3, 3, 58)
Conv4 (3, 3, 48)
Conv5 (3, 3, 51)
Conv6 (3, 3, 40)
Conv7 (3, 3, 31)
Convolution8 (3, 3, 1)
Normalized QP MapNormalized Y/U/V
Concat
Summation
ConvL (M,N,KL)
ConvolutionL (M,N,KL)
ReLU
M: kernel width
N: kernel height
KL: kernel number
130. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
137
• Neural networks were demonstrated to provide improved intra prediction, compared to conventional
directional and planar modes
• Mostly fully connected networks
have been used for this
purpose (no convolutional
layers)
• Average rate reductions
of 4-5% (for intra coding)
have been reported
• Examples of prediction
demonstrate the benefit
of non-linear processing
Neural networks for intra prediction
Figure
from JVET-J0037
Figures from Li et al. IEEE-TCSVT, July 2018
131. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
138
• Key pictures coded with full resolution
• Non-key pictures coded with reduced resolution
• Upsampling based on motion-compensated steerable pyramid
Variable-resolution coding for dynamic texture (Thakur et al. 2017)
Ref pic L0 Ref pic L1
Lowpass Lowpass Lowpass
Original Pictures
Reconstructed
key Pictures
Predicting
Non-Key Pictures
132. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
139
• Motion vectors initially estimated from downsampled lowpass key pictures, refined and applied in bandpass
and highpass components of non-key pictures
• Authors report significant bit rate saving (20-30% average) for dynamic texture content, whereas subjective
quality is preserved compared to full-resolution coding
Variable-resolution coding for dynamic texture (Thakur et al. 2017)
Motion
Estimation
Motion
Compensation
Bandpass
Current LowpassReference Lowpass
HighpassHighpass
Bandpass
Key picture Non-key picture
133. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
140
• Low and high-resolution dictionaries trained jointly with sparsity constraint (large data base)
• Up-converter searches low number of matching dictionary bases in low res, and applies the corresponding
bases from the high res dictionary
Low-resolution coding with dictionary-based up-conversion (Schneider et al. 2017)
134. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA |
Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018
141
• Scheme run with overlapping blocks
• Provides sharp reconstruction of structures and edges
• Authors report 2-3% rate gain when used in upsampling for HEVC scalable coding
Low-resolution coding with dictionary-based up-conversion (Schneider et al. 2017)