Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning

Video Coding Enhancements
for HTTP Adaptive Streaming
Using Machine Learning
Ekrem Çetinkaya
Supervisor: Univ.-Prof. DI Dr. Christian Timmerer
Co-Supervisor: Assoc.-Prof. DI Dr. Klaus Schöffmann
07.06.2023
Klagenfurt Am Wörthersee - Austria

Motivation
3
Buffering 20 seconds
to watch 480p video
Buffering 2 seconds
to watch 4K video
2023
2013

Motivation
4
Shooting 10 Minute
YouTube video
Shooting 8 seconds
YouTube Shorts
2023
2013

Motivation
5
Group chat Group chat
2023
2013

Motivation
6
Assistant Assistant
2023
2013

Motivation
7
Expectation Quality Demand
= Need improvement in video
coding

Statistics
8
66 % share in the
entire Internet traffic
SD HD UHD
51% 47% 2%
2016

Statistics
8
66 % share in the
entire Internet traffic
SD HD UHD
22% 57% 21%
Increasing content
resolution
2021
Sandvine. Global Internet Phenomena Report 2023. [Online] Available: https://www.sandvine.com/phenomena. Accessed: 2023-02-13.

Background
9
Increasing video
codec complexity
2003 2013 2020
AVC HEVC VVC
• 16x16 block size
• Quaternary Tree
• 4K Support
• Quaternary Tree
• 8K Support
• Multi-type Tree
• 16K Support
170%
37%
954%
35%
C. Feldmann, “State of Compression Standards - VVC”, 2020, https://bitmovin.com/compression-standards-vvc-2020/
Vanne et.al., “Comparative Rate-Distortion-Complexity Analysis of HEVC and AVC Video Codecs”, TCSVT, 2012

10
Background
Block
Partitioning
Motion
Compensation
Transformation
&
Quantization
Entropy
Coding
Entropy
Decoding
In-Loop
Filtering
Inverse
Transformation &
Inverse
Quantization
Picture Buffer
Partitioning prediction
with ML
Optical flow detection
with ML
Deblocking and
denoising with ML
Super-resolution with
ML
Source Video
Decoded Video
Encoded Video
Inter or Intra
Prediction
Mode and prediction
decision with ML

Research Questions
11
RQ-1
How to efficiently provide multi-bitrate representations over a wide range of resolutions for HAS?
RQ-2
How to improve the performance of video codecs using machine learning?
RQ-3
How to improve the visual quality of videos using machine learning?
RQ-4
How to use machine learning to improve perceptual quality assessment for videos?

Contributions
12
Codec improvement
for multi-bitrate with
machine learning
P1
Codec improvement
for multi-resolution
with machine learning
P2
Studying codec
components to
enhance with machine
learning
P3
Improve visual
quality of HAS videos
with machine
learning
P4
Platform to evaluate
machine learning
based quality
enhancement
P5
Improve visual
quality of videos with
machine learning on
mobile devices
P6
Improve multi-bitrate
light field image
coding quality with
machine learning
P7
Improve perceptual
quality assessment
using machine
learning
P8
RQ-4
RQ-3
RQ-2
RQ-1

Contributions
13
P1
P2
P3
P4
P5
P6
P7
P8
FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning – IEEE VCIP - 2020
Fast multi-resolution and multi-rate encoding for HTTP adaptive streaming using machine learning – IEEE OJSP - 2021
CTU Depth Decision Algorithms for HEVC: A Survey – SP:IC - 2021
Super-Resolution Based Bitrate Adaptation for HTTP Adaptive Streaming for mobile devices – ACM MHV - 2022
MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks – MMM - 2022
LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices – IEEE IVMSP - 2022
LFC-SASR: Light Field Coding Using Spatial and Angular Super-Resolution – IEEE ICME - 2022
BQ-ViT: Blind Visual Quality Assessment Using Vision Transformers – IEEE Access - 2023

Literature Review
16
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and
systems for Video Technology 28.1 (2016): 143-157.
2 B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International
Conference on Multimedia and Expo (ICME), San Diego, CA, USA, July 2018.
3 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression
Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358

Literature Review
17
No improvement for parallel encoding
Block partitioning takes too long

Literature Review
18
Bottleneck

Literature Review
19

Literature Review
20

Literature Review
20
Brute force
Single CU 83522 possibility

Literature Review
21

Proposed Method
22
a logo for a multi-rate video encoding
application named “FaME-ML”
FaME-ML
• Multi-rate encoding
• CNN to guide depth decisions
• Focus on parallel encoding
a logo for a multi-resolution video
encoding application named “FaRes-ML”
FaRes-ML
• Multi-resolution encoding
• Extended CNN structure
• More features & in-depth evaluation
P1 P2

Proposed Method
23
P1
FaME-ML: Fast Multirate Encoding for HTTP
Adaptive Streaming Using Machine Learning
Çetinkaya, E., Amirpour, H., Timmerer, C., & Ghanbari, M.
2020 - IEEE International Conference on Visual Communications and Image Processing (VCIP)

Proposed Method
24
CNN Guided Solution
Utilize reference encoding information
Speed up Dependent
Representations
Improve Parallel
encoding
FaME-ML
P1

Proposed Method
25
Quality
Resolution
Input Video
Encoder
Multi-rate
Multi-resolution
FaME-ML
P1

Proposed Method
25
Quality
Input Video
Encoder
Multi-rate
FaME-ML
P1

Proposed Method
26
Quality
Input Video
Encoder
CNN
Reference
FaME-ML
P1

Proposed Method
27
1 @
64x64
1 @
32x32
1 @
32x32
4 @
32x32
8 @
32x32
4 @
32x32
4 @
32x32
32 @
16x16
16 @
32x32
64 @
8x8
128 @
4x4
256 @
2x2
256
64
2
14
P(S)
P(N)
256
64
2
Texture Processing CNN
Conv Block
Input (Y, U, V)
Concatenated
YUV
FC Layer Softmax Feature
Vector
FaME-ML
P1

Proposed Method
28
FRD FV FMV FD FQP FPU
5 5 1 1 1 1
14
RD Cost Variance Motion Vectors
Depth Level Frame QP PU Decision
FaME-ML
P1

Proposed Method
29
a logo for a multi-rate video encoding
application named “FaME-ML”
FaME-ML
• Multi-rate encoding
• CNN to guide depth decisions
• Focus on parallel encoding
a logo for a multi-resolution video
encoding application named “FaRes-ML”
FaRes-ML
• Multi-resolution encoding
• Extended CNN structure
• More features & in-depth evaluation
P1 P2

Proposed Method
30
P2
FaRes-ML: Fast Multi-Resolution and Multi-Rate Encoding
for HTTP Adaptive Streaming Using Machine Learning
Çetinkaya, E., Amirpour, H., Timmerer, C., & Ghanbari, M.
2021 - IEEE Open Journal of Signal Processing

Proposed Method
31
Extended Feature
Set
2D representation of features
Improved CNN
Utilize more features
Feature processing component
Target both parallel
and serial encoding
FaRes-ML
P2

Proposed Method
32
Quality
Resolution
Input Video
CNN
Encoder
Reference
FaRes-ML
P2

Results
33
1,00
0,72
0,59
0,51
0,45
0,88
0,64
0,59
0,51
0,45
0,51
0,55
0,59
0,51
0,45
0,00
0,20
0,40
0,60
0,80
1,00
QP22 QP26 QP30 QP34 QP38
HM 16.21 Lower Bound FaME-ML
Normalized
encoding
time
FaME-ML
P1

Results
34
Normalized
encoding
time
1,00
0,82
0,70
0,63
0,28
0,22
0,19
0,17
0,08
0,06
0,05
0,04
0,95
0,80
0,68
0,61
0,24
0,21
0,17 0,16
0,08
0,06
0,05
0,04
0,72
0,48
0,25
0,18
0,21
0,14
0,08
0,05
0,08
0,04
0,02 0,02
0,00
0,20
0,40
0,60
0,80
1,00
QP22 QP27 QP32 QP37 QP22 QP27 QP32 QP37 QP22 QP27 QP32 QP37
2160p 1080p 540p
HM 16.21 Lower Bound FaRes-ML
FaRes-ML
P2

Results
35
FaRes-ML
P2
Resolution
3840x2160 52.53 % 3.02 % 3.16 %
1920x1080 49.63 % 2.30 % 2.46 %
960x540 36.65 % 0.83 % 1.48 %
Average 46.27 % 2.05 % 2.36 %
∆𝑻 𝑩𝑫𝑹𝑷𝑺𝑵𝑹 𝑩𝑫𝑹𝑽𝑴𝑨𝑭

Discussion
36
FaME-ML
• Fast multi-rate encoding scheme
guided by a CNN
• 49% speed-up in parallel
encoding with 0.9% BD-Rate
increase
FaRes-ML
• Fast multi-resolution encoding
scheme guided by a CNN
• 47% speed-up in overall encoding
and 28% in parallel encoding with
2% BD-Rate increase
P1 P2

Motivation
37
70% of YouTube watch time
is from mobile devices 1
1 “YouTube by the Numbers: Stats, Demographics & Fun Facts”, Omnicore
2 “Experience Shapes Mobile Customer Loyalty”, Ericsson.
26% of smartphone users encounter
video streaming problems every day 2

Motivation
38
https://ai-benchmark.com/ranking.html. Accessed: 02 February 2023
1295
655
220
170
Galaxy S22 (2022)
Galaxy S21 (2021)
Galaxy S20 (2020)
Galaxy S10 (2019)
ML Execution Score

Literature Review
40
Super-resolution networks are designed
for powerful hardware
Mobile solutions for HAS are limited

Proposed Method
41
a logo for a super-resolution based ABR
algorithm
SR-ABR
• ABR algorithm for SR application
• Lightweight SR network for fast
execution on mobile devices
a logo for a lightweight super-resolution
network
LiDeR
• Lightweight SR network
• Real-time video processing on
mobile devices
P4 P6

Proposed Method
42
P4
Super-resolution based bitrate adaptation for HTTP
adaptive streaming for mobile devices
Nguyen, M., Çetinkaya, E., Hellwagner, H., & Timmerer, C.
2022 - ACM Mile-High Video Conference

Proposed Method
43
HR
LR
Device Info
X2 X3 X4
X2 X3 X4
Buffer Network
ABR
SR-ABR
P4

Proposed Method
44
SR-ABR
P4
Throughput
Cost
Buffer
Cost
Quality
Cost
SR-Quality
Cost

Proposed Method
45
a logo for a super-resolution based ABR
algorithm
SR-ABR
• ABR algorithm for SR application
• Lightweight SR network for fast
execution on mobile devices
a logo for a lightweight super-resolution
network
LiDeR
• Lightweight SR network
• Real-time video processing on
mobile devices
P4 P6

Proposed Method
46
P6
LiDeR: Lightweight Dense Residual Network for Video
Super-Resolution on Mobile Devices
Çetinkaya, E., Nguyen, M., & Timmerer, C.
2022 - IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop

Proposed Method
47
DenseRes
DenseRes
Convolution
ClipReLU
Pixel
Shuffle
LR Frames HR Frames

Proposed Method
48
DenseRes
Convolution
ReLU
Convolution
ReLU
Add
Convolution
ReLU
Add
Convolution
ReLU
Add
Convolution
1x1

Results
49
24
30
36
5
9
14
X2 X3 X4
Execution Speed (FPS)
SR-ABR CARN-M
90,93
52,83
39
91,13
54,11
41,56
82,1
42,91
24,32
X2 X3 X4
VMAF
SR-ABR CARN-M Bilinear
SR-ABR
P4
CARN-M = Ahn, N., Kang, B., & Sohn, K. A. (2018). Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European
conference on computer vision (ECCV) (pp. 252-268).

Results
50
SR-ABR
P4
3098
1818
2670
1738
Average Bitrate (kbps)
BBA-0 EP SQUAD SR-ABR
3,54
4,051
3,35
4,09
QoE Score (ITU.T P.1203)
0,029
0,045
0,032
0,049
VMAF / Bitrate (1 kbps)
BBA-0 = T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson. A buffer-based approach to rate adaptation: Evidence from a large video streaming service. In
ACM SIGCOMM Computer Communication Review, volume 44, pages 187–198. ACM, 2014.
EP = https://github.com/google/ExoPlayer
SQUAD = C. Wang, A. Rizk, and M. Zink. SQUAD: A spectrum-based quality adaptation for dynamic adaptive streaming over HTTP. In Proceedings of the 7th
International Conference on Multimedia Systems, pages 1–12, 2016.

Results
51
LiDeR
P6
139
45
52
4
X4 (180P – 720P)
LiDeR FSRCNN ESPCN EDSR
60
14
17
2
X2 (360P – 720P)
LiDeR FSRCNN ESPCN EDSR
FSRCNN = Dong, Chao, Chen Change Loy, and Xiaoou Tang. "Accelerating the super-resolution convolutional neural network." European conference on computer vision. Springer, Cham, 2016.
ESPCN = Shi, Wenzhe, et al. "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
EDSR = Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.

Results
52
LiDeR
P6
FSRCNN = Dong, Chao, Chen Change Loy, and Xiaoou Tang. "Accelerating the super-resolution convolutional neural network." European conference on computer vision. Springer, Cham, 2016.
ESPCN = Shi, Wenzhe, et al. "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
EDSR = Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.
25,77
29,74
27,51
33,05
27,8
33,47
27,85
33,41
27,9
34,08
X4 (180P - 720P) X2 (360P - 720P)
PSNR
Bilinear SR-ABR FSRCNN ESPCN EDSR
0,699
0,862
0,769
0,928
0,776
0,933
0,778
0,934
0,782
0,939
X4 (180P - 720P) X2 (360P - 720P)
SSIM
Bilinear SR-ABR FSRCNN ESPCN EDSR

Discussion
53
SR-ABR
• ABR algorithm that leverages SR to
improve quality
• Up to 44 % bandwidth reduction
while providing 15 % higher QoE
• SR Network for mobile devices
• Up to 4.8 X speed up while
improving VMAF by 60 %
LiDeR
• Lightweight neural network for
mobile video super-resolution
• Real-time (> 60 FPS) execution
on mobile devices
• On-par visual quality with SoTA SR
networks
P4 P6

54
Conclusion
FaME-ML
Fast multi-rate
encoding scheme
guided by a CNN
P1 FaRes-ML
Fast multi-resolution
encoding scheme
guided by a CNN
P2 SR-ABR
ABR algorithm that
leverages SR to
improve quality
P4 LiDeR
Lightweight neural
network for mobile
video super-
resolution
P6

55
Conclusion
FaME-ML
49% speed-up in
parallel encoding
with 0.9% BD-Rate
increase
P1 FaRes-ML
47% speed-up in
overall encoding and
28% with 2% BD-Rate
increase
P2 LiDeR
Real-time (> 60 FPS)
execution on mobile
devices
P6
SR-ABR
Up to 44 %
bandwidth reduction
while providing 15 %
higher QoE
P4

Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning

Ähnlich wie Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning