Thesis Presentation - Ekrem Çetinkaya
Video is evolving into a crucial tool as daily lives are increasingly centered around visual communication. The demand for better video content is constantly rising, from entertainment to business meetings. The delivery of video content to users is of utmost significance. HTTP adaptive streaming, in which the video content adjusts to the changing network circumstances, has become the de-facto method for delivering internet video. As video technology continues to advance, it presents a number of challenges, one of which is the large amount of data required to describe a video accurately. To address this issue, it is necessary to have a powerful video encoding tool. Historically, these efforts have relied on hand-crafted tools and heuristics. However, with the recent advances in machine learning, there has been increasing exploration into using these techniques to enhance video coding performance. This thesis proposes eight contributions that enhance video coding performance for HTTP adaptive streaming using machine learning.
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
1. Video Coding Enhancements
for HTTP Adaptive Streaming
Using Machine Learning
Ekrem Çetinkaya
Supervisor: Univ.-Prof. DI Dr. Christian Timmerer
Co-Supervisor: Assoc.-Prof. DI Dr. Klaus Schöffmann
07.06.2023
Klagenfurt Am Wörthersee - Austria
13. Research Questions
11
RQ-1
How to efficiently provide multi-bitrate representations over a wide range of resolutions for HAS?
RQ-2
How to improve the performance of video codecs using machine learning?
RQ-3
How to improve the visual quality of videos using machine learning?
RQ-4
How to use machine learning to improve perceptual quality assessment for videos?
14. Contributions
12
Codec improvement
for multi-bitrate with
machine learning
P1
Codec improvement
for multi-resolution
with machine learning
P2
Studying codec
components to
enhance with machine
learning
P3
Improve visual
quality of HAS videos
with machine
learning
P4
Platform to evaluate
machine learning
based quality
enhancement
P5
Improve visual
quality of videos with
machine learning on
mobile devices
P6
Improve multi-bitrate
light field image
coding quality with
machine learning
P7
Improve perceptual
quality assessment
using machine
learning
P8
RQ-4
RQ-3
RQ-2
RQ-1
15. Contributions
13
P1
P2
P3
P4
P5
P6
P7
P8
FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning – IEEE VCIP - 2020
Fast multi-resolution and multi-rate encoding for HTTP adaptive streaming using machine learning – IEEE OJSP - 2021
CTU Depth Decision Algorithms for HEVC: A Survey – SP:IC - 2021
Super-Resolution Based Bitrate Adaptation for HTTP Adaptive Streaming for mobile devices – ACM MHV - 2022
MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks – MMM - 2022
LiDeR: Lightweight Dense Residual Network for Video Super-Resolution on Mobile Devices – IEEE IVMSP - 2022
LFC-SASR: Light Field Coding Using Spatial and Angular Super-Resolution – IEEE ICME - 2022
BQ-ViT: Blind Visual Quality Assessment Using Vision Transformers – IEEE Access - 2023
20. Literature Review
16
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and
systems for Video Technology 28.1 (2016): 143-157.
2 B. Guo, Y. Han, J. Wen, "Fast Block Structure Determination in AV1-based Multiple Resolutions Video Encoding," in 2018 IEEE International
Conference on Multimedia and Expo (ICME), San Diego, CA, USA, July 2018.
3 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression
Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
27. Proposed Method
22
a logo for a multi-rate video encoding
application named “FaME-ML”
FaME-ML
• Multi-rate encoding
• CNN to guide depth decisions
• Focus on parallel encoding
a logo for a multi-resolution video
encoding application named “FaRes-ML”
FaRes-ML
• Multi-resolution encoding
• Extended CNN structure
• More features & in-depth evaluation
P1 P2
28. Proposed Method
23
P1
FaME-ML: Fast Multirate Encoding for HTTP
Adaptive Streaming Using Machine Learning
Çetinkaya, E., Amirpour, H., Timmerer, C., & Ghanbari, M.
2020 - IEEE International Conference on Visual Communications and Image Processing (VCIP)
29. Proposed Method
24
CNN Guided Solution
Utilize reference encoding information
Speed up Dependent
Representations
Improve Parallel
encoding
FaME-ML
P1
35. Proposed Method
29
a logo for a multi-rate video encoding
application named “FaME-ML”
FaME-ML
• Multi-rate encoding
• CNN to guide depth decisions
• Focus on parallel encoding
a logo for a multi-resolution video
encoding application named “FaRes-ML”
FaRes-ML
• Multi-resolution encoding
• Extended CNN structure
• More features & in-depth evaluation
P1 P2
36. Proposed Method
30
P2
FaRes-ML: Fast Multi-Resolution and Multi-Rate Encoding
for HTTP Adaptive Streaming Using Machine Learning
Çetinkaya, E., Amirpour, H., Timmerer, C., & Ghanbari, M.
2021 - IEEE Open Journal of Signal Processing
37. Proposed Method
31
Extended Feature
Set
2D representation of features
Improved CNN
Utilize more features
Feature processing component
Target both parallel
and serial encoding
FaRes-ML
P2
42. Discussion
36
FaME-ML
• Fast multi-rate encoding scheme
guided by a CNN
• 49% speed-up in parallel
encoding with 0.9% BD-Rate
increase
FaRes-ML
• Fast multi-resolution encoding
scheme guided by a CNN
• 47% speed-up in overall encoding
and 28% in parallel encoding with
2% BD-Rate increase
P1 P2
44. Motivation
37
70% of YouTube watch time
is from mobile devices 1
1 “YouTube by the Numbers: Stats, Demographics & Fun Facts”, Omnicore
2 “Experience Shapes Mobile Customer Loyalty”, Ericsson.
26% of smartphone users encounter
video streaming problems every day 2
48. Proposed Method
41
a logo for a super-resolution based ABR
algorithm
SR-ABR
• ABR algorithm for SR application
• Lightweight SR network for fast
execution on mobile devices
a logo for a lightweight super-resolution
network
LiDeR
• Lightweight SR network
• Real-time video processing on
mobile devices
P4 P6
49. Proposed Method
42
P4
Super-resolution based bitrate adaptation for HTTP
adaptive streaming for mobile devices
Nguyen, M., Çetinkaya, E., Hellwagner, H., & Timmerer, C.
2022 - ACM Mile-High Video Conference
52. Proposed Method
45
a logo for a super-resolution based ABR
algorithm
SR-ABR
• ABR algorithm for SR application
• Lightweight SR network for fast
execution on mobile devices
a logo for a lightweight super-resolution
network
LiDeR
• Lightweight SR network
• Real-time video processing on
mobile devices
P4 P6
53. Proposed Method
46
P6
LiDeR: Lightweight Dense Residual Network for Video
Super-Resolution on Mobile Devices
Çetinkaya, E., Nguyen, M., & Timmerer, C.
2022 - IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop
56. Results
49
24
30
36
5
9
14
X2 X3 X4
Execution Speed (FPS)
SR-ABR CARN-M
90,93
52,83
39
91,13
54,11
41,56
82,1
42,91
24,32
X2 X3 X4
VMAF
SR-ABR CARN-M Bilinear
SR-ABR
P4
CARN-M = Ahn, N., Kang, B., & Sohn, K. A. (2018). Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European
conference on computer vision (ECCV) (pp. 252-268).
57. Results
50
SR-ABR
P4
3098
1818
2670
1738
Average Bitrate (kbps)
BBA-0 EP SQUAD SR-ABR
3,54
4,051
3,35
4,09
QoE Score (ITU.T P.1203)
BBA-0 EP SQUAD SR-ABR
0,029
0,045
0,032
0,049
VMAF / Bitrate (1 kbps)
BBA-0 EP SQUAD SR-ABR
BBA-0 = T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson. A buffer-based approach to rate adaptation: Evidence from a large video streaming service. In
ACM SIGCOMM Computer Communication Review, volume 44, pages 187–198. ACM, 2014.
EP = https://github.com/google/ExoPlayer
SQUAD = C. Wang, A. Rizk, and M. Zink. SQUAD: A spectrum-based quality adaptation for dynamic adaptive streaming over HTTP. In Proceedings of the 7th
International Conference on Multimedia Systems, pages 1–12, 2016.
58. Results
51
LiDeR
P6
139
45
52
4
X4 (180P – 720P)
Execution Speed (FPS)
LiDeR FSRCNN ESPCN EDSR
60
14
17
2
X2 (360P – 720P)
Execution Speed (FPS)
LiDeR FSRCNN ESPCN EDSR
FSRCNN = Dong, Chao, Chen Change Loy, and Xiaoou Tang. "Accelerating the super-resolution convolutional neural network." European conference on computer vision. Springer, Cham, 2016.
ESPCN = Shi, Wenzhe, et al. "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
EDSR = Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.
59. Results
52
LiDeR
P6
FSRCNN = Dong, Chao, Chen Change Loy, and Xiaoou Tang. "Accelerating the super-resolution convolutional neural network." European conference on computer vision. Springer, Cham, 2016.
ESPCN = Shi, Wenzhe, et al. "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
EDSR = Lim, Bee, et al. "Enhanced deep residual networks for single image super-resolution." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.
25,77
29,74
27,51
33,05
27,8
33,47
27,85
33,41
27,9
34,08
X4 (180P - 720P) X2 (360P - 720P)
PSNR
Bilinear SR-ABR FSRCNN ESPCN EDSR
0,699
0,862
0,769
0,928
0,776
0,933
0,778
0,934
0,782
0,939
X4 (180P - 720P) X2 (360P - 720P)
SSIM
Bilinear SR-ABR FSRCNN ESPCN EDSR
60. Discussion
53
SR-ABR
• ABR algorithm that leverages SR to
improve quality
• Up to 44 % bandwidth reduction
while providing 15 % higher QoE
• SR Network for mobile devices
• Up to 4.8 X speed up while
improving VMAF by 60 %
LiDeR
• Lightweight neural network for
mobile video super-resolution
• Real-time (> 60 FPS) execution
on mobile devices
• On-par visual quality with SoTA SR
networks
P4 P6
62. 54
Conclusion
FaME-ML
Fast multi-rate
encoding scheme
guided by a CNN
P1 FaRes-ML
Fast multi-resolution
encoding scheme
guided by a CNN
P2 SR-ABR
ABR algorithm that
leverages SR to
improve quality
P4 LiDeR
Lightweight neural
network for mobile
video super-
resolution
P6
63. 55
Conclusion
FaME-ML
49% speed-up in
parallel encoding
with 0.9% BD-Rate
increase
P1 FaRes-ML
47% speed-up in
overall encoding and
28% with 2% BD-Rate
increase
P2 LiDeR
Real-time (> 60 FPS)
execution on mobile
devices
P6
SR-ABR
Up to 44 %
bandwidth reduction
while providing 15 %
higher QoE
P4