The visual computing world is moving to an exciting technological era of ultra HD (UHD) and wide-gamut deep colors (WCG). The new Gen9 graphics engine in the 6th generation Intel® Core™ processors is the developers’ platform choice for creating visual excellence in 4K and deep colors. The Gen9 processor graphics offers attractive solutions for high-quality and low-power video scaling that handle UHD and WCG. First, we introduce a hardware fixed-function scaler inside the new SFC (scaling and format conversion) module that provides high quality scaling in low-power platforms. Second, we present a super-resolution scaling solution based on convolutional neural network that can be implemented via OpenCL™ running on the execution units (EUs). We discuss the merits of each solution in different user environments
6. UHD End-to-End Support in
Gen9 Intel® Processor Graphics
UHD Decode, Encode, Display
UHD Content
UHD Display
UHD Capture
UHD Video Scaling Support
• Upscale from HD to UHD
• Downscale from UHD to HD
Display Port* (DP), Embedded DisplayPort* (eDP), Miracast* and other names and brands may be claimed as the property of others
* GPU Accelerated; Media Codec support may not be available on all operating systems and applications.
7. 7
Why UHD Scaling is Different?
SD to HD Scaling
• Pixel Resolution from 720x480 to 1920x1080
• Aspect Ratio from 4:3 to 16:9
• SD Video in Low Quality, often requiring, De-interlace, De-noise, De-blocking, Sharpening, etc.
FHD to 4K UHD Scaling
• Pixel Resolution from 1920x1080 to 3840x2160
• Aspect Ratio stays at 16:9
• FHD Video already in High-Quality with Crisp Details
8. 8
Why UHD Scaling is Different?
SD to HD Scaling
• Pixel Resolution from 720x480 to 1920x1080
• Aspect Ratio from 4:3 to 16:9
• SD Video in Low Quality, often requiring, De-interlace, De-noise, De-blocking, Sharpening, etc.
• 345,600 pixels to 2,073,600 pixels
FHD to 4K UHD Scaling
• Pixel Resolution from 1920x1080 to 3840x2160
• Aspect Ratio stays at 16:9
• FHD Video already in High-Quality with Crisp Details
• 2,073,600 pixels to 8,294,400 pixels
9. 9
UnsliceGeometry
Subslice
Slice Common
FF Media in Unslice
• 6th Generation Intel Core Processor
Graphics on 14nm Process
• Support of Latest APIs
o DirectX* 12/11.3
o OpenCL 2.0
o OpenGL* 4.4
• Scalable uArch Partitioning similar to
5th Generation Intel® Core™ Architecture
o Unslice, Slice, Subslice, etc.
• Improved Design for Better Energy
Efficiency
• Flexible and Finer-grain Power
Management
* Other names and brands may be claimed as the property of others
10. 10
Multi-Format Codec (MFX)
• HEVC Decode
• HEVC Encode
• HEVC 10bit Decode (GPU Accelerated)
• JPEG / MJPEG Decode
• JPEG / MJPEG Encode
• MPEG2 Decode and Encode
• AVC Decode and Encode
• VP8 Decode and Encode
FF Media in UnsliceUnsliceGeometry
Subslice
Slice Common
11. 11
Video Quality Engine (VQE)
• Video Processing and Enhancement
• 16bit per channel processing pipe
• RAW image processing pipe
• De-noise
• De-interlace
• Contrast/Saturation Enhancement
• Skin-tone Detection and Enhancement
• Color Space Conversion (BT2020)
• Color Correction
FF Media in UnsliceUnsliceGeometry
Subslice
Slice Common
12. 12
Scaler and Format Conversion (SFC)
• Dedicated Media FF HW
• Advanced Video Scaler (AVS)
• Sharpness Enhancement
• Color Space Conversion
• Chroma Sampling
• Rotation and other Format Conversions
Media Sampler
• Video Motion Estimation (VME)
• Advanced Video Scaler (AVS)
• Sharpness Enhancement
FF Media in UnsliceUnsliceGeometry
Subslice
Slice Common
13. SFC (Scaler and Format Converter)
Low-Power UHD Video Playback
• New SFC HW pipe is added to deliver Ultra Low Power media playback experience
• SFC is connected inline (without memory read/write) to MFX (video decode) and VQE (video processing)
14. 14
Video Decode Scaling Display (or Encode)
MFX
Video Decode
Media Sampler
AVS
VQE
Video Enhancement
MFX
Video Decode
SFC AVS
VD-SFC (Video Decode SFC)
VQE
Video Enhancement
MFX
Video Encode
MFX
Video Encode
SFC AVS Example #1
GEN8 without SFC
GEN9 with SFC
memory
read/write
memory
read/write
memory
read/write
15. 15
SFC AVS Example #2
Video Quality Enhancement Scaling Display (or Encode)
MFX
Video Decode
VQE
Video Enhancement
Media Sampler
AVS
MFX
Video Decode
VQE
Video Enhancement
SFC AVS
VE-SFC (Video Enhance SFC)
MFX
Video Encode
MFX
Video Encode
GEN8 without SFC
GEN9 with SFC
memory
read/write
memory
read/write
memory
read/write
16. SFC (Scaler and Format Converter)
Low-Power UHD Video Playback
• New SFC HW pipe is added to deliver Ultra Low Power media playback experience
• SFC is connected inline (without memory read/write) to MFX (video decode) and VQE (video processing)
SFC pipeline delivers many benefits:
• Inline Connection: Reduced bandwidth and power consumption
• SFC handles scaling, detail enhancement, color space conversion, and other format conversion on the fly
• 12bit Data Path ready for Ultra-HD (UHD), High Dynamic Range (HDR), Wide Color Gamut (WCG)
• Free up EU resources (slice/subslice) from media use cases and power-gated when not used
• SFC can process UHD Video (3840x2160 @ 60fps) operating at power-efficient low-frequency mode
17. 17
AVS (Advanced Video Scaler) in SFC
AVS is a Low-Power Fixed-Function Hardware in SFC
• Real-time video scaling in a 12bits per channel data path
• Consists of a pair of spatial filters, Sharp Filter and Smooth Filter
Adaptive Mode
• The results of the two filters are alpha-blended to generate the output pixel value
• The alpha blending factor, , is computed for each pixel from neighboring pixels
Sharp Filter
Smooth Filter
Blending Factor
Computation +
Input
Pixel
Output
Pixel
Blending Factor
18. 18
AVS Smooth Filter
Reference Ground Truth (1440x960) Smooth Filter (720x480 to 1440x960)
** Blurrier than Reference Ground Truth **
19. 19
AVS Sharp Filter
Reference Ground Truth (1440x960) Sharp Filter (720x480 to 1440x960)
** Similar to Reference Ground Truth **
20. 20
AVS Sharper Filter
Reference Ground Truth (1440x960) Sharper Filter (720x480 to 1440x960)
** Sharper than Reference Ground Truth **
visual artifact
22. 22
Adaptive Mode in AVS
Sharp Filter
• Sharp and Crisp Output on Natural Scenes
• Ringing on Computer Graphics
Smooth Filter
• Blurrier Output on Natural Scenes
• Ringing-free Output on Computer Graphics
Adaptive Mode
• Best of Both Filters possible based on Per-Pixel Adjustment
• Sharp Output on Natural Scenes
• Ringing-free Output on Computer Graphics
26. 26
Adaptive Mode in AVS
Sharp Filter
• Sharp and Crisp Output on Natural Scenes
• Ringing on Computer Graphics
Smooth Filter
• Blurrier Output on Natural Scenes
• Ringing-free Output on Computer Graphics
Adaptive Mode
• Best of Both Filters possible based on Per-Pixel Adjustment
• Sharp Output on Natural Scenes
• Ringing-free Output on Computer Graphics
27. Media Scaler Interface
Interface Video Scaler
Intel® Media Server Studio SDK
https://software.intel.com/en-us/media-sdk
• Microsoft Windows* DXVA SFC AVS (default)
• LibVA (Android/Linux) SFC AVS (default)
macOS* SFC and AVS
27
• Application SW specifies input/output formats, then
o conf.vpp.In.Width, Height, CropX, CropY, CropW, CropH
o conf.vpp.Out.Wdith, Height, CropX, CropY, CropW, CropH
• MSDK configures the video processing pipeline accordingly
* Other names and brands may be claimed as the property of others
32. 32
Neuron
A neuron
• Is a nerve cell in brains, spinal cords, etc.
• Processes and transmits data through electrical/chemical signals
• Can give rise to multiple dendrites, but not more than one axon
• Signals travel from the axon of one neuron to a dendrite
of another (with many exceptions to these rules) via a synapse
• Connects to each other to form neural networks
• A human brain contains about 100 billion neurons
• Each has 5K~100K synaptic connections to other neurons
input signal input signal
dendrites
axon
output signal
axon
terminals
nucleus
cell body
33. 33
Artificial Neuron
• A Neuron has a single Axon and multiple Dendrites
o Dendrites receive incoming electrical signals
o Electrical signal is sent out from an Axon to Dendrites
and 𝑜𝑢𝑡 =
0
1
𝑖𝑓 𝑓 < 0
𝑖𝑓 𝑓 ≥ 0
𝑓 = 𝑏 +
𝑖=0
𝑛
𝑤𝑖 𝑥𝑖
S
x0
xn
b
f
out
w0
wn
x1 w1
.
.
.
.
.
.
input signal input signal
dendrites
axon
output signal
axon
terminals
nucleus
cell body
34. 34
Artificial Neuron – what does it do?
x0 x1 x0 AND x1 x0 NAND x1
0 0 0 1
0 1 0 1
1 0 0 1
1 1 1 0
x0 x1 f out
0 0 3 1
0 1 1 1
1 0 1 1
1 1 -1 0
S
x0
x1
b
f
out
w0
w1
NAND gate is universal for computation - any logic can be built up out of NAND gates
An artificial neuron (perceptron with 2 input) can implement a NAND gate:
• input = (x0, x1)
• weights = (w0, w1) = (-2, -2)
• bias b = 3
• out = 0 if f < 0
1 if f ≥ 0
NAND Gate
Artificial Neuron
and 𝑜𝑢𝑡 =
0
1
𝑖𝑓 𝑓 < 0
𝑖𝑓 𝑓 ≥ 0
𝑓 = 𝑏 +
𝑖=0
𝑛
𝑤𝑖 𝑥𝑖
39. 39
Super-Resolution
Super-Resolution
• The term has been used by many to mean many different things over the years
• We will define what we mean by it in this talk, and then move on
Super-Resolution as Upscaling
• Input = Low-resolution Image (e.g., 1920x1080 RGB picture)
• Output = High-resolution Image (e.g., 3840x2160 RGB picture)
• Super-Resolution Requirements:
o Use a single input image to generate a single output image, i.e., Single-frame (Spatial) SR
o Output image quality is better than traditional scalers based on interpolation (bilinear, bicubic, etc.)
o No visual artifacts are introduced by SR upscaling
40. Publications on CNN-based SR
40
SCN from University of Illinois – Urbana Champaign
1. Image Super-Resolution via Sparse Representation, Huang et al., TIP 2010
2. Coupled Dictionary Training for Image Super-Resolution, Huang et al., TIP 2012
3. Deep Networks for Image Super-Resolution with Sparse Prior, Huang et al., ICCV 2015
4. Self-Tuned Deep Super Resolution, Huang et al., CVPR 2015
5. Robust Single Image Super-Resolution via Deep Networks with Sparse Prior, Huang et al., TIP 2016
SRCNN from The Chinese University of Hong Kong
1. Learning a deep convolutional network for image super-resolution, Tang et al., ECCV 2014
2. Image Super-Resolution using Deep Convolutional Networks, Tang et al., TPAMI 2016
DRCN from Seoul National University
1. Deeply-Recursive Convolutional Network for Image Super-Resolution, Kim et al., CVPR 2016
2. Accurate Image Super-Resolution using Very Deep Convolutional Networks, Kim et al., CVPR 2016
Technische Universität Mϋnchen, Image Super-Resolution with Fast Approximate Convolutional Sparse Coding, Smagt et al., ICONIP 2014
Huaqiao University, Deep Network Cascade for Image Super-Resolution, Chen et al., ECCV 2014
41. Publications on CNN-based SR
41
SCN from University of Illinois – Urbana Champaign
1. Image Super-Resolution via Sparse Representation, Huang et al., TIP 2010
2. Coupled Dictionary Training for Image Super-Resolution, Huang et al., TIP 2012
3. Deep Networks for Image Super-Resolution with Sparse Prior, Huang et al., ICCV 2015
4. Self-Tuned Deep Super Resolution, Huang et al., CVPR 2015
5. Robust Single Image Super-Resolution via Deep Networks with Sparse Prior, Huang et al., TIP 2016
SRCNN from The Chinese University of Hong Kong
1. Learning a deep convolutional network for image super-resolution, Tang et al., ECCV 2014
2. Image Super-Resolution using Deep Convolutional Networks, Tang et al., TPAMI 2016
DRCN from Seoul National University
1. Deeply-Recursive Convolutional Network for Image Super-Resolution, Kim et al., CVPR 2016
2. Accurate Image Super-Resolution using Very Deep Convolutional Networks, Kim et al., CVPR 2016
Technische Universität Mϋnchen, Image Super-Resolution with Fast Approximate Convolutional Sparse Coding, Smagt et al., ICONIP 2014
Huaqiao University, Deep Network Cascade for Image Super-Resolution, Chen et al., ECCV 2014
compared to all SFSR
(CNN-based or not)
solutions
42. From Sparse Coding to CNN-based SR
42
Neuron CNN
Scaling Super Resolution
Sparse Coding
Super Resolution
CNN-based SR
Sparse
Coding
Sparse Coding
Deep Network
43. Sparse Coding
43
• Reconstruct input signal x using a linear combination of basis vectors of a Dictionary D with
sparse coefficients
o x = D ⋅
• where x is an n x 1 input vector
D is an n x m matrix, an overcomplete (m > n) Dictionary with m basis vectors
is an m x 1 sparse code vector
• Sparse = Most of sparse code coefficients in are zero, i.e., is a sparse representation of x
• Optimal sparse code is obtained as = argminz E(x, z) =
1
2
x − 𝐃𝐳 2
2
+ 𝐳 1
Encoder
• Dictionary D
• ISTA/CoD (iterative)
• LSTA/LCoD (approximate)
Input Vector x Sparse Code
44. Sparse Coding Super-Resolution
44
Super-Resolution Reconstruction
• y = Dy ⋅ y y = x Dx ⋅ x = x
3x3 LR
Image Patch y
HR Sparse
Representation x
LR Sparse
Representation y
9x9 HR
Image Patch x
Joint Dictionary
Training:
Iterative
Optimization
using 100,000
random image
patch pairs
Overcomplete
LR Dictionary Dy
(m = 1024)
Overcomplete
HR Dictionary Dx
(m = 1024)
Linear
Combination
Linear
Combination
Dictionary Elements
Dictionary Elements
Sparse Code Encoder
45. 45
SCN (Sparse Coding based Network)
Sparse Coding Super-Resolution Deep Network Super-Resolution
1. Layer #1 (Convolutional Layer H): image patch/feature y is extracted from the LR image Iy with my filters
2. Layer #2 and #3 (Sparse Code Encoder as k-iterations of LISTA network): Sparse code is computed from y
3. Layer #4 (Reconstruction): Sparse code is multiplied with HR Dictionary Dx to reconstruct HR image patch x
4. Layer #5 (Convolutional Layer G): All HR patches x are combined to HR Image Ix
Sparse Code Encoder
Iy LR Image
y LR Image Patch
Sparse Code
x HR Image Patch
Ix HR Image
Fig. 2 from “Robust Single Image Super-Resolution via Deep Networks with Sparse Prior”, IEEE Transactions on Image Processing, Vol. 25. Issue 7, pp 3194-3207, 2016
46. 46
SCN: 5-Layer Deep Network for Super-Resolution
Deep Network Architecture
• 2 Convolutional Layers (H and G) and 3 Layers for Sparse Coding Encoder
• All parameters trained via back-propagation using MSE cost function
• Network learns more complex function beyond the sparse coding model
• Performs better than sparse coding results even with dictionary size reduced from 1024 to 128
Advantages of SCN
• LISTA sub-network to enforce sparse representation, i.e., better interpretation of filter responses
and parameter initialization based on domain knowledge in sparse coding
• Better SR results, faster training speed and smaller model size
Subjective Quality Assessment
• Best Visual Quality against other SFSR solutions (sharper boundaries, richer textures, no ringing)
• Scale ratio is fixed for the network Use a cascade of multiple SCNs + bicubic downscaler
• Cascade of multiple networks is better than a single network trained with a large scale factor
48. 48
Table of Content
PSNR
MSE
Visual
Inspection
Gen9 Intel®
Processor Graphics
Super-Resolution
Scaling
SFC Media HW FF
Advanced Video
Scaler in SFC
Convolutional
Neural Network
Super-Resolution
Scaling using CNN
Compare
49. Capturing LR and HR Test Images
49
1. Camera Capture
• LR: Camera Capture in FHD Mode at 1936x1288, then cropped to 720x480
• HR: Camera Capture in UHD Mode at 3888x2592, then cropped to 1440x960
2. Optical Scanner
• LR: Scan a letter-size printed document in 300dpi Mode at 2478x3228, then cropped to 720x480
• HR: Scan the same printed document in 600dpi Mode at 4956x6456, then cropped to 1440x960
3. Screen Capture (www.intel.com)
• LR: Screen Capture of Intel Website at 100% Zoom, then cropped to 720x480
• HR: Screen Capture of the same Intel Website at 200% Zoom, then cropped to 1440x960
Test Image #1 Test Image #2 Test Image #3
50. SR Test Scenarios
50
Scaling Solutions
• SFC AVS: Gen9 Intel® Processor Graphics Media HW FF SFC AVS in SW Simulation
• SCN: Sparse-Coding Network (SCN) is CNN-based SR from Huang et al.
MATLAB codes and network parameters available in http://www.ifp.illinois.edu/~dingliu2/iccv15/
2x Upscaling for 1920x1080 to 3840x2160
• SFC AVS: 2x
• SCN: 2x
4x Upscaling for 1920x1080 to 7680x4320
• SFC AVS: 4x
• SCN: 2x (SCN) 2x (SCN)
1.3x Upscaling for 1920x1080 to 2560x1440
• SFC AVS: 1.3x
• SCN: 2x (SCN) 0.65x (MATLAB Bicubic)
55. SR Test Results
55
Upscaling Ratio Test 1 Test 2 Test 3
1.3x SFC AVS SFC AVS SCN
2x SFC AVS SFC AVS SCN
4x SFC AVS / SCN SFC AVS SFC AVS
Overall
• SFC AVS and SCN performed well against the ground truth and quite closely to each other in 3 test examples
• SFC AVS seems to have a slight advantage over SCN on these 3 test examples
But, Why...?
• SCN has not been trained on a wide range of non-natural scenes / computer graphics contents
• Test input images are high-quality LR images, but SCN is trained on very blurry LR input images (Gaussian
Blurring + Downsample + Bicubic Upsample)
• Better understanding of CNN architecture, training database, and training strategies is required
56. Summary
56
• Gen9 Intel® Processor Graphics adds a new HW FF called SFC
• SFC AVS provides a high-quality video scaling solution at low-power
• Adaptive mode in AVS combines benefits of smooth and sharp
filters on a per-pixel basis for superior output quality
1 Gen9 Intel®
Processor Graphics
Super-Resolution
Scaling
SFC Media HW FF
Advanced Video
Scaler in SFC
Convolutional
Neural Network
Super-Resolution
Scaling using CNN
Compare
57. Summary
57
• Super-Resolution scaling solutions have
been developed using CNN framework
and presents a great potential for high
quality video scaling
• Gen9 Intel® Processor Graphics adds a new HW FF called SFC
• SFC AVS provides a high-quality video scaling solution at low-power
• Adaptive mode in AVS combines benefits of smooth and sharp
filters on a per-pixel basis for superior output quality
2
Gen9 Intel®
Processor Graphics
Super-Resolution
Scaling
SFC Media HW FF
Advanced Video
Scaler in SFC
Convolutional
Neural Network
Super-Resolution
Scaling using CNN
Compare
58. Summary
58
• Super-Resolution scaling solutions have
been developed using CNN framework
and presents a great potential for high
quality video scaling
• SFC AVS produces very high quality
output that is comparable to current
state-of-the-art CNN-based SR solutions
• CNN-based SR scaling can be further
improved with more intelligent training
and architecture in the future
• Gen9 Intel® Processor Graphics adds a new HW FF called SFC
• SFC AVS provides a high-quality video scaling solution at low-power
• Adaptive mode in AVS combines benefits of smooth and sharp
filters on a per-pixel basis for superior output quality
3
Gen9 Intel®
Processor Graphics
Super-Resolution
Scaling
SFC Media HW FF
Advanced Video
Scaler in SFC
Convolutional
Neural Network
Super-Resolution
Scaling using CNN
Compare
59. Summary
59
• Super-Resolution scaling solutions have
been developed using CNN framework
and presents a great potential for high
quality Super-Resolution scaling
• SFC AVS produces very high quality
output that is comparable to current
state-of-the-art CNN-based SR solutions
• CNN-based SR scaling can be further
improved with more intelligent training
and architecture in the future
• Gen9 Intel® Processor Graphics adds a new HW FF called SFC
• SFC AVS provides a high-quality video scaling solution at low-power
• Adaptive mode in AVS combines benefits of smooth and sharp
filters on a per-pixel basis for superior output quality
• Use Gen9 Intel HW FF Scaler for
Low-Power High-Performance
High-Quality UHD 4K60 Scaling
• Use Gen9 Intel® Processor
Graphics for CNN-based SR
running on openCL for
enhanced UHD picture quality
Gen9 Intel®
Processor Graphics
Super-Resolution
Scaling
SFC Media HW FF
Advanced Video
Scaler in SFC
Convolutional
Neural Network
Super-Resolution
Scaling using CNN
Compare
61. 61
Acknowledgement
Many thanks go to the following individuals from Intel
• Yi-jen Chiu
• Keith Rowe
• Niranjan S Mulay
• Ping Liu
• Furong Zhang
• Wen-fu Kao
• Vidhya Krishnan
• Sungye Kim
• Charles Lingle, Jon Kennedy and other tech reviewers
• Michaelle Gonzalez, Naomi Pitfield, and the SIGGRAPH Team