SlideShare a Scribd company logo
1 of 15
Download to read offline
Paralleling Variable Block Size Motion
Estimation of HEVC On
CPU plus GPU Platform
Xiangwen Wang1, Li Song2, Yanan Zhao2, Min Chen1
1Shanghai University of Electric Power
2 Shanghai Jiao Tong University
Outline
Introduction
The proposed paralleling VBSME of HEVC for
GPU+CPU
Preliminary result and Future work
Introduction
HEVC is the newest video coding standard introduced by ITU-T VCEG and
ISO/IEC MEPG.
Compared with H.264/AVC, HEVC decreases the bitrate by 50% percent
on average while maintaning the same visual quality.
BQMall_832x480. Left: HEVC 1.5Mbps, right: x264 3.0Mbps
Introduction
However, encoding complexity is several times more complex than H.264.
• RDO: iterate over all mode and partition combinations to dicide the best coding
information
• RDoQ: iterate over many QP candidates for each block
• Intra: prediction modes increased to 35 for luma
• SAO: works pixel by pixel
• Quadtree structure: bigger block sizes and numerous partition manners
• Some other highly computational modules ...
As a result, the traditional method which performs the encoding in a
sequential way could no longer provide a real-time demand, especially
when it comes to HD (1920x1080) and UHD (3840x2160) videos.
Parallellism in the encoding procedure must be extensively utilized.
Overview OF VBSME In HEVC
• Three independent block concepts
• CU - Coding Unit
• PU - Prediction Unit
• TU - Transform Unit
• The total number of allowed PU
size is 12 (from 64x64 to 4x8/8x4)
 up to 425 times ME for one
64x64 CTU
(5+4x5+16x5+64x5 = 425)
CU size and PU partition structure
CU PU
Depth1: 64x64
Depth2: 32x32
Depth3: 16x16
Depth4:8x8
Two stages:
• ME to select best MV for candidate PUs
• CU depth and PU partition mode decision
MV selection criterion for each PU:
Jpred,SAD =SA(T)D + λpred * Rpred
CU sizes and PU partition mode decision:
Jmode =SSD + λmode * Rmode,
To calculate the Jmode for each PU, reconstruction and
entropy coding of all syntaxes are necessary, the
complexity is beyond the computational capability of
common computers for real applications.
Mode Decision with VBSME In HM
The proposed parallel encoding framework
Copy to GPU fEnc
Interpolate&
Border Pad
fEnc
ME Kernel
64x64~8x8
PU MVs
Half/quarter
pixel img
buff
Half/quarter
pixel img
buff
fRec
Encode &
Reconstruct
64x64~8x8
PU MVs
MC
CPU GPU
GPU-MEMCPU-MEM
Read Img
Launch new
LCU Line ME
Sync to
last LCU Line
ME
Launch
Interpolate
Entropy
coding
Sync to
Interpolate
All LCU line?
N
Mode
Decision
fRec
Next frame
frame
loop
LCU
line
loop
Lunch ME
for one CTU
line
Fast PU partition mode decsion scheme
SKIP ?
Half/quarter
pixel img
buff
CBF_fast ?
Fast CU
partition
64x64~8x8
PU MVs
PART_2Nx2N?
CU partition
or next CU
N
Y
Y
CU depth==4
CU_idx==4?
Y
RD cost
Calculate
CU depth
= 4
Sync to
last LCU Line
ME
CPU-MEM
MC
The MV and residual information are employed for PU
partition decision
Two edge feature parameters:
   
 
00 01 10 11
_
8
S S S S
V
QP stepN
 
   
  
 
 
   
 
00 10 01 11
_
8
S S S S
H
QP stepN
 
   
  
 
 
If (H==V &&H!=0)
PART_2Nx2N
Elseif (H==V and H==0)
PART_NxN
Elseif H>V
PART_Nx2N
else
PART_2NxN
Parallel realization of VBSME on CUDA
8x8 block size
SAD calculation
16x16 block size
Jpred calculation
Integer Pixel Jpred
Comparison 16x16
Fractional pixel
MV refinement
Variable block size
Jpred generation
Variable block size
Jpred calculation
Integer Pixel Jpred
Comparison
four 16x16 lines
Variable block size ME
The MV selection criterion is as
follow:
Jpred =SAD +λpred * DMV = SAD +λpred *(MV_C - PMV)
where MV_C: current point MV,
PMV: next slice
PMV for MV cost calculate
MV0 MV1 MV2
MV3
MV4
PMV=medium(MV0, MV1, MV2, MV3, MV4)
 one CTU(64x64) line is divided into four 16x16 block lines;
 The ME process of each 16x16 line is done by GPU sequentially;
 The MVs of 16x16 block size are used as the MV predictions for
all other block sizes.
Variable block size SAD Generation
8x8
0 1 2 3 4 5 6 7 63626160
16x16
32x32
64x64
Variable block size SAD Generation on CUDA
Experimental Results
Platform:Z620 = NVIDIA Tesla C2050+i7@2.6G, with Win7 OS
The CUDA driver version of the GPU is 5.0 and the CUDA
Capability version number is 2.0.
The search range is 64x64 with the full search strategy for IMV and
24 fractional-pixel positions around the IMV.
sequence CPU(fps) GPU(fps)
Speedup
ratio
Traffic_2560x
1600_crop
0.21 23.77 113.2
ParkScreen_1
920x1080_24
0.69 77.76 112.7
The speed-up ratio is about 113 times
Experimental Results:RD comparison
Note1, The propose algorithm is realization based on X265 encoder, a first open source
encoder implementation of HEVC "x265 project, http://code.google.com/p/x265/";
Note2, the Cactus_Proposed implies the RD curve generated by the X265 encoder with
the proposed algorithm.
Conclusion
We present a parallel friendly VBSME(variable block
size motion estimation) scheme which make full of
available computation resources from CPU and GPU
respectively
Preliminary results are reported with speedup ratio over
100 times compared to single thread CPU only solution
We will continue to exploit parallelism, targeting a
4K@30fps realtime HEVC encoder over multicore CPU
and GPGPU platform.
Thanks!

More Related Content

What's hot

Modul 2 gsm air interface
Modul 2   gsm air interfaceModul 2   gsm air interface
Modul 2 gsm air interfaceWijaya Kusuma
 
Pci mod3,6,30 analysis and auto optimization
Pci mod3,6,30 analysis and auto optimizationPci mod3,6,30 analysis and auto optimization
Pci mod3,6,30 analysis and auto optimizationShuangquan Lei
 
Packet Reordering Response for MPTCP under Wireless Heterogeneous Environment
Packet Reordering Response for MPTCP under Wireless Heterogeneous EnvironmentPacket Reordering Response for MPTCP under Wireless Heterogeneous Environment
Packet Reordering Response for MPTCP under Wireless Heterogeneous EnvironmentCommunication Systems & Networks
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportLinaro
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Pci planning-for-lte
Pci planning-for-ltePci planning-for-lte
Pci planning-for-ltechelebix
 
LOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNS
LOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNSLOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNS
LOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNScsandit
 
12 multiple access
12 multiple access12 multiple access
12 multiple accessbheemsain
 
ds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overviewds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overviewAngela Suen
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
 
iMinds' course: preceding exercises
iMinds' course: preceding exercisesiMinds' course: preceding exercises
iMinds' course: preceding exercisesFORGE project
 

What's hot (18)

MIMO Testbed presentation (DSPeR'2005)
MIMO Testbed presentation (DSPeR'2005)MIMO Testbed presentation (DSPeR'2005)
MIMO Testbed presentation (DSPeR'2005)
 
Stbc.pptx(1)
Stbc.pptx(1)Stbc.pptx(1)
Stbc.pptx(1)
 
Modul 2 gsm air interface
Modul 2   gsm air interfaceModul 2   gsm air interface
Modul 2 gsm air interface
 
Cnq1
Cnq1Cnq1
Cnq1
 
Pci mod3,6,30 analysis and auto optimization
Pci mod3,6,30 analysis and auto optimizationPci mod3,6,30 analysis and auto optimization
Pci mod3,6,30 analysis and auto optimization
 
Packet Reordering Response for MPTCP under Wireless Heterogeneous Environment
Packet Reordering Response for MPTCP under Wireless Heterogeneous EnvironmentPacket Reordering Response for MPTCP under Wireless Heterogeneous Environment
Packet Reordering Response for MPTCP under Wireless Heterogeneous Environment
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Pci planning-for-lte
Pci planning-for-ltePci planning-for-lte
Pci planning-for-lte
 
LOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNS
LOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNSLOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNS
LOW POWER-AREA GDI & PTL TECHNIQUES BASED FULL ADDER DESIGNS
 
LTE Vs. 3G
LTE Vs. 3GLTE Vs. 3G
LTE Vs. 3G
 
Opal rt e phaso rsim_2013
Opal rt e phaso rsim_2013Opal rt e phaso rsim_2013
Opal rt e phaso rsim_2013
 
E1 To Stm
E1 To Stm E1 To Stm
E1 To Stm
 
12 multiple access
12 multiple access12 multiple access
12 multiple access
 
ds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overviewds894-zynq-ultrascale-plus-overview
ds894-zynq-ultrascale-plus-overview
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
 
iMinds' course: preceding exercises
iMinds' course: preceding exercisesiMinds' course: preceding exercises
iMinds' course: preceding exercises
 
BSS Dimensioning
BSS DimensioningBSS Dimensioning
BSS Dimensioning
 

Viewers also liked

Howen CCTV System worldwide Application-201309
Howen CCTV System worldwide Application-201309Howen CCTV System worldwide Application-201309
Howen CCTV System worldwide Application-201309Berry Gao
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Chris Huang
 
Block Matching Project
Block Matching ProjectBlock Matching Project
Block Matching Projectdswazalwar
 
Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clusteringSahil Biswas
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...Journal For Research
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
"Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin..."Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin...Edge AI and Vision Alliance
 
Integrating Physical And Logical Security
Integrating Physical And Logical SecurityIntegrating Physical And Logical Security
Integrating Physical And Logical SecurityJorge Sebastiao
 

Viewers also liked (11)

Howen CCTV System worldwide Application-201309
Howen CCTV System worldwide Application-201309Howen CCTV System worldwide Application-201309
Howen CCTV System worldwide Application-201309
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
 
Content based video summarization into object maps
Content based video summarization into object mapsContent based video summarization into object maps
Content based video summarization into object maps
 
Block Matching Project
Block Matching ProjectBlock Matching Project
Block Matching Project
 
Perceptual Video Coding
Perceptual Video Coding Perceptual Video Coding
Perceptual Video Coding
 
Keyframe-based Video Summarization Designer
Keyframe-based Video Summarization DesignerKeyframe-based Video Summarization Designer
Keyframe-based Video Summarization Designer
 
Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clustering
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
"Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin..."Image and Video Summarization," a Presentation from the University of Washin...
"Image and Video Summarization," a Presentation from the University of Washin...
 
Integrating Physical And Logical Security
Integrating Physical And Logical SecurityIntegrating Physical And Logical Security
Integrating Physical And Logical Security
 

Similar to Paralleling Variable Block Size Motion Estimation of HEVC On CPU plus GPU Platform

Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by Videoguy
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by Videoguy
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by Videoguy
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by Videoguy
 
A Review on Image Compression in Parallel using CUDA
A Review on Image Compression in Parallel using CUDAA Review on Image Compression in Parallel using CUDA
A Review on Image Compression in Parallel using CUDAIJERD Editor
 
SOC Application Studies: Image Compression
SOC Application Studies: Image CompressionSOC Application Studies: Image Compression
SOC Application Studies: Image CompressionA B Shinde
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Codemotion
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processingideas2ignite
 
FPGA based JPEG Encoder
FPGA based JPEG EncoderFPGA based JPEG Encoder
FPGA based JPEG EncoderIJERA Editor
 
Kassem2009
Kassem2009Kassem2009
Kassem2009lazchi
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Flexis QE 32-bit ColdFire® V1 Microcontrollers
Flexis QE 32-bit  ColdFire® V1 Microcontrollers Flexis QE 32-bit  ColdFire® V1 Microcontrollers
Flexis QE 32-bit ColdFire® V1 Microcontrollers Premier Farnell
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 

Similar to Paralleling Variable Block Size Motion Estimation of HEVC On CPU plus GPU Platform (20)

Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
Efficient Realization of Parallel HEVC Intra Coding
Efficient Realization of Parallel HEVC Intra CodingEfficient Realization of Parallel HEVC Intra Coding
Efficient Realization of Parallel HEVC Intra Coding
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by
 
Video coding technology proposal by
Video coding technology proposal by Video coding technology proposal by
Video coding technology proposal by
 
A Review on Image Compression in Parallel using CUDA
A Review on Image Compression in Parallel using CUDAA Review on Image Compression in Parallel using CUDA
A Review on Image Compression in Parallel using CUDA
 
SOC Application Studies: Image Compression
SOC Application Studies: Image CompressionSOC Application Studies: Image Compression
SOC Application Studies: Image Compression
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
Yufeng Guo - Tensor Processing Units: how TPUs enable the next generation of ...
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
FPGA based JPEG Encoder
FPGA based JPEG EncoderFPGA based JPEG Encoder
FPGA based JPEG Encoder
 
Kassem2009
Kassem2009Kassem2009
Kassem2009
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
An35225228
An35225228An35225228
An35225228
 
Flexis QE 32-bit ColdFire® V1 Microcontrollers
Flexis QE 32-bit  ColdFire® V1 Microcontrollers Flexis QE 32-bit  ColdFire® V1 Microcontrollers
Flexis QE 32-bit ColdFire® V1 Microcontrollers
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 

Recently uploaded

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 

Recently uploaded (20)

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 

Paralleling Variable Block Size Motion Estimation of HEVC On CPU plus GPU Platform

  • 1. Paralleling Variable Block Size Motion Estimation of HEVC On CPU plus GPU Platform Xiangwen Wang1, Li Song2, Yanan Zhao2, Min Chen1 1Shanghai University of Electric Power 2 Shanghai Jiao Tong University
  • 2. Outline Introduction The proposed paralleling VBSME of HEVC for GPU+CPU Preliminary result and Future work
  • 3. Introduction HEVC is the newest video coding standard introduced by ITU-T VCEG and ISO/IEC MEPG. Compared with H.264/AVC, HEVC decreases the bitrate by 50% percent on average while maintaning the same visual quality. BQMall_832x480. Left: HEVC 1.5Mbps, right: x264 3.0Mbps
  • 4. Introduction However, encoding complexity is several times more complex than H.264. • RDO: iterate over all mode and partition combinations to dicide the best coding information • RDoQ: iterate over many QP candidates for each block • Intra: prediction modes increased to 35 for luma • SAO: works pixel by pixel • Quadtree structure: bigger block sizes and numerous partition manners • Some other highly computational modules ... As a result, the traditional method which performs the encoding in a sequential way could no longer provide a real-time demand, especially when it comes to HD (1920x1080) and UHD (3840x2160) videos. Parallellism in the encoding procedure must be extensively utilized.
  • 5. Overview OF VBSME In HEVC • Three independent block concepts • CU - Coding Unit • PU - Prediction Unit • TU - Transform Unit • The total number of allowed PU size is 12 (from 64x64 to 4x8/8x4)  up to 425 times ME for one 64x64 CTU (5+4x5+16x5+64x5 = 425) CU size and PU partition structure CU PU Depth1: 64x64 Depth2: 32x32 Depth3: 16x16 Depth4:8x8
  • 6. Two stages: • ME to select best MV for candidate PUs • CU depth and PU partition mode decision MV selection criterion for each PU: Jpred,SAD =SA(T)D + λpred * Rpred CU sizes and PU partition mode decision: Jmode =SSD + λmode * Rmode, To calculate the Jmode for each PU, reconstruction and entropy coding of all syntaxes are necessary, the complexity is beyond the computational capability of common computers for real applications. Mode Decision with VBSME In HM
  • 7. The proposed parallel encoding framework Copy to GPU fEnc Interpolate& Border Pad fEnc ME Kernel 64x64~8x8 PU MVs Half/quarter pixel img buff Half/quarter pixel img buff fRec Encode & Reconstruct 64x64~8x8 PU MVs MC CPU GPU GPU-MEMCPU-MEM Read Img Launch new LCU Line ME Sync to last LCU Line ME Launch Interpolate Entropy coding Sync to Interpolate All LCU line? N Mode Decision fRec Next frame frame loop LCU line loop Lunch ME for one CTU line
  • 8. Fast PU partition mode decsion scheme SKIP ? Half/quarter pixel img buff CBF_fast ? Fast CU partition 64x64~8x8 PU MVs PART_2Nx2N? CU partition or next CU N Y Y CU depth==4 CU_idx==4? Y RD cost Calculate CU depth = 4 Sync to last LCU Line ME CPU-MEM MC The MV and residual information are employed for PU partition decision Two edge feature parameters:       00 01 10 11 _ 8 S S S S V QP stepN                    00 10 01 11 _ 8 S S S S H QP stepN              If (H==V &&H!=0) PART_2Nx2N Elseif (H==V and H==0) PART_NxN Elseif H>V PART_Nx2N else PART_2NxN
  • 9. Parallel realization of VBSME on CUDA 8x8 block size SAD calculation 16x16 block size Jpred calculation Integer Pixel Jpred Comparison 16x16 Fractional pixel MV refinement Variable block size Jpred generation Variable block size Jpred calculation Integer Pixel Jpred Comparison four 16x16 lines Variable block size ME The MV selection criterion is as follow: Jpred =SAD +λpred * DMV = SAD +λpred *(MV_C - PMV) where MV_C: current point MV, PMV: next slice
  • 10. PMV for MV cost calculate MV0 MV1 MV2 MV3 MV4 PMV=medium(MV0, MV1, MV2, MV3, MV4)  one CTU(64x64) line is divided into four 16x16 block lines;  The ME process of each 16x16 line is done by GPU sequentially;  The MVs of 16x16 block size are used as the MV predictions for all other block sizes.
  • 11. Variable block size SAD Generation 8x8 0 1 2 3 4 5 6 7 63626160 16x16 32x32 64x64 Variable block size SAD Generation on CUDA
  • 12. Experimental Results Platform:Z620 = NVIDIA Tesla C2050+i7@2.6G, with Win7 OS The CUDA driver version of the GPU is 5.0 and the CUDA Capability version number is 2.0. The search range is 64x64 with the full search strategy for IMV and 24 fractional-pixel positions around the IMV. sequence CPU(fps) GPU(fps) Speedup ratio Traffic_2560x 1600_crop 0.21 23.77 113.2 ParkScreen_1 920x1080_24 0.69 77.76 112.7 The speed-up ratio is about 113 times
  • 13. Experimental Results:RD comparison Note1, The propose algorithm is realization based on X265 encoder, a first open source encoder implementation of HEVC "x265 project, http://code.google.com/p/x265/"; Note2, the Cactus_Proposed implies the RD curve generated by the X265 encoder with the proposed algorithm.
  • 14. Conclusion We present a parallel friendly VBSME(variable block size motion estimation) scheme which make full of available computation resources from CPU and GPU respectively Preliminary results are reported with speedup ratio over 100 times compared to single thread CPU only solution We will continue to exploit parallelism, targeting a 4K@30fps realtime HEVC encoder over multicore CPU and GPGPU platform.