SlideShare ist ein Scribd-Unternehmen logo
1 von 29
What is H.264?

• Video compression standard

• Official name: Advanced Video Coding (AVC) for generic
  audiovisual services
   o aka: MPEG-4/Part 10 or MPEG-4 AVC
• It's in your iPod
   o Current generation standardized format
   o Compression efficiency: H.264 >> XviD and DivX
How H.264 Compresses Video

     Frame 1        Frame 2         Frame 3         Frame 4        Frame 5




  Spatial
                          Temporal        <Source: Foreman, QCIF @ 25 fps>
Redundancy
                         Redundancy
    • Three redundancy reduction principles:
       1. Spatial redundancy (Intra-frame prediction)
       2. Temporal redundancy (Inter-frame prediction)
       3. Entropy coding (Mapping more common symbols to shorter codes)
Simple Video Encoder
Intra-frame Prediction
• Prediction block is formed from previously encoded blocks in
  the same frame
• Use spatial similarities to compress each frame
   o Use neighboring pixels to make a prediction on a block
   o Transmit the difference between actual and predicted
   o Tradeoff: prediction accuracy vs. # control bits
• Compression efficiency is relatively low in most areas of a
  typical scene

• Relatively low computation cost




                             Divide into 16x16 macroblocks (MBs)
Inter-frame Prediction

• Temporal locality
• Use previous frame as prediction for current frame
• Record movements
   o "motion vectors" (MVs)
Motion Vectors
Motion Estimation Algorithms

• Block Matching
   o 16 pixel x 16 pixel macroblocks
   o Estimate the movement of each macroblock
• Phase Correlation
   o Perform the search in the frequency domain
   o Only works well for translational motion
• Bayesian methods
tree moved down people moved farther to
                        and to the right the right than tree




Frame 1 (reference)             Frame 2 (current)




                          Macroblock to be coded
Big (Computational) Problem
• HD Video- 1080p (1920×1080) = 8,160 macroblocks
• Search window-how far we search for original block
  o   Normally 16 pixels; sometimes 32 pixels
  o   (2*16+1)*(2*16+1) = 1089 positions




                                          ME block

            Reference                                Current
            Frame          Search                    Frame
                           Space
Profiling Results

• Motion estimation (ME) dominates the encoding time!




  Results from JM H.264 Reference
  Code
Amdahl's Law

• Limits the overall speedup
• Eventually, the speedup limited by unparallized portion of
  the code
   o Optimized ME implementation (like x264) generally
     results in lower overall speedup
Previous Implementations

• x264
   o CPU
   o Open source
   o C and hand-coded assembly
   o VERY optimized
       MMX, SSE2, SSE3, SSE4
   o Considered the fastest implementation of H.264
   o Multithreaded (pthread support)
   o Slow! Slower than last generation encoders.
In CUDA
     • Several published articles which implemented H.264
       encoder in CUDA.
     • All of them target ME for parallelization
     • An example*
        o ME = 5 kernels
        o Full-search (i.e., unoptimized ME)
        o Sub-pel MV support
        o Sub-partition support




* Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA)," Multimedia and Expo, 2008
IEEE International Conference on, pp.697-700, June 23 2008-April 26 2008.
Problems with Previous Work

• Do not address inter-block dependencies
  o Sacrifice quality for parallelizability (i.e. speed)




                     MVp Dependencies
Our Project

• H.264 specifies how the decoder will work
   o Flexibility in encoder
       e.g. other CUDA implementations
• Solve motion estimation problem in parallel
   1.Deal with the dependency between blocks
   2.Best guess of MVp
Direct Approach: Wavefront
Our Approach: Pyramid ME

• Also known as "Hierarchical" ME
• Perform ME at a number of resolutions in increasing order
   o Use the MV found at the higher level as an estimate of
     the MVp in the lower level
Motion Vector


Sub-sampled 16x
Using Pyramid ME to Solve MVp Problem
Our Prototyping Framework

• Originally MATLAB + nvmex
• Now pyCUDA + matplotlib
• Motivation
  o Simplicity
  o Flexibility (output images, graphs, etc.)
  o pyCUDA == awesome
  o Automatic tuning in the future
Our Prototyping Framework
Our CUDA Implementation

• CUDA + C
• One kernel / level of hierarchy
• One block per macroblock
• One thread per search position
   o With 512 thread limit, search window size <= 11
   o Can perform argmin reduction to find the best MV
• Texture memory for reference and current frame
   o Allows for sub-pixel interpolation
   o Handles border clamping
Results

Gold    203.3 msec
CUDA    3.6 msec        Speedup = 56
x264    11.6 msec

• Not appropriate to compare the CUDA time to the x264 time.
• The x264 is performing a more accurate search.
   o The CUDA implementation will be made more accurate in
     the future.
   o We implemented small subset of the ME features
Conclusions

• H.264 ME in CUDA is viable, but will not be easy
   o Competing against very well written CPU code
• Full encoding process of H.264 is very complicated
   o Complex control flow and data dependencies
Future Work

• Improve estimate for MVp
• Pipeline data transfers
• Downsample on GPU vs. CPU
   o Data access concerns
• Process multiple frames together
   o Improve occupancy
• More than ME in CUDA
   o More dependency constraints
CUDA as a Development Framework

• Opened up GPU
   o Took less than a month!
• Documentation is sparse
• Right way isn't always known
• Debugging is a pain
• Emulation mode is VERY slow
• CUDA servers can become locked and need rebooting
Acknowledgements

Dark_Shikari (x264 dev)
Various other people in #x264 channel @ Freenode.net
H.264 Encoder Block Diagram

                                                                                                Bitstream
Video Input                    +                     Transform &                      Entropy
                                                                                                Output
                                                     Quantization                     Coding
                                      -
                                                               Inverse Quantization
                                                               & Inverse Transform



                             Intra/Inter Mode
                                 Decision
                                                                    + +

                 Motion                        Intra
              Compensation                  Prediction



                                                 Picture            Deblocking
                                                Buffering             Filter

                Motion
               Estimation
                                                                     Block prediction
References

E. G. Richardson, Iain (2003). H.264 and MPEG-4 Video Compression: Video Coding for Next-generation
Multimedia. Chichester: John Wiley & Sons Ltd..

Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified
Device Architecture (CUDA)," Multimedia and Expo, 2008 IEEE International Conference on, pp.697-700,
June 23 2008-April 26 2008.

S Ryoo, CI Rodrigues, SS Baghsorkhi, SS Stone, DB."Optimization Principles and Application Performance
Evaluation of a Multithreaded GPU Using CUDA" 2008.

http://www.cs.cf.ac.uk/Dave/Multimedia/node256.html

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/ZAMPOGLU/Hierarchicalestimation.h
tml

Weitere ähnliche Inhalte

Was ist angesagt?

H.264 video standard
H.264 video standardH.264 video standard
H.264 video standardSajan Sahu
 
Andes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processorAndes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processorRISC-V International
 
SemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable CoresSemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable CoresRISC-V International
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLinaro
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V International
 
LF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IOLF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IOLF_DPDK
 
h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.Videoguy
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304Linaro
 
Closing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingClosing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingRISC-V International
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRay Song
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresRISC-V International
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Sk Cheah
 
Hard IP Core design | Convolution Encoder
Hard IP Core design | Convolution EncoderHard IP Core design | Convolution Encoder
Hard IP Core design | Convolution EncoderArchit Vora
 

Was ist angesagt? (20)

RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
Andes open cl for RISC-V
Andes open cl for RISC-VAndes open cl for RISC-V
Andes open cl for RISC-V
 
H.264 video standard
H.264 video standardH.264 video standard
H.264 video standard
 
Andes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processorAndes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processor
 
SemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable CoresSemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable Cores
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
 
REDA services
REDA servicesREDA services
REDA services
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor Family
 
RISC-V assembly
RISC-V assemblyRISC-V assembly
RISC-V assembly
 
LF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IOLF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IO
 
h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
 
Closing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingClosing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzing
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLD
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED.
 
Hard IP Core design | Convolution Encoder
Hard IP Core design | Convolution EncoderHard IP Core design | Convolution Encoder
Hard IP Core design | Convolution Encoder
 
Secure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-VSecure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-V
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 

Ähnlich wie H 264 in cuda presentation

Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisArunaRavi
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jainSahil Jain
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)danishrafiq
 
Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...Videoguy
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingChristian Kehl
 
Scrambling For Video Surveillance
Scrambling For Video SurveillanceScrambling For Video Surveillance
Scrambling For Video SurveillanceKobi Magnezi
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Emerging H.264 Standard:
Emerging H.264 Standard:Emerging H.264 Standard:
Emerging H.264 Standard:Videoguy
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Alpen-Adria-Universität
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanVinayagam Mariappan
 
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 VideoAn Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 VideoDr. Mohieddin Moradi
 
Video Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionVideo Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionChamp Yen
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 

Ähnlich wie H 264 in cuda presentation (20)

Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S Thesis
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jain
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)
 
Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
 
Scrambling For Video Surveillance
Scrambling For Video SurveillanceScrambling For Video Surveillance
Scrambling For Video Surveillance
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Emerging H.264 Standard:
Emerging H.264 Standard:Emerging H.264 Standard:
Emerging H.264 Standard:
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
HEVC intra coding
HEVC intra codingHEVC intra coding
HEVC intra coding
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam Mariappan
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 VideoAn Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
 
Video Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionVideo Compression Standards - History & Introduction
Video Compression Standards - History & Introduction
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

H 264 in cuda presentation

  • 1. What is H.264? • Video compression standard • Official name: Advanced Video Coding (AVC) for generic audiovisual services o aka: MPEG-4/Part 10 or MPEG-4 AVC • It's in your iPod o Current generation standardized format o Compression efficiency: H.264 >> XviD and DivX
  • 2. How H.264 Compresses Video Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Spatial Temporal <Source: Foreman, QCIF @ 25 fps> Redundancy Redundancy • Three redundancy reduction principles: 1. Spatial redundancy (Intra-frame prediction) 2. Temporal redundancy (Inter-frame prediction) 3. Entropy coding (Mapping more common symbols to shorter codes)
  • 4. Intra-frame Prediction • Prediction block is formed from previously encoded blocks in the same frame • Use spatial similarities to compress each frame o Use neighboring pixels to make a prediction on a block o Transmit the difference between actual and predicted o Tradeoff: prediction accuracy vs. # control bits • Compression efficiency is relatively low in most areas of a typical scene • Relatively low computation cost Divide into 16x16 macroblocks (MBs)
  • 5. Inter-frame Prediction • Temporal locality • Use previous frame as prediction for current frame • Record movements o "motion vectors" (MVs)
  • 7. Motion Estimation Algorithms • Block Matching o 16 pixel x 16 pixel macroblocks o Estimate the movement of each macroblock • Phase Correlation o Perform the search in the frequency domain o Only works well for translational motion • Bayesian methods
  • 8. tree moved down people moved farther to and to the right the right than tree Frame 1 (reference) Frame 2 (current) Macroblock to be coded
  • 9. Big (Computational) Problem • HD Video- 1080p (1920×1080) = 8,160 macroblocks • Search window-how far we search for original block o Normally 16 pixels; sometimes 32 pixels o (2*16+1)*(2*16+1) = 1089 positions ME block Reference Current Frame Search Frame Space
  • 10. Profiling Results • Motion estimation (ME) dominates the encoding time! Results from JM H.264 Reference Code
  • 11. Amdahl's Law • Limits the overall speedup • Eventually, the speedup limited by unparallized portion of the code o Optimized ME implementation (like x264) generally results in lower overall speedup
  • 12. Previous Implementations • x264 o CPU o Open source o C and hand-coded assembly o VERY optimized  MMX, SSE2, SSE3, SSE4 o Considered the fastest implementation of H.264 o Multithreaded (pthread support) o Slow! Slower than last generation encoders.
  • 13. In CUDA • Several published articles which implemented H.264 encoder in CUDA. • All of them target ME for parallelization • An example* o ME = 5 kernels o Full-search (i.e., unoptimized ME) o Sub-pel MV support o Sub-partition support * Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA)," Multimedia and Expo, 2008 IEEE International Conference on, pp.697-700, June 23 2008-April 26 2008.
  • 14. Problems with Previous Work • Do not address inter-block dependencies o Sacrifice quality for parallelizability (i.e. speed) MVp Dependencies
  • 15. Our Project • H.264 specifies how the decoder will work o Flexibility in encoder  e.g. other CUDA implementations • Solve motion estimation problem in parallel 1.Deal with the dependency between blocks 2.Best guess of MVp
  • 17. Our Approach: Pyramid ME • Also known as "Hierarchical" ME • Perform ME at a number of resolutions in increasing order o Use the MV found at the higher level as an estimate of the MVp in the lower level
  • 19. Using Pyramid ME to Solve MVp Problem
  • 20. Our Prototyping Framework • Originally MATLAB + nvmex • Now pyCUDA + matplotlib • Motivation o Simplicity o Flexibility (output images, graphs, etc.) o pyCUDA == awesome o Automatic tuning in the future
  • 22. Our CUDA Implementation • CUDA + C • One kernel / level of hierarchy • One block per macroblock • One thread per search position o With 512 thread limit, search window size <= 11 o Can perform argmin reduction to find the best MV • Texture memory for reference and current frame o Allows for sub-pixel interpolation o Handles border clamping
  • 23. Results Gold 203.3 msec CUDA 3.6 msec Speedup = 56 x264 11.6 msec • Not appropriate to compare the CUDA time to the x264 time. • The x264 is performing a more accurate search. o The CUDA implementation will be made more accurate in the future. o We implemented small subset of the ME features
  • 24. Conclusions • H.264 ME in CUDA is viable, but will not be easy o Competing against very well written CPU code • Full encoding process of H.264 is very complicated o Complex control flow and data dependencies
  • 25. Future Work • Improve estimate for MVp • Pipeline data transfers • Downsample on GPU vs. CPU o Data access concerns • Process multiple frames together o Improve occupancy • More than ME in CUDA o More dependency constraints
  • 26. CUDA as a Development Framework • Opened up GPU o Took less than a month! • Documentation is sparse • Right way isn't always known • Debugging is a pain • Emulation mode is VERY slow • CUDA servers can become locked and need rebooting
  • 27. Acknowledgements Dark_Shikari (x264 dev) Various other people in #x264 channel @ Freenode.net
  • 28. H.264 Encoder Block Diagram Bitstream Video Input + Transform & Entropy Output Quantization Coding - Inverse Quantization & Inverse Transform Intra/Inter Mode Decision + + Motion Intra Compensation Prediction Picture Deblocking Buffering Filter Motion Estimation Block prediction
  • 29. References E. G. Richardson, Iain (2003). H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. Chichester: John Wiley & Sons Ltd.. Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA)," Multimedia and Expo, 2008 IEEE International Conference on, pp.697-700, June 23 2008-April 26 2008. S Ryoo, CI Rodrigues, SS Baghsorkhi, SS Stone, DB."Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA" 2008. http://www.cs.cf.ac.uk/Dave/Multimedia/node256.html http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/ZAMPOGLU/Hierarchicalestimation.h tml