MCW currently works in a number compute intense application areas such as HEVC, VP9, OpenCV, imaging, video, codecs, H.264, H.265, broadcast video, ADAS.
Languages include OpenCL, CUDA, RenderScript, C/C++, Assembly (Intel, ARM)
Platforms supported include Intel, AMD, ARM, Qualcomm, Imagination, across CPU, GPU and DSP for these platforms.
A deep bench in LLVM technology with significant compiler optimization is also available for licensing and customization.
MulticoreWare Inc - Accelerating Video and Imaging Applications
1.
2. Global Team
Largest Independent OpenCL Team
Founded in 2008
225 Strong and Growing
High Ratio of PhDs, Masters
Chennai
St. Louis
Parallel Processing Leaders
Champaign
Sunnyvale
Changchun
Beijing
Dr. Wen Mei-Hwu, MCW CTO and PI for the UIUC Blue
Waters Supercomputer accepts the Second Annual
Achievement Award at GTC 2013
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
2
3. Industry Leadership
Tools leadership role on HSA Foundation
Khronos Contributor Member
Strategic Relationship with University
of Illinois at Urbana-Champaign, USA
Partnerships with CPU/GPU/FPGA Vendors
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
3
5. Professional Services
Parallel Processing tools
• Complete OpenCL stack for AMD Fusion
• C++ AMP
• Renderscript
Clients globally have used MulticoreWare to maximize
performance and portability of their software
Video Encoding
Video Processing
• Scaling, color space conversion
• Resizing and rate-conversion
• De-interlacing and re-interlacing
Video Game Engine Acceleration
Image Processing
• Semiconductor wafer defect inspection
• Raster Image Processor engine parallelization
Bioinformatics
• Accelerated BLAST algorithm for gene
sequencing
• 3500X faster than NIH reference model
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
5
6. Domain Expertise
Video Processing
Video Transcoding
Video Game Engines
Image Processing
Medical Imaging
Seismic data analysis
Compression
Encryption
Fluid Dynamics
Compilers (LLVM)
Device drivers
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
6
7. Platform Expertise
Video and Imaging implementations done across many platforms
Experience across heterogeneous compute platforms
• Mobile device platforms to workstations and cloud based platforms
• x86 Assembly Code optimization
• ARM Mali and NEON optimization
Experience across heterogeneous programming models
• CUDA
• OpenCL
• Renderscript
• C++AMP
• MARE
• HSA
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
7
8. Video Expertise
X264 – Open Source H.264 Encoder accelerated for Telestream’s Vantage
Encoder
- MulticoreWare’s H.265 Encoder
- MulticoreWare’s H.265 Decoder
VP9 Acceleration
Accelerated Video Processing Library – Super – resolution, image
stabilization, detection and recognition
Handbrake
FFMPEG
VLC
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
8
9. HEVC – Commercially Supported Open Source
Compute intensive
• Larger block size 64x64 Vs 16x16 in H.264
• More transform sizes
• New Intra prediction modes
• Quad tree structure in processing Coding Unit(CU)
• Sample Adaptive Offset (SAO) filter in addition to deblocking filter
New ideas to facilitate parallel processing of data – Tiles, WPP
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
9
10. Renderscript
MCW
• Developed Renderscript infrastructure for ARM Mali
• Developed 2 marquee APKs using ARM A15 & Mali
Photo processing 2-15x speedup over ARM core
Video transcoder with filtering and motion stabilization
Working closely with Google
• Enabling VP9 video codec
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
10
11. Image Processing Expertise
Cinema DNG (debayering, noise reduction, etc.)
GIMP/GEGL – open source PhotoShop alternative
•
•
•
•
•
MCW parallelized GIMP
Accelerated kernels for color space conversion
Improved calling and data transfer mode between GIMP and GEGL
Streamlined redundant operations for improved efficiency of image processing
More than 20 algorithms (e.g. image scaling, Brightness/contrast control, gamma correction,
edge enhancement, color correction, etc.) implemented
JPEG in browser
•
•
•
•
Implemented accelerated libjpeg-turbo for JPEG decoding
Integration of libjpeg-turbo in Chromium.
Implementation of parallel progressive mode JPEG decoding
Implementation of Huffman decoding algorithm
OpenCV
•
•
Performance optimized and author of many functions
Key contributor
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
11
13. Automotive Algorithms in OpenCV
Lane keeping
• Canny
AEB
• HOG
• Haar
• Optical flow
Traffic sign recognition
• Hough transform
• Haar
• SURF
Driver monitor
• Face/eye detect/tracking
Pedestrian detection and avoidance
• HOG
• StereoMatch
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
13
14. Other OpenCV Algorithms
MCW - lead contributor of OpenCL-accelerated OpenCV
•
•
•
•
•
•
•
Face detection
HOG pedestrian detection
PyrLK/TVL1 optical flow show
Square detection
SURF matcher
Stereo matcher
CLAHE
Extensive optimizations are applied to these algorithms
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
14
15. Mobile Application Acceleration
MulticoreWare MobileComputeMark
Android benchmark App
Parallel Path Analyzer for Android
Renderscript / OpenCL Stack Development
Photo editing App for ARM
Video transcode App for ARM
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
15
16. Accelerated Libraries
OpenCL
VPL
•
•
Video Processing Library
Nearly 80 video kernels for broadcast
standards-conversion
Other Languages
VP9 Codec for Google
RenderScript
OpenCL
VFL
•
Video pre-processing Filters Library
IPL
•
Image Processing Library
XAL
•
H.264 Acceleration Library
Crypto
•
•
Crypto++
AES
Compression
•
XXX_Zip
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
16
17. PPA – Parallel Path Analyzer
A performance-visualization tool to identify performance bottlenecks, application critical paths and
system-wide dependencies.
Provides flexible, globally time-stamped, runtime data collection and post-processing
procedures to generate meaningful performance analysis results and display them in intuitive
graphical and textual ways.
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
17
18. MxPA Source-2-Source Translator
OpenCL to C on Intel X86, OpenMP, others
*…
• Maintain a common code base in OpenCL
• Support OpenCL enabled devices or go direct to other
compilers as needed
Generates C source code for vendor specific compiler tools
• Integrated code sequencing and resource utilization for
highest performance
• Highest performance automated code generation method
available today
• Takes advantage of Intel SSE, TBB
Translation
Close to ASM code performance out of box
• No need for OpenCL driver support
• Leverages silicon vendor tool optimizations
* = NDA needed for more details
upcrc.illinois.edu
Copyrights 2014, Confidential, MulticoreWare Inc.,
February 3, 2014
18