1. Introducing PgOpenCL
A New PostgreSQL
Procedural Language
Unlocking the Power of the GPU!
By
Tim Child
2. Bio
Tim Child
⢠35 years experience of software development
⢠Formerly
⢠VP Oracle Corporation
⢠VP BEA Systems Inc.
⢠VP Informix
⢠Leader at Illustra, Autodesk, Navteq, Intuit, âŚ
⢠30+ years experience in 3D, CAD, GIS and DBMS
3. Terminology
Term Description
Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, ⌠)
GPU Graphics Processing Unit (highly specialized CPU for graphics)
GPGPU General Purpose GPU (non-graphics programming on a GPU)
CUDA Nvidiaâs GPU programming environment
APU Accelerated Processing Unit (AMDâs Hybrid CPU & GPU chip)
ISO C99 Modern standard version of the C language
OpenCL Open Compute Language
OpenMP Open Multi-Processing (parallelizing compilers)
SIMD Single Instruction Multiple Data (Vector instructions )
SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions
xPU Any Processing Unit device (CPU, GPU, APU)
Kernel Functions that execute on a OpenCL Device
Work Item Instance of a Kernel
Workgroup A group of Work Items
FLOP Floating Point Operation (single = SQL real type )
MIC Many Integrated Cores (Intelâs 50+ x86 Core chip architecture)
4. Some Technology Trends
Impacting DBMS
⢠Solid State Storage
â Reduced Access Time, Lower Power, Increasing in capacity
⢠Virtualization
â Server consolidation, Specialized VMâs, lowers direct costs
⢠Cloud Computing
â EC2, Azure, ⌠lowers capital requirements
⢠Multi-Core
â 2,4,6,8, 12, âŚ. Lots of benefits to multi-threaded applications
⢠xPU (GPU/APU)
â GPU >1000 Cores
â > 1T FLOP /s @ âŹ2500
â APU = CPU + GPU Chip Hybrids due in Mid 2011
â 2 T FLOP /s for $2.10 per hour (AWS EC2)
â Intel MIC âKnights Corner â > 50 x86 Cores
6. GPU vs CPU
Vendor NVidia ATI Radeon Intel
Architecture Fermi Evergreen Nehalem
Cores 448 1600 4
Simple Simple Complex
Transistors 3.1 B 2.15 B 731 M
Clock 1.5 G Hz 851 M Hz 3 G Hz
Peak Float 1500 G 2720 G 96 G
Performance FLOP / s FLOP / s FLOP / s
Peak Double 750 G 544 G 48 G
Performance FLOP / s FLOP / s FLOP / s
Memory ~ 190 G / s ~ 153 G / s ~ 30 G / s
Bandwidth
Power 250 W > 250 W 80 W
Consumption
SIMD / Vector Many Many SSE4+
Instructions
8. Future (Mid 2011)
APU Based PC
APU (Accelerated Processing Unit)
APU Chip
CPU CPU ~20 GB/s System RAM
North Bridge
~20 GB/s APUâs
PCIE ~12 GB/s
PCIE ~12 GB/s
Adds an Embedded
Embedded GPU
GPU
Discrete
150 GB/s Graphic RAM
GPU
Source AMD
9. Scalar vs. SIMD
Scalar Instruction
C=A+B 1 + 2 = 3
SIMD Instruction 1 3 5 7
+
Vector C = Vector A + Vector B 2 4 6 8
=
3 7 11 15
OpenCL
Vector lengths 2,4,8,16 for char, short, int, float, double
10. Summarizing xPU
Trends
⢠Many more xPU Cores in our Future
⢠Compute Environment becoming Hybrid
â CPU and GPUâs
â Need CPU to give access to GPU power
⢠GPU Capabilities
â Lots of cores
â Vector/SIMD Instructions
â Fast Memory
⢠GPU Futures
â Virtual Memory
â Multi-tasking / Pre-emption
11. Scaling PostgreSQL Queries
on xPUâs
Multi-Core CPU Many Core GPU
PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL
Threads Threads Threads Threads Threads Threads Threads Thread Thread
PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL
Postgres Threads Threads Threads Thread Thread
Process
PgOpenCL PgOpenCL PgOpenCL PgOpenCL
PgOpenCL
Threads Threads Thread Thread
Threads
Using More
Transistors
12. Parallel
Programming Systems
Category CUDA OpenMP OpenCL
Language C C, Fortran C
Cross Platform X â â
Standard Vendor OpenMP Khronos
CPU X â â
GPU â X â
Clusters X â X
Compilation / Link Static Static Dynamic
13. What is OpenCL?
⢠OpenCL - Open Compute Language
â Subset of C 99
â Open Specification
â Proposed by Apple
â Many Companies Collaborated on the Specification
â Portable, Device Agnostic
â Specification maintained by Khronos Group
⢠PgOpenCL
â OpenCL as a PostgreSQL Procedural Language
14. System Overview
DBMS Server
PgOpenCL
PgOpenCL
Web HTTP Web SQL SQL
SQL
Browser Server Statement Procedure
Procedure
PCIe X2 Bus
TCP/IP
App
PostgreSQL GPGPU
Server
Disk I/O Tables
TCP/IP
PostgreSQL
Client
15. OpenCL
Language
⢠A subset of ISO C99
â - But without some C99 features such as standard C99 headers,
â function pointers, recursion, variable length arrays, and bit fields
⢠A superset of ISO C99 with additions for:
â - Work-items and Workgroups
â - Vector types
â - Synchronization
â - Address space qualifiers
⢠Also includes a large set of built-in functions
â - Image manipulation
â - Work-item manipulation,
â - Specialized math routines, etc.
16. PgOpenCL
Components
⢠New PostgreSQL Procedural Language
â Language handler
⢠Maps arguments
⢠Calls function
⢠Returns results
â Language validator
⢠Creates Function with parameter & syntax checking
⢠Compiles Function to a Binary format
⢠New data types
â cl_double4, cl_double8, âŚ.
⢠System Admin Pseudo-Tables
â Platform, Device, Run-Time, âŚ
18. PGOpenCL
Function Declaration
CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])
AS $BODY$
#pragma PGOPENCL Platform : ATI Stream
#pragma PGOPENCL Device : CPU
__kernel __attribute__((reqd_work_group_size(64, 1, 1)))
void VectorAdd( __global const float *a, __global const float *b, __global float *c)
{
int i = get_global_id(0);
c[i] = a[i] + b[i];
}
$BODY$
Language PgOpenCL;
19. PgOpenCL
Execution Model
A
Table
B
Select Table 100âs - 1000âs of
to Array Threads (Kernels)
xPU
VectorAdd(A, B)
A + B Returns C = C
Copy Unnest Array
Copy To Table
Table
C C C C C C C C C C C C C
20. Using
Re-Shaped Tables
100âs - 1000âs of
Table of Threads (Kernels) Table of
Arrays Arrays
A + B = C
A
C C C C
B
xPU
VectorAdd(A, B)
Returns C
A
C C C C
B
Copy
Copy
21. Todayâs GPGPU
Challenges
⢠No Pre-emptive Multi-Tasking
⢠No Virtual Memory
⢠Limited Bandwidth to discrete GPGPU
â 1 â 8 G/s over PCIe Bus
⢠Hard to Program
â New Parallel Algorithms and constructs
â âNewâ C language dialect
⢠Immature Tools
â Compilers, IDE, Debuggers, Profilers - early years
⢠Data organization really matters
â Types, Structure, and Alignment
â SQL needs to Shape the Data
⢠Profiling and Debugging is not easy
Solves Well for Problem Sets with the Right Shape!
22. Making a Problem
Work for You
⢠Determine % Parallelism Possible
for ( i = 0, i < â, i++)
for ( j = 0; j < â; j++ )
for ( k = 0; k < â; k++ )
⢠Arrange data to fit available GPU RAM
⢠Ensure calculation time >> I/O transfer overhead
⢠Learn about Parallel Algorithms and the OpenCL language
⢠Learn new tools
⢠Carefully choose Data Types, Organization and Alignments
⢠Profile and Measure at Every Stage
23. PgOpenCL
System Requirements
⢠PostgreSQL 9.x
⢠For GPUâs
â AMD ATI OpenCL Stream SDK 2.x
â NVidia CUDA 3.x SDK
â Recent Macs with O/S 11.6
⢠For CPUâs (Pentium M or more recent)
â AMD ATI OpenCL Stream SDK 2.x
â Intel OpenCL SDK Alpha Release (x86)
â Recent Macs with O/S 11.6
24. PGOpenCL
Status
Today 1Q 2011
Prototype Beta
2010 2011
⢠Wish List
⢠Beta Testers
â Existing OpenCL App?
â Have a GPU App?
⢠Contributors
â Code server side functions?
⢠Sponsors & Supporters
â AMD Fusion Fund?
â Khronos?
26. Using the
Whole Brain
APU Chip
PgOpenCl PgOpenCl
PgOpenCL PgOpenCL
CPU
CPU CPU
Postgres You canât be in a
parallel universe
with a single
brain!
North Bridge
~20 GB/s
⢠Heterogeneous Compute Environments
PgOpenCl
PgOpenCl ⢠CPUâs, GPUâs, APUâs
Embedded PgOpenCl ⢠Expect 100âs â 1000âs of cores
PgOpenCl
GPU PgOpenCL
The Future Is Parallel: What's a Programmer to Do?
27. Summarizing
PgOpenCL
⢠Supports Heterogeneous Parallel Compute Environments
⢠CPUâs, GPUâs, APUâs
⢠OpenCL
⢠Portable and high-performance framework
âIdeal for computationally intensive algorithms
âAccess to all compute resources (CPU, APU, GPU)
âWell-defined computation/memory model
â˘Efficient parallel programming language
âC99 with extensions for task and data parallelism
âRich set of built-in functions
â˘Open standard for heterogeneous parallel computing
⢠PgOpenCL
⢠Integrates PostgreSQL with OpenCL
⢠Provides Easy SQL Access to xPUâs
⢠APU, CPU, GPGPU
⢠Integrates OpenCL
⢠SQL + Web Apps(PHP, Ruby, ⌠)
28. More
Information
⢠PGOpenCL
⢠Twitter @3DMashUp
⢠OpenCL
⢠www.khronos.org/opencl/
⢠www.amd.com/us/products/technologies/stream-technology/opencl/
⢠http://software.intel.com/en-us/articles/intel-opencl-sdk
⢠http://www.nvidia.com/object/cuda_opencl_new.html
⢠http://developer.apple.com/technologies/mac/snowleopard/opencl.html
29. Q&A
⢠Using Parallel Applications?
⢠Benefits of OpenCL / PgOpenCL?
⢠Want to Collaborate on PgOpenCL?