Unleash Your Potential - Namagunga Girls Coding Club
YaCF: The accULL Compiler Thesis Analyzes Parallelization
1. YaCF: The
accULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: The accULL Compiler
Undergraduate Thesis Project
Juan Jos´ Fumero Alfonso
e
Universidad de La Laguna
22 de junio de 2012
1 / 85
2. YaCF: The
accULL Compiler
Juan J. Fumero
Outline
Introduction
YaCF
Experiments
Conclusions
1 Introduction
Future Work
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
2 / 85
3. YaCF: The
accULL Compiler
Juan J. Fumero
Outline
Introduction
YaCF
Experiments
Conclusions
1 Introduction
Future Work
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
3 / 85
4. YaCF: The
accULL Compiler
Juan J. Fumero
Moore’s Law
Introduction
YaCF
Experiments
Conclusions
Future Work
Every 18 months the number of transistors could be doubled.
4 / 85
5. YaCF: The
accULL Compiler
Juan J. Fumero
Nowadays Parallel Architectures
Introduction
YaCF
Experiments
Conclusions
Future Work
5 / 85
6. YaCF: The
accULL Compiler
Juan J. Fumero
Parallel Architectures
Introduction
YaCF
Experiments
Conclusions
Future Work
The solution
• More processors
• More cores per processor
6 / 85
7. YaCF: The
accULL Compiler
Juan J. Fumero
Parallel Architectures
Introduction
YaCF
Experiments
Conclusions
Future Work
The systems are hybrid using all options.
7 / 85
8. YaCF: The
accULL Compiler
Juan J. Fumero
Parallel Architectures
Introduction
YaCF
Experiments
Conclusions
Future Work
8 / 85
9. YaCF: The
accULL Compiler
Juan J. Fumero
OpenMP: Shared Memory
Introduction
YaCF
Programming
Experiments • API that support SMP programming.
Conclusions
• Multi-platform.
Future Work
• A directive-based approach.
• A set of compiler directives, library routines and environment
variables for parallel programming.
OpenMP example
1 #pragma omp p a r a l l e l
2 {
3 #pragma omp master
4 {
5 nthreads = o m p _ g e t _ n u m _ t h r e a d s ( ) ;
6 }
7 #pragma omp f o r p r i v a t e ( x ) reduction (+: sum ) schedule ( runtime )
8 f o r ( i =0; i < NUM_STEPS ; ++i ) {
9 x = ( i +0.5)∗step ;
10 sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ;
11 }
12 #pragma omp master
13 {
14 pi = step ∗ sum ;
15 }
16 }
9 / 85
10. YaCF: The
accULL Compiler
Juan J. Fumero
MPI: Message Passing Interface
Introduction
YaCF
Experiments
Conclusions
Future Work • A language-independent communications protocol used to
program parallel applications.
• MPI’s goals are high performance, scalability and portability.
MPI example
1 MPI_Comm_size ( MPI_COMM_WORLD , &M P I _ N U M P R O C E S S O R S ) ;
2 MPI_Comm_rank ( MPI_COMM_WORLD , &MPI_NAME ) ;
3 w = 1.0 / N ;
4 f o r ( i = MPI_NAME ; i < N ; i += M P I _ N U M P R O C E S S O R S ) {
5 local = ( i + 0 . 5 ) ∗ w ;
6 pi_mpi = pi_mpi + 4 . 0 / ( 1 . 0 + local ∗ local ) ;
7 }
8 MPI_Allreduce (&pi_mpi , &gpi_mpi , 1 , MPI_DOUBLE , MPI_SUM , MPI_C OMM_WOR LD ) ;
10 / 85
11. YaCF: The
accULL Compiler
Juan J. Fumero
High Performance Computing
Introduction
YaCF
Experiments • The most powerful computers at the moment.
Conclusions
• Systems with a massive number of processors.
Future Work
• High speed of calculation.
• It contains thousands of processors and cores.
• Systems very expensive and consuming a huge amount of energy.
11 / 85
12. YaCF: The
accULL Compiler
Juan J. Fumero
TOP 500: High Performance
Introduction
YaCF
Computing
Experiments
Conclusions
• The TOP500 project ranks and details the 500 (non-distributed)
Future Work
most powerful known computer systems in the world.
• The project publishes an updated list of the supercomputers
twice a year.
12 / 85
14. YaCF: The
accULL Compiler
Juan J. Fumero
Languages for Heterogeneous
Introduction
YaCF
Programming
Experiments
Conclusions
CUDA
Future Work Developed by NVIDIA.
• Pros: its performance, it is easier than OpenCL.
• Con: only works with NVIDIA hardware.
14 / 85
15. YaCF: The
accULL Compiler
Juan J. Fumero
Languages for Heterogeneous
Introduction
YaCF
Programming
Experiments
Conclusions
Future Work
CUDA
1 __global__ v o i d mmkernel ( f l o a t ∗ a , f l o a t ∗ b , f l o a t ∗ c , i n t n ,
2 int m , int p)
3 {
4 i n t i = blockIdx . x∗32 + threadIdx . x ;
5 i n t j = blockIdx . y ;
6 f l o a t sum = 0 . 0 f ;
7 f o r ( i n t k = 0 ; k < p ; ++k ) sum += b [ i+n∗k ] ∗ c [ k+p∗j ] ;
8 a [ i+n∗j ] = sum ;
9 }
15 / 85
16. YaCF: The
accULL Compiler
Juan J. Fumero
Languages for Heterogeneous
Introduction
YaCF
Programming
Experiments
Conclusions
Future Work
OpenCL
A framework developed by the Khronos Group.
• Pros: can be used with any device, it is a standard.
• Cons: more complex than CUDA, immature.
16 / 85
17. YaCF: The
accULL Compiler
Juan J. Fumero
Languages for Heterogeneous
Introduction
YaCF
Programming
Experiments
Conclusions
Future Work
OpenCL
1 __kernel v o i d matvecmul ( __global f l o a t ∗a ,
2 c o n s t __global f l o a t ∗b , c o n s t __global f l o a t ∗c ,
3 c o n s t uint N ) {
4 float R;
5 int k;
6 i n t xid = get_global_id ( 0 ) ;
7 i n t yid = get_global_id ( 1 ) ;
8 i f ( xid < N ) {
9 i f ( yid < N ) {
10 R = 0.0;
11 f o r ( k = 0 ; k < N ; k++)
12 R += b [ xid ∗ N + k ] ∗ c [ k∗N + yid ] ;
13 a [ xid∗N+yid ] = R ;
14 }
15 }
16 }
17 / 85
18. YaCF: The
accULL Compiler
Juan J. Fumero
Languages for Heterogeneous
Introduction
YaCF
Programming
Experiments
Conclusions Pros
Future Work
1 The programmer can use all machine’s devices.
2 GPU and CPU could work in parallel.
18 / 85
19. YaCF: The
accULL Compiler
Juan J. Fumero
Languages for Heterogeneous
Introduction
YaCF
Programming
Experiments
Conclusions Problems
Future Work
1 The programmer needs to know low-level details of the
architecture.
19 / 85
20. YaCF: The
accULL Compiler
Juan J. Fumero
Languages for Heterogeneous
Introduction
YaCF
Programming
Experiments
Conclusions
Future Work
Cons
1 The programmer needs to know low-level details of the
architecture.
2 Source codes need to be rewritten:
• One version for OpenMP/MPI.
• A different version for GPU.
3 Good performance requires a great effort in parameter tuning.
4 These languages (CUDA/OpenCL) are complex and new for
non-experts.
20 / 85
21. YaCF: The
accULL Compiler
Juan J. Fumero
GPGPU (General Purpose GPU)
Introduction
YaCF
Computing
Experiments
Conclusions
Future Work
Can we use GPUs for parallel
computing? Is this efficient?
21 / 85
22. YaCF: The
accULL Compiler
Juan J. Fumero
The NBody Problem
Introduction
YaCF
Experiments
Conclusions
Future Work
• Simulation numerically
approximates the
evolution of a system of
bodies.
• Each body continuously
interacts with other
bodies.
• Fluid flow simulations.
22 / 85
23. YaCF: The
accULL Compiler
Juan J. Fumero
NBody description
Introduction
YaCF
Experiments
Conclusions
Future Work
Acceleration
Fi
ai =
mi
mj rij
ai ≈ G ·
(||rij ||2 + 2 )3/2
1≤j≤N
23 / 85
24. YaCF: The
accULL Compiler
Juan J. Fumero
CUDA implementation
Introduction
YaCF
Experiments
Conclusions
Future Work
• The method is Particle to Particle.
• Its computational complexity is O(n2 )
• Evaluate all pair-wise interactions. It is exact.
24 / 85
25. YaCF: The
accULL Compiler
Juan J. Fumero
CUDA implementation: blocks and
Introduction
YaCF
grids
Experiments
Conclusions
Future Work
25 / 85
26. YaCF: The
accULL Compiler
Juan J. Fumero
CUDA Kernel: Tile calculation
Introduction
YaCF
Experiments
Conclusions
Future Work
1 __device__ float3 gravitation ( float4 myPos , float3 accel ) {
2 e x t e r n __shared__ float4 sharedPos [ ] ;
3 unsigned long i = 0;
4
5 f o r ( u n s i g n e d i n t counter = 0 ; counter < blockDim . x ; counter++ )
6 {
7 accel = b o d y B o d y I n t e r a c t i o n ( accel , SX ( i++) , myPos ) ;
8 }
9 r e t u r n accel ;
10 }
26 / 85
27. YaCF: The
accULL Compiler
Juan J. Fumero
CUDA Kernel: calculate forces
Introduction
YaCF
Experiments
Conclusions
Future Work
1 __global__ v o i d c al c u l a t e _ f o r c es ( float4∗ globalX , float4∗ globalA )
2 {
3 // A s h a r e d memory b u f f e r t o s t o r e t h e body p o s i t i o n s .
4 e x t e r n __shared__ float4 [ ] shPosition ;
5 float4 myPosition ;
6 i n t i , tile ;
7 float3 a c c = {0.0 f , 0 . 0 f , 0 . 0 f };
8 // G l o b a l t h r e a d ID ( r e p r e s e n t t h e u n i q u e body i n d e x i n t h e s i m u l a t i o n )
9 i n t gtid = blockIdx . x ∗ blockDim . x + threadIdx . x ;
10 // T h i s i s t h e p o s i t i o n o f t h e body we a r e c o m p u t i n g t h e a c c e l e r a t i o n f o r .
11 float4 myPosition = globalX [ gtid ] ;
12 f o r ( i = 0 , tile = 0 ; i < N ; i += blockDim . x , tile++)
13 {
14 i n t idx = tile ∗ blockDim . x + threadIdx . x ;
15 shPosition [ threadIdx . x ] = globalX [ idx ] ;
16 __syncthreads ( ) ;
17 a c c = t il e_ ca lc u l a t i on ( myPosition , a c c ) ;
18 __syncthreads ( ) ;
19 }
20 // r e t u r n
21 }
27 / 85
28. YaCF: The
accULL Compiler
Juan J. Fumero
Results
Introduction
• Tesla C1060 (1.3).
YaCF
• Sequential source code: Intel Corei7 930.
Experiments
Conclusions
• NBody SDK.
Future Work • Cuda Runtime /Cuda Driver: 4.0.
• 400000 bodies
• 200 interactions.
Device Cores Memory Performance (GFLOPS)
Tesla C1060 240 4GB 933 (Single), 78 (double)
Intel Corei7 4 4GB 44.8 (11.2 per core)
28 / 85
29. YaCF: The
accULL Compiler
Juan J. Fumero
Results
Introduction
YaCF
Experiments
Conclusions
• Sequential code: ≈ 147202512.40 ms ≈ 41 hours (40.89 hours)
Future Work
• Parallel CUDA code: 1392029.6 ms = (23.3 minutes)
• The speedup is 105.7 (105×).
29 / 85
30. YaCF: The
accULL Compiler
Juan J. Fumero
At the Present Time
Introduction
YaCF
Experiments
Conclusions
Future Work
• Some applications accelerate with GPUs.
• The user need to learn new programming languages and tools.
• The CUDA model and its architecture have to be understood.
• Non-expert users have to write programs for a new model.
30 / 85
31. YaCF: The
accULL Compiler
Juan J. Fumero
GPGPU Languages
Introduction
YaCF
Experiments
Conclusions
Future Work OpenACC: introduced last November in
SuperComputing’2011
A directive based language.
• Aimed to be standard.
• Supported by: Cray, NVIDIA, PGI and CAPS.
• One simple source code for all versions.
• Platform independent.
• Easier for beginners.
31 / 85
32. YaCF: The
accULL Compiler
Juan J. Fumero
GPGPU Languages
Introduction
YaCF
Experiments
OpenACC
Conclusions A directive based language.
Future Work
32 / 85
33. YaCF: The
accULL Compiler
Juan J. Fumero
A New Dimension for HPC
Introduction
YaCF
Experiments
Conclusions
Future Work
33 / 85
34. YaCF: The
accULL Compiler
Juan J. Fumero
accULL: our OpenACC
Introduction
YaCF
Implementation
Experiments
Conclusions
Future Work
accULL = compiler + runtime library.
34 / 85
35. YaCF: The
accULL Compiler
Juan J. Fumero
accULL: our OpenACC
Introduction
YaCF
Implementation
Experiments
Conclusions
Future Work
accULL = compiler + runtime library.
accULL = YaCF + Frangollo.
34 / 85
36. YaCF: The
accULL Compiler
Juan J. Fumero
Initial Objectives of this Project
Introduction
YaCF
Experiments
Conclusions
Future Work
• To integrate C99 in the YaCF project.
• To implement a new class hierarchy for new YaCF Frontends.
• To implement an OpenACC Frontend.
• To complete the OpenMP grammar with directives in OpenMP
3.0.
• To test the new C99 interface.
35 / 85
37. YaCF: The
accULL Compiler
Juan J. Fumero
Source-to-source Compilers
Introduction
YaCF
Experiments
Conclusions
Future Work
• Rose Compiler Framework.
• Cetus Compiler.
• Mercurium.
36 / 85
38. YaCF: The
accULL Compiler
Juan J. Fumero
Outline
Introduction
YaCF
Experiments
Conclusions
1 Introduction
Future Work
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
37 / 85
39. YaCF: The
accULL Compiler
Juan J. Fumero
accULL: our OpenACC
Introduction
YaCF
implementation
Experiments
Conclusions
Future Work
38 / 85
40. YaCF: The
accULL Compiler
Juan J. Fumero
accULL: our OpenACC
Introduction
YaCF
implementation
Experiments
Conclusions
Future Work
39 / 85
41. YaCF: The
accULL Compiler
Juan J. Fumero
accULL: our OpenACC
Introduction
YaCF
implementation
Experiments
Conclusions
Future Work
40 / 85
42. YaCF: The
accULL Compiler
Juan J. Fumero
accULL: our OpenACC
Introduction
YaCF
implementation
Experiments
Conclusions
Future Work
41 / 85
43. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Yet Another Compiler
Introduction
YaCF
Framework
Experiments
Conclusions
Future Work
42 / 85
44. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF
Introduction
YaCF
Experiments
Conclusions
Future Work • A source-to-source compiler that translates C code with
OpenMP, llc and OpenACC annotations into code with
Frangollo calls.
• Integrates code analysis tools.
• Completely written in Python.
• Based on widely known object oriented software patterns.
• Based on the pycparser Python module.
• Implementing code transformation is only a matter of writing a
few lines of code.
43 / 85
45. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
44 / 85
46. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
45 / 85
47. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
46 / 85
48. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
47 / 85
49. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
48 / 85
50. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
49 / 85
51. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
50 / 85
52. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
51 / 85
53. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Preprocessor
Introduction
YaCF
Experiments
Conclusions
Future Work
52 / 85
54. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Preprocessor
Introduction
YaCF
Experiments
Conclusions
Future Work
53 / 85
55. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Preprocessor
Introduction
YaCF
Experiments
Conclusions
Future Work
54 / 85
56. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Preprocessor
Introduction
YaCF
Experiments
Conclusions
Future Work
55 / 85
57. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
56 / 85
58. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Architecture
Introduction
YaCF
Experiments
Conclusions
Future Work
57 / 85
59. YaCF: The
accULL Compiler
Juan J. Fumero
YaCF: Statistics
Introduction
YaCF
Experiments
Conclusions
Future Work
• 20683 lines of Python code.
• 2158 functions and methods.
• My contribution has been about 25 % of YaCF project.
58 / 85
60. YaCF: The
accULL Compiler
Juan J. Fumero
Outline
Introduction
YaCF
Experiments
Conclusions
1 Introduction
Future Work
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
59 / 85
61. YaCF: The
accULL Compiler
Juan J. Fumero
Experiments
Introduction
YaCF
Experiments
Conclusions
Future Work
• Benchmark Scalapack: testing
C99.
• Block Matrix Multiplication in
accULL.
• Three different problems from
the Rodinia Benchmark:
• HotSpot.
• SRAD.
• Needleman–Wunsch.
60 / 85
62. YaCF: The
accULL Compiler
Juan J. Fumero
ScaLAPACK
Introduction
YaCF
Experiments
Conclusions
Future Work
• The ScaLAPACK (Scalable LAPACK) is a library that includes
a subset of LAPACK routines redesigned for distributed memory
MIMD parallel computers.
• ScaLAPACK is designed for heterogeneous computing.
• It is portable to any computer that support MPI.
• Scalable depends on PBLAS operations.
61 / 85
63. YaCF: The
accULL Compiler
Juan J. Fumero
ScaLAPACK: results in YaCF
Introduction
YaCF
Experiments
Conclusions
Directory Total C files Success Failures
Future Work
PBLAS/SRC 123 123 0
REDIST/SRC 21 21 0
PBLAS/SRC/PTOOLS 102 101 1
PBLAS/TESTING 2 1 1
PBLAS/TIMING 2 1 1
REDIST/TESTING 10 0 10
SRC 9 9 0
TOOLS 2 2 0
Total 271 258 13
62 / 85
64. YaCF: The
accULL Compiler
Juan J. Fumero
ScaLAPACK: results in YaCF
Introduction
YaCF
Experiments
Conclusions
Directory Total C files Success Failures
Future Work
PBLAS/SRC 123 123 0
REDIST/SRC 21 21 0
PBLAS/SRC/PTOOLS 102 101 1
PBLAS/TESTING 2 1 1
PBLAS/TIMING 2 1 1
REDIST/TESTING 10 0 10
SRC 9 9 0
TOOLS 2 2 0
Total 271 258 13
95 % of the ScaLAPACK C files are correctly parsed in YaCF.
62 / 85
65. YaCF: The
accULL Compiler
Juan J. Fumero
Platforms
Introduction
YaCF
Experiments
Conclusions • Garoe: A desktop computer with an Intel Core i7 930 processor
Future Work (2.80 GHz), with 1MB of L2 cache, 8MB of L3 cache, shared by
the four cores. The system has 4 GB RAM and a Tesla C2050
with 4 GB of memory attached.
63 / 85
66. YaCF: The
accULL Compiler
Juan J. Fumero
Platforms
Introduction
YaCF
Experiments
Conclusions
• Drago: A second cluster node. It is a shared memory system
Future Work with 4 Intel Xeon E7. Each processor has 10 cores. In this case,
the accelerator platform is Intel OpenCL SDK 1.5 which runs on
the CPU.
64 / 85
67. YaCF: The
accULL Compiler
Juan J. Fumero
MxM in accULL
Introduction
YaCF
Experiments
Conclusions
Future Work
• MxM is a basic kernel frequently used to showcase the peak
performance of GPU computing.
• We compare the performance of the accULL implementation
with that of:
• OpenMP.
• CUDA.
• OpenCL.
65 / 85
68. YaCF: The
accULL Compiler
Juan J. Fumero
MxM in accULL
Introduction
YaCF
Experiments
Conclusions
MxM OpenACC code
Future Work
1 #pragma a c c k e r n e l s name ( " mxm " ) c o p y ( a [ L∗N ] ) c o p y i n ( b [ L∗M] , c [M∗N ] )
2 {
3 #pragma a c c l o o p p r i v a t e ( i , j ) c o l l a p s e ( 2 )
4 f o r ( i = 0 ; i < L ; i++)
5 f o r ( j = 0 ; j < N ; j++)
6 a[i ∗ L + j] = 0.0;
7 /∗ I t e r a t e o v e r b l o c k s ∗/
8 f o r ( ii = 0 ; ii < L ; ii += tile_size )
9 f o r ( jj = 0 ; jj < N ; jj += tile_size )
10 f o r ( kk = 0 ; kk < M ; kk += tile_size ) {
11 /∗ I t e r a t e i n s i d e a b l o c k ∗/
12 #pragma a c c l o o p collapse ( 2 ) p r i v a t e ( i , j , k )
13 f o r ( j=jj ; j < min ( N , jj+tile_size ) ; j++)
14 f o r ( i=ii ; i < min ( L , ii+tile_size ) ; i++)
15 f o r ( k=kk ; k < min ( M , kk+tile_size ) ; k++)
16 a [ i∗L+j ] += ( b [ i∗L+k ] ∗ c [ k∗M+j ] ) ;
17 }
18 }
66 / 85
69. YaCF: The
accULL Compiler
Juan J. Fumero
MxM in accULL (Garoe)
Introduction
YaCF
Experiments
Conclusions
Future Work
67 / 85
70. YaCF: The
accULL Compiler
Juan J. Fumero
MxM in accULL (Drago)
Introduction
YaCF
Experiments
Conclusions
Future Work
68 / 85
71. YaCF: The
accULL Compiler
Juan J. Fumero
SRAD: an Image Filtering Code
Introduction
YaCF
Experiments
Conclusions
Future Work
69 / 85
72. YaCF: The
accULL Compiler
Juan J. Fumero
SRAD (Garoe)
Introduction
YaCF
Experiments
Conclusions
Future Work
CUDA in Frangollo performs better than CUDA native.
70 / 85
74. YaCF: The
accULL Compiler
Juan J. Fumero
NW: Needleman-Wunsch, a
Introduction
YaCF
Sequence Alignment Code
Experiments
Conclusions
Future Work
72 / 85
75. YaCF: The
accULL Compiler
Juan J. Fumero
NW (Garoe)
Introduction
YaCF
Experiments
Conclusions
Future Work
Poor results (but better than OpenMP - 4 cores)
73 / 85
77. YaCF: The
accULL Compiler
Juan J. Fumero
HotSpot: a Thermal Simulation
Introduction
YaCF
Tool for Estimating Processor
Experiments Temperature
Conclusions
Future Work
75 / 85
78. YaCF: The
accULL Compiler
Juan J. Fumero
HotSpot (Garoe)
Introduction
YaCF
Experiments
Conclusions
Future Work
As good as native versions.
76 / 85
80. YaCF: The
accULL Compiler
Juan J. Fumero
Outline
Introduction
YaCF
Experiments
Conclusions
1 Introduction
Future Work
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
78 / 85
81. YaCF: The
accULL Compiler
Juan J. Fumero
Conclusions: Compiler
Introduction
YaCF
Technologies
Experiments
Conclusions
Future Work
• Compiler technologies tend to use and optimize source-to-source
compilers to generate and transform source code.
• It is easier to parallelize a source code with AST transformations.
• AST transformations enable to programmers to easily generate
code for any platform.
79 / 85
82. YaCF: The
accULL Compiler
Juan J. Fumero
Conclusions: Programming Model
Introduction
YaCF
Experiments
Conclusions
Future Work • The usage of directive-based programming languages allow
non-expert programmers to abstract from architectural details
and write programs easier.
• The OpenACC standard is a start point to heterogeneous
systems programming.
• Future versions of the OpenMP standard will include support for
accelerators.
• The results we are obtaining with accULL our early OpenACC
implementation are promising.
80 / 85
83. YaCF: The
accULL Compiler
Juan J. Fumero
References I
Introduction
YaCF
Experiments Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
a a o
Conclusions accULL: An OpenACC implementation with CUDA and OpenCL
Future Work
support
International European Conference on Parallel and Distributed
Computing 2012.
Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
a a o
Directive-based Programming for GPUs: A Comparative Study
The 14th IEEE International Conference on High Performance
Computing and Communications.
Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
a a o
accULL: an user-directed Approach to Heterogeneous
Programming
The 10th IEEE International Symposium on Parallel and
Distributed Processing with Applications.
81 / 85
84. YaCF: The
accULL Compiler
Juan J. Fumero
Outline
Introduction
YaCF
Experiments
Conclusions
1 Introduction
Future Work
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
82 / 85
85. YaCF: The
accULL Compiler
Juan J. Fumero
Future Work
Introduction
YaCF
Experiments
Conclusions
Future Work
• Add support to MPI with CUDA and OpenCL.
83 / 85
86. YaCF: The
accULL Compiler
Juan J. Fumero
Future Work
Introduction
YaCF
Experiments
Conclusions
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
83 / 85
87. YaCF: The
accULL Compiler
Juan J. Fumero
Future Work
Introduction
YaCF
Experiments
Conclusions
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC and
CAPS-HMPP.
83 / 85
88. YaCF: The
accULL Compiler
Juan J. Fumero
Future Work
Introduction
YaCF
Experiments
Conclusions
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC and
CAPS-HMPP.
• Adding support for vectorization.
83 / 85
89. YaCF: The
accULL Compiler
Juan J. Fumero
Future Work
Introduction
YaCF
Experiments
Conclusions
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC and
CAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
90. YaCF: The
accULL Compiler
Juan J. Fumero
Future Work
Introduction
YaCF
Experiments
Conclusions
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC and
CAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
91. YaCF: The
accULL Compiler
Juan J. Fumero
Thank you for your attention
Introduction
YaCF
Experiments
Conclusions
Future Work
Juan Jos´ Fumero Alfonso
e
jfumeroa@ull.edu.es
84 / 85
92. YaCF: The
accULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: The accULL Compiler
Undergraduate Thesis Project
Juan Jos´ Fumero Alfonso
e
Universidad de La Laguna
22 de junio de 2012
85 / 85