SlideShare a Scribd company logo
1 of 92
Download to read offline
YaCF: The
accULL Compiler

Juan J. Fumero





Future Work
                  YaCF: The accULL Compiler
                     Undergraduate Thesis Project

                     Juan Jos´ Fumero Alfonso
                      Universidad de La Laguna

                         22 de junio de 2012

                                                    1 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  1 Introduction
Future Work

                  2 YaCF

                  3 Experiments

                  4 Conclusions

                  5 Future Work

                                             2 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  1 Introduction
Future Work

                  2 YaCF

                  3 Experiments

                  4 Conclusions

                  5 Future Work

                                             3 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                         Moore’s Law




Future Work

                  Every 18 months the number of transistors could be doubled.

                                                                                4 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Nowadays Parallel Architectures




Future Work

                                                    5 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                             Parallel Architectures




Future Work

                  The solution
                    • More processors
                    • More cores per processor

                                                                      6 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                               Parallel Architectures




Future Work
                  The systems are hybrid using all options.

                                                                        7 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Parallel Architectures




Future Work

                                           8 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                       OpenMP: Shared Memory

Experiments           • API that support SMP programming.
                      • Multi-platform.
Future Work
                      • A directive-based approach.
                      • A set of compiler directives, library routines and environment
                         variables for parallel programming.

                  OpenMP example
                   1 #pragma omp p a r a l l e l
                   2 {
                   3     #pragma omp master
                   4     {
                   5            nthreads = o m p _ g e t _ n u m _ t h r e a d s ( ) ;
                   6     }
                   7     #pragma omp f o r p r i v a t e ( x ) reduction (+: sum ) schedule ( runtime )
                   8      f o r ( i =0; i < NUM_STEPS ; ++i ) {
                   9            x = ( i +0.5)∗step ;
                  10            sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ;
                  11     }
                  12     #pragma omp master
                  13     {
                  14            pi = step ∗ sum ;
                  15     }
                  16 }

                                                                                                          9 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                      MPI: Message Passing Interface




Future Work         • A language-independent communications protocol used to
                      program parallel applications.
                    • MPI’s goals are high performance, scalability and portability.

                  MPI example
                  1 MPI_Comm_size ( MPI_COMM_WORLD , &M P I _ N U M P R O C E S S O R S ) ;
                  2 MPI_Comm_rank ( MPI_COMM_WORLD , &MPI_NAME ) ;
                  3 w = 1.0 / N ;
                  4 f o r ( i = MPI_NAME ; i < N ; i += M P I _ N U M P R O C E S S O R S ) {
                  5       local = ( i + 0 . 5 ) ∗ w ;
                  6       pi_mpi = pi_mpi + 4 . 0 / ( 1 . 0 + local ∗ local ) ;
                  7 }
                  8 MPI_Allreduce (&pi_mpi , &gpi_mpi , 1 , MPI_DOUBLE , MPI_SUM , MPI_C OMM_WOR LD ) ;

                                                                                                          10 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                 High Performance Computing


Experiments       • The most powerful computers at the moment.
                  • Systems with a massive number of processors.
Future Work
                  • High speed of calculation.
                  • It contains thousands of processors and cores.
                  • Systems very expensive and consuming a huge amount of energy.

                                                                               11 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                    TOP 500: High Performance


                  • The TOP500 project ranks and details the 500 (non-distributed)
Future Work
                    most powerful known computer systems in the world.
                  • The project publishes an updated list of the supercomputers
                    twice a year.

                                                                                  12 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Accelerators Era




Future Work

                                     13 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous


Future Work       Developed by NVIDIA.
                    • Pros: its performance, it is easier than OpenCL.
                    • Con: only works with NVIDIA hardware.

                                                                         14 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                 Languages for Heterogeneous



Future Work


                  1 __global__ v o i d mmkernel ( f l o a t ∗ a , f l o a t ∗ b , f l o a t ∗ c , i n t n ,
                  2   int m , int p)
                  3 {
                  4     i n t i = blockIdx . x∗32 + threadIdx . x ;
                  5     i n t j = blockIdx . y ;
                  6     f l o a t sum = 0 . 0 f ;
                  7     f o r ( i n t k = 0 ; k < p ; ++k ) sum += b [ i+n∗k ] ∗ c [ k+p∗j ] ;
                  8     a [ i+n∗j ] = sum ;
                  9 }

                                                                                                              15 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous



Future Work
                  A framework developed by the Khronos Group.
                    • Pros: can be used with any device, it is a standard.
                    • Cons: more complex than CUDA, immature.

                                                                             16 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                Languages for Heterogeneous



Future Work

                   1 __kernel v o i d matvecmul ( __global f l o a t ∗a ,
                   2       c o n s t __global f l o a t ∗b , c o n s t __global f l o a t ∗c ,
                   3       c o n s t uint N ) {
                   4           float R;
                   5           int k;
                   6           i n t xid = get_global_id ( 0 ) ;
                   7           i n t yid = get_global_id ( 1 ) ;
                   8           i f ( xid < N )        {
                   9                 i f ( yid < N ) {
                  10                       R = 0.0;
                  11                       f o r ( k = 0 ; k < N ; k++)
                  12                                    R += b [ xid ∗ N + k ] ∗ c [ k∗N + yid ] ;
                  13                       a [ xid∗N+yid ] = R ;
                  14                 }
                  15          }
                  16 }

                                                                                                     17 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous


Conclusions       Pros
Future Work
                   1   The programmer can use all machine’s devices.
                   2   GPU and CPU could work in parallel.

                                                                       18 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous


Conclusions       Problems
Future Work
                   1   The programmer needs to know low-level details of the

                                                                               19 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                   Languages for Heterogeneous



Future Work
                   1   The programmer needs to know low-level details of the
                   2   Source codes need to be rewritten:
                         • One version for OpenMP/MPI.
                         • A different version for GPU.
                   3   Good performance requires a great effort in parameter tuning.
                   4   These languages (CUDA/OpenCL) are complex and new for

                                                                                      20 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                        GPGPU (General Purpose GPU)



Future Work

                  Can we use GPUs for parallel
                  computing? Is this efficient?

                                                      21 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  The NBody Problem




Future Work

                       • Simulation numerically
                         approximates the
                         evolution of a system of
                       • Each body continuously
                         interacts with other
                       • Fluid flow simulations.

                                                    22 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                NBody description




Future Work

                                              ai =
                                                           mj rij
                                 ai ≈ G ·
                                                    (||rij ||2 +    2 )3/2

                                                                             23 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                            CUDA implementation




Future Work

                  • The method is Particle to Particle.
                  • Its computational complexity is O(n2 )
                  • Evaluate all pair-wise interactions. It is exact.

                                                                        24 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  CUDA implementation: blocks and



Future Work

                                                     25 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                              CUDA Kernel: Tile calculation




Future Work

                   1 __device__ float3 gravitation ( float4 myPos , float3 accel ) {
                   2     e x t e r n __shared__ float4 sharedPos [ ] ;
                   3     unsigned long i = 0;
                   5     f o r ( u n s i g n e d i n t counter = 0 ; counter < blockDim . x ; counter++ )
                   6     {
                   7             accel = b o d y B o d y I n t e r a c t i o n ( accel , SX ( i++) , myPos ) ;
                   8     }
                   9     r e t u r n accel ;
                  10 }

                                                                                                                 26 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                   CUDA Kernel: calculate forces




Future Work
                   1 __global__ v o i d c al c u l a t e _ f o r c es ( float4∗ globalX , float4∗ globalA )
                   2 {
                   3   // A s h a r e d memory b u f f e r t o s t o r e t h e body p o s i t i o n s .
                   4   e x t e r n __shared__ float4 [ ] shPosition ;
                   5   float4 myPosition ;
                   6   i n t i , tile ;
                   7   float3 a c c = {0.0 f , 0 . 0 f , 0 . 0 f };
                   8   // G l o b a l t h r e a d ID ( r e p r e s e n t t h e u n i q u e body i n d e x i n t h e s i m u l a t i o n )
                   9   i n t gtid = blockIdx . x ∗ blockDim . x + threadIdx . x ;
                  10   // T h i s i s t h e p o s i t i o n o f t h e body we a r e c o m p u t i n g t h e a c c e l e r a t i o n f o r .
                  11   float4 myPosition = globalX [ gtid ] ;
                  12   f o r ( i = 0 , tile = 0 ; i < N ; i += blockDim . x , tile++)
                  13   {
                  14       i n t idx = tile ∗ blockDim . x + threadIdx . x ;
                  15       shPosition [ threadIdx . x ] = globalX [ idx ] ;
                  16       __syncthreads ( ) ;
                  17       a c c = t il e_ ca lc u l a t i on ( myPosition , a c c ) ;
                  18       __syncthreads ( ) ;
                  19   }
                  20   // r e t u r n
                  21 }

                                                                                                                                              27 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  •   Tesla C1060 (1.3).
                  •   Sequential source code: Intel Corei7 930.

                  •   NBody SDK.
Future Work       •   Cuda Runtime /Cuda Driver: 4.0.
                        • 400000 bodies
                        • 200 interactions.

                         Device      Cores    Memory     Performance (GFLOPS)
                      Tesla C1060     240      4GB      933 (Single), 78 (double)
                      Intel Corei7     4       4GB        44.8 (11.2 per core)

                                                                                    28 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  • Sequential code: ≈ 147202512.40 ms ≈ 41 hours (40.89 hours)
Future Work
                  • Parallel CUDA code: 1392029.6 ms = (23.3 minutes)
                  • The speedup is 105.7 (105×).

                                                                              29 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                            At the Present Time




Future Work

                  • Some applications accelerate with GPUs.
                  • The user need to learn new programming languages and tools.
                  • The CUDA model and its architecture have to be understood.
                  • Non-expert users have to write programs for a new model.

                                                                                  30 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                  GPGPU Languages




Future Work       OpenACC: introduced last November in
                  A directive based language.
                    • Aimed to be standard.
                    • Supported by: Cray, NVIDIA, PGI and CAPS.
                    • One simple source code for all versions.
                    • Platform independent.
                    • Easier for beginners.

                                                                    31 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                GPGPU Languages


Conclusions       A directive based language.
Future Work

                                                                  32 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  A New Dimension for HPC




Future Work

                                            33 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                            accULL: our OpenACC



Future Work
                  accULL = compiler + runtime library.

                                                         34 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                            accULL: our OpenACC



Future Work
                  accULL = compiler + runtime library.
                     accULL = YaCF + Frangollo.

                                                         34 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                             Initial Objectives of this Project




Future Work

                  • To integrate C99 in the YaCF project.
                  • To implement a new class hierarchy for new YaCF Frontends.
                  • To implement an OpenACC Frontend.
                  • To complete the OpenMP grammar with directives in OpenMP
                  • To test the new C99 interface.

                                                                                 35 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                      Source-to-source Compilers




Future Work

                  • Rose Compiler Framework.
                  • Cetus Compiler.
                  • Mercurium.

                                                                   36 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  1 Introduction
Future Work

                  2 YaCF

                  3 Experiments

                  4 Conclusions

                  5 Future Work

                                             37 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC



Future Work

                                         38 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC



Future Work

                                         39 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC



Future Work

                                         40 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  accULL: our OpenACC



Future Work

                                         41 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Yet Another Compiler



Future Work

                                               42 / 85
YaCF: The
accULL Compiler

Juan J. Fumero




Future Work       • A source-to-source compiler that translates C code with
                    OpenMP, llc and OpenACC annotations into code with
                    Frangollo calls.
                  • Integrates code analysis tools.
                  • Completely written in Python.
                  • Based on widely known object oriented software patterns.
                  • Based on the pycparser Python module.
                  • Implementing code transformation is only a matter of writing a
                    few lines of code.

                                                                                     43 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       44 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       45 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       46 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       47 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       48 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       49 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       50 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       51 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor




Future Work

                                       52 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor




Future Work

                                       53 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor




Future Work

                                       54 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Preprocessor




Future Work

                                       55 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       56 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  YaCF: Architecture




Future Work

                                       57 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                   YaCF: Statistics




Future Work

                  • 20683 lines of Python code.
                  • 2158 functions and methods.
                  • My contribution has been about 25 % of YaCF project.

                                                                           58 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  1 Introduction
Future Work

                  2 YaCF

                  3 Experiments

                  4 Conclusions

                  5 Future Work

                                             59 / 85
YaCF: The
accULL Compiler

Juan J. Fumero




Future Work
                  • Benchmark Scalapack: testing
                  • Block Matrix Multiplication in
                  • Three different problems from
                    the Rodinia Benchmark:
                      • HotSpot.
                      • SRAD.
                      • Needleman–Wunsch.

                                                                   60 / 85
YaCF: The
accULL Compiler

Juan J. Fumero




Future Work

                  • The ScaLAPACK (Scalable LAPACK) is a library that includes
                    a subset of LAPACK routines redesigned for distributed memory
                    MIMD parallel computers.
                  • ScaLAPACK is designed for heterogeneous computing.
                  • It is portable to any computer that support MPI.
                  • Scalable depends on PBLAS operations.

                                                                                61 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                           ScaLAPACK: results in YaCF



                  Directory          Total C files   Success   Failures
Future Work
                  PBLAS/SRC              123          123        0
                  REDIST/SRC              21          21         0
                  PBLAS/SRC/PTOOLS       102          101        1
                  PBLAS/TESTING           2            1         1
                  PBLAS/TIMING            2            1         1
                  REDIST/TESTING          10           0        10
                  SRC                     9            9         0
                  TOOLS                   2            2         0
                  Total                  271          258       13

                                                                         62 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                               ScaLAPACK: results in YaCF



                   Directory             Total C files Success Failures
Future Work
                   PBLAS/SRC                  123          123          0
                   REDIST/SRC                  21           21          0
                   PBLAS/SRC/PTOOLS           102          101          1
                   PBLAS/TESTING               2             1          1
                   PBLAS/TIMING                2             1          1
                   REDIST/TESTING              10            0         10
                   SRC                         9             9          0
                   TOOLS                       2             2          0
                   Total                      271          258         13
                  95 % of the ScaLAPACK C files are correctly parsed in YaCF.

                                                                               62 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



Conclusions       • Garoe: A desktop computer with an Intel Core i7 930 processor
Future Work         (2.80 GHz), with 1MB of L2 cache, 8MB of L3 cache, shared by
                    the four cores. The system has 4 GB RAM and a Tesla C2050
                    with 4 GB of memory attached.

                                                                                63 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  • Drago: A second cluster node. It is a shared memory system
Future Work         with 4 Intel Xeon E7. Each processor has 10 cores. In this case,
                    the accelerator platform is Intel OpenCL SDK 1.5 which runs on
                    the CPU.

                                                                                  64 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                     MxM in accULL




Future Work

                  • MxM is a basic kernel frequently used to showcase the peak
                    performance of GPU computing.
                  • We compare the performance of the accULL implementation
                    with that of:
                      • OpenMP.
                      • CUDA.
                      • OpenCL.

                                                                                 65 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                                                                MxM in accULL



                  MxM OpenACC code
Future Work

                   1   #pragma a c c k e r n e l s name ( " mxm " ) c o p y ( a [ L∗N ] ) c o p y i n ( b [ L∗M] , c [M∗N ] )
                   2   {
                   3   #pragma a c c l o o p p r i v a t e ( i , j ) c o l l a p s e ( 2 )
                   4   f o r ( i = 0 ; i < L ; i++)
                   5       f o r ( j = 0 ; j < N ; j++)
                   6           a[i ∗ L + j] = 0.0;
                   7   /∗ I t e r a t e o v e r b l o c k s ∗/
                   8   f o r ( ii = 0 ; ii < L ; ii += tile_size )
                   9     f o r ( jj = 0 ; jj < N ; jj += tile_size )
                  10       f o r ( kk = 0 ; kk < M ; kk += tile_size ) {
                  11         /∗ I t e r a t e i n s i d e a b l o c k ∗/
                  12        #pragma a c c l o o p collapse ( 2 ) p r i v a t e ( i , j , k )
                  13         f o r ( j=jj ; j < min ( N , jj+tile_size ) ; j++)
                  14           f o r ( i=ii ; i < min ( L , ii+tile_size ) ; i++)
                  15             f o r ( k=kk ; k < min ( M , kk+tile_size ) ; k++)
                  16               a [ i∗L+j ] += ( b [ i∗L+k ] ∗ c [ k∗M+j ] ) ;
                  17         }
                  18   }

                                                                                                                                66 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  MxM in accULL (Garoe)




Future Work

                                          67 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  MxM in accULL (Drago)




Future Work

                                          68 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  SRAD: an Image Filtering Code




Future Work

                                                  69 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                     SRAD (Garoe)




Future Work

                  CUDA in Frangollo performs better than CUDA native.

                                                                        70 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  SRAD (Drago)




Future Work

                                 71 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  NW: Needleman-Wunsch, a

                   Sequence Alignment Code


Future Work

                                             72 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                          NW (Garoe)




Future Work

                  Poor results (but better than OpenMP - 4 cores)

                                                                       73 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  NW (Drago)




Future Work

                               74 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  HotSpot: a Thermal Simulation

                   Tool for Estimating Processor
Experiments                         Temperature

Future Work

                                                   75 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                HotSpot (Garoe)




Future Work

                  As good as native versions.

                                                                  76 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  HotSpot (Drago)




Future Work

                                    77 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  1 Introduction
Future Work

                  2 YaCF

                  3 Experiments

                  4 Conclusions

                  5 Future Work

                                             78 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                             Conclusions: Compiler



Future Work

                  • Compiler technologies tend to use and optimize source-to-source
                    compilers to generate and transform source code.
                  • It is easier to parallelize a source code with AST transformations.
                  • AST transformations enable to programmers to easily generate
                    code for any platform.

                                                                                     79 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                           Conclusions: Programming Model




Future Work       • The usage of directive-based programming languages allow
                    non-expert programmers to abstract from architectural details
                    and write programs easier.
                  • The OpenACC standard is a start point to heterogeneous
                    systems programming.
                  • Future versions of the OpenMP standard will include support for
                  • The results we are obtaining with accULL our early OpenACC
                    implementation are promising.

                                                                                    80 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                     References I


Experiments       Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
                        a           a o
Conclusions       accULL: An OpenACC implementation with CUDA and OpenCL
Future Work
                  International European Conference on Parallel and Distributed
                  Computing 2012.
                  Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
                        a          a o
                  Directive-based Programming for GPUs: A Comparative Study
                  The 14th IEEE International Conference on High Performance
                  Computing and Communications.
                  Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande
                        a          a o
                  accULL: an user-directed Approach to Heterogeneous
                  The 10th IEEE International Symposium on Parallel and
                  Distributed Processing with Applications.

                                                                               81 / 85
YaCF: The
accULL Compiler

Juan J. Fumero



                  1 Introduction
Future Work

                  2 YaCF

                  3 Experiments

                  4 Conclusions

                  5 Future Work

                                             82 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                   Future Work




Future Work
                  • Add support to MPI with CUDA and OpenCL.

                                                                 83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                    Future Work




Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.

                                                                  83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                    Future Work




Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and

                                                                          83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                        Future Work




Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and
                  • Adding support for vectorization.

                                                                          83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                        Future Work




Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and
                  • Adding support for vectorization.
                  • Exploring FPGAs to combine with CUDA and OpenCL.
                  • To introduce LLVM Compiler Framework in the Frontend.

                                                                            83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                                                        Future Work




Future Work
                  • Add support to MPI with CUDA and OpenCL.
                  • Perform new experiments with OpenACC.
                  • To compare our accULL approach with PGI-OpenACC and
                  • Adding support for vectorization.
                  • Exploring FPGAs to combine with CUDA and OpenCL.
                  • To introduce LLVM Compiler Framework in the Frontend.

                                                                            83 / 85
YaCF: The
accULL Compiler

Juan J. Fumero
                  Thank you for your attention




Future Work

                    Juan Jos´ Fumero Alfonso

                                                 84 / 85
YaCF: The
accULL Compiler

Juan J. Fumero





Future Work
                  YaCF: The accULL Compiler
                     Undergraduate Thesis Project

                     Juan Jos´ Fumero Alfonso
                      Universidad de La Laguna

                         22 de junio de 2012

                                                    85 / 85

More Related Content

What's hot

JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020Joseph Kuo
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir OverviewPTIHPA
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingJonathan Salwan
Embedded system design psoc lab report
Embedded system design psoc lab reportEmbedded system design psoc lab report
Embedded system design psoc lab reportRamesh Naik Bhukya
All VLSI programs
All VLSI programsAll VLSI programs
All VLSI programsGouthaman V
Arduino C maXbox web of things slide show
Arduino C maXbox web of things slide showArduino C maXbox web of things slide show
Arduino C maXbox web of things slide showMax Kleiner
Dive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingDive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingSaumil Shah
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
Digital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECEDigital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECERamesh Naik Bhukya
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW ImplementationZhen Wei
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVMWang Hsiangkai
GC in C++0x [eng]
GC in C++0x [eng]GC in C++0x [eng]
GC in C++0x [eng]yak1ex

What's hot (20)

Advance ROP Attacks
Advance ROP AttacksAdvance ROP Attacks
Advance ROP Attacks
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented Programming
Vlsi lab2
Vlsi lab2Vlsi lab2
Vlsi lab2
Embedded system design psoc lab report
Embedded system design psoc lab reportEmbedded system design psoc lab report
Embedded system design psoc lab report
All VLSI programs
All VLSI programsAll VLSI programs
All VLSI programs
Arduino C maXbox web of things slide show
Arduino C maXbox web of things slide showArduino C maXbox web of things slide show
Arduino C maXbox web of things slide show
Dive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingDive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented Programming
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Digital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECEDigital System Design Lab Report - VHDL ECE
Digital System Design Lab Report - VHDL ECE
8 Bit ALU
8 Bit ALU8 Bit ALU
8 Bit ALU
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
[Sitcon2018] Analysis and Improvement of IOTA PoW Implementation
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Instruction Combine in LLVM
Instruction Combine in LLVMInstruction Combine in LLVM
Instruction Combine in LLVM
GC in C++0x [eng]
GC in C++0x [eng]GC in C++0x [eng]
GC in C++0x [eng]
Idiomatic C++
Idiomatic C++Idiomatic C++
Idiomatic C++
Functional programming
Functional programmingFunctional programming
Functional programming

Similar to YaCF: The accULL Compiler Thesis Analyzes Parallelization

accULL (HAC Leganés)
accULL (HAC Leganés)accULL (HAC Leganés)
accULL (HAC Leganés)Ruymán Reyes
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
Neural_Programmer_InterpreterKaty Lee
Education using FIRE
Education using FIREEducation using FIRE
Education using FIREFORGE project
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Takahiro Katagiri
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
An integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processorsAn integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processorsVLSICS Design
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...Felipe Prado
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...Maksim Shudrak
Bifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseBifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseJorge Ressia
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization Ganesan Narayanasamy
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Igalia
A Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdfA Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdfAlexelectronic1
Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Osama Ghandour Geris
Adsa lab manual
Adsa lab manualAdsa lab manual
Adsa lab manualRaja Ch
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
 Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be... Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...Arturo Hoffstadt

Similar to YaCF: The accULL Compiler Thesis Analyzes Parallelization (20)

accULL (HAC Leganés)
accULL (HAC Leganés)accULL (HAC Leganés)
accULL (HAC Leganés)
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
Education using FIRE
Education using FIREEducation using FIRE
Education using FIRE
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
An integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processorsAn integrated approach for designing and testing specific processors
An integrated approach for designing and testing specific processors
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
DEF CON 27 - MAKSIM SHUDRAK - zero bugs found hold my beer afl how to improve...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Zero bugs found? Hold my beer AFL! how to improve coverage-guided fuzzing and...
Bifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseBifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk Loose
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
OpenPOWER Application Optimization
OpenPOWER Application Optimization OpenPOWER Application Optimization
OpenPOWER Application Optimization
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)Property-based testing an open-source compiler, pflua (FOSDEM 2015)
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
A Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdfA Programmable Calculator Design and implement a programmable calc.pdf
A Programmable Calculator Design and implement a programmable calc.pdf
Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10
Adsa lab manual
Adsa lab manualAdsa lab manual
Adsa lab manual
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
 Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be... Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...
Planning Mode Simulator: A simulation tool for studying ALMA's scheduling be...

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club

YaCF: The accULL Compiler Thesis Analyzes Parallelization

  • 1. YaCF: The accULL Compiler Juan J. Fumero Introduction YaCF Experiments Conclusions Future Work YaCF: The accULL Compiler Undergraduate Thesis Project Juan Jos´ Fumero Alfonso e Universidad de La Laguna 22 de junio de 2012 1 / 85
  • 2. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 2 / 85
  • 3. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 3 / 85
  • 4. YaCF: The accULL Compiler Juan J. Fumero Moore’s Law Introduction YaCF Experiments Conclusions Future Work Every 18 months the number of transistors could be doubled. 4 / 85
  • 5. YaCF: The accULL Compiler Juan J. Fumero Nowadays Parallel Architectures Introduction YaCF Experiments Conclusions Future Work 5 / 85
  • 6. YaCF: The accULL Compiler Juan J. Fumero Parallel Architectures Introduction YaCF Experiments Conclusions Future Work The solution • More processors • More cores per processor 6 / 85
  • 7. YaCF: The accULL Compiler Juan J. Fumero Parallel Architectures Introduction YaCF Experiments Conclusions Future Work The systems are hybrid using all options. 7 / 85
  • 8. YaCF: The accULL Compiler Juan J. Fumero Parallel Architectures Introduction YaCF Experiments Conclusions Future Work 8 / 85
  • 9. YaCF: The accULL Compiler Juan J. Fumero OpenMP: Shared Memory Introduction YaCF Programming Experiments • API that support SMP programming. Conclusions • Multi-platform. Future Work • A directive-based approach. • A set of compiler directives, library routines and environment variables for parallel programming. OpenMP example 1 #pragma omp p a r a l l e l 2 { 3 #pragma omp master 4 { 5 nthreads = o m p _ g e t _ n u m _ t h r e a d s ( ) ; 6 } 7 #pragma omp f o r p r i v a t e ( x ) reduction (+: sum ) schedule ( runtime ) 8 f o r ( i =0; i < NUM_STEPS ; ++i ) { 9 x = ( i +0.5)∗step ; 10 sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ; 11 } 12 #pragma omp master 13 { 14 pi = step ∗ sum ; 15 } 16 } 9 / 85
  • 10. YaCF: The accULL Compiler Juan J. Fumero MPI: Message Passing Interface Introduction YaCF Experiments Conclusions Future Work • A language-independent communications protocol used to program parallel applications. • MPI’s goals are high performance, scalability and portability. MPI example 1 MPI_Comm_size ( MPI_COMM_WORLD , &M P I _ N U M P R O C E S S O R S ) ; 2 MPI_Comm_rank ( MPI_COMM_WORLD , &MPI_NAME ) ; 3 w = 1.0 / N ; 4 f o r ( i = MPI_NAME ; i < N ; i += M P I _ N U M P R O C E S S O R S ) { 5 local = ( i + 0 . 5 ) ∗ w ; 6 pi_mpi = pi_mpi + 4 . 0 / ( 1 . 0 + local ∗ local ) ; 7 } 8 MPI_Allreduce (&pi_mpi , &gpi_mpi , 1 , MPI_DOUBLE , MPI_SUM , MPI_C OMM_WOR LD ) ; 10 / 85
  • 11. YaCF: The accULL Compiler Juan J. Fumero High Performance Computing Introduction YaCF Experiments • The most powerful computers at the moment. Conclusions • Systems with a massive number of processors. Future Work • High speed of calculation. • It contains thousands of processors and cores. • Systems very expensive and consuming a huge amount of energy. 11 / 85
  • 12. YaCF: The accULL Compiler Juan J. Fumero TOP 500: High Performance Introduction YaCF Computing Experiments Conclusions • The TOP500 project ranks and details the 500 (non-distributed) Future Work most powerful known computer systems in the world. • The project publishes an updated list of the supercomputers twice a year. 12 / 85
  • 13. YaCF: The accULL Compiler Juan J. Fumero Accelerators Era Introduction YaCF Experiments Conclusions Future Work 13 / 85
  • 14. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions CUDA Future Work Developed by NVIDIA. • Pros: its performance, it is easier than OpenCL. • Con: only works with NVIDIA hardware. 14 / 85
  • 15. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work CUDA 1 __global__ v o i d mmkernel ( f l o a t ∗ a , f l o a t ∗ b , f l o a t ∗ c , i n t n , 2 int m , int p) 3 { 4 i n t i = blockIdx . x∗32 + threadIdx . x ; 5 i n t j = blockIdx . y ; 6 f l o a t sum = 0 . 0 f ; 7 f o r ( i n t k = 0 ; k < p ; ++k ) sum += b [ i+n∗k ] ∗ c [ k+p∗j ] ; 8 a [ i+n∗j ] = sum ; 9 } 15 / 85
  • 16. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work OpenCL A framework developed by the Khronos Group. • Pros: can be used with any device, it is a standard. • Cons: more complex than CUDA, immature. 16 / 85
  • 17. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work OpenCL 1 __kernel v o i d matvecmul ( __global f l o a t ∗a , 2 c o n s t __global f l o a t ∗b , c o n s t __global f l o a t ∗c , 3 c o n s t uint N ) { 4 float R; 5 int k; 6 i n t xid = get_global_id ( 0 ) ; 7 i n t yid = get_global_id ( 1 ) ; 8 i f ( xid < N ) { 9 i f ( yid < N ) { 10 R = 0.0; 11 f o r ( k = 0 ; k < N ; k++) 12 R += b [ xid ∗ N + k ] ∗ c [ k∗N + yid ] ; 13 a [ xid∗N+yid ] = R ; 14 } 15 } 16 } 17 / 85
  • 18. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Pros Future Work 1 The programmer can use all machine’s devices. 2 GPU and CPU could work in parallel. 18 / 85
  • 19. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Problems Future Work 1 The programmer needs to know low-level details of the architecture. 19 / 85
  • 20. YaCF: The accULL Compiler Juan J. Fumero Languages for Heterogeneous Introduction YaCF Programming Experiments Conclusions Future Work Cons 1 The programmer needs to know low-level details of the architecture. 2 Source codes need to be rewritten: • One version for OpenMP/MPI. • A different version for GPU. 3 Good performance requires a great effort in parameter tuning. 4 These languages (CUDA/OpenCL) are complex and new for non-experts. 20 / 85
  • 21. YaCF: The accULL Compiler Juan J. Fumero GPGPU (General Purpose GPU) Introduction YaCF Computing Experiments Conclusions Future Work Can we use GPUs for parallel computing? Is this efficient? 21 / 85
  • 22. YaCF: The accULL Compiler Juan J. Fumero The NBody Problem Introduction YaCF Experiments Conclusions Future Work • Simulation numerically approximates the evolution of a system of bodies. • Each body continuously interacts with other bodies. • Fluid flow simulations. 22 / 85
  • 23. YaCF: The accULL Compiler Juan J. Fumero NBody description Introduction YaCF Experiments Conclusions Future Work Acceleration Fi ai = mi mj rij ai ≈ G · (||rij ||2 + 2 )3/2 1≤j≤N 23 / 85
  • 24. YaCF: The accULL Compiler Juan J. Fumero CUDA implementation Introduction YaCF Experiments Conclusions Future Work • The method is Particle to Particle. • Its computational complexity is O(n2 ) • Evaluate all pair-wise interactions. It is exact. 24 / 85
  • 25. YaCF: The accULL Compiler Juan J. Fumero CUDA implementation: blocks and Introduction YaCF grids Experiments Conclusions Future Work 25 / 85
  • 26. YaCF: The accULL Compiler Juan J. Fumero CUDA Kernel: Tile calculation Introduction YaCF Experiments Conclusions Future Work 1 __device__ float3 gravitation ( float4 myPos , float3 accel ) { 2 e x t e r n __shared__ float4 sharedPos [ ] ; 3 unsigned long i = 0; 4 5 f o r ( u n s i g n e d i n t counter = 0 ; counter < blockDim . x ; counter++ ) 6 { 7 accel = b o d y B o d y I n t e r a c t i o n ( accel , SX ( i++) , myPos ) ; 8 } 9 r e t u r n accel ; 10 } 26 / 85
  • 27. YaCF: The accULL Compiler Juan J. Fumero CUDA Kernel: calculate forces Introduction YaCF Experiments Conclusions Future Work 1 __global__ v o i d c al c u l a t e _ f o r c es ( float4∗ globalX , float4∗ globalA ) 2 { 3 // A s h a r e d memory b u f f e r t o s t o r e t h e body p o s i t i o n s . 4 e x t e r n __shared__ float4 [ ] shPosition ; 5 float4 myPosition ; 6 i n t i , tile ; 7 float3 a c c = {0.0 f , 0 . 0 f , 0 . 0 f }; 8 // G l o b a l t h r e a d ID ( r e p r e s e n t t h e u n i q u e body i n d e x i n t h e s i m u l a t i o n ) 9 i n t gtid = blockIdx . x ∗ blockDim . x + threadIdx . x ; 10 // T h i s i s t h e p o s i t i o n o f t h e body we a r e c o m p u t i n g t h e a c c e l e r a t i o n f o r . 11 float4 myPosition = globalX [ gtid ] ; 12 f o r ( i = 0 , tile = 0 ; i < N ; i += blockDim . x , tile++) 13 { 14 i n t idx = tile ∗ blockDim . x + threadIdx . x ; 15 shPosition [ threadIdx . x ] = globalX [ idx ] ; 16 __syncthreads ( ) ; 17 a c c = t il e_ ca lc u l a t i on ( myPosition , a c c ) ; 18 __syncthreads ( ) ; 19 } 20 // r e t u r n 21 } 27 / 85
  • 28. YaCF: The accULL Compiler Juan J. Fumero Results Introduction • Tesla C1060 (1.3). YaCF • Sequential source code: Intel Corei7 930. Experiments Conclusions • NBody SDK. Future Work • Cuda Runtime /Cuda Driver: 4.0. • 400000 bodies • 200 interactions. Device Cores Memory Performance (GFLOPS) Tesla C1060 240 4GB 933 (Single), 78 (double) Intel Corei7 4 4GB 44.8 (11.2 per core) 28 / 85
  • 29. YaCF: The accULL Compiler Juan J. Fumero Results Introduction YaCF Experiments Conclusions • Sequential code: ≈ 147202512.40 ms ≈ 41 hours (40.89 hours) Future Work • Parallel CUDA code: 1392029.6 ms = (23.3 minutes) • The speedup is 105.7 (105×). 29 / 85
  • 30. YaCF: The accULL Compiler Juan J. Fumero At the Present Time Introduction YaCF Experiments Conclusions Future Work • Some applications accelerate with GPUs. • The user need to learn new programming languages and tools. • The CUDA model and its architecture have to be understood. • Non-expert users have to write programs for a new model. 30 / 85
  • 31. YaCF: The accULL Compiler Juan J. Fumero GPGPU Languages Introduction YaCF Experiments Conclusions Future Work OpenACC: introduced last November in SuperComputing’2011 A directive based language. • Aimed to be standard. • Supported by: Cray, NVIDIA, PGI and CAPS. • One simple source code for all versions. • Platform independent. • Easier for beginners. 31 / 85
  • 32. YaCF: The accULL Compiler Juan J. Fumero GPGPU Languages Introduction YaCF Experiments OpenACC Conclusions A directive based language. Future Work 32 / 85
  • 33. YaCF: The accULL Compiler Juan J. Fumero A New Dimension for HPC Introduction YaCF Experiments Conclusions Future Work 33 / 85
  • 34. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF Implementation Experiments Conclusions Future Work accULL = compiler + runtime library. 34 / 85
  • 35. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF Implementation Experiments Conclusions Future Work accULL = compiler + runtime library. accULL = YaCF + Frangollo. 34 / 85
  • 36. YaCF: The accULL Compiler Juan J. Fumero Initial Objectives of this Project Introduction YaCF Experiments Conclusions Future Work • To integrate C99 in the YaCF project. • To implement a new class hierarchy for new YaCF Frontends. • To implement an OpenACC Frontend. • To complete the OpenMP grammar with directives in OpenMP 3.0. • To test the new C99 interface. 35 / 85
  • 37. YaCF: The accULL Compiler Juan J. Fumero Source-to-source Compilers Introduction YaCF Experiments Conclusions Future Work • Rose Compiler Framework. • Cetus Compiler. • Mercurium. 36 / 85
  • 38. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 37 / 85
  • 39. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 38 / 85
  • 40. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 39 / 85
  • 41. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 40 / 85
  • 42. YaCF: The accULL Compiler Juan J. Fumero accULL: our OpenACC Introduction YaCF implementation Experiments Conclusions Future Work 41 / 85
  • 43. YaCF: The accULL Compiler Juan J. Fumero YaCF: Yet Another Compiler Introduction YaCF Framework Experiments Conclusions Future Work 42 / 85
  • 44. YaCF: The accULL Compiler Juan J. Fumero YaCF Introduction YaCF Experiments Conclusions Future Work • A source-to-source compiler that translates C code with OpenMP, llc and OpenACC annotations into code with Frangollo calls. • Integrates code analysis tools. • Completely written in Python. • Based on widely known object oriented software patterns. • Based on the pycparser Python module. • Implementing code transformation is only a matter of writing a few lines of code. 43 / 85
  • 45. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 44 / 85
  • 46. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 45 / 85
  • 47. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 46 / 85
  • 48. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 47 / 85
  • 49. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 48 / 85
  • 50. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 49 / 85
  • 51. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 50 / 85
  • 52. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 51 / 85
  • 53. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 52 / 85
  • 54. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 53 / 85
  • 55. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 54 / 85
  • 56. YaCF: The accULL Compiler Juan J. Fumero YaCF: Preprocessor Introduction YaCF Experiments Conclusions Future Work 55 / 85
  • 57. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 56 / 85
  • 58. YaCF: The accULL Compiler Juan J. Fumero YaCF: Architecture Introduction YaCF Experiments Conclusions Future Work 57 / 85
  • 59. YaCF: The accULL Compiler Juan J. Fumero YaCF: Statistics Introduction YaCF Experiments Conclusions Future Work • 20683 lines of Python code. • 2158 functions and methods. • My contribution has been about 25 % of YaCF project. 58 / 85
  • 60. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 59 / 85
  • 61. YaCF: The accULL Compiler Juan J. Fumero Experiments Introduction YaCF Experiments Conclusions Future Work • Benchmark Scalapack: testing C99. • Block Matrix Multiplication in accULL. • Three different problems from the Rodinia Benchmark: • HotSpot. • SRAD. • Needleman–Wunsch. 60 / 85
  • 62. YaCF: The accULL Compiler Juan J. Fumero ScaLAPACK Introduction YaCF Experiments Conclusions Future Work • The ScaLAPACK (Scalable LAPACK) is a library that includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. • ScaLAPACK is designed for heterogeneous computing. • It is portable to any computer that support MPI. • Scalable depends on PBLAS operations. 61 / 85
  • 63. YaCF: The accULL Compiler Juan J. Fumero ScaLAPACK: results in YaCF Introduction YaCF Experiments Conclusions Directory Total C files Success Failures Future Work PBLAS/SRC 123 123 0 REDIST/SRC 21 21 0 PBLAS/SRC/PTOOLS 102 101 1 PBLAS/TESTING 2 1 1 PBLAS/TIMING 2 1 1 REDIST/TESTING 10 0 10 SRC 9 9 0 TOOLS 2 2 0 Total 271 258 13 62 / 85
  • 64. YaCF: The accULL Compiler Juan J. Fumero ScaLAPACK: results in YaCF Introduction YaCF Experiments Conclusions Directory Total C files Success Failures Future Work PBLAS/SRC 123 123 0 REDIST/SRC 21 21 0 PBLAS/SRC/PTOOLS 102 101 1 PBLAS/TESTING 2 1 1 PBLAS/TIMING 2 1 1 REDIST/TESTING 10 0 10 SRC 9 9 0 TOOLS 2 2 0 Total 271 258 13 95 % of the ScaLAPACK C files are correctly parsed in YaCF. 62 / 85
  • 65. YaCF: The accULL Compiler Juan J. Fumero Platforms Introduction YaCF Experiments Conclusions • Garoe: A desktop computer with an Intel Core i7 930 processor Future Work (2.80 GHz), with 1MB of L2 cache, 8MB of L3 cache, shared by the four cores. The system has 4 GB RAM and a Tesla C2050 with 4 GB of memory attached. 63 / 85
  • 66. YaCF: The accULL Compiler Juan J. Fumero Platforms Introduction YaCF Experiments Conclusions • Drago: A second cluster node. It is a shared memory system Future Work with 4 Intel Xeon E7. Each processor has 10 cores. In this case, the accelerator platform is Intel OpenCL SDK 1.5 which runs on the CPU. 64 / 85
  • 67. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL Introduction YaCF Experiments Conclusions Future Work • MxM is a basic kernel frequently used to showcase the peak performance of GPU computing. • We compare the performance of the accULL implementation with that of: • OpenMP. • CUDA. • OpenCL. 65 / 85
  • 68. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL Introduction YaCF Experiments Conclusions MxM OpenACC code Future Work 1 #pragma a c c k e r n e l s name ( " mxm " ) c o p y ( a [ L∗N ] ) c o p y i n ( b [ L∗M] , c [M∗N ] ) 2 { 3 #pragma a c c l o o p p r i v a t e ( i , j ) c o l l a p s e ( 2 ) 4 f o r ( i = 0 ; i < L ; i++) 5 f o r ( j = 0 ; j < N ; j++) 6 a[i ∗ L + j] = 0.0; 7 /∗ I t e r a t e o v e r b l o c k s ∗/ 8 f o r ( ii = 0 ; ii < L ; ii += tile_size ) 9 f o r ( jj = 0 ; jj < N ; jj += tile_size ) 10 f o r ( kk = 0 ; kk < M ; kk += tile_size ) { 11 /∗ I t e r a t e i n s i d e a b l o c k ∗/ 12 #pragma a c c l o o p collapse ( 2 ) p r i v a t e ( i , j , k ) 13 f o r ( j=jj ; j < min ( N , jj+tile_size ) ; j++) 14 f o r ( i=ii ; i < min ( L , ii+tile_size ) ; i++) 15 f o r ( k=kk ; k < min ( M , kk+tile_size ) ; k++) 16 a [ i∗L+j ] += ( b [ i∗L+k ] ∗ c [ k∗M+j ] ) ; 17 } 18 } 66 / 85
  • 69. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL (Garoe) Introduction YaCF Experiments Conclusions Future Work 67 / 85
  • 70. YaCF: The accULL Compiler Juan J. Fumero MxM in accULL (Drago) Introduction YaCF Experiments Conclusions Future Work 68 / 85
  • 71. YaCF: The accULL Compiler Juan J. Fumero SRAD: an Image Filtering Code Introduction YaCF Experiments Conclusions Future Work 69 / 85
  • 72. YaCF: The accULL Compiler Juan J. Fumero SRAD (Garoe) Introduction YaCF Experiments Conclusions Future Work CUDA in Frangollo performs better than CUDA native. 70 / 85
  • 73. YaCF: The accULL Compiler Juan J. Fumero SRAD (Drago) Introduction YaCF Experiments Conclusions Future Work 71 / 85
  • 74. YaCF: The accULL Compiler Juan J. Fumero NW: Needleman-Wunsch, a Introduction YaCF Sequence Alignment Code Experiments Conclusions Future Work 72 / 85
  • 75. YaCF: The accULL Compiler Juan J. Fumero NW (Garoe) Introduction YaCF Experiments Conclusions Future Work Poor results (but better than OpenMP - 4 cores) 73 / 85
  • 76. YaCF: The accULL Compiler Juan J. Fumero NW (Drago) Introduction YaCF Experiments Conclusions Future Work 74 / 85
  • 77. YaCF: The accULL Compiler Juan J. Fumero HotSpot: a Thermal Simulation Introduction YaCF Tool for Estimating Processor Experiments Temperature Conclusions Future Work 75 / 85
  • 78. YaCF: The accULL Compiler Juan J. Fumero HotSpot (Garoe) Introduction YaCF Experiments Conclusions Future Work As good as native versions. 76 / 85
  • 79. YaCF: The accULL Compiler Juan J. Fumero HotSpot (Drago) Introduction YaCF Experiments Conclusions Future Work 77 / 85
  • 80. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 78 / 85
  • 81. YaCF: The accULL Compiler Juan J. Fumero Conclusions: Compiler Introduction YaCF Technologies Experiments Conclusions Future Work • Compiler technologies tend to use and optimize source-to-source compilers to generate and transform source code. • It is easier to parallelize a source code with AST transformations. • AST transformations enable to programmers to easily generate code for any platform. 79 / 85
  • 82. YaCF: The accULL Compiler Juan J. Fumero Conclusions: Programming Model Introduction YaCF Experiments Conclusions Future Work • The usage of directive-based programming languages allow non-expert programmers to abstract from architectural details and write programs easier. • The OpenACC standard is a start point to heterogeneous systems programming. • Future versions of the OpenMP standard will include support for accelerators. • The results we are obtaining with accULL our early OpenACC implementation are promising. 80 / 85
  • 83. YaCF: The accULL Compiler Juan J. Fumero References I Introduction YaCF Experiments Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande a a o Conclusions accULL: An OpenACC implementation with CUDA and OpenCL Future Work support International European Conference on Parallel and Distributed Computing 2012. Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande a a o Directive-based Programming for GPUs: A Comparative Study The 14th IEEE International Conference on High Performance Computing and Communications. Ruym´n Reyes, Iv´n L´pez, Juan J. Fumero, F de Sande a a o accULL: an user-directed Approach to Heterogeneous Programming The 10th IEEE International Symposium on Parallel and Distributed Processing with Applications. 81 / 85
  • 84. YaCF: The accULL Compiler Juan J. Fumero Outline Introduction YaCF Experiments Conclusions 1 Introduction Future Work 2 YaCF 3 Experiments 4 Conclusions 5 Future Work 82 / 85
  • 85. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. 83 / 85
  • 86. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. 83 / 85
  • 87. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. 83 / 85
  • 88. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. • Adding support for vectorization. 83 / 85
  • 89. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. • Adding support for vectorization. • Exploring FPGAs to combine with CUDA and OpenCL. • To introduce LLVM Compiler Framework in the Frontend. 83 / 85
  • 90. YaCF: The accULL Compiler Juan J. Fumero Future Work Introduction YaCF Experiments Conclusions Future Work • Add support to MPI with CUDA and OpenCL. • Perform new experiments with OpenACC. • To compare our accULL approach with PGI-OpenACC and CAPS-HMPP. • Adding support for vectorization. • Exploring FPGAs to combine with CUDA and OpenCL. • To introduce LLVM Compiler Framework in the Frontend. 83 / 85
  • 91. YaCF: The accULL Compiler Juan J. Fumero Thank you for your attention Introduction YaCF Experiments Conclusions Future Work Juan Jos´ Fumero Alfonso e 84 / 85
  • 92. YaCF: The accULL Compiler Juan J. Fumero Introduction YaCF Experiments Conclusions Future Work YaCF: The accULL Compiler Undergraduate Thesis Project Juan Jos´ Fumero Alfonso e Universidad de La Laguna 22 de junio de 2012 85 / 85