SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Introduction to OpenMP


Presenter: Vengada Karthik Rangaraju

           Fall 2012 Term

       September 13th, 2012
What is openMP?

•   Open Standard for Shared Memory Multiprocessing
•   Goal: Exploit multicore hardware with shared memory
•   Programmer’s view: The openMP API
•   Structure: Three primary API components:
    – Compiler directives,
    – Runtime Library routines and
    – Environment Variables
Shared Memory Architecture in a
    Multi-Core Environment
The key components of the API and its
             functions

• Compiler Directives
   - Spawning parallel regions (threads)
   - Synchronizing
   - Dividing blocks of code among threads
   - Distributing loop iterations
The key components of the API and its
             functions

• Runtime Library Routines
   - Setting & querying no. of threads
   - Nested parallelism
   - Control over locks
   - Thread information
The key components of the API and its
             functions

• Environment Variables
   - Setting no. of threads
   - Specifying how loop iterations are divided
   - Thread processor binding
   - Enabling/Disabling dynamic threads
   - Nested parallelism
Goals
• Standardization
• Ease of Use
• Portability
Paradigm for using openMP
          Write sequential
              program


         Find parallelizable
        portions of program

                                       Insert calls to
               Insert                 runtime library
        directives/pragmas     +   routines and modify
         into existing code            environment
                                    variables, if desired

          Use openMP’s
        extended Compiler
                                      What happens
                                         here?

        Compile and run !
Compiler translation


#pragma omp <directive-type> <directive-clauses></n>
{
……
…..// Block of code executed as per instruction !
}
Basic Example in C
{
… //Sequential
}
 #pragma omp parallel //fork
{
printf(“Hello from thread
   %d.n”,omp_get_thread_num());
} //join
{
… //Sequential
}
What exactly happens when lines of
    code are executed in parallel?


• A team of threads are created
• Each thread can have its own set of private
  variables
• All threads can have shared variables
• Original thread : Master Thread
• Fork-Join Model
• Nested Parallelism
openMP LifeCycle – Petrinet model
Compiler directives – The Multi Core
           Magic Spells !
  <directive type>   Description
  parallel           Each thread will perform
                     same computation as
                     others(replicated
                     computations)
  for / sections     These are called workshare
                     directives. Portions of
                     overall work divided among
                     threads(different
                     computations). They don’t
                     create threads. It has to be
                     enclosed inside a parallel
                     directive for threads to
                     takeover the divided work.
Compiler directives – The Multi Core
             Magic Spells !

• Types of workshare directives

   for                      Countable iteration[static]

   sections                 One or more sequential
                            sections of code, executed
                            by a single thread

   single                   Serializes a section of code
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive


    <directive type>       <directive clause>
    parallel               If(expression)
                           private(var1,var2,…)
                           firstprivate(var1,var2,..)
                           lastprivate(var1,var2,..)
                           shared(var1,var2,..)
                           NUM_THREADS(integer value)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive

   <directive type>       <directive clause>
   for                    schedule(type, chunk)
                          private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          shared(var1,var2,..)
                          collapse(n)
                          nowait
                          Reduction(operator:list)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive



   <directive type>       <directive clause>
   sections               private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          reduction(operator:list)
                          nowait
Matrix Multiplication using loop
                directive
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
Scheduling Parallel Loops
•   Static
•   Dynamic
•   Guided
•   Automatic
•   Runtime
Scheduling Parallel Loops
• Static - Amount of work/iteration - same
         - Set of contiguous chunks in RR fashion
         - 1 Chunk = x iterations
Scheduling Parallel Loops
• Dynamic - Amount of work/iteration - Varies
           - Each thread will grab chunk of
             iterations and return to grab another
             chunk when it has executed them.
• Guided - Same as dynamic, only difference,
         - a good proportion of iterations
            remaining are shared among each
            thread.
Scheduling Parallel Loops
• Runtime - Schedule determined using an
            environment variable. Library
            routine provided !
• Automatic - Implementation chooses any
               schedule
Matrix Multiplication using loop
      directive – with a schedule
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for schedule(static)
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
openMP worshare directive – sections
 int g;
 void foo(int m, int n)
{
      int p,i;
        #pragma omp sections firstprivate(g) nowait
        {
            #pragma omp section
            {
               p=f1(g);
               for(i=0;i<m;i++)
               do_stuff;
            }
            #pragma omp section
            {
               p=f2(g);
               for(i=0;i<n;i++)
               do_other_stuff;
            }
        }
return;
}
Parallelizing when the no.of Iterations
        is unknown[dynamic] !


• openMP has a directive called task
Explicit Tasks
 void processList(Node* list)
{
    #pragma omp parallel
    pragma omp single
    {
       Node *currentNode = list;
       while(currentNode)
        {
           #pragma omp task firstprivate(currentNode)
           doWork(currentNode);
          currentNode=currentNode->next;
        }
     }
}
Explicit Tasks – Petrinet Model
Synchronization
•   Barrier
•   Critical
•   Atomic
•   Flush
Performing Reductions
• A loop containing reduction will always be
  sequential, since each iteration would form a
  result depending on previous iteration.
• openMP allows these loops to be parallelized
  as long as the developer says, loop contains
  reduction and indicates the variable and kind
  of reduction via “Clauses”
Without using reduction
#pragma omp parallel shared(array,sum)
firstprivate(local_sum)
{
    #pragma omp for private(i,j)
    for(i=0;i<max_i;i++)
    {
          for(j=0;j<max_j;++j)
          local_sum+=array[i][j];
    }
}
#pragma omp critical
sum+=local_sum;
}
Using Reductions in openMP
sum=0;
#pragma omp parallel shared(array)
{
  #pragma omp for reduction(+:sum) private(i,j)
  for(i=0;i<max_i;i++)
  {
       for(j=0;j<max_j;++j)
       sum+=array[i][j];
  }
}
Programming for performance
• Use of IF clause before creating parallel
  regions
• Understanding Cache Coherence
• Judicious use of parallel and flush
• Critical and atomic - know the difference !
• Avoid unnecessary computations in critical
  region
• Use of barrier - a starvation alert !
References
• NUMA UMA

   http://vvirtual.wordpress.com/2011/06/13/what-is-numa/

   http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/

• openMP basics

   https://computing.llnl.gov/tutorials/openMP/

• Workshop on openMP SMP, by Tim Mattson from Intel (video)

  http://www.youtube.com/watch?v=TzERa9GA6vY
Interesting links

• openMP official page

   http://openmp.org/wp/

• 32 openMP Traps for C++ Developers

   http://www.viva64.com/en/a/0054/#ID0EMULM

Weitere ähnliche Inhalte

Was ist angesagt?

Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessor
Kishan Panara
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
Uday Sharma
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
Yasir Khan
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing Model
Adlin Jeena
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
Sri Prasanna
 

Was ist angesagt? (20)

Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Multivector and multiprocessor
Multivector and multiprocessorMultivector and multiprocessor
Multivector and multiprocessor
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
Aca11 bk2 ch9
Aca11 bk2 ch9Aca11 bk2 ch9
Aca11 bk2 ch9
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
Aca2 01 new
Aca2 01 newAca2 01 new
Aca2 01 new
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Load balancing in cloud computing.pptx
Load balancing in cloud computing.pptxLoad balancing in cloud computing.pptx
Load balancing in cloud computing.pptx
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Parallel Programing Model
Parallel Programing ModelParallel Programing Model
Parallel Programing Model
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Kernel module in linux os.
Kernel module in linux os.Kernel module in linux os.
Kernel module in linux os.
 

Andere mochten auch

Biref Introduction to OpenMP
Biref Introduction to OpenMPBiref Introduction to OpenMP
Biref Introduction to OpenMP
JerryHe
 

Andere mochten auch (13)

Intro to OpenMP
Intro to OpenMPIntro to OpenMP
Intro to OpenMP
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
OpenMp
OpenMpOpenMp
OpenMp
 
OpenMP
OpenMPOpenMP
OpenMP
 
Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 
Openmp combined
Openmp combinedOpenmp combined
Openmp combined
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
Biref Introduction to OpenMP
Biref Introduction to OpenMPBiref Introduction to OpenMP
Biref Introduction to OpenMP
 
Openmp
OpenmpOpenmp
Openmp
 
Parallel-kmeans
Parallel-kmeansParallel-kmeans
Parallel-kmeans
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
OpenMP
OpenMPOpenMP
OpenMP
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 

Ähnlich wie Presentation on Shared Memory Parallel Programming

Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
Anshul Sharma
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
Jeff Larkin
 
2008 07-24 kwpm-threads_and_synchronization
2008 07-24 kwpm-threads_and_synchronization2008 07-24 kwpm-threads_and_synchronization
2008 07-24 kwpm-threads_and_synchronization
fangjiafu
 

Ähnlich wie Presentation on Shared Memory Parallel Programming (20)

Lecture7
Lecture7Lecture7
Lecture7
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Open MP cheet sheet
Open MP cheet sheetOpen MP cheet sheet
Open MP cheet sheet
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Lecture8
Lecture8Lecture8
Lecture8
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Lecture6
Lecture6Lecture6
Lecture6
 
Nbvtalkataitamimageprocessingconf
NbvtalkataitamimageprocessingconfNbvtalkataitamimageprocessingconf
Nbvtalkataitamimageprocessingconf
 
Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5
 
openmp.ppt
openmp.pptopenmp.ppt
openmp.ppt
 
openmp.ppt
openmp.pptopenmp.ppt
openmp.ppt
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
(3) cpp procedural programming
(3) cpp procedural programming(3) cpp procedural programming
(3) cpp procedural programming
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
Cc module 3.pptx
Cc module 3.pptxCc module 3.pptx
Cc module 3.pptx
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
 
2008 07-24 kwpm-threads_and_synchronization
2008 07-24 kwpm-threads_and_synchronization2008 07-24 kwpm-threads_and_synchronization
2008 07-24 kwpm-threads_and_synchronization
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Kürzlich hochgeladen (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

Presentation on Shared Memory Parallel Programming

  • 1. Introduction to OpenMP Presenter: Vengada Karthik Rangaraju Fall 2012 Term September 13th, 2012
  • 2. What is openMP? • Open Standard for Shared Memory Multiprocessing • Goal: Exploit multicore hardware with shared memory • Programmer’s view: The openMP API • Structure: Three primary API components: – Compiler directives, – Runtime Library routines and – Environment Variables
  • 3. Shared Memory Architecture in a Multi-Core Environment
  • 4. The key components of the API and its functions • Compiler Directives - Spawning parallel regions (threads) - Synchronizing - Dividing blocks of code among threads - Distributing loop iterations
  • 5. The key components of the API and its functions • Runtime Library Routines - Setting & querying no. of threads - Nested parallelism - Control over locks - Thread information
  • 6. The key components of the API and its functions • Environment Variables - Setting no. of threads - Specifying how loop iterations are divided - Thread processor binding - Enabling/Disabling dynamic threads - Nested parallelism
  • 7. Goals • Standardization • Ease of Use • Portability
  • 8. Paradigm for using openMP Write sequential program Find parallelizable portions of program Insert calls to Insert runtime library directives/pragmas + routines and modify into existing code environment variables, if desired Use openMP’s extended Compiler What happens here? Compile and run !
  • 9. Compiler translation #pragma omp <directive-type> <directive-clauses></n> { …… …..// Block of code executed as per instruction ! }
  • 10. Basic Example in C { … //Sequential } #pragma omp parallel //fork { printf(“Hello from thread %d.n”,omp_get_thread_num()); } //join { … //Sequential }
  • 11. What exactly happens when lines of code are executed in parallel? • A team of threads are created • Each thread can have its own set of private variables • All threads can have shared variables • Original thread : Master Thread • Fork-Join Model • Nested Parallelism
  • 12. openMP LifeCycle – Petrinet model
  • 13. Compiler directives – The Multi Core Magic Spells ! <directive type> Description parallel Each thread will perform same computation as others(replicated computations) for / sections These are called workshare directives. Portions of overall work divided among threads(different computations). They don’t create threads. It has to be enclosed inside a parallel directive for threads to takeover the divided work.
  • 14. Compiler directives – The Multi Core Magic Spells ! • Types of workshare directives for Countable iteration[static] sections One or more sequential sections of code, executed by a single thread single Serializes a section of code
  • 15. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> parallel If(expression) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) NUM_THREADS(integer value)
  • 16. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> for schedule(type, chunk) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) collapse(n) nowait Reduction(operator:list)
  • 17. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> sections private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) reduction(operator:list) nowait
  • 18. Matrix Multiplication using loop directive #pragma omp parallel private(i,j,k) { #pragma omp for for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 19. Scheduling Parallel Loops • Static • Dynamic • Guided • Automatic • Runtime
  • 20. Scheduling Parallel Loops • Static - Amount of work/iteration - same - Set of contiguous chunks in RR fashion - 1 Chunk = x iterations
  • 21. Scheduling Parallel Loops • Dynamic - Amount of work/iteration - Varies - Each thread will grab chunk of iterations and return to grab another chunk when it has executed them. • Guided - Same as dynamic, only difference, - a good proportion of iterations remaining are shared among each thread.
  • 22. Scheduling Parallel Loops • Runtime - Schedule determined using an environment variable. Library routine provided ! • Automatic - Implementation chooses any schedule
  • 23. Matrix Multiplication using loop directive – with a schedule #pragma omp parallel private(i,j,k) { #pragma omp for schedule(static) for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 24. openMP worshare directive – sections int g; void foo(int m, int n) { int p,i; #pragma omp sections firstprivate(g) nowait { #pragma omp section { p=f1(g); for(i=0;i<m;i++) do_stuff; } #pragma omp section { p=f2(g); for(i=0;i<n;i++) do_other_stuff; } } return; }
  • 25. Parallelizing when the no.of Iterations is unknown[dynamic] ! • openMP has a directive called task
  • 26. Explicit Tasks void processList(Node* list) { #pragma omp parallel pragma omp single { Node *currentNode = list; while(currentNode) { #pragma omp task firstprivate(currentNode) doWork(currentNode); currentNode=currentNode->next; } } }
  • 27. Explicit Tasks – Petrinet Model
  • 28. Synchronization • Barrier • Critical • Atomic • Flush
  • 29. Performing Reductions • A loop containing reduction will always be sequential, since each iteration would form a result depending on previous iteration. • openMP allows these loops to be parallelized as long as the developer says, loop contains reduction and indicates the variable and kind of reduction via “Clauses”
  • 30. Without using reduction #pragma omp parallel shared(array,sum) firstprivate(local_sum) { #pragma omp for private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) local_sum+=array[i][j]; } } #pragma omp critical sum+=local_sum; }
  • 31. Using Reductions in openMP sum=0; #pragma omp parallel shared(array) { #pragma omp for reduction(+:sum) private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) sum+=array[i][j]; } }
  • 32. Programming for performance • Use of IF clause before creating parallel regions • Understanding Cache Coherence • Judicious use of parallel and flush • Critical and atomic - know the difference ! • Avoid unnecessary computations in critical region • Use of barrier - a starvation alert !
  • 33. References • NUMA UMA http://vvirtual.wordpress.com/2011/06/13/what-is-numa/ http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/ • openMP basics https://computing.llnl.gov/tutorials/openMP/ • Workshop on openMP SMP, by Tim Mattson from Intel (video) http://www.youtube.com/watch?v=TzERa9GA6vY
  • 34. Interesting links • openMP official page http://openmp.org/wp/ • 32 openMP Traps for C++ Developers http://www.viva64.com/en/a/0054/#ID0EMULM