SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Molecular Models,
          Threads and You
Optimizing the TINKER classical molecular dynamics code
            while maintaining code readability



                       Jiahao Chen

                      Martínez Group
         Dept. Chemistry, CATMS, MRL and Beckman

            CS 498 MG presentation: 2007-12-07
Molecular models/force fields
Typical energy function


E = covalent bond effects
                          +


           noncovalent interactions
Molecular models/force fields
Typical energy function


E=               kb (rb − req,b )2+         κa (θa − θeq,a )2 +                  lnd cos (nπ)
                                                             d∈dihedrals n
                                 a∈angles
       b∈bonds

        bond stretch                  angle torsion                    dihedrals

              +           -
                                                                  12             6
                     qi qj                                 σij             σij
       +                              +                                −
                                                      ij
                      rij                                  rij             rij
           i<j∈atoms                      i<j∈atoms
        electrostatics                                 dispersion
           computation cost =                              O(N2)
Problem description
• The state of the system is given by the position and
  momentum of every atom (of mass mi)
   (x1 , p1 , x2 , p2 , · · · , xN , pN ) ∈ R   3×2×N


• Solve the system∂p partial differential equations
                   of
    ∂x        p              ∂E
        i        i       i
            =                =−         , i = 1, · · · , N
                     ,
      ∂t        mi ∂t             ∂xi
• with user-specified initial conditions (e.g. with
  constant temperature and pressure)
• Subject to (user-specified) constraints, e.g. fixed
  bond angles
Many parallel and serial
     implementations
                                      Global
Package name   Threads          MPI
                                      Arrays
   NAMD               CHARM++
 GROMACS          ✓             ✓
  TINKER
   AMBER       partly ✓         ✓
 CHARMM                         ✓
  LAMMPS                        ✓
 NWChem                         ✓       ✓
Things I tried

• Compiler flags optimization
• Cache miss reduction
• Lookup tables
• Parallelization with OpenMP
Compiler flag optimization
      flags                     gfortran 4.1.2                                     ifort 10.0.023
                                             -                                                  -
      -O0                 29.95(2) s                                         36.30(2) s
                                                                             32.59(4) s
      -Os                 29.92(3) s               +0.77(3) %                                       +10.22(2) %
                                                                             32.12(3) s
      -O1                 30.22(1) s               -0.90(4) %                                       +11.51(1) %
      -O2                 29.66(3) s               +0.96(1) %                30.30(2) s             +16.54(2) %
                                                                             30.83(2) s
      -O3                 29.84(2) s               +0.38(2) %                                       +15.06(2) %
                                                                                                    +20.22(1)%2
 CE search                28.77(2) s              +3.62(3) %1                28.96(2) s
1. FFLAGS =”-falign-functions -falign-jumps -falign-labels -falign-loops -fvpt -fcse-skip-blocks -fdelete-null-pointer-
checks -ffast-math -fforce-addr -fgcse -fgcse-lm -fgcse-sm -floop-optimize -fkeep-static-consts -fmerge-constants -fno-
defer-pop -fno-guess-branch-probability -fno-math-errno -funsafe-math-optimizations -fno-trapping-math -foptimize-
register-move -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop -fno-sched-spec -fsched-spec-load
-fsched-stalled-insns -fsignaling-nans -fsingle-precision-constant -fstrength-reduce -fthread-jumps -funroll-all-loops”
2. FFLAGS =”-xN -no-prec-div -static -inline-level=1 -ip -fno-alias -fno-fnalias -fno-omit-frame-pointer -fkeep-static-
consts -nolib-inline -heap-arrays 1 -pad -O3 -scalar-rep -funroll-loops -complex-limited-range”
Algorithm and time profile
                                                                       N=6
                       for each time step
                                                                   gfortran 4.1.2
                         >98%
 Initialize                                 Remove
                        Move one
model and                                  unphysical         Flush I/O           End
                        time step
parameters                                  motions
                                                                                O(N)
                             O(N2)
              Update       Calculate        Update                  Calculate & record
Enforce                                                 Enforce
               state    potential energy     state                  kinetic energy and
temp. &                                                 temp. &
              by t/2      and forces        by t/2                      properties
pressure                                                pressure
                             >59%                                         <31%
                                                                                 O(N)
                            O(N )
                                2

  Calculate    Calculate    Calculate    Calculate    Calculate              Add up all
                                                                       ...
    bond         angle      dihedral    dispersion     charge                 compo-
interactions interactions interactions interactions interactions               nents
   9%            12%                          8%           37%                 26%
An unexpected cost
                     for each time step                            N=6
                        Q: WhyRemove15%    is
                        >98%
 Initialize
                      Move one
model and                               unphysical        Flush I/O          End
                     of total execution
                      time step
parameters                               motions
                            O(N ) Text
                     time spent adding Calculate & record                  O(N)
                                 2

              Update      Calculate      Update
Enforce                                            Enforce

                                 numbers!?
               state   potential energy   state                kinetic energy and
temp. &                                            temp. &
              by t/2     and forces       by t/2                   properties
pressure                                           pressure
                            >59%                                      <31%
                                                                              O(N)
                            O(N )
                               2

                                                                          Add up all
  Calculate    Calculate    Calculate    Calculate    Calculate
                                                                    ...    compo-
    bond         angle      dihedral    dispersion     charge
                                                                            nents
interactions interactions interactions interactions interactions
   9%            12%                       8%          37%                  26%
A: many L2 cache misses
c      zero out each of the first derivative components
     7 do i = 1, n
         do j = 1, 3
    42     deb(j,i) = 0.0d0
                      22 other
           ...
         end do
                       terms
       end do
       ...
c      sum up to get the total energy and first derivatives
       energy = eb + ...
       do i = 1, n
         do j = 1, 3
           desum(j,i) = deb(j,i) + ... 22 other
    19
                                         terms
     2     derivs(j,i) = desum(j,i)
         end do
       end do
     70 of 91 cache misses per time step (n = 6) shown
A simple solution
c          zero out each of the first derivative components
      7    do i = 1, n
             do j = 1, 3
26 42          deb(j,i) = 0.0d0
               ...
             end do
           end do
           ...
c          sum up to get the total energy and first derivatives
           energy = eb + ...
           do i = 1, n
             do j = 1, 3
    6          temp = deb(j,i) + ...
    1 19       desum(j,i) = temp
    12         derivs(j,i) = temp
             end do
           end do
       reduced cache misses from 92 to 41 per time step
Speedup from reducing
   L2 cache misses
    flags         gfortran 4.1.2   ifort 10.0.023

   original        29.95(2) s       28.96(2) s

  with scalar
                   27.43(3) s       28.95(1) s
 replacement
   speedup         +8.44(1) %       +0.03(2) %


ifort already called with scalar replacement flag
Lookup tables (LUTs)

• Calculations of sqrt() and exp() take up
  23.8% of execution time
• Idea: pre-compute values of sqrt() and
  exp() in an array and recall them from
  memory when needed
• Caution: LUT should not displace too much
  data from L2 cache
sqrt() with LUT
direct LUT   LUT with linear interpolation
exp() with LUT
                            LUT with first-order Taylor
direct LUT
                               series refinement*




  e =e       + (x − x0 )e     + O (x − x0 )
   x    x0              x0                2
Choice of
            implementation
          desired table            expected
function                 refinement
         precision size            speedup
                  (doubl
  sqrt()   10 -4  10,764    none    +118%
                    es)

 exp()      10-8    6,836   Taylor    +151%

              LUT aligned to 128-bits
         L2 cache = 4 MB = 512K doubles
Speedup from LUT use
      flags           gfortran 4.1.2   ifort 10.0.023

     original          29.95(2) s       28.96(2) s

with lookup tables     26.89(1) s       25.87(2) s

    speedup           +10.23(2) %      +7.22(3) %
Summary of serial
           improvements
  Improvement        gfortran 4.1.2   ifort 10.0.023

Best compiler flags    +3.62(3) %      +20.22(1) %
  L2 cache miss
                      +8.44(2) %       +0.03(1) %
    reduction
  Lookup tables      +10.23(1) %       +7.22(2) %
                      23.91(3) s       26.86(2) s
      Total
                     +20.17(4) %      +26.00(2) %
Parallelization targets
                       for each time step                          N=6
                         >98%
 Initialize                                 Remove
                        Move one
model and                                  unphysical         Flush I/O          End
                        time step
parameters                                  motions

                                       Text                                   O(N)
                             O(N2)
              Update       Calculate        Update                 Calculate & record
Enforce                                                 Enforce
               state    potential energy     state                 kinetic energy and
temp. &                                                 temp. &
              by t/2      and forces        by t/2                     properties
pressure                                                pressure
                             >59%                                         <31%
                                                                                O(N)
                            O(N )
                                2

                                                                            Add up all
  Calculate    Calculate    Calculate    Calculate    Calculate
                                                                      ...    compo-
    bond         angle      dihedral    dispersion     charge
                                                                              nents
interactions interactions interactions interactions interactions
   9%            12%                          8%           37%                26%
Parallelization strategy
                         Calculate
                      potential energy omp sections
                        and forces             100%
omp section
50%
            omp section
            50%
                                                                         Add up all
  Calculate    Calculate    Calculate    Calculate    Calculate
                                                                   ...    compo-
   charge        angle      dihedral    dispersion      bond
                                                                           nents
interactions interactions interactions interactions interactions
  50%           16%           2%                       12%
                                          11%
omp parallel do        omp parallel do           omp parallel do
          omp parallel do           omp parallel do
Parallelization results
                                                                  gfortran 4.1.2
                   35

                                                                 N=6
                                                                 N=1000
                                                                 Ideal
                   30
Execution time/s




                   25




                   20




                   15




                   10



                                                            # cores
                    5
                        0.5   1   1.5   2   2.5   3   3.5             4   4.5
Summary
• Free software can sometimes be better
  than non-free software
• L2 cache misses can significantly degrade
  performance
• Lookup tables are an effective tradeoff
  between speed and memory vs. precision
• Simple OpenMP parallelization is effective
  for small numbers of processors

Weitere ähnliche Inhalte

Was ist angesagt?

Adsorption and Electron Injection for CdSe on TiO2
Adsorption and Electron Injection for CdSe on TiO2Adsorption and Electron Injection for CdSe on TiO2
Adsorption and Electron Injection for CdSe on TiO2kamatlab
 
PROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARS
PROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARSPROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARS
PROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARSLUIS POWELL
 
CAE REPORT
CAE REPORTCAE REPORT
CAE REPORTdayahisa
 
2nd order performance6
2nd order performance62nd order performance6
2nd order performance6rabinnaik
 
4CSPDuploadpdf.pdf
4CSPDuploadpdf.pdf4CSPDuploadpdf.pdf
4CSPDuploadpdf.pdfgrssieee
 
TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...
TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...
TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...grssieee
 
Computing Loops
Computing LoopsComputing Loops
Computing LoopsAntonini
 
Theory and application of fluctuating-charge models
Theory and application of fluctuating-charge modelsTheory and application of fluctuating-charge models
Theory and application of fluctuating-charge modelsJiahao Chen
 
Suppression of correlated electron escape in double ionization in strong lase...
Suppression of correlated electron escape in double ionization in strong lase...Suppression of correlated electron escape in double ionization in strong lase...
Suppression of correlated electron escape in double ionization in strong lase...Jakub Prauzner-Bechcicki
 

Was ist angesagt? (13)

Adsorption and Electron Injection for CdSe on TiO2
Adsorption and Electron Injection for CdSe on TiO2Adsorption and Electron Injection for CdSe on TiO2
Adsorption and Electron Injection for CdSe on TiO2
 
PROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARS
PROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARSPROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARS
PROBLEMAS RESUELTOS (87) DEL CAPÍTULO I DE LABORATORIO DE FÍSICA II - SEARS
 
Note 0
Note 0Note 0
Note 0
 
CAE REPORT
CAE REPORTCAE REPORT
CAE REPORT
 
2nd order performance6
2nd order performance62nd order performance6
2nd order performance6
 
4CSPDuploadpdf.pdf
4CSPDuploadpdf.pdf4CSPDuploadpdf.pdf
4CSPDuploadpdf.pdf
 
TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...
TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...
TU4.L09 - FOUR-COMPONENT SCATTERING POWER DECOMPOSITION WITH ROTATION OF COHE...
 
Computing Loops
Computing LoopsComputing Loops
Computing Loops
 
Regresi Ni3.Nf
Regresi Ni3.NfRegresi Ni3.Nf
Regresi Ni3.Nf
 
Ch13
Ch13Ch13
Ch13
 
Theory and application of fluctuating-charge models
Theory and application of fluctuating-charge modelsTheory and application of fluctuating-charge models
Theory and application of fluctuating-charge models
 
Bab iii
Bab iiiBab iii
Bab iii
 
Suppression of correlated electron escape in double ionization in strong lase...
Suppression of correlated electron escape in double ionization in strong lase...Suppression of correlated electron escape in double ionization in strong lase...
Suppression of correlated electron escape in double ionization in strong lase...
 

Andere mochten auch

Excitation Energy Transfer In Photosynthetic Membranes
Excitation Energy Transfer In Photosynthetic MembranesExcitation Energy Transfer In Photosynthetic Membranes
Excitation Energy Transfer In Photosynthetic MembranesJiahao Chen
 
New Passenger Cabins In Aircraft
New Passenger Cabins In AircraftNew Passenger Cabins In Aircraft
New Passenger Cabins In AircraftJose Antonio Martin
 
Chistes Informaticos Y De Chat 1199295697555190 5
Chistes Informaticos Y De Chat 1199295697555190 5Chistes Informaticos Y De Chat 1199295697555190 5
Chistes Informaticos Y De Chat 1199295697555190 5Jose Antonio Martin
 
Fluctuating-charge models: theory and applications
Fluctuating-charge models: theory and applicationsFluctuating-charge models: theory and applications
Fluctuating-charge models: theory and applicationsJiahao Chen
 
QTPIE and water (Part 1)
QTPIE and water (Part 1)QTPIE and water (Part 1)
QTPIE and water (Part 1)Jiahao Chen
 
QTPIE and water (Part 2)
QTPIE and water (Part 2)QTPIE and water (Part 2)
QTPIE and water (Part 2)Jiahao Chen
 
A brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFTA brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFTJiahao Chen
 
An introduction to Julia
An introduction to JuliaAn introduction to Julia
An introduction to JuliaJiahao Chen
 
The Visualization Toolkit (VTK) and why you might care about it
The Visualization Toolkit (VTK) and why you might care about itThe Visualization Toolkit (VTK) and why you might care about it
The Visualization Toolkit (VTK) and why you might care about itJiahao Chen
 
Vinculum E-Retail Product Suite for Retailers and E-commerce Vendors
Vinculum E-Retail Product Suite for Retailers and E-commerce VendorsVinculum E-Retail Product Suite for Retailers and E-commerce Vendors
Vinculum E-Retail Product Suite for Retailers and E-commerce VendorsSiddhartha Tripathi
 

Andere mochten auch (15)

Posturitas
PosturitasPosturitas
Posturitas
 
Excitation Energy Transfer In Photosynthetic Membranes
Excitation Energy Transfer In Photosynthetic MembranesExcitation Energy Transfer In Photosynthetic Membranes
Excitation Energy Transfer In Photosynthetic Membranes
 
New Passenger Cabins In Aircraft
New Passenger Cabins In AircraftNew Passenger Cabins In Aircraft
New Passenger Cabins In Aircraft
 
Chistes Informaticos Y De Chat 1199295697555190 5
Chistes Informaticos Y De Chat 1199295697555190 5Chistes Informaticos Y De Chat 1199295697555190 5
Chistes Informaticos Y De Chat 1199295697555190 5
 
The three Pascualas final comic
The three Pascualas final comicThe three Pascualas final comic
The three Pascualas final comic
 
Fluctuating-charge models: theory and applications
Fluctuating-charge models: theory and applicationsFluctuating-charge models: theory and applications
Fluctuating-charge models: theory and applications
 
QTPIE and water (Part 1)
QTPIE and water (Part 1)QTPIE and water (Part 1)
QTPIE and water (Part 1)
 
Angulperfect
AngulperfectAngulperfect
Angulperfect
 
QTPIE and water (Part 2)
QTPIE and water (Part 2)QTPIE and water (Part 2)
QTPIE and water (Part 2)
 
A brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFTA brief introduction to Hartree-Fock and TDDFT
A brief introduction to Hartree-Fock and TDDFT
 
Ilusionoptica
IlusionopticaIlusionoptica
Ilusionoptica
 
Foto 11 Septiembre
Foto 11 SeptiembreFoto 11 Septiembre
Foto 11 Septiembre
 
An introduction to Julia
An introduction to JuliaAn introduction to Julia
An introduction to Julia
 
The Visualization Toolkit (VTK) and why you might care about it
The Visualization Toolkit (VTK) and why you might care about itThe Visualization Toolkit (VTK) and why you might care about it
The Visualization Toolkit (VTK) and why you might care about it
 
Vinculum E-Retail Product Suite for Retailers and E-commerce Vendors
Vinculum E-Retail Product Suite for Retailers and E-commerce VendorsVinculum E-Retail Product Suite for Retailers and E-commerce Vendors
Vinculum E-Retail Product Suite for Retailers and E-commerce Vendors
 

Ähnlich wie Molecular models, threads and you

Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejatknowdiff
 
Power Market and Models Convergence ?
Power Market and Models Convergence ?Power Market and Models Convergence ?
Power Market and Models Convergence ?NicolasRR
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 
化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片
化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片
化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片Chyi-Tsong Chen
 
Lecture6
Lecture6Lecture6
Lecture6voracle
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorPTIHPA
 
Data Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification SystemData Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification SystemJamie (Taka) Wang
 
270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docx
270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docx270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docx
270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docxeugeniadean34240
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfRTEFGDFGJU
 
Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...
Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...
Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...Nicolas Ratkovich
 
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...Shizuoka Inst. Science and Tech.
 
Large Scale Online Experimentation with Quantile Metrics
Large Scale Online Experimentation with Quantile MetricsLarge Scale Online Experimentation with Quantile Metrics
Large Scale Online Experimentation with Quantile MetricsWeitao Duan
 
A kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem ResolvedA kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem ResolvedKaiju Capital Management
 

Ähnlich wie Molecular models, threads and you (20)

Dr. Amir Nejat
Dr. Amir NejatDr. Amir Nejat
Dr. Amir Nejat
 
Power Market and Models Convergence ?
Power Market and Models Convergence ?Power Market and Models Convergence ?
Power Market and Models Convergence ?
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
 
化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片
化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片
化工概論: 製程分析模擬最適化與控制_陳奇中教授演講投影片
 
UML&FM 2012
UML&FM 2012UML&FM 2012
UML&FM 2012
 
Lecture6
Lecture6Lecture6
Lecture6
 
Munish Virang Rp
Munish Virang RpMunish Virang Rp
Munish Virang Rp
 
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. ProcessorImplementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 
Data Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification SystemData Mining With A Simulated Annealing Based Fuzzy Classification System
Data Mining With A Simulated Annealing Based Fuzzy Classification System
 
08 fouri
08 fouri08 fouri
08 fouri
 
270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docx
270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docx270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docx
270-102-divide-and-conquer_handout.pdfCS 270Algorithm.docx
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
 
Conic Clustering
Conic ClusteringConic Clustering
Conic Clustering
 
Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...
Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...
Heat-and-Mass Transfer Relationship to Determine Shear Stress in Tubular Memb...
 
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on ...
 
Future CMB Experiments
Future CMB ExperimentsFuture CMB Experiments
Future CMB Experiments
 
Algo complexity
Algo complexityAlgo complexity
Algo complexity
 
Large Scale Online Experimentation with Quantile Metrics
Large Scale Online Experimentation with Quantile MetricsLarge Scale Online Experimentation with Quantile Metrics
Large Scale Online Experimentation with Quantile Metrics
 
A kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem ResolvedA kernel-free particle method: Smile Problem Resolved
A kernel-free particle method: Smile Problem Resolved
 

Kürzlich hochgeladen

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Molecular models, threads and you

  • 1. Molecular Models, Threads and You Optimizing the TINKER classical molecular dynamics code while maintaining code readability Jiahao Chen Martínez Group Dept. Chemistry, CATMS, MRL and Beckman CS 498 MG presentation: 2007-12-07
  • 2. Molecular models/force fields Typical energy function E = covalent bond effects + noncovalent interactions
  • 3. Molecular models/force fields Typical energy function E= kb (rb − req,b )2+ κa (θa − θeq,a )2 + lnd cos (nπ) d∈dihedrals n a∈angles b∈bonds bond stretch angle torsion dihedrals + - 12 6 qi qj σij σij + + − ij rij rij rij i<j∈atoms i<j∈atoms electrostatics dispersion computation cost = O(N2)
  • 4. Problem description • The state of the system is given by the position and momentum of every atom (of mass mi) (x1 , p1 , x2 , p2 , · · · , xN , pN ) ∈ R 3×2×N • Solve the system∂p partial differential equations of ∂x p ∂E i i i = =− , i = 1, · · · , N , ∂t mi ∂t ∂xi • with user-specified initial conditions (e.g. with constant temperature and pressure) • Subject to (user-specified) constraints, e.g. fixed bond angles
  • 5. Many parallel and serial implementations Global Package name Threads MPI Arrays NAMD CHARM++ GROMACS ✓ ✓ TINKER AMBER partly ✓ ✓ CHARMM ✓ LAMMPS ✓ NWChem ✓ ✓
  • 6. Things I tried • Compiler flags optimization • Cache miss reduction • Lookup tables • Parallelization with OpenMP
  • 7. Compiler flag optimization flags gfortran 4.1.2 ifort 10.0.023 - - -O0 29.95(2) s 36.30(2) s 32.59(4) s -Os 29.92(3) s +0.77(3) % +10.22(2) % 32.12(3) s -O1 30.22(1) s -0.90(4) % +11.51(1) % -O2 29.66(3) s +0.96(1) % 30.30(2) s +16.54(2) % 30.83(2) s -O3 29.84(2) s +0.38(2) % +15.06(2) % +20.22(1)%2 CE search 28.77(2) s +3.62(3) %1 28.96(2) s 1. FFLAGS =”-falign-functions -falign-jumps -falign-labels -falign-loops -fvpt -fcse-skip-blocks -fdelete-null-pointer- checks -ffast-math -fforce-addr -fgcse -fgcse-lm -fgcse-sm -floop-optimize -fkeep-static-consts -fmerge-constants -fno- defer-pop -fno-guess-branch-probability -fno-math-errno -funsafe-math-optimizations -fno-trapping-math -foptimize- register-move -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop -fno-sched-spec -fsched-spec-load -fsched-stalled-insns -fsignaling-nans -fsingle-precision-constant -fstrength-reduce -fthread-jumps -funroll-all-loops” 2. FFLAGS =”-xN -no-prec-div -static -inline-level=1 -ip -fno-alias -fno-fnalias -fno-omit-frame-pointer -fkeep-static- consts -nolib-inline -heap-arrays 1 -pad -O3 -scalar-rep -funroll-loops -complex-limited-range”
  • 8. Algorithm and time profile N=6 for each time step gfortran 4.1.2 >98% Initialize Remove Move one model and unphysical Flush I/O End time step parameters motions O(N) O(N2) Update Calculate Update Calculate & record Enforce Enforce state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Calculate Calculate Calculate Calculate Calculate Add up all ... bond angle dihedral dispersion charge compo- interactions interactions interactions interactions interactions nents 9% 12% 8% 37% 26%
  • 9. An unexpected cost for each time step N=6 Q: WhyRemove15% is >98% Initialize Move one model and unphysical Flush I/O End of total execution time step parameters motions O(N ) Text time spent adding Calculate & record O(N) 2 Update Calculate Update Enforce Enforce numbers!? state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Add up all Calculate Calculate Calculate Calculate Calculate ... compo- bond angle dihedral dispersion charge nents interactions interactions interactions interactions interactions 9% 12% 8% 37% 26%
  • 10. A: many L2 cache misses c zero out each of the first derivative components 7 do i = 1, n do j = 1, 3 42 deb(j,i) = 0.0d0 22 other ... end do terms end do ... c sum up to get the total energy and first derivatives energy = eb + ... do i = 1, n do j = 1, 3 desum(j,i) = deb(j,i) + ... 22 other 19 terms 2 derivs(j,i) = desum(j,i) end do end do 70 of 91 cache misses per time step (n = 6) shown
  • 11. A simple solution c zero out each of the first derivative components 7 do i = 1, n do j = 1, 3 26 42 deb(j,i) = 0.0d0 ... end do end do ... c sum up to get the total energy and first derivatives energy = eb + ... do i = 1, n do j = 1, 3 6 temp = deb(j,i) + ... 1 19 desum(j,i) = temp 12 derivs(j,i) = temp end do end do reduced cache misses from 92 to 41 per time step
  • 12. Speedup from reducing L2 cache misses flags gfortran 4.1.2 ifort 10.0.023 original 29.95(2) s 28.96(2) s with scalar 27.43(3) s 28.95(1) s replacement speedup +8.44(1) % +0.03(2) % ifort already called with scalar replacement flag
  • 13. Lookup tables (LUTs) • Calculations of sqrt() and exp() take up 23.8% of execution time • Idea: pre-compute values of sqrt() and exp() in an array and recall them from memory when needed • Caution: LUT should not displace too much data from L2 cache
  • 14. sqrt() with LUT direct LUT LUT with linear interpolation
  • 15. exp() with LUT LUT with first-order Taylor direct LUT series refinement* e =e + (x − x0 )e + O (x − x0 ) x x0 x0 2
  • 16. Choice of implementation desired table expected function refinement precision size speedup (doubl sqrt() 10 -4 10,764 none +118% es) exp() 10-8 6,836 Taylor +151% LUT aligned to 128-bits L2 cache = 4 MB = 512K doubles
  • 17. Speedup from LUT use flags gfortran 4.1.2 ifort 10.0.023 original 29.95(2) s 28.96(2) s with lookup tables 26.89(1) s 25.87(2) s speedup +10.23(2) % +7.22(3) %
  • 18. Summary of serial improvements Improvement gfortran 4.1.2 ifort 10.0.023 Best compiler flags +3.62(3) % +20.22(1) % L2 cache miss +8.44(2) % +0.03(1) % reduction Lookup tables +10.23(1) % +7.22(2) % 23.91(3) s 26.86(2) s Total +20.17(4) % +26.00(2) %
  • 19. Parallelization targets for each time step N=6 >98% Initialize Remove Move one model and unphysical Flush I/O End time step parameters motions Text O(N) O(N2) Update Calculate Update Calculate & record Enforce Enforce state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Add up all Calculate Calculate Calculate Calculate Calculate ... compo- bond angle dihedral dispersion charge nents interactions interactions interactions interactions interactions 9% 12% 8% 37% 26%
  • 20. Parallelization strategy Calculate potential energy omp sections and forces 100% omp section 50% omp section 50% Add up all Calculate Calculate Calculate Calculate Calculate ... compo- charge angle dihedral dispersion bond nents interactions interactions interactions interactions interactions 50% 16% 2% 12% 11% omp parallel do omp parallel do omp parallel do omp parallel do omp parallel do
  • 21. Parallelization results gfortran 4.1.2 35 N=6 N=1000 Ideal 30 Execution time/s 25 20 15 10 # cores 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5
  • 22. Summary • Free software can sometimes be better than non-free software • L2 cache misses can significantly degrade performance • Lookup tables are an effective tradeoff between speed and memory vs. precision • Simple OpenMP parallelization is effective for small numbers of processors