This document discusses the formation of an institute called S2I2C2M2 that aims to promote more sustainable software development practices for computational chemistry. It outlines challenges with current software, including complexity, lack of parallelism, and outdated development practices. The institute will focus on developing portable parallel infrastructures, general tensor algebra algorithms, standards for code and data sharing, and education initiatives to train the next generation of computational chemists. The goal is to overcome barriers and change the culture around computational chemistry software to better support collaborative open science.
2. Outline
Overview
Challenges and Current State
Overcoming the Barriers—technical and cultural
3. Computational Chemistry
Long history – over 50 years
Underpins broad array of scientific applications—Grand
Challenges
Efficient combustion systems
Drug design
Understanding biological systems
Semiconductor design
Water sustainability
CO2 Sequestration
Efficient lighting (quantum dots)
…
Full partner with experiments—”computational experiments”
may be more reliable than lab measurements
4. Scientific Software Innovation Institute for Computational
Chemistry and Materials Modeling -- S2I2C2M2
Collaboration between computational chemists,
computer scientists, applied mathematicians, and
computer engineers
Goal: overcome obstacles of algorithms and culture
and change the nature of computational chemistry
software development.
Year long conceptualization phase has been funded by
NSF
First meeting scheduled for January 2013
5. People
Daniel Crawford (Virginia Tech) Vijay Pande (Stanford)
Robert Harrison (Tennessee, ORNL) Manish Parashar (Rutgers)
Anna I. Krylov (U.S.C.) Ram Ramanujam (LSU)
Theresa Windus (Iowa State) Beverly Sanders (Florida)
Emily Carter (Princeton) Bernhard Schlegel (Wayne State)
Edmund Chow (Georgia Tech) David Sherrill (Georgia Tech)
Erik Deumens (Florida) Lyudmila Slipchenko (Purdue)
Mark Gordon (Iowa State) Masha Sosonkina (Iowa State)
Martin Head-Gordon (Berkeley) Edward Valeev (Virginia Tech)
Todd Martinez (Stanford), Ross Walker (San Diego
David McDowell (Georgia Tech) Supercomputing Center)
+ others
6. Current State of Computational
Chemistry
Long history--legacies of modern molecular
dynamics and quantum chemistry packages span
decades
Both open source and commercial
Amalgam of programming languages
Domain specific methods
Multi-dimensional integral engines
General purpose
Davidson method for computing eigenvalues of large
matrices (ranks in tens of billions)
7. Software is extremely complex
Example: Modern ab initio quantum chemistry
simulations
Computations scale as O(N7) or higher
Where N represents size of molecular system (number of atoms,
electrons, or basis functions)
Code complexity arises naturally from problems, but
is an obstacle to long-term sustainability
is an obstacle to exploitation of (ever changing) HPC
hardware
hinders education of next generation of scientists
only a handful of very senior students can make a contribution
8. Much recent software development focused on
exploiting parallel architectures
Varying degrees of success
With a few exceptions, still not fully exploiting
available systems
Utilizing exascale will require rethinking of approach
Desperate need for tools to generate high
performance massively parallel code from high
level specifications
9. Developers
Mostly grad students and post-docs
Training in software engineering left to individual
research groups
Extent to which this is done varies
Large burden for small groups: community approach
has potential benefits for both software and the
students
Education tends to be narrow: students learn about
software their advisors are involved with
10. Science Drivers
Catalysis
Catalysts facilitate control of chemical reactions by raising rates
that chemical bonds are formed or broken
Improve selectivity and control over unwanted byproducts
Decreased energy consumption
Reduction of waste stream
Rational design of catalysts for a specific application is one of
the Holy Grails of of modern chemistry and chemical engineering
Requires quantitative information about transition states
Intermediates low concentration and short lifetimes—thwart
experimental evaluation
Will require state-of-the-art computation combined with
experiements
11. Science Drivers
Organic photovoltaic cells
Potential applications: thin-film transistors, LEDs, solar
cells, optical switches
Advantages
Devicescan be flexible
Inexpensive to produce
Limitations
Reduced power conversion energy
But,process leading to current generation not well
understood
12. Overcoming the Barriers
1 year conceptualization phase funded by NSF
First
meeting Jan 2013
3 working groups
Highest priority
Portable parallel infrastructure
General-purpose tensor algebra algorithms
Protocols for information exchange and code
interoperability
Education and training
13. Portable parallel infrastructure
Technology trends
Massive concurrency on a chip
Massive number of sockets in largest supercomputers
Heterogeneity (CPU + GPU)
Deep, complex memory hierarchies
Memory and communication bandwidth limited
Bleeding edge applications may need to
Coordinate over 109 threads
Tolerate faults
Explicitly manage energy consumption
14. Sustainability of large and widely distributed
chemistry codes
Enable most computations in chemistry
Likely will run on leadership class machines
Working group will include computational chemists,
parallel programming experts, and reps from major
tech providers (NVIDIA, Intel, IBM)
15. Sustainability of software developed by smaller
research groups
Need to understand programming models and tools
Need to understand how both community and
software can be better organized
Accelerate testing of new ideas at sufficient scale to
determine their worth
Key: being able to write code and integrate into
existing software.
Currently, new developments take months or years
to migrate from developers software to other
packages
16. General-purpose tensor algebra
algorithms
Tensor algebra ubiquitous in science and
engineering
Need new approaches for computing with high
dimensional tensors
Currentsoftware—8 or fewer
Emerging methods require 3N dimensions where N (the
number of electrons) may be O(100) or more.
Need common framework of reusable software
elements.
17. Challenges for high-rank tensors
Challenges
Develop robust implementations of algorithms
Standardize data structures and algorithms, APIs,
software elements, and frameworks
Automate the derivation, transformation, and
implementation of tensor expressions.
Will require cross-disciplinary collaborations
Infrastructure will include DSL, runtimes, compilers as
well as static and dynamic algorithm analyses
18. Protocols for Information Exchange and
Code Interoperability
Historically culture has been competitive rather than
collaborative
Sharing of code and data limited
Theoretical methods driving code towards greater
size and complexity
Currently, progress may require substantial
duplication of effort—code that could be reused is
not due to lack of standards
Impedes new science, wastes human labor
19. Information Sharing
Standards for data shared between codes
Standards (or methods to convert between
standards) for stored data to facilitate mining.
Data provenance
Cannot expect all code to use the same format, but
transformation leads to errors, computational
inefficiencies, complex interfaces
20. Code Sharing
Need to establish level of interoperability
Coarse-grained
Hartree-Fock code from one app, MP2 code from another
Fine-grained
Calculate most of one electron contributions to a Fock matrix
in one program, relativistic and solvent terms in another
Need architecture
21. Education and Training
Mastering existing codes is daunting task for grad students
Chemical education culture worse than many STEM fields
PhD only requires modest coursework
OK for most fields of chemistry where undergrad training is
adequate preparation for hands-on lab research
Most students unprepared for research in computational chemistry
Ad hoc training by individual research groups is innefficient
How should students be prepared for 2020 and beyond?
Programming models and tools
Multidisciplinary foundation to computational science
Reasoning about software
Manipulating software with confidence and facility
22. Summer school
Summer school for grad students supported by
community
Fundamental algorithms of computational chemistry
Software best practices
23. If S 2I2C2M2 is successful
Open access to new software tools and infrastructure
Training and educational opportunities for grad
students and post-docs in
Algorithms
Code standards
Software best practices
NOT the goal to produce monolithic computational
chemistry package to replace existing ones
Healthy competition is good
Sets of robust and properly validated software
components that can be shared will benefit the entire
community