Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Jeff Wagner

Larry Smarr
Larry SmarrInstitute Director, Calit2 um California Institute for Telecommunications and Information Technology
Open Force Field:
Scavenging pre-emptible
CPU hours* in the age of
COVID
Jeff Wagner, Open Force Field Scavenger King Technical Lead
*and panel spots
www.openforcefield.org
Automated
infrastructure enables
rapid experimentation
with minimum human
intervention
OPEN SOFTWARE
Access to large, high
quality experimental
and quantum chemical
data facilities easy
curation of balanced
train / test sets
OPEN DATA
Exploring new force
field science:
hypothesis - build
software - train - test -
iterate
is now almost routine
OPEN Software, OPEN Data, OPEN Science is rapidly
facilitating force field science!
OPEN SCIENCE
www.openforcefield.org
What is a force field?
re
q
U = kOH(r - req)2
Many more
terms…
www.openforcefield.org
Training new force fields requires new data
Large molecule datasets (SMILES strings) Quantum chemical calculation results
(times thousands of molecules)
www.openforcefield.org
What Open Force Field does
www.openforcefield.org
What Open Force Field does
www.openforcefield.org
PRP is capable of running enormous quantum chemistry
workloads
OpenFF-1.0.0 released OpenFF-2.0.0 released
OpenFF begins using Nautilus
www.openforcefield.org
OpenFF force fields are state-of-the-art
OpenFF 2.0.0 outperforms other public small molecule force fields
and OpenFF force fields continue to improve
OpenFF 2.0.0 outperforms OpenFF 1.2.1, GAFF 2.1, and CGENFF in free energy calculations. The
proprietary OPLS3e force field shows the best performance.
www.openforcefield.org
Dataset listing: https://qcarchive.molssi.org/apps/ml_datasets/
Python example notebooks for data access: https://qcarchive.molssi.org/examples/
OpenFF’s dataset lifecycle: https://github.com/openforcefield/qca-dataset-submission/projects/1
The datasets on QCArchive are fully open!
1 von 9

Recomendados

Larry Smarr - NRP Application Drivers von
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr
141 views51 Folien
01-10 Exploring new high potential 2D materials - Angioni.pdf von
01-10 Exploring new high potential 2D materials - Angioni.pdf01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdfOCRE | Open Clouds for Research Environments
14 views25 Folien
Introduction of Okinawa Open Laboratory Testbed, OpenStack and SDN Technology... von
Introduction of Okinawa Open Laboratory Testbed, OpenStack and SDN Technology...Introduction of Okinawa Open Laboratory Testbed, OpenStack and SDN Technology...
Introduction of Okinawa Open Laboratory Testbed, OpenStack and SDN Technology...Takashi Torii
976 views32 Folien
Chap1 intro to-accelerators_final von
Chap1 intro to-accelerators_finalChap1 intro to-accelerators_final
Chap1 intro to-accelerators_finalSanjay Dubey
2.1K views56 Folien
Hpc, grid and cloud computing - the past, present, and future challenge von
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeJason Shih
1.7K views54 Folien
A performance-aware power capping orchestrator for the Xen hypervisor von
A performance-aware power capping orchestrator for the Xen hypervisorA performance-aware power capping orchestrator for the Xen hypervisor
A performance-aware power capping orchestrator for the Xen hypervisorNECST Lab @ Politecnico di Milano
156 views22 Folien

Más contenido relacionado

Similar a Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Jeff Wagner

Concurrent programming with RTOS von
Concurrent programming with RTOSConcurrent programming with RTOS
Concurrent programming with RTOSSirin Software
1.3K views36 Folien
HNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai von
HNSciCloud represented at HUAWEI CONNECT 2017 in ShanghaiHNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
HNSciCloud represented at HUAWEI CONNECT 2017 in ShanghaiHelix Nebula The Science Cloud
267 views6 Folien
Enabling 5G through end-to-end wireless and optical orchestration von
Enabling 5G through end-to-end wireless and optical orchestrationEnabling 5G through end-to-end wireless and optical orchestration
Enabling 5G through end-to-end wireless and optical orchestrationJohann Marquez-Barja
205 views35 Folien
Leveraging CI/CD to improve open stack operation von
Leveraging CI/CD to improve open stack operationLeveraging CI/CD to improve open stack operation
Leveraging CI/CD to improve open stack operationMaría Angélica Bracho
825 views28 Folien
Openflow von
OpenflowOpenflow
OpenflowPhuc Tran
857 views20 Folien
Overview of DuraMat software tool development (poster version) von
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)Anubhav Jain
68 views7 Folien

Similar a Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Jeff Wagner(20)

Concurrent programming with RTOS von Sirin Software
Concurrent programming with RTOSConcurrent programming with RTOS
Concurrent programming with RTOS
Sirin Software1.3K views
Enabling 5G through end-to-end wireless and optical orchestration von Johann Marquez-Barja
Enabling 5G through end-to-end wireless and optical orchestrationEnabling 5G through end-to-end wireless and optical orchestration
Enabling 5G through end-to-end wireless and optical orchestration
Overview of DuraMat software tool development (poster version) von Anubhav Jain
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)
Anubhav Jain68 views
Introduction of Okinawa Open Laboratory and it's activities (iPOP2015) von Takashi Torii
Introduction of Okinawa Open Laboratory and it's activities (iPOP2015)Introduction of Okinawa Open Laboratory and it's activities (iPOP2015)
Introduction of Okinawa Open Laboratory and it's activities (iPOP2015)
Takashi Torii465 views
CERN User Story von Tim Bell
CERN User StoryCERN User Story
CERN User Story
Tim Bell2.8K views
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor... von OpenNebula Project
OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebula Project342 views
AMS 250 - High-Performance, Massively Parallel Computing with FLASH von dongwook159
AMS 250 - High-Performance, Massively Parallel Computing with FLASH AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
dongwook1591.3K views
User-­friendly Metaworkflows in Quantum Chemistry von Sandra Gesing
User-­friendly Metaworkflows in Quantum ChemistryUser-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum Chemistry
Sandra Gesing818 views
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ... von inside-BigData.com
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
inside-BigData.com584 views
Cognitive Engine: Boosting Scientific Discovery von diannepatricia
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discovery
diannepatricia689 views
大強子計算網格與OSS von Yuan CHAO
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
Yuan CHAO911 views
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ... von AMD Developer Central
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
OpenACC Monthly Highlights April 2018 von NVIDIA
OpenACC Monthly Highlights April 2018OpenACC Monthly Highlights April 2018
OpenACC Monthly Highlights April 2018
NVIDIA5.6K views
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko... von Spark Summit
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Spark Summit1.8K views

Más de Larry Smarr

Panel: Reaching More Minority Serving Institutions von
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
80 views100 Folien
Global Network Advancement Group - Next Generation Network-Integrated Systems von
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
109 views72 Folien
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... von
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
98 views13 Folien
Panel Discussion: Engaging underrepresented technologists, researchers, and e... von
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Larry Smarr
84 views12 Folien
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon von
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonLarry Smarr
93 views22 Folien
Panel: Reaching More Minority Serving Institutions von
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
8 views100 Folien

Más de Larry Smarr(20)

Panel: Reaching More Minority Serving Institutions von Larry Smarr
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
Larry Smarr80 views
Global Network Advancement Group - Next Generation Network-Integrated Systems von Larry Smarr
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
Larry Smarr109 views
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... von Larry Smarr
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Larry Smarr98 views
Panel Discussion: Engaging underrepresented technologists, researchers, and e... von Larry Smarr
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Larry Smarr84 views
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon von Larry Smarr
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
Larry Smarr93 views
Panel: Reaching More Minority Serving Institutions von Larry Smarr
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
Larry Smarr8 views
Panel: The Global Research Platform: An Overview von Larry Smarr
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An Overview
Larry Smarr94 views
Panel: Future Wireless Extensions of Regional Optical Networks von Larry Smarr
Panel: Future Wireless Extensions of Regional Optical NetworksPanel: Future Wireless Extensions of Regional Optical Networks
Panel: Future Wireless Extensions of Regional Optical Networks
Larry Smarr119 views
Global Research Platform Workshops - Maxine Brown von Larry Smarr
Global Research Platform Workshops - Maxine BrownGlobal Research Platform Workshops - Maxine Brown
Global Research Platform Workshops - Maxine Brown
Larry Smarr92 views
Built around answering questions von Larry Smarr
Built around answering questionsBuilt around answering questions
Built around answering questions
Larry Smarr101 views
Panel: NRP Science Impacts​ von Larry Smarr
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
Larry Smarr92 views
Democratizing Science through Cyberinfrastructure - Manish Parashar von Larry Smarr
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
Larry Smarr114 views
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses; von Larry Smarr
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Larry Smarr92 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr193 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr7 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr6 views
Frank Würthwein - NRP and the Path forward von Larry Smarr
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forward
Larry Smarr130 views
Global Network Advancement Group Next Generation Network-Integrated Sys... von Larry Smarr
      Global Network Advancement GroupNext Generation Network-Integrated Sys...      Global Network Advancement GroupNext Generation Network-Integrated Sys...
Global Network Advancement Group Next Generation Network-Integrated Sys...
Larry Smarr42 views
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks von Larry Smarr
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical NetworksRobert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks
Larry Smarr5 views
Richard Alo: Panel - Reaching More Minority-Serving Campuses von Larry Smarr
Richard Alo: Panel -  Reaching More Minority-Serving CampusesRichard Alo: Panel -  Reaching More Minority-Serving Campuses
Richard Alo: Panel - Reaching More Minority-Serving Campuses
Larry Smarr22 views

Último

Java Platform Approach 1.0 - Picnic Meetup von
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic MeetupRick Ossendrijver
25 views39 Folien
.conf Go 2023 - Data analysis as a routine von
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
90 views12 Folien
Photowave Presentation Slides - 11.8.23.pptx von
Photowave Presentation Slides - 11.8.23.pptxPhotowave Presentation Slides - 11.8.23.pptx
Photowave Presentation Slides - 11.8.23.pptxCXL Forum
126 views16 Folien
ChatGPT and AI for Web Developers von
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web DevelopersMaximiliano Firtman
174 views82 Folien
Microchip: CXL Use Cases and Enabling Ecosystem von
Microchip: CXL Use Cases and Enabling EcosystemMicrochip: CXL Use Cases and Enabling Ecosystem
Microchip: CXL Use Cases and Enabling EcosystemCXL Forum
129 views12 Folien
The details of description: Techniques, tips, and tangents on alternative tex... von
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...BookNet Canada
110 views24 Folien

Último(20)

.conf Go 2023 - Data analysis as a routine von Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk90 views
Photowave Presentation Slides - 11.8.23.pptx von CXL Forum
Photowave Presentation Slides - 11.8.23.pptxPhotowave Presentation Slides - 11.8.23.pptx
Photowave Presentation Slides - 11.8.23.pptx
CXL Forum126 views
Microchip: CXL Use Cases and Enabling Ecosystem von CXL Forum
Microchip: CXL Use Cases and Enabling EcosystemMicrochip: CXL Use Cases and Enabling Ecosystem
Microchip: CXL Use Cases and Enabling Ecosystem
CXL Forum129 views
The details of description: Techniques, tips, and tangents on alternative tex... von BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada110 views
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... von NUS-ISS
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
NUS-ISS28 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 von Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi113 views
Combining Orchestration and Choreography for a Clean Architecture von ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs168 views
Web Dev - 1 PPT.pdf von gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet52 views
Transcript: The Details of Description Techniques tips and tangents on altern... von BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada119 views
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV von Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk86 views
MemVerge: Gismo (Global IO-free Shared Memory Objects) von CXL Forum
MemVerge: Gismo (Global IO-free Shared Memory Objects)MemVerge: Gismo (Global IO-free Shared Memory Objects)
MemVerge: Gismo (Global IO-free Shared Memory Objects)
CXL Forum112 views
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... von Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays40 views
Future of Learning - Yap Aye Wee.pdf von NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 views
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi von Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays26 views
CXL at OCP von CXL Forum
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum208 views
The Importance of Cybersecurity for Digital Transformation von NUS-ISS
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
NUS-ISS25 views

Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Jeff Wagner

Hinweis der Redaktion

  1. Consider a water molecule moving around in space. Think of it like a ball and stick model. It’s going to have a list of different inter- and intramolecular motions and interactions. Let’s just consider the H-O bonds. In liquid water, these bonds stretch a bit back and forth, more or less symmetrically around an equilibrium distance we’ll call r_eq. This is modelled with a harmonic oscillator, the same one from physics class. Here there are two fitted parameters, one value for the equilibrium bond length and another for the force constant of the spring. In this case we can reliably fit these values to a combination of experimental data and highly accurate quantum chemical calculations, so we can with pretty good accuracy model this particular detail of the motion of water. There are approximations that can be made for the other forces that atoms feel, which I won’t go into here. But when you add them all up, you get a physical model of a molecule that you can propagate forward through time, on the scale of femtoseconds per timestep. If these models are accurate enough, you can use them to guide research into chemicals with desired properties, for example finding drug candidates that bind to a protein or polymers that have a desired property.
  2. The basic task here is to take a molecule and run QM calculations on it to determine its optimized geometry, which serves as a sort of baseline reference from first-principles physics that can be used in fitting and, also later, benchmarking. It’s not such a fundamentally difficult thing to do with one of the many off-the-shelf tools, but we need to do this at massive scale. Depending on the direction that fitting experiments go, we might have data sets of thousands to tens of thousands of molecules that are used in individual fits, so generating these data sets must be automated and standardized. There are a lot of different methods and settings that scientists might want to use in the QM space, so there must be a way to coherently communicate with different programs. Each of these calculations molecules takes on the order of a few hours to days per molecule, so we’re talking about a large amount of compute to leverage this data. And finally, the actual results must be stored in a way that’s rapidly accessible for future fitting experiments - ideally in a publicly accessible database so that the community can make use of our compute without needing to ask to create an account on one of our clusters or shipping harddrives around.
  3. What we do: we create new, comprehensive force fields and systematically improve their accuracy through scientific innovation and large, high-quality datasets. This schematic gives an overview of that process. We begin by generating, curate, and sharing the datasets necessary for producing and benchmarking high-accuracy force fields. We create and maintain open-source software for systematic, automated parameter optimization to our curated datasets. We finally benchmark the force field to evaluate if it has been significantly improved. If force fields do not meet our standards, they return back into the pipeline. All infrastructure, datasets, and force fields created during this process are released openly with permissive licenses so users can rapidly use, modify, and extend our work.
  4. PRP is used heavily in this force field creation workflow during the QC data generation stage.
  5. This is all a pretty tall task, but fortunately it’s mostly a solved problem. We use QCArchive, which is a project out of the molecular software sciences institute at virginia tech. It’s basically a public archive of QM calculations but also includes a lot of infrastructure for generating new data, including talking to different QM engines via a unified interface (QCEngine). Two of the key contributors to the projects (Daniel Smith and Lori Burns) gave a SciPy talk about this project a few years ago. The scale of the project has grown and the backend has been partially rewritten since then, but the talk holds up today. QCArchive handles most of the hard stuff - storing results in a database, running QM calculations with Psi4 - and we built a tool that makes our communication with it a little easier, since there’s some pre-processing we need to do before sending stuff off to QM. This tool is called QCSubmit and pretty elegantly handles the tasks of “I have a bunch of molecules, please run QM calculations on these” and, later, “please go fetch for me QM calculations on these molecules” Some of our calculations are run by grad students and postdocs running “QC managers” as cluster jobs at their respective universities. But something like half of our total compute is run on Nautilus. It’s a uniquely suitable compute backend to pair with the NSF-funded MolSSI QCArchive project, which was designed to take advantage of preemptible compute. Our datasets consist of hundreds to millions of jobs, each requiring tens to thousands of CPU-hours and 8-32 GB of RAM.
  6. And so, in combination with the efforts of modeling scientists and engineers, the enormous amount of training data generated by PRP has helped us release continuous improvements to our force fields, such that our models are now comparable to other academic models with decades of development, and we’re closing in on the accuracy of models developed by for-profit chemical modeling software vendors.