Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
The Materials Project: Experiences from running a
million computational materials science
simulations and sharing the resu...
Talk outline
•  What we did
•  How we did it
•  Things that worked for us
2
Materials development is a key bottleneck
for new technologies
3
Si for solar cells
since 1950s
graphite + Li{Co,Mn,Ni}O2
...
Today, one can calculate many materials properties
from scratch with density functional theory (DFT)
4
A. Jain, Y. Shin, a...
High-throughput DFT uses supercomputers to calculate
the properties of tens of thousands of materials
5
Automate the DFT
p...
What we did
•  We started with known databases of chemical
compositions, for which the crystal structure was
known but the...
Materials Project database
•  Online resource of density functional
theory simulation data for ~65,000
inorganic materials...
Many “largest ever” data sets – efforts combined are
>1 million DFT simulations!
8
M. de Jong, W. Chen, H.
Geerlings, M. A...
Talk outline
•  What we did
•  How we did it
•  Things that worked for us
9
The web site is the tip of the iceberg – we’ve built and
released an entire software stack underlying the effort
10
pymatg...
A “black-box” view of performing a calculation
11
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elasJc	
tensor	o...
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
12
lots of tedious,
low-level work…!
Resul...
What would be a better way?
13
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elasJc	
tensor	of	GaAs?
What would be a better way?
14
Results!!
researcher!
What	is	the	
GGA-PBE	elasJc	
tensor	of	GaAs?	
a button!
We built software for automatically doing calculations
15
	
(automatic materials
science workflows)
Custodian	
(calculatio...
MPComplete on Materials Project works as a simple
“one-click DFT”
16
Input generation
(parameter choice)
Workflow mapping ...
MPComplete on Materials Project works as a simple
“one-click DFT”
17
Input generation
(parameter choice)
Workflow mapping ...
Workflow parameters can be customized at
multiple levels of detail
18
1.  Workflows have
various high-level
options
2. Fire...
You can build workflows from scratch or reuse
components to assemble workflows
Multiple workflows are built with the same com...
Software allows you to leverage the prior efforts and
knowledge of many researchers past + present
20
K. Mathew J. Montoya...
Talk outline
•  What we did
•  How we did it
•  Things that worked for us
21
Things that worked for us (1) - BDFLs
•  At first, we tried to make every coding decision by committee –
e.g., get all the ...
Things that worked for us (2) – forced collaboration
•  The tendency for most scientists, at least at first, is to
write th...
Things that worked for us (3) - MongoDB
•  When most people think databases, they think “SQL”
–  We were also of that ment...
Things that worked for us (4) – day 1 open source
•  Early in the project, we felt there was commercial and
“research adva...
Thank you!
•  Prof. Kristin Persson and Prof. Gerbrand Ceder,
founders of Materials Project and their teams
•  Prof. Shyue...
Nächste SlideShare
Wird geladen in …5
×

The Materials Project: Experiences from running a million computational science simulations and sharing the results with tens of thousands of researchers

338 Aufrufe

Veröffentlicht am

presentation given at MolSSI workshop held by Autodesk, San Francisco, Sept 2017

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

The Materials Project: Experiences from running a million computational science simulations and sharing the results with tens of thousands of researchers

  1. 1. The Materials Project: Experiences from running a million computational materials science simulations and sharing the results with tens of thousands of researchers Anubhav Jain Energy Technologies Area Lawrence Berkeley National Lab Berkeley, CA MolSSI workflow workshop Slides (already) posted to: http://www.slideshare.net/anubhavster Input file flags SLURM format how to fix ZPOTRF? q  set up the structure coordinates q  write input files, double-check all the flags q  copy to supercomputer q  submit job to queue q  deal with supercomputer headaches q  monitor job q  fix error jobs, resubmit to queue, wait again q  repeat process for subsequent calculaJons in workflow q  parse output files to obtain results q  copy and organize results, e.g., into Excel
  2. 2. Talk outline •  What we did •  How we did it •  Things that worked for us 2
  3. 3. Materials development is a key bottleneck for new technologies 3 Si for solar cells since 1950s graphite + Li{Co,Mn,Ni}O2 for batteries since 1990 Technologies are often limited by the properties of their component materials, but take decades to discover and about 20 years to commercialize How can we find new materials more quickly & reliably?
  4. 4. Today, one can calculate many materials properties from scratch with density functional theory (DFT) 4 A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  5. 5. High-throughput DFT uses supercomputers to calculate the properties of tens of thousands of materials 5 Automate the DFT procedure Supercomputing Power FireWorks Software for programming general computational workflows that can be scaled across large supercomputers. NERSC Supercomputing center, processor count is ~100,000 desktop machines. Other centers are also viable. High-throughput materials screening G. Ceder & K.A. Persson, Scientific American (2015)
  6. 6. What we did •  We started with known databases of chemical compositions, for which the crystal structure was known but the properties of the material were unknown •  We ran density functional theory simulations to predict the properties of those materials (~65,000 compounds) •  We put the results online on a site called “The Materials Project” •  We built APIs to the data and released our software stack for generating new data 6
  7. 7. Materials Project database •  Online resource of density functional theory simulation data for ~65,000 inorganic materials •  Over 35,000 registered users –  we also published a review paper showing how people used the database to solve real research problems •  Includes band structures, elastic tensors, piezoelectric tensors, battery properties and more •  RESTful API •  www.materialsproject.org – (free) 7 Jain et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).! Jain, A., Persson, K. A. & Ceder, G. Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases. APL Mater. 4, 53102 (2016).!
  8. 8. Many “largest ever” data sets – efforts combined are >1 million DFT simulations! 8 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 electronic transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085.! R. Tran, Z. Xu, B. Radhakrishnan, D. Winston, W. Sun, K. A. Persson, and S. P. Ong, Sci. Data, 2016, 3, 160080.! >150 Wulff shapes + surface characterizations
  9. 9. Talk outline •  What we did •  How we did it •  Things that worked for us 9
  10. 10. The web site is the tip of the iceberg – we’ve built and released an entire software stack underlying the effort 10 pymatgen FireWorks custodian atomate REST API
  11. 11. A “black-box” view of performing a calculation 11 “something”! Results!! researcher! What is the GGA-PBE elasJc tensor of GaAs?
  12. 12. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 12 lots of tedious, low-level work…! Results!! researcher! What is the GGA-PBE elasJc tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q  set up the structure coordinates q  write input files, double-check all the flags q  copy to supercomputer q  submit job to queue q  deal with supercomputer headaches q  monitor job q  fix error jobs, resubmit to queue, wait again q  repeat process for subsequent calculaJons in workflow q  parse output files to obtain results q  copy and organize results, e.g., into Excel
  13. 13. What would be a better way? 13 “something”! Results!! researcher! What is the GGA-PBE elasJc tensor of GaAs?
  14. 14. What would be a better way? 14 Results!! researcher! What is the GGA-PBE elasJc tensor of GaAs? a button!
  15. 15. We built software for automatically doing calculations 15 (automatic materials science workflows) Custodian (calculation error recovery) (materials analysis framework) Base packages Derived packages (workflow definition & execution) These are all open-source:
  16. 16. MPComplete on Materials Project works as a simple “one-click DFT” 16 Input generation (parameter choice) Workflow mapping Supercomputer submission / monitoring Error handling File Transfer File Parsing / DB insertion Custom material Submit! www.materialsproject.org “Crystal Toolkit” Anyone can find, edit, and submit (suggest) structures Currently, this feature is available for: •  structure optimization •  band structures •  elastic tensors •  about ~10 more in Python interface
  17. 17. MPComplete on Materials Project works as a simple “one-click DFT” 17 Input generation (parameter choice) Workflow mapping Supercomputer submission / monitoring Error handling File Transfer File Parsing / DB insertion Custom material Submit! www.materialsproject.org “Crystal Toolkit” Anyone can find, edit, and submit (suggest) structures Currently, this feature is available for: •  structure optimization •  band structures •  elastic tensors •  about ~10 more in Python interface One can also use the same infrastructure to conduct customized research studies via a Python interface that provides access to high-level operations
  18. 18. Workflow parameters can be customized at multiple levels of detail 18 1.  Workflows have various high-level options 2. Fireworks also have options / flags (not shown) 3. Firetasks have most detailed number of options / flags (not shown) Example 1: “VASP input set” controls the rules that set DFT parameters (pseudopotentials, cutoffs, grid densities, etc) via pymatgen! ! Example II: If “stability_check” is enabled, the later parts of the workflow are skipped if the structure is determined unstable to save computer time on uninteresting structures!
  19. 19. You can build workflows from scratch or reuse components to assemble workflows Multiple workflows are built with the same components stacked together in different ways like Legos 19 These two workflows reuse almost all the same code between the two!
  20. 20. Software allows you to leverage the prior efforts and knowledge of many researchers past + present 20 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  21. 21. Talk outline •  What we did •  How we did it •  Things that worked for us 21
  22. 22. Things that worked for us (1) - BDFLs •  At first, we tried to make every coding decision by committee – e.g., get all the developers to sit in a room and agree on a solution •  Later, we assigned a strong BDFL (benevolent dictator for life) for each codebase that would consider all options but could simply make decisions on behalf of that codebase •  We found it that, even though the BDFL was not always right, we were able to progress much faster, much better, and surprisingly with much less conflict than the old committee way •  Note: If you were BDFL of a codebase, you got to do things your way. But you were also signing up for a ton of extra work for that privilege. Thus, BDFLs must care a lot about the code, be very detail oriented, and be willing to work overtime. Not everyone is a candidate! 22
  23. 23. Things that worked for us (2) – forced collaboration •  The tendency for most scientists, at least at first, is to write their own individual scripts their own corner •  At first, it was needed to have a strong authority figure (i.e., center lead) force collaboration. –  “All code must go in pymatgen!” – Kristin Persson •  When the code builds enough momentum and is big / established enough, forced collaboration can be dropped and researchers naturally put code there. 23
  24. 24. Things that worked for us (3) - MongoDB •  When most people think databases, they think “SQL” –  We were also of that mentality from 2006-2011 •  We built a beautiful, intricate schema (database blueprint) for simulation data that was a wonder to behold –  But, only the “database master” really knew how to modify / expand it –  Any time a new type of data needed to be included in the database, the “database master” had to design schema updates •  A computer science colleague though we might want to experiment with MongoDB •  Result: we can move so much faster with MongoDB due to its flexibility and easy learning curve. –  These days, we don’t really use SQL for anything. 24
  25. 25. Things that worked for us (4) – day 1 open source •  Early in the project, we felt there was commercial and “research advantage” value in all our automation software –  “Let’s release open source in the future, when the code is cleaner and also we finished getting our own research mileage out of it” – Materials Project, circa 2011 •  One BDFL experimented with day 1 open-source for a new and experimental code that rewrote a major, closed-source legacy Java code in Python –  That code, pymatgen, grew very quickly and displaced the old legacy code in record time. It’s been cited ~300 times in just 4 years since publication! •  Today all our codes are open source from day 1 –  Incidentally, if we are not open source from day 1, we almost never see the code become open source. The “clean it up and release as open source later” never works for us. 25
  26. 26. Thank you! •  Prof. Kristin Persson and Prof. Gerbrand Ceder, founders of Materials Project and their teams •  Prof. Shyue Ping Ong, pymatgen BDFL •  NERSC computing center and staff •  Funding: U.S. Department of Energy •  …and everyone who contributed to these codes!! 26 Slides (already) posted to: http://www.slideshare.net/anubhavster

×