Seismic Method Estimate velocity from seismic data.pptx
Computational materials design with high-throughput and machine learning methods
1. Computational materials design with high-
throughput and machine learning methods
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Presentation at Apple, Sept 21 2018
Slides (already) posted to hackingmaterials.lbl.gov
2. New materials discovery for devices is difficult
• Novel materials with enhanced performance characteristics
could make a big dent in sustainability, scalability, and cost
• In practice, we tend to re-use the same fundamental materials
for decades
– solar power w/Si since 1950s
– graphite/LiCoO2 (basis of today’s Li battery electrodes) since 1990
– Bi2Te3 and PbTe thermoelectrics first studied ~1910
• Although there are lots of improvements to manufacturing,
microstructure, etc., there not many new basic compositions
• Why is discovering better materials such a challenge?
2
3. What constrains traditional experimentation?
3
“[The Chevrel] discovery resulted from a lot of
unsuccessful experiments of Mg ions insertion
into well-known hosts for Li+ ions insertion, as
well as from the thorough literature analysis
concerning the possibility of divalent ions
intercalation into inorganic materials.”
-Aurbach group, on discovery of Chevrel cathode
for multivalent (e.g., Mg2+) batteries
Levi, Levi, Chasid, Aurbach
J. Electroceramics (2009)
4. Outline
4
① Density functional theory and “high-throughput”
screening of materials
– Intro to high-throughput density functional theory
– Materials Project database
– atomate
② Data mining approaches to materials design
– matminer
– matbench
– Text mining
③ Conclusion
5. What is density functional theory (DFT)?
5
• 1920s: The Schrödinger equation essentially contains all of chemistry
embedded within it
• it is almost always too complicated to solve due to the numerous electron
interactions and complexity of the wave function entity
• 1960s: DFT is developed and reframes the problem for ground state
properties of the system to be in terms of the charge density, not
wavefunction
• makes solutions tractable while in principle not sacrificing accuracy for
the ground state!
e–
e– e–
e– e–
e–
6. How does one use DFT to design new materials?
6
A. Jain, Y. Shin, and K. A.
Persson, Nat. Rev. Mater.
1, 15004 (2016).
7. How accurate is DFT in practice?
7
Shown are typical DFT results for (i) Li
battery voltages, (ii) electronic band gaps,
and (iii) bulk modulus
(i) (ii)
(iii)
(i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder,
Phys. Rev. B 82, 075122 (2010).
(ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010).
(iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst,
M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S.
Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009
(2015).
battery voltages
band gaps
bulk modulus
9. High-throughput DFT: a key idea
9
Automate the DFT
procedure
Supercomputing
Power
FireWorks
Software for programming
general computational
workflows that can be
scaled across large
supercomputers.
NERSC
Supercomputing center,
processor count is
~100,000 desktop
machines. Other centers
are also viable.
High-throughput
materials screening
G. Ceder & K.A.
Persson, Scientific
American (2015)
10. Examples of (early) high-throughput studies
10
Application Researcher Search space Candidates Hit rate
Scintillators Klintenberg et al. 22,000 136 1/160
Curtarolo et al. 11,893 ? ?
Topological insulators Klintenberg et al. 60,000 17 1/3500
Curtarolo et al. 15,000 28 1/535
High TC superconductors Klintenberg et al. 60,000 139 1/430
Thermoelectrics – ICSD
- Half Heusler systems
- Half Heusler best ZT
Curtarolo et al. 2,500
80,000
80,000
20
75
18
1/125
1/1055
1/4400
1-photon water splitting Jacobsen et al. 19,000 20 1/950
2-photon water splitting Jacobsen et al. 19,000 12 1/1585
Transparent shields Jacobsen et al. 19,000 8 1/2375
Hg adsorbers Bligaard et al. 5,581 14 1/400
HER catalysts Greeley et al. 756 1 1/756*
Li ion battery cathodes Ceder et al. 20,000 4 1/5000*
Entries marked with * have experimentally verified the candidates.
See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
11. Computations predict, experiments confirm
11
Sidorenkite-based Li-ion battery
cathodes
YCuTe2 thermoelectrics
Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang,
Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite
(Na3MnPO4CO3): A New Intercalation Cathode Material
for Na-Ion Batteries, Chem. Mater., 2013
Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs,
ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M;
Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric
Properties of Intrinsically Doped YCuTe2 with CuTe4-based
Layered Structure. J. Mat. Chem C, 2016
More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
Li-M-O CO2 capture compounds
Dunstan, M. T., Jain, A., Liu, W., Ong, S. P., Liu, T., Lee,
J., Persson, K. A., Scott, S. A., Dennis, J. S. & Grey, C. .
Energy and Environmental Science (2016)
12. Outline
12
① Density functional theory and “high-throughput”
screening of materials
– Intro to high-throughput density functional theory
– Materials Project database
– atomate
② Data mining approaches to materials design
– matminer
– matbench
– Text mining
③ Conclusion
13. Materials Project database
• Online resource of density
functional theory simulation data
for ~85,000 inorganic materials
• Includes band structures, elastic
tensors, piezoelectric tensors,
battery properties and more
• Nearly 55,000 registered users
• Free
• www.materialsproject.org
13
Jain et al. Commentary: The Materials Project: A
materials genome approach to accelerating
materials innovation. APL Mater. 1, 11002 (2013).!
14. Here’s an MP example we put together three years ago but
hasn’t yet made it to the web site
14
15. Outline
15
① Density functional theory and “high-throughput”
screening of materials
– Intro to high-throughput density functional theory
– Materials Project database
– atomate
② Data mining approaches to materials design
– matminer
– matbench
– Text mining
③ Conclusion
16. With HT-DFT, we can generate data rapidly – what to do next?
16
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
17. With HT-DFT, we can generate data rapidly – what to do next?
17
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
Goal: make it easy to
generate comparable
data sets on your own
18. A “black-box” view of performing a calculation
18
“something”!
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
19. Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
19
lots of tedious,
low-level work…!
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
Input file flags
SLURM format
how to fix ZPOTRF?
q set up the structure coordinates
q write input files, double-check all
the flags
q copy to supercomputer
q submit job to queue
q deal with supercomputer
headaches
q monitor job
q fix error jobs, resubmit to queue,
wait again
q repeat process for subsequent
calculations in workflow
q parse output files to obtain results
q copy and organize results, e.g., into
Excel
20. What would be a better way?
20
“something”!
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
21. What would be a better way?
21
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
Workflows to run!
q band structure!
q surface energies!
ü elastic tensor!
q Raman spectrum!
q QH thermal expansion!
22. Ideally the method should scale to millions of calculations
22
Results!!
researcher!
Start with all binary
oxides, replace O->S,
run several different
properties
Workflows to run!
ü band structure!
ü surface energies!
ü elastic tensor!
q Raman spectrum!
q QH thermal expansion!
q spin-orbit coupling!
23. Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
23
Results!!
researcher!
Run many different
properties of many
different materials!
24. Each simulation procedure translates high-level instructions
into a series of low-level tasks
24
quickly and automatically translate high-level (minimal)
specifications into well-defined FireWorks workflows
What is the
GGA-PBE elastic
tensor of GaAs?
M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al.,
Charting the complete elastic properties of inorganic crystalline compounds,
Sci. Data. 2 (2015).
25. Atomate contains a library of simulation procedures
25
VASP-based
• band structure
• spin-orbit coupling
• hybrid functional
calcs
• elastic tensor
• piezoelectric tensor
• Raman spectra
• NEB
• GIBBS method
• QH thermal
expansion
• AIMD
• ferroelectric
• surface adsorption
• work functions
Other
• BoltzTraP
• FEFF method
• Q-Chem
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
26. 26
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
27. Atomate thus encodes and standardizes knowledge about
running various kinds of simulations from domain experts
27
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators, about
how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
28. Outline
28
① Density functional theory and “high-throughput”
screening of materials
– Intro to high-throughput density functional theory
– Materials Project database
– atomate
② Data mining approaches to materials design
– matminer
– matbench
– Text mining
③ Conclusion
29. Machine learning: the big problem in my view is connecting
data to ML algorithms through features
29
Lots of data on
complex objects that
you want to interrelate
Clustering, Regression, Feature
extraction, Model-building, etc.
Well developed
data-mining routines that work only
on numbers (ideally ones with high
relevance to your problem)
Need to transform materials science objects into a set of
physically relevant numerical data (“features” or “descriptors”)
30. 30
Currently, it can be hard to get started with ML in materials
How can we make
this transformation?
Where do we get
the output data?
31. Matminer connects materials data with data mining
algorithms and data visualization libraries
31
Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
32. >40 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
32
Matminer contains a library of descriptors for various
materials science entities
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
33. 33
Interactive Jupyter notebooks demonstrate use cases
https://github.com/hackingmaterials/matminer_examples!
Many examples available:
• Retrieving data from various databases
• Predicting bulk / shear modulus
• Predicting formation energies:
• from composition alone
• with Voronoi-based structure features
included
• with Coulomb matrix and Orbital Field
matrix descriptors (reproducing
previous studies in the literature)
• Making interactive visualizations
• Creating an ML pipeline
34. Outline
34
① Density functional theory and “high-throughput”
screening of materials
– Intro to high-throughput density functional theory
– Materials Project database
– atomate
② Data mining approaches to materials design
– matminer
– matbench
– Text mining
③ Conclusion
36. Dataset: 24,597 crystalline mats
Scoring: 10% held validation set
Test Set variability (MAD): 0.81 eV/atom
Literature MAE: 0.12 eV/atom (best)1
Matbench MAE: 0.122 ± 0.024 eV/atom
DFT Eform Exp. Eg
Dataset: 6,354 mats
Scoring: 20% held validation set
Test Set variability (SD): 1.5 eV
Literature RMSE: 0.45 eV (best)2
Matbench RMSE: 0.48 ± 0.07 eV
Regression Performance = Variability of Test Set
Average Predictive Error
36
Performance against literature best on two
regression problems
Problem 1:
DFT-based
formation energy of
bulk materials based
on composition +
structure
Problem 2:
experimental band
gap prediction based
on composition only
Choudhary et al. Physical
Review Materials, 2,
083801 (2018)
Zhuo et al. The Journal
of Physical Chemistry
Letters, 9, 1668 (2018)
37. Classification
Dataset: 6,354 mats
Scoring: 20% held validation set
Literature ROC-AUC: 0.970 (best)2
Matbench ROC-AUC: 0.984 ± 0.005
Dataset: 5,313 mats
Scoring: 20% held validation set
Literature ROC-AUC: 0.952 (best)3
Matbench ROC-AUC: 0.953 ± 0.006
Exp. Metallic
Glass Formation
Exp. Eg= 0?
Performance: Receiver Operating
Characteristic Area Under Curve
37
Performance against literature best on two
classification problems
Problem 1:
experimental gap=0
(is metal?) based on
composition alone
Problem 2:
is a composition
metallic glass forming?
Ren, Fang et al. Science
Advances, 4, 1566 (2018)
Zhuo et al. The Journal
of Physical Chemistry
Letters, 9, 1668 (2018)
38. Outline
38
① Density functional theory and “high-throughput”
screening of materials
– Intro to high-throughput density functional theory
– Materials Project database
– atomate
② Data mining approaches to materials design
– matminer
– matbench
– Text mining
③ Conclusion
39. 39
An engine to label the content of scientific abstracts
Collect, clean, and extract information from millions of
published materials science journal abstracts
40. 40
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
42. • Predicting thermoelectric compositions
– Step 1: Start with all chemical compositions in our text
library
– Step 2: Identify compositions with high correlation to
the word “thermoelectric” (details TBA)
– Step 3: (optional) Filter out compositions explicitly
studied as thermoelectrics to yield only new
predictions
42
How about new materials discovery?
43. 43
This method can predict thermoelectric materials
years in advance of actual discovery - 1
solid lines – yet unreported as thermoelectric
dashed lines –already reported in literature as thermoelectric
Note: each year is trained only on abstracts published until that year
44. 44
This method can predict thermoelectric materials
years in advance of actual discovery - 2
Top materials are significantly more likely to be
studied as thermoelectrics in later years
Note: each year is trained only on abstracts published until that year
47. Outline
47
① Density functional theory and “high-throughput”
screening of materials
– Intro to high-throughput density functional theory
– Materials Project database
– atomate
② Data mining approaches to materials design
– matminer
– matbench
– Text mining
③ Conclusion
48. • High-throughput density functional theory and machine learning are a new
set of tools for doing materials science
• We are developing many methods and software implementations to try to
advance the field
– pymatgen (materials analysis) -- www.pymatgen.org
– FireWorks (workflow management) -- https://materialsproject.github.io/fireworks
– atomate (materials science workflows) -- https://hackingmaterials.github.io/atomate
– matminer (materials data mining) -- https://hackingmaterials.github.io/matminer
– matbench (automatic data mining) -- https://hackingmaterials.github.io/matbench
– text mining tools under development
• If you are interested, give the software a try!
– basic support available via help forums (see code docs)
– enterprise support also available
48
Conclusions
Quantum mechanics Density functional theory High-throughput DFT
e– e–
e– e–
e– e–
Materials databases Machine learning
1920s 1960s 2000s 2010s 2010s
49. • Materials Project
– K. Persson (director)
• Atomate
– K. Mathew
• Matminer
– L. Ward
• Matbench
– A. Dunn
• Text mining
– V. Tshitoyan, J. Dagdelen, L. Weston
• Funding:
– DOE-BES (MP)
– DOE-BES (ECRP)
– Toyota Research Institute
• Computing: NERSC
49
Thank you!
Slides (already) posted to hackingmaterials.lbl.gov