SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Computational materials design with high-
throughput and machine learning methods
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Presentation at Apple, Sept 21 2018
Slides (already) posted to hackingmaterials.lbl.gov
New materials discovery for devices is difficult
•  Novel materials with enhanced performance characteristics
could make a big dent in sustainability, scalability, and cost
•  In practice, we tend to re-use the same fundamental materials
for decades
–  solar power w/Si since 1950s
–  graphite/LiCoO2 (basis of today’s Li battery electrodes) since 1990
–  Bi2Te3 and PbTe thermoelectrics first studied ~1910
•  Although there are lots of improvements to manufacturing,
microstructure, etc., there not many new basic compositions
•  Why is discovering better materials such a challenge?
2
What constrains traditional experimentation?
3
“[The Chevrel] discovery resulted from a lot of
unsuccessful experiments of Mg ions insertion
into well-known hosts for Li+ ions insertion, as
well as from the thorough literature analysis
concerning the possibility of divalent ions
intercalation into inorganic materials.”
-Aurbach group, on discovery of Chevrel cathode
for multivalent (e.g., Mg2+) batteries
Levi, Levi, Chasid, Aurbach
J. Electroceramics (2009)
Outline
4
①  Density functional theory and “high-throughput”
screening of materials
–  Intro to high-throughput density functional theory
–  Materials Project database
–  atomate
②  Data mining approaches to materials design
–  matminer
–  matbench
–  Text mining
③  Conclusion
What is density functional theory (DFT)?
5
•  1920s: The Schrödinger equation essentially contains all of chemistry
embedded within it
•  it is almost always too complicated to solve due to the numerous electron
interactions and complexity of the wave function entity
•  1960s: DFT is developed and reframes the problem for ground state
properties of the system to be in terms of the charge density, not
wavefunction
•  makes solutions tractable while in principle not sacrificing accuracy for
the ground state!
e–	
e–	 e–	
e–	 e–	
e–
How does one use DFT to design new materials?
6
A. Jain, Y. Shin, and K. A.
Persson, Nat. Rev. Mater.
1, 15004 (2016).
How accurate is DFT in practice?
7
Shown are typical DFT results for (i) Li
battery voltages, (ii) electronic band gaps,
and (iii) bulk modulus
(i) (ii)
(iii)
(i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder,
Phys. Rev. B 82, 075122 (2010).
(ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010).
(iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst,
M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S.
Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009
(2015).
battery voltages
band gaps
bulk modulus
Some limitations of DFT are addressed by other techniques
8
Source: NASA
High-throughput DFT: a key idea
9
Automate the DFT
procedure
Supercomputing
Power
FireWorks
Software for programming
general computational
workflows that can be
scaled across large
supercomputers.
NERSC
Supercomputing center,
processor count is
~100,000 desktop
machines. Other centers
are also viable.
High-throughput
materials screening
G. Ceder & K.A.
Persson, Scientific
American (2015)
Examples of (early) high-throughput studies
10
Application Researcher Search space Candidates Hit rate
Scintillators Klintenberg et al. 22,000 136 1/160
Curtarolo et al. 11,893 ? ?
Topological insulators Klintenberg et al. 60,000 17 1/3500
Curtarolo et al. 15,000 28 1/535
High TC superconductors Klintenberg et al. 60,000 139 1/430
Thermoelectrics – ICSD
- Half Heusler systems
- Half Heusler best ZT
Curtarolo et al. 2,500
80,000
80,000
20
75
18
1/125
1/1055
1/4400
1-photon water splitting Jacobsen et al. 19,000 20 1/950
2-photon water splitting Jacobsen et al. 19,000 12 1/1585
Transparent shields Jacobsen et al. 19,000 8 1/2375
Hg adsorbers Bligaard et al. 5,581 14 1/400
HER catalysts Greeley et al. 756 1 1/756*
Li ion battery cathodes Ceder et al. 20,000 4 1/5000*
Entries marked with * have experimentally verified the candidates.
See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
Computations predict, experiments confirm
11
Sidorenkite-based Li-ion battery
cathodes
YCuTe2 thermoelectrics
Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang,
Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite
(Na3MnPO4CO3): A New Intercalation Cathode Material
for Na-Ion Batteries, Chem. Mater., 2013
Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs,
ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M;
Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric
Properties of Intrinsically Doped YCuTe2 with CuTe4-based
Layered Structure. J. Mat. Chem C, 2016
More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
Li-M-O CO2 capture compounds
Dunstan, M. T., Jain, A., Liu, W., Ong, S. P., Liu, T., Lee,
J., Persson, K. A., Scott, S. A., Dennis, J. S. & Grey, C. .
Energy and Environmental Science (2016)
Outline
12
①  Density functional theory and “high-throughput”
screening of materials
–  Intro to high-throughput density functional theory
–  Materials Project database
–  atomate
②  Data mining approaches to materials design
–  matminer
–  matbench
–  Text mining
③  Conclusion
Materials Project database
•  Online resource of density
functional theory simulation data
for ~85,000 inorganic materials
•  Includes band structures, elastic
tensors, piezoelectric tensors,
battery properties and more
•  Nearly 55,000 registered users
•  Free
•  www.materialsproject.org
13
Jain et al. Commentary: The Materials Project: A
materials genome approach to accelerating
materials innovation. APL Mater. 1, 11002 (2013).!
Here’s an MP example we put together three years ago but
hasn’t yet made it to the web site
14
Outline
15
①  Density functional theory and “high-throughput”
screening of materials
–  Intro to high-throughput density functional theory
–  Materials Project database
–  atomate
②  Data mining approaches to materials design
–  matminer
–  matbench
–  Text mining
③  Conclusion
With HT-DFT, we can generate data rapidly – what to do next?
16
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
With HT-DFT, we can generate data rapidly – what to do next?
17
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
Goal: make it easy to
generate comparable
data sets on your own
A “black-box” view of performing a calculation
18
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
19
lots of tedious,
low-level work…!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
Input	file	flags	
SLURM	format	
how	to	fix	ZPOTRF?	
	
		
q  set	up	the	structure	coordinates	
q  write	input	files,	double-check	all	
the	flags	
q  copy	to	supercomputer	
q  submit	job	to	queue	
q  deal	with	supercomputer	
headaches	
q  monitor	job	
q  fix	error	jobs,	resubmit	to	queue,	
wait	again	
q  repeat	process	for	subsequent	
calculations	in	workflow	
q  parse	output	files	to	obtain	results	
q  copy	and	organize	results,	e.g.,	into	
Excel
What would be a better way?
20
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?
What would be a better way?
21
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
Workflows to run!
q  band structure!
q  surface energies!
ü  elastic tensor!
q  Raman spectrum!
q  QH thermal expansion!
Ideally the method should scale to millions of calculations
22
Results!!
researcher!
Start	with	all	binary	
oxides,	replace	O->S,	
run	several	different	
properties	
Workflows to run!
ü  band structure!
ü  surface energies!
ü  elastic tensor!
q  Raman spectrum!
q  QH thermal expansion!
q  spin-orbit coupling!
Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
23
Results!!
researcher!
Run	many	different	
properties	of	many	
different	materials!
Each simulation procedure translates high-level instructions
into a series of low-level tasks
24
quickly and automatically translate high-level (minimal)
specifications into well-defined FireWorks workflows
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
M.	De	Jong,	W.	Chen,	T.	Angsten,	A.	Jain,	R.	Notestine,	A.	Gamst,	et	al.,	
Charting	the	complete	elastic	properties	of	inorganic	crystalline	compounds,	
Sci.	Data.	2	(2015).
Atomate contains a library of simulation procedures
25
VASP-based
•  band structure
•  spin-orbit coupling
•  hybrid functional
calcs
•  elastic tensor
•  piezoelectric tensor
•  Raman spectra
•  NEB
•  GIBBS method
•  QH thermal
expansion
•  AIMD
•  ferroelectric
•  surface adsorption
•  work functions
Other
•  BoltzTraP
•  FEFF method
•  Q-Chem
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
26
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
Atomate thus encodes and standardizes knowledge about
running various kinds of simulations from domain experts
27
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators, about
how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
Outline
28
①  Density functional theory and “high-throughput”
screening of materials
–  Intro to high-throughput density functional theory
–  Materials Project database
–  atomate
②  Data mining approaches to materials design
–  matminer
–  matbench
–  Text mining
③  Conclusion
Machine learning: the big problem in my view is connecting
data to ML algorithms through features
29
Lots of data on
complex objects that
you want to interrelate
Clustering,	Regression,	Feature	
extraction,	Model-building,	etc.	
Well developed
data-mining routines that work only
on numbers (ideally ones with high
relevance to your problem)
Need to transform materials science objects into a set of
physically relevant numerical data (“features” or “descriptors”)
30
Currently, it can be hard to get started with ML in materials
How can we make
this transformation?
Where do we get
the output data?
Matminer connects materials data with data mining
algorithms and data visualization libraries
31
Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
>40 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
32
Matminer contains a library of descriptors for various
materials science entities
feat	=	EwaldEnergy([options])	
y	=	feat.featurize([input_data])	
•  compatible with scikit-
learn pipelining
•  automatically deploy
multiprocessing to
parallelize over data
•  include citations to
methodology papers
33
Interactive Jupyter notebooks demonstrate use cases
https://github.com/hackingmaterials/matminer_examples!
Many	examples	available:		
	
•  Retrieving	data	from	various	databases	
	
•  Predicting	bulk	/	shear	modulus	
•  Predicting	formation	energies:	
•  from	composition	alone	
•  with	Voronoi-based	structure	features	
included	
•  with	Coulomb	matrix	and	Orbital	Field	
matrix	descriptors	(reproducing	
previous	studies	in	the	literature)	
•  Making	interactive	visualizations	
	
•  Creating	an	ML	pipeline
Outline
34
①  Density functional theory and “high-throughput”
screening of materials
–  Intro to high-throughput density functional theory
–  Materials Project database
–  atomate
②  Data mining approaches to materials design
–  matminer
–  matbench
–  Text mining
③  Conclusion
35
Matbench: use matminer to create a black-box optimizer
Dataset: 24,597 crystalline mats
Scoring: 10% held validation set
Test Set variability (MAD): 0.81 eV/atom
Literature MAE: 0.12 eV/atom (best)1
Matbench MAE: 0.122 ± 0.024 eV/atom
DFT Eform Exp. Eg
Dataset: 6,354 mats
Scoring: 20% held validation set
Test Set variability (SD): 1.5 eV
Literature RMSE: 0.45 eV (best)2
Matbench RMSE: 0.48 ± 0.07 eV
Regression Performance = Variability of Test Set
Average Predictive Error
36
Performance against literature best on two
regression problems
Problem 1:
DFT-based
formation energy of
bulk materials based
on composition +
structure
Problem 2:
experimental band
gap prediction based
on composition only
Choudhary et al. Physical
Review Materials, 2,
083801 (2018)
Zhuo et al. The Journal
of Physical Chemistry
Letters, 9, 1668 (2018)
Classification
Dataset: 6,354 mats
Scoring: 20% held validation set
Literature ROC-AUC: 0.970 (best)2
Matbench ROC-AUC: 0.984 ± 0.005
Dataset: 5,313 mats
Scoring: 20% held validation set
Literature ROC-AUC: 0.952 (best)3
Matbench ROC-AUC: 0.953 ± 0.006
Exp. Metallic
Glass Formation
Exp. Eg= 0?
Performance: Receiver Operating
Characteristic Area Under Curve
37
Performance against literature best on two
classification problems
Problem 1:
experimental gap=0
(is metal?) based on
composition alone
Problem 2:
is a composition
metallic glass forming?
Ren, Fang et al. Science
Advances, 4, 1566 (2018)
Zhuo et al. The Journal
of Physical Chemistry
Letters, 9, 1668 (2018)
Outline
38
①  Density functional theory and “high-throughput”
screening of materials
–  Intro to high-throughput density functional theory
–  Materials Project database
–  atomate
②  Data mining approaches to materials design
–  matminer
–  matbench
–  Text mining
③  Conclusion
39
An engine to label the content of scientific abstracts
Collect, clean, and extract information from millions of
published materials science journal abstracts
40
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
41
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
•  Predicting thermoelectric compositions
–  Step 1: Start with all chemical compositions in our text
library
–  Step 2: Identify compositions with high correlation to
the word “thermoelectric” (details TBA)
–  Step 3: (optional) Filter out compositions explicitly
studied as thermoelectrics to yield only new
predictions
42
How about new materials discovery?
43
This method can predict thermoelectric materials
years in advance of actual discovery - 1
solid lines – yet unreported as thermoelectric
dashed lines –already reported in literature as thermoelectric
Note: each year is trained only on abstracts published until that year
44
This method can predict thermoelectric materials
years in advance of actual discovery - 2
Top materials are significantly more likely to be
studied as thermoelectrics in later years
Note: each year is trained only on abstracts published until that year
45
Independent computations also support the promise of
text-mining based composition predictions
46
How does this work? (schematic)
Outline
47
①  Density functional theory and “high-throughput”
screening of materials
–  Intro to high-throughput density functional theory
–  Materials Project database
–  atomate
②  Data mining approaches to materials design
–  matminer
–  matbench
–  Text mining
③  Conclusion
•  High-throughput density functional theory and machine learning are a new
set of tools for doing materials science
•  We are developing many methods and software implementations to try to
advance the field
–  pymatgen (materials analysis) -- www.pymatgen.org
–  FireWorks (workflow management) -- https://materialsproject.github.io/fireworks
–  atomate (materials science workflows) -- https://hackingmaterials.github.io/atomate
–  matminer (materials data mining) -- https://hackingmaterials.github.io/matminer
–  matbench (automatic data mining) -- https://hackingmaterials.github.io/matbench
–  text mining tools under development
•  If you are interested, give the software a try!
–  basic support available via help forums (see code docs)
–  enterprise support also available
48
Conclusions
Quantum mechanics Density functional theory High-throughput DFT
e–	e–	
e–	 e–	
e–	 e–	
Materials databases Machine learning
1920s 1960s 2000s 2010s 2010s
•  Materials Project
–  K. Persson (director)
•  Atomate
–  K. Mathew
•  Matminer
–  L. Ward
•  Matbench
–  A. Dunn
•  Text mining
–  V. Tshitoyan, J. Dagdelen, L. Weston
•  Funding:
–  DOE-BES (MP)
–  DOE-BES (ECRP)
–  Toyota Research Institute
•  Computing: NERSC
49
Thank you!
Slides (already) posted to hackingmaterials.lbl.gov

Weitere ähnliche Inhalte

Was ist angesagt?

Electrochemistry 1 the basic of the basic
Electrochemistry 1 the basic of the basicElectrochemistry 1 the basic of the basic
Electrochemistry 1 the basic of the basic
Toru Hara
 
Gas sensing properties of Nanocrystalline metal oxides
Gas sensing properties of Nanocrystalline metal oxidesGas sensing properties of Nanocrystalline metal oxides
Gas sensing properties of Nanocrystalline metal oxides
shantanusood
 

Was ist angesagt? (20)

Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Dye sensitized solar cells
Dye sensitized solar cellsDye sensitized solar cells
Dye sensitized solar cells
 
Metal semiconductor contacts
Metal semiconductor contactsMetal semiconductor contacts
Metal semiconductor contacts
 
Thermoelectric materials & Applications
Thermoelectric materials & ApplicationsThermoelectric materials & Applications
Thermoelectric materials & Applications
 
Ferroelectric & pizeoelectric materials
Ferroelectric & pizeoelectric materialsFerroelectric & pizeoelectric materials
Ferroelectric & pizeoelectric materials
 
Transparent conducting oxides for thin film PV
Transparent conducting oxides for thin film PVTransparent conducting oxides for thin film PV
Transparent conducting oxides for thin film PV
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Computational methods applied to materials modeling
Computational methods applied to materials modelingComputational methods applied to materials modeling
Computational methods applied to materials modeling
 
Thermoelectricity
ThermoelectricityThermoelectricity
Thermoelectricity
 
Tight binding
Tight bindingTight binding
Tight binding
 
Electrochemistry 1 the basic of the basic
Electrochemistry 1 the basic of the basicElectrochemistry 1 the basic of the basic
Electrochemistry 1 the basic of the basic
 
high entropy alloys
high entropy alloyshigh entropy alloys
high entropy alloys
 
Gas sensing properties of Nanocrystalline metal oxides
Gas sensing properties of Nanocrystalline metal oxidesGas sensing properties of Nanocrystalline metal oxides
Gas sensing properties of Nanocrystalline metal oxides
 
Nonlinear Optical Materials
Nonlinear Optical MaterialsNonlinear Optical Materials
Nonlinear Optical Materials
 
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum DotsA DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
 
Metal semiconductor contact
Metal semiconductor contactMetal semiconductor contact
Metal semiconductor contact
 
metal organic framework-carbon capture and sequestration
metal organic framework-carbon capture and sequestrationmetal organic framework-carbon capture and sequestration
metal organic framework-carbon capture and sequestration
 
Presentation
PresentationPresentation
Presentation
 
Introduction to supercapacitors
Introduction to supercapacitors  Introduction to supercapacitors
Introduction to supercapacitors
 

Ähnlich wie Computational materials design with high-throughput and machine learning methods

Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...
Anubhav Jain
 

Ähnlich wie Computational materials design with high-throughput and machine learning methods (20)

Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Organic Charge Trapping Memory Transistors
Organic Charge Trapping Memory TransistorsOrganic Charge Trapping Memory Transistors
Organic Charge Trapping Memory Transistors
 
The Materials Project and computational materials discovery
The Materials Project and computational materials discoveryThe Materials Project and computational materials discovery
The Materials Project and computational materials discovery
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...
 
Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
 

Mehr von Anubhav Jain

Mehr von Anubhav Jain (20)

Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 

Kürzlich hochgeladen

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 

Kürzlich hochgeladen (20)

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 

Computational materials design with high-throughput and machine learning methods

  • 1. Computational materials design with high- throughput and machine learning methods Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA Presentation at Apple, Sept 21 2018 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. New materials discovery for devices is difficult •  Novel materials with enhanced performance characteristics could make a big dent in sustainability, scalability, and cost •  In practice, we tend to re-use the same fundamental materials for decades –  solar power w/Si since 1950s –  graphite/LiCoO2 (basis of today’s Li battery electrodes) since 1990 –  Bi2Te3 and PbTe thermoelectrics first studied ~1910 •  Although there are lots of improvements to manufacturing, microstructure, etc., there not many new basic compositions •  Why is discovering better materials such a challenge? 2
  • 3. What constrains traditional experimentation? 3 “[The Chevrel] discovery resulted from a lot of unsuccessful experiments of Mg ions insertion into well-known hosts for Li+ ions insertion, as well as from the thorough literature analysis concerning the possibility of divalent ions intercalation into inorganic materials.” -Aurbach group, on discovery of Chevrel cathode for multivalent (e.g., Mg2+) batteries Levi, Levi, Chasid, Aurbach J. Electroceramics (2009)
  • 4. Outline 4 ①  Density functional theory and “high-throughput” screening of materials –  Intro to high-throughput density functional theory –  Materials Project database –  atomate ②  Data mining approaches to materials design –  matminer –  matbench –  Text mining ③  Conclusion
  • 5. What is density functional theory (DFT)? 5 •  1920s: The Schrödinger equation essentially contains all of chemistry embedded within it •  it is almost always too complicated to solve due to the numerous electron interactions and complexity of the wave function entity •  1960s: DFT is developed and reframes the problem for ground state properties of the system to be in terms of the charge density, not wavefunction •  makes solutions tractable while in principle not sacrificing accuracy for the ground state! e– e– e– e– e– e–
  • 6. How does one use DFT to design new materials? 6 A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  • 7. How accurate is DFT in practice? 7 Shown are typical DFT results for (i) Li battery voltages, (ii) electronic band gaps, and (iii) bulk modulus (i) (ii) (iii) (i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder, Phys. Rev. B 82, 075122 (2010). (ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010). (iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009 (2015). battery voltages band gaps bulk modulus
  • 8. Some limitations of DFT are addressed by other techniques 8 Source: NASA
  • 9. High-throughput DFT: a key idea 9 Automate the DFT procedure Supercomputing Power FireWorks Software for programming general computational workflows that can be scaled across large supercomputers. NERSC Supercomputing center, processor count is ~100,000 desktop machines. Other centers are also viable. High-throughput materials screening G. Ceder & K.A. Persson, Scientific American (2015)
  • 10. Examples of (early) high-throughput studies 10 Application Researcher Search space Candidates Hit rate Scintillators Klintenberg et al. 22,000 136 1/160 Curtarolo et al. 11,893 ? ? Topological insulators Klintenberg et al. 60,000 17 1/3500 Curtarolo et al. 15,000 28 1/535 High TC superconductors Klintenberg et al. 60,000 139 1/430 Thermoelectrics – ICSD - Half Heusler systems - Half Heusler best ZT Curtarolo et al. 2,500 80,000 80,000 20 75 18 1/125 1/1055 1/4400 1-photon water splitting Jacobsen et al. 19,000 20 1/950 2-photon water splitting Jacobsen et al. 19,000 12 1/1585 Transparent shields Jacobsen et al. 19,000 8 1/2375 Hg adsorbers Bligaard et al. 5,581 14 1/400 HER catalysts Greeley et al. 756 1 1/756* Li ion battery cathodes Ceder et al. 20,000 4 1/5000* Entries marked with * have experimentally verified the candidates. See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
  • 11. Computations predict, experiments confirm 11 Sidorenkite-based Li-ion battery cathodes YCuTe2 thermoelectrics Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang, Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite (Na3MnPO4CO3): A New Intercalation Cathode Material for Na-Ion Batteries, Chem. Mater., 2013 Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs, ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M; Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric Properties of Intrinsically Doped YCuTe2 with CuTe4-based Layered Structure. J. Mat. Chem C, 2016 More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016). Li-M-O CO2 capture compounds Dunstan, M. T., Jain, A., Liu, W., Ong, S. P., Liu, T., Lee, J., Persson, K. A., Scott, S. A., Dennis, J. S. & Grey, C. . Energy and Environmental Science (2016)
  • 12. Outline 12 ①  Density functional theory and “high-throughput” screening of materials –  Intro to high-throughput density functional theory –  Materials Project database –  atomate ②  Data mining approaches to materials design –  matminer –  matbench –  Text mining ③  Conclusion
  • 13. Materials Project database •  Online resource of density functional theory simulation data for ~85,000 inorganic materials •  Includes band structures, elastic tensors, piezoelectric tensors, battery properties and more •  Nearly 55,000 registered users •  Free •  www.materialsproject.org 13 Jain et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).!
  • 14. Here’s an MP example we put together three years ago but hasn’t yet made it to the web site 14
  • 15. Outline 15 ①  Density functional theory and “high-throughput” screening of materials –  Intro to high-throughput density functional theory –  Materials Project database –  atomate ②  Data mining approaches to materials design –  matminer –  matbench –  Text mining ③  Conclusion
  • 16. With HT-DFT, we can generate data rapidly – what to do next? 16 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)!
  • 17. With HT-DFT, we can generate data rapidly – what to do next? 17 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)! Goal: make it easy to generate comparable data sets on your own
  • 18. A “black-box” view of performing a calculation 18 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  • 19. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 19 lots of tedious, low-level work…! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q  set up the structure coordinates q  write input files, double-check all the flags q  copy to supercomputer q  submit job to queue q  deal with supercomputer headaches q  monitor job q  fix error jobs, resubmit to queue, wait again q  repeat process for subsequent calculations in workflow q  parse output files to obtain results q  copy and organize results, e.g., into Excel
  • 20. What would be a better way? 20 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  • 21. What would be a better way? 21 Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Workflows to run! q  band structure! q  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion!
  • 22. Ideally the method should scale to millions of calculations 22 Results!! researcher! Start with all binary oxides, replace O->S, run several different properties Workflows to run! ü  band structure! ü  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion! q  spin-orbit coupling!
  • 23. Atomate tries make it easy, automatic, and flexible to generate data with existing simulation packages 23 Results!! researcher! Run many different properties of many different materials!
  • 24. Each simulation procedure translates high-level instructions into a series of low-level tasks 24 quickly and automatically translate high-level (minimal) specifications into well-defined FireWorks workflows What is the GGA-PBE elastic tensor of GaAs? M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al., Charting the complete elastic properties of inorganic crystalline compounds, Sci. Data. 2 (2015).
  • 25. Atomate contains a library of simulation procedures 25 VASP-based •  band structure •  spin-orbit coupling •  hybrid functional calcs •  elastic tensor •  piezoelectric tensor •  Raman spectra •  NEB •  GIBBS method •  QH thermal expansion •  AIMD •  ferroelectric •  surface adsorption •  work functions Other •  BoltzTraP •  FEFF method •  Q-Chem Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  • 26. 26 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  • 27. Atomate thus encodes and standardizes knowledge about running various kinds of simulations from domain experts 27 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  • 28. Outline 28 ①  Density functional theory and “high-throughput” screening of materials –  Intro to high-throughput density functional theory –  Materials Project database –  atomate ②  Data mining approaches to materials design –  matminer –  matbench –  Text mining ③  Conclusion
  • 29. Machine learning: the big problem in my view is connecting data to ML algorithms through features 29 Lots of data on complex objects that you want to interrelate Clustering, Regression, Feature extraction, Model-building, etc. Well developed data-mining routines that work only on numbers (ideally ones with high relevance to your problem) Need to transform materials science objects into a set of physically relevant numerical data (“features” or “descriptors”)
  • 30. 30 Currently, it can be hard to get started with ML in materials How can we make this transformation? Where do we get the output data?
  • 31. Matminer connects materials data with data mining algorithms and data visualization libraries 31 Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
  • 32. >40 featurizer classes can generate thousands of potential descriptors that are described in the literature 32 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) •  compatible with scikit- learn pipelining •  automatically deploy multiprocessing to parallelize over data •  include citations to methodology papers
  • 33. 33 Interactive Jupyter notebooks demonstrate use cases https://github.com/hackingmaterials/matminer_examples! Many examples available: •  Retrieving data from various databases •  Predicting bulk / shear modulus •  Predicting formation energies: •  from composition alone •  with Voronoi-based structure features included •  with Coulomb matrix and Orbital Field matrix descriptors (reproducing previous studies in the literature) •  Making interactive visualizations •  Creating an ML pipeline
  • 34. Outline 34 ①  Density functional theory and “high-throughput” screening of materials –  Intro to high-throughput density functional theory –  Materials Project database –  atomate ②  Data mining approaches to materials design –  matminer –  matbench –  Text mining ③  Conclusion
  • 35. 35 Matbench: use matminer to create a black-box optimizer
  • 36. Dataset: 24,597 crystalline mats Scoring: 10% held validation set Test Set variability (MAD): 0.81 eV/atom Literature MAE: 0.12 eV/atom (best)1 Matbench MAE: 0.122 ± 0.024 eV/atom DFT Eform Exp. Eg Dataset: 6,354 mats Scoring: 20% held validation set Test Set variability (SD): 1.5 eV Literature RMSE: 0.45 eV (best)2 Matbench RMSE: 0.48 ± 0.07 eV Regression Performance = Variability of Test Set Average Predictive Error 36 Performance against literature best on two regression problems Problem 1: DFT-based formation energy of bulk materials based on composition + structure Problem 2: experimental band gap prediction based on composition only Choudhary et al. Physical Review Materials, 2, 083801 (2018) Zhuo et al. The Journal of Physical Chemistry Letters, 9, 1668 (2018)
  • 37. Classification Dataset: 6,354 mats Scoring: 20% held validation set Literature ROC-AUC: 0.970 (best)2 Matbench ROC-AUC: 0.984 ± 0.005 Dataset: 5,313 mats Scoring: 20% held validation set Literature ROC-AUC: 0.952 (best)3 Matbench ROC-AUC: 0.953 ± 0.006 Exp. Metallic Glass Formation Exp. Eg= 0? Performance: Receiver Operating Characteristic Area Under Curve 37 Performance against literature best on two classification problems Problem 1: experimental gap=0 (is metal?) based on composition alone Problem 2: is a composition metallic glass forming? Ren, Fang et al. Science Advances, 4, 1566 (2018) Zhuo et al. The Journal of Physical Chemistry Letters, 9, 1668 (2018)
  • 38. Outline 38 ①  Density functional theory and “high-throughput” screening of materials –  Intro to high-throughput density functional theory –  Materials Project database –  atomate ②  Data mining approaches to materials design –  matminer –  matbench –  Text mining ③  Conclusion
  • 39. 39 An engine to label the content of scientific abstracts Collect, clean, and extract information from millions of published materials science journal abstracts
  • 40. 40 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  • 41. 41 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  • 42. •  Predicting thermoelectric compositions –  Step 1: Start with all chemical compositions in our text library –  Step 2: Identify compositions with high correlation to the word “thermoelectric” (details TBA) –  Step 3: (optional) Filter out compositions explicitly studied as thermoelectrics to yield only new predictions 42 How about new materials discovery?
  • 43. 43 This method can predict thermoelectric materials years in advance of actual discovery - 1 solid lines – yet unreported as thermoelectric dashed lines –already reported in literature as thermoelectric Note: each year is trained only on abstracts published until that year
  • 44. 44 This method can predict thermoelectric materials years in advance of actual discovery - 2 Top materials are significantly more likely to be studied as thermoelectrics in later years Note: each year is trained only on abstracts published until that year
  • 45. 45 Independent computations also support the promise of text-mining based composition predictions
  • 46. 46 How does this work? (schematic)
  • 47. Outline 47 ①  Density functional theory and “high-throughput” screening of materials –  Intro to high-throughput density functional theory –  Materials Project database –  atomate ②  Data mining approaches to materials design –  matminer –  matbench –  Text mining ③  Conclusion
  • 48. •  High-throughput density functional theory and machine learning are a new set of tools for doing materials science •  We are developing many methods and software implementations to try to advance the field –  pymatgen (materials analysis) -- www.pymatgen.org –  FireWorks (workflow management) -- https://materialsproject.github.io/fireworks –  atomate (materials science workflows) -- https://hackingmaterials.github.io/atomate –  matminer (materials data mining) -- https://hackingmaterials.github.io/matminer –  matbench (automatic data mining) -- https://hackingmaterials.github.io/matbench –  text mining tools under development •  If you are interested, give the software a try! –  basic support available via help forums (see code docs) –  enterprise support also available 48 Conclusions Quantum mechanics Density functional theory High-throughput DFT e– e– e– e– e– e– Materials databases Machine learning 1920s 1960s 2000s 2010s 2010s
  • 49. •  Materials Project –  K. Persson (director) •  Atomate –  K. Mathew •  Matminer –  L. Ward •  Matbench –  A. Dunn •  Text mining –  V. Tshitoyan, J. Dagdelen, L. Weston •  Funding: –  DOE-BES (MP) –  DOE-BES (ECRP) –  Toyota Research Institute •  Computing: NERSC 49 Thank you! Slides (already) posted to hackingmaterials.lbl.gov