SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
Software tools, crystal descriptors, and
machine learning applied to materials design
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Presentation to Computational Materials at Berkeley,
Oct 16 2018
Slides (already) posted to hackingmaterials.lbl.gov
2
Materials and their properties decide what is
technologically possible
Electric vehicles and solar power
are two technologies that have
been dreamed about for many
decades, but did not have much
real impact for a long time …
1910	
1956
3
Materials and their properties decide what is
technologically possible
Today’s revolution in clean energy technologies are largely
due to advancements in materials –
science, engineering, and manufacturing.
Much else might be possible with better materials …
but, as past examples demonstrate, it can take a long time.
What constrains traditional approaches to materials design?
4
“[The Chevrel] discovery resulted from a lot of
unsuccessful experiments of Mg ions insertion
into well-known hosts for Li+ ions insertion, as
well as from the thorough literature analysis
concerning the possibility of divalent ions
intercalation into inorganic materials.”
-Aurbach group, on discovery of Chevrel cathode
for multivalent (e.g., Mg2+) batteries
Levi, Levi, Chasid, Aurbach
J. Electroceramics (2009)
Outline
5
①  Density functional theory and “high-
throughput” screening of materials
–  High-throughput density functional theory
–  Searching for new thermoelectrics
②  Software to accelerate materials design
–  atomate
–  matminer
③  Text mining for materials design
–  “Materials Scholar”
Density functional theory (DFT) models materials properties
from first principles
6
•  1920s: The Schrödinger equation for quantum mechanics essentially contains
all of chemistry embedded within it
•  it is almost always too complicated to solve due to the numerous electron
interactions and complexity of the wave function entity
•  1960s: DFT is developed and reframes the problem for ground state
properties of the system to separate interactions and written in terms of the
charge density, not wavefunction
•  makes solutions tractable while in principle not sacrificing accuracy for
the ground state!
e–	
e–	 e–	
e–	 e–	
e–
How does one use DFT to design new materials?
7
A. Jain, Y. Shin, and K. A.
Persson, Nat. Rev. Mater.
1, 15004 (2016).
How accurate is density functional theory in practice?
8
Shown are typical DFT results for (i) Li
battery voltages, (ii) electronic band gaps,
and (iii) bulk modulus
(i) (ii)
(iii)
(i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder,
Phys. Rev. B 82, 075122 (2010).
(ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010).
(iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst,
M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S.
Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009
(2015).
battery voltages
band gaps
bulk modulus
DFT is limited in length + time scale
(addressed by other techniques)
9
Source: NASA
High-throughput DFT: a key idea
10
Automate the DFT
procedure
Supercomputing
Power
FireWorks
Software for programming
general computational
workflows that can be
scaled across large
supercomputers.
NERSC
Supercomputing center,
processor count is
~100,000 desktop
machines. Other centers
are also viable.
High-throughput
materials screening
G. Ceder & K.A.
Persson, Scientific
American (2015)
•  “DFT is too computationally expensive”
–  Quote from a critic in ~2008: “It would be much too
expensive to ever apply DFT to the entire ICSD database”
–  DFT implementations got faster, computers got much
faster and expensive: people failed to extrapolate where
computing would be in 5-10 years.
–  e.g., DOE INCITE program awarded 5 million CPU hours in
2003. It awarded 6 billion CPU-hours in 2018.
–  Things rarely get 1000X more powerful in 15 years.
Exponential growth is hard to visualize …
11
Reasons why this wasn’t always just really obvious
•  “DFT is too computationally expensive”
•  “Managing all those calculations would be hard”
–  G. Ceder, high-throughput pioneer, ~2006: “If you can build
a system that even just keeps track of where all our
calculations are, I would already be impressed”.
12
Reasons why this wasn’t always just really obvious
Facebook	in	2005	was	
valued	at	~$100	million	
for	building	something	
that	looked	like	this	…
•  “DFT is too computationally expensive”
•  “Managing all those calculations would be hard”
•  “DFT requires a trained researcher to set the
correct parameters, monitor job, & fix errors”
–  Standard calculations are actually quite routine. Many
things are in fact better as “automatic” – after all, shouldn’t
ab initio mean no tunable parameters?
–  Error correction is also better automated with a big
database of fixes rather than ad-hoc.
–  However – expanding the scope of what can be automated
without researcher intervention (e.g., defect calculations,
GW) is still a major research topic.
13
Reasons why this wasn’t always just really obvious
Examples of (early) high-throughput studies
14
Application Researcher Search space Candidates Hit rate
Scintillators Klintenberg et al. 22,000 136 1/160
Curtarolo et al. 11,893 ? ?
Topological insulators Klintenberg et al. 60,000 17 1/3500
Curtarolo et al. 15,000 28 1/535
High TC superconductors Klintenberg et al. 60,000 139 1/430
Thermoelectrics – ICSD
- Half Heusler systems
- Half Heusler best ZT
Curtarolo et al. 2,500
80,000
80,000
20
75
18
1/125
1/1055
1/4400
1-photon water splitting Jacobsen et al. 19,000 20 1/950
2-photon water splitting Jacobsen et al. 19,000 12 1/1585
Transparent shields Jacobsen et al. 19,000 8 1/2375
Hg adsorbers Bligaard et al. 5,581 14 1/400
HER catalysts Greeley et al. 756 1 1/756*
Li ion battery cathodes Ceder et al. 20,000 4 1/5000*
Entries marked with * have experimentally verified the candidates.
See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
Computations predict, experiments confirm
15
Sidorenkite-based Li-ion battery
cathodes
YCuTe2 thermoelectrics
Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang,
Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite
(Na3MnPO4CO3): A New Intercalation Cathode Material
for Na-Ion Batteries, Chem. Mater., 2013
Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs,
ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M;
Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric
Properties of Intrinsically Doped YCuTe2 with CuTe4-based
Layered Structure. J. Mat. Chem C, 2016
More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
Li-M-O CO2 capture compounds
Dunstan, M. T., Jain, A., Liu, W., Ong, S. P., Liu, T., Lee,
J., Persson, K. A., Scott, S. A., Dennis, J. S. & Grey, C. .
Energy and Environmental Science (2016)
Outline
16
①  Density functional theory and “high-
throughput” screening of materials
–  High-throughput density functional theory
–  Searching for new thermoelectrics
②  Software to accelerate materials design
–  atomate
–  matminer
③  Text mining for materials design
–  “Materials Scholar”
Thermoelectric materials convert heat to electricity
•  A thermoelectric material
generates a voltage based
on thermal gradient
•  Applications
–  Heat to electricity
–  Refrigeration
•  Advantages include:
–  Reliability
–  Easy to scale to different
sizes (including compact)
17
www.alphabetenergy.com
Alphabet Energy – 25kW generator
Thermoelectric figure of merit
18
•  Many materials properties are important for thermoelectrics
•  Focus is usually on finding materials that possess a high “figure
of merit”, or zT, for high efficiency
•  Target: zT at least 1, ideally >2
ZT = α2σT/κ
power factor
>2 mW/mK2
(PbTe=10 mW/mK2)
Seebeck coefficient
> 100 V/K
Band structure + Boltztrap
electrical conductivity
> 103 /(ohm-cm)
Band structure + Boltztrap
thermal conductivity
< 1 W/(m*K)
•  e from Boltztrap
•  l difficult (phonon-phonon scattering)
Very difficult to balance these properties using intuition
alone!
Example: Seebeck and e– conductivity tradeoff
19
Heavy band:
ü  Large DOS
(higher Seebeck and more carriers)
✗ Large effective mass
(poor mobility)
Light band:
ü  Small effective mass
(improved mobility)
✗ Small DOS
(lower Seebeck, fewer carriers)
Multiple bands, off symmetry:
ü  Large DOS with small effective
mass
✗ Difficult to design!
E
k
Finding good thermoelectrics is tough –
can computations help?
•  Thermoelectric (TE) materials must exhibit properties that are
difficult to obtain simultaneously
•  Can theory / computation help? As proposed as early as 2003 by
Blake and Metiu1:
20
“With the cost of computing become relatively inexpensive one can
envisage a time where one runs multiple computer test tube
reactions like these on large Beowulf clusters - as a means of
screening for new TE materials. Certainly it appears that in the
future theory may be a very competent dance partner for what has
previously been a solo experimental effort in searching for ever
better TE materials.”
1. Blake and Metiu. Can theory help in the search for better thermoelectric materials? Chemistry, Physics,
and Materials Science of Thermoelectric Materials: Beyond Bismuth Telluride, 2003 !
But screening not trivial – difficult to predict properties
21
Chen,	W.	et	al.	Understanding	thermoelectric	properties	from	high-
throughput	calculations:	trends,	insights,	and	comparisons	with	
experiment.	J.	Mater.	Chem.	C	4,	4414–4426	(2016).	
zT = σS2/κ
power	factor	from	
constant,	fixed	relation	
time	approximation	and	
GGA	band	structures	
minimum	thermal	
conductivity	from	GGA	
elastic	constants
Getting more accurate results is not easy …
•  More accurate methods
exist, but they are:
–  not automatic
–  too computationally
expensive to run on many
compounds
•  Developing better
computational models is a
major research effort!
22
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
200 300 400 500 600 700 800 900 1000 1100
Mobility(cm2/V*s)
Temperature (K)
expt
AMSET
BoltzTraP
“AMSET”	model	being	developed	in	
our	group	for	Seebeck	coefficient	
and	mobility
In the meantime – work with what we have …
23
All data (~300GB total) is
available for direct download
through the Dryad repository
linked in the following
publication:
F. Ricci, W. Chen, U. Aydemir, G.J. Snyder, G.-M.
Rignanese, A. Jain, et al., An ab initio electronic
transport database for inorganic materials, Sci.
Data. 4 (2017) 170085.
New Materials from screening – TmAgTe2 (calcs)
24
Zhu, H.; Hautier, G.; Aydemir, U.; Gibbs, Z. M.; Li, G.; Bajaj, S.; Pöhls, J.-H.; Broberg, D.; Chen, W.; Jain, A.; White, M. A.; Asta,
M.; Snyder, G. J.; Persson, K.; Ceder, G. Computational and experimental investigation of TmAgTe 2 and XYZ 2 compounds, a
new group of thermoelectric materials identified by first-principles high-throughput screening, J. Mater. Chem. C, 2015, 3
•  Calculations:
trigonal p-
TmAgTe2 could
have power
factor up to 8
mW/mK2
•  requires 1020/cm3
carriers
TmAgTe2 (experiments)
25
Zhu, H.; Hautier, G.; Aydemir, U.; Gibbs, Z. M.; Li, G.; Bajaj, S.; Pöhls, J.-H.; Broberg, D.; Chen, W.; Jain, A.; White, M. A.; Asta,
M.; Snyder, G. J.; Persson, K.; Ceder, G. Computational and experimental investigation of TmAgTe 2 and XYZ 2 compounds, a
new group of thermoelectric materials identified by first-principles high-throughput screening, J. Mater. Chem. C, 2015, 3
•  Expt: p-zT only 0.35 despite
very low thermal
conductivity (~0.25 W/mK)
•  Limitation: carrier
concentration (~1017/cm3)
•  likely limited by TmAg
defects, as determined by
followup calculations
YCuTe2 – friendlier elements, higher zT (0.75)
26
Aydemir, U.; Pöhls, J.-H.; Zhu, H., Hautier, G.; Bajaj, S.; Gibbs, Z. M.; Chen, W.; Li, G.; Broberg, D.;
Kang, S.D.; White, M. A.; Asta, M.; Ceder, G.; Persson, K.; Jain, A.; Snyder, G. J. YCuTe2: A Member of
a New Class of Thermoelectric Materials with CuTe4-Based Layered Structure. J. Mat Chem C, 2016
experiment
computation
•  Calculations: p-YCuTe2
could only reach PF of 0.4
mW/mK2
•  SOC inhibits PF
•  if thermal conductivity is low
(e.g., 0.4, we get zT ~1)
•  Expt: zT ~0.75 – not too far
from calculation limit
•  carrier concentration of 1019
•  Decent performance, but
unlikely to be improved with
further optimization
Thermoelectrics screening: lessons so far
•  There are issues in being able to calculate the
information we need:
–  existing high-throughput techniques need to be more
accurate without greatly increasing computational cost
–  More importantly, computing doping limits is hard and
requires better methods of estimation
•  We and others are working to improve the state of
these issues – more ideas are needed!
•  We are also exploring new ways of searching for
thermoelectrics based on text mining (later in talk)
27
Outline
28
①  Density functional theory and “high-
throughput” screening of materials
–  High-throughput density functional theory
–  Searching for new thermoelectrics
②  Software to accelerate materials design
–  atomate
–  matminer
③  Text mining for materials design
–  “Materials Scholar”
With HT-DFT, we can generate data rapidly – what to do next?
29
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>7500 elastic
tensors
>1000
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
With HT-DFT, we can generate data rapidly – what to do next?
30
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>7500 elastic
tensors
>1000
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
Goal: make it easy to
generate comparable
data sets on your own
A “black-box” view of performing a calculation
31
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
32
lots of tedious,
low-level work…!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
Input	file	flags	
SLURM	format	
how	to	fix	ZPOTRF?	
	
		
q  set	up	the	structure	coordinates	
q  write	input	files,	double-check	all	
the	flags	
q  copy	to	supercomputer	
q  submit	job	to	queue	
q  deal	with	supercomputer	
headaches	
q  monitor	job	
q  fix	error	jobs,	resubmit	to	queue,	
wait	again	
q  repeat	process	for	subsequent	
calculations	in	workflow	
q  parse	output	files	to	obtain	results	
q  copy	and	organize	results,	e.g.,	into	
Excel
What would be a better way?
33
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?
What would be a better way?
34
Results!!
researcher!
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
Workflows to run!
q  band structure!
q  surface energies!
ü  elastic tensor!
q  Raman spectrum!
q  QH thermal expansion!
Ideally the method should scale to millions of calculations
35
Results!!
researcher!
Start	with	all	binary	
oxides,	replace	O->S,	
run	several	different	
properties	
Workflows to run!
ü  band structure!
ü  surface energies!
ü  elastic tensor!
q  Raman spectrum!
q  QH thermal expansion!
q  spin-orbit coupling!
Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
36
Results!!
researcher!
Run	many	different	
properties	of	many	
different	materials!
Each simulation procedure translates high-level instructions
into a series of low-level tasks
37
quickly and automatically translate high-level (minimal)
specifications into well-defined FireWorks workflows
What	is	the	
GGA-PBE	elastic	
tensor	of	GaAs?	
M.	De	Jong,	W.	Chen,	T.	Angsten,	A.	Jain,	R.	Notestine,	A.	Gamst,	et	al.,	
Charting	the	complete	elastic	properties	of	inorganic	crystalline	compounds,	
Sci.	Data.	2	(2015).
Atomate contains a library of simulation procedures
38
VASP-based
•  band structure
•  spin-orbit coupling
•  hybrid functional
calcs
•  elastic tensor
•  piezoelectric tensor
•  Raman spectra
•  NEB
•  GIBBS method
•  QH thermal
expansion
•  AIMD
•  ferroelectric
•  surface adsorption
•  work functions
Other
•  BoltzTraP
•  FEFF method
•  Q-Chem
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
39
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
Atomate thus encodes and standardizes knowledge about
running various kinds of simulations from domain experts
40
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators, about
how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
Atomate now powers the Materials Project
•  Online resource of density
functional theory simulation data
for ~85,000 inorganic materials
•  Includes band structures, elastic
tensors, piezoelectric tensors,
battery properties and more
•  >60,000 registered users
•  Free
•  www.materialsproject.org
41
Jain et al. Commentary: The Materials Project: A
materials genome approach to accelerating
materials innovation. APL Mater. 1, 11002 (2013).!
42
Getting started with atomate
Mathew, K. et al. Atomate: A high-
level interface to generate, execute,
and analyze computational
materials science workflows.
Comput. Mater. Sci. 139, 140–152
(2017).!
hackingmaterials.github.io
/atomate!
https://groups.google.com/
forum/#!forum/atomate!
Paper Docs Support
Outline
43
①  Density functional theory and “high-
throughput” screening of materials
–  High-throughput density functional theory
–  Searching for new thermoelectrics
②  Software to accelerate materials design
–  atomate
–  matminer
③  Text mining for materials design
–  “Materials Scholar”
44
Bottom-up vs top-down approach
Small number of
general principles
Large number of
specific cases
•  Conventional theory starts
with a small number of
principles and keeps
extending / simplifying to
tackle more and more cases
(growing the theory)
•  Data mining starts from a
*very* large space of
possible models and
removes ones that are
inconsistent with the data
(“trimming” the theories)
45
What is needed to do machine learning on materials?
How can we represent
chemistry and structure as
vectors?
How do we get
enough output
data for training?
Matminer connects materials data with data mining
algorithms and data visualization libraries
46
Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
>40 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
47
Matminer contains a library of descriptors for various
materials science entities
feat	=	EwaldEnergy([options])	
y	=	feat.featurize([input_data])	
•  compatible with scikit-
learn pipelining
•  automatically deploy
multiprocessing to
parallelize over data
•  include citations to
methodology papers
>40 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
48
Matminer contains a library of descriptors for various
materials science entities
feat	=	EwaldEnergy([options])	
y	=	feat.featurize([input_data])	
•  compatible with scikit-
learn pipelining
•  automatically deploy
multiprocessing to
parallelize over data
•  include citations to
methodology papers
49
The crystal structure is a core entity that
machine learning algorithms should know about
Step 1. Describe each site as a
fingerprint telling you similar it is
to each of 22 known local
environments (e.g., tetrahedral,
octahedral, etc.)
•  site à vector
Step 2: Describe each structure as
the average of its site fingerprints
(other site stats can be added)
•  structure à vector
tetrahedron	
octahedron	
distorted	8-coordinated	cube
Defining local order parameters for various environments
50
Use	a	given	local	order	parameter	
with	a	threshold	
for	motif	recognition:	
	
If	qtet	>	qthresh,	
				then	motif	is	tetrahedron.	
	
Else	
				not	(too	much)	a	tetrahedron.	
Tetrahedral order parameter, qtet, [1]:
[1] Zimmermann et al., J. Am. Chem. Soc., 2017, 10.1021/jacs.5b08098
We have now developed mathematical order parameters for
22 different local environments
51
Applications of order parameters
52
Defect / interstitial site identification
[1] Zimmermann et al., Frontiers of Materials, 2017, doi: 10.3389/fmats.2017.00034
Diffusion path characterization
Deployed on Materials Project: similar structure matching
53
https://www.materialsproject.org/materials/mp-91/!
Target: W
similar structures
(distance near 0)
Cs3Sb!
TiGaFeCo!
CeMg2Cu!
54
Can cluster
crystal structures
by “local
environment
similarity”
55
Matbench: use matminer to create a black-box optimizer
Dataset: 24,597 crystalline mats
Scoring: 10% held validation set
Test Set variability (MAD): 0.81 eV/atom
Literature MAE: 0.12 eV/atom (best)1
Matbench MAE: 0.122 ± 0.024 eV/atom
DFT Eform Exp. Eg
Dataset: 6,354 mats
Scoring: 20% held validation set
Test Set variability (SD): 1.5 eV
Literature RMSE: 0.45 eV (best)2
Matbench RMSE: 0.48 ± 0.07 eV
Regression Performance = Variability of Test Set
Average Predictive Error
56
Performance against literature best on two
regression problems
Problem 1:
DFT-based
formation energy of
bulk materials based
on composition +
structure
Problem 2:
experimental band
gap prediction based
on composition only
Choudhary et al. Physical
Review Materials, 2,
083801 (2018)
Zhuo et al. The Journal
of Physical Chemistry
Letters, 9, 1668 (2018)
57
Matminer – getting started
Ward et al. Matminer : An open
source toolkit for materials data
mining. Computational Materials
Science, 152, 60–69 (2018).!
Paper Docs Support
hackingmaterials.github.io
/matminer!
https://groups.google.com/
forum/#!forum/matminer!
Outline
58
①  Density functional theory and “high-
throughput” screening of materials
–  High-throughput density functional theory
–  Searching for new thermoelectrics
②  Software to accelerate materials design
–  atomate
–  matminer
③  Text mining for materials design
–  “Materials Scholar”
59
An engine to label the content of scientific abstracts
Collect, clean, and extract information from millions of
published materials science journal abstracts
60
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
61
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
•  Predicting thermoelectric compositions
–  Step 1: Start with all chemical compositions in our text
library
–  Step 2: Identify compositions with high correlation to
the word “thermoelectric” (details TBA)
–  Step 3: (optional) Filter out compositions explicitly
studied as thermoelectrics to yield only new
predictions
62
How about new materials discovery?
63
This method can predict thermoelectric materials
years in advance of actual discovery
Each year is trained only on abstracts published until that year
64
Text mining predictions correspond to materials with very
promising computed properties – without any simulations
a
Li2CuSb
b
optoe
in
electron
BaC
65
How does this work? (schematic)
•  Recently, many new methods are being
developed that combine theory, computation,
and machine learning to accelerate materials
design problems
•  The rate of change is accelerating and new ideas
are needed to bring things even further!
66
Conclusions
Quantum mechanics Density functional theory High-throughput DFT
e–	e–	
e–	 e–	
e–	 e–	
Materials databases Machine learning
1920s 1960s 2000s 2010s 2010s
•  High-throughput DFT & Materials Project
–  K.A. Persson, G. Ceder, S.P. Ong & many others
•  Thermoelectrics screening
–  G. Snyder, G. Hautier, M.A. White, & many others on the team
•  Atomate
–  K. Mathew & development team
•  Matminer
–  L. Ward & development team
•  Text mining
–  V. Tshitoyan, J. Dagdelen, L. Weston & collaborators
•  Funding:
–  DOE-BES (MP)
–  DOE-BES (ECRP)
–  Toyota Research Institute
•  Computing: NERSC
67
Thank you!
Slides (already) posted to hackingmaterials.lbl.gov

Weitere ähnliche Inhalte

Was ist angesagt?

Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...Anubhav Jain
 
Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...
Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...
Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...Boniface Y. Antwi
 
Lecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine LearningLecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine LearningDanielSchwalbeKoda
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 
BIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsBIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsbios203
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
 
Density functional theory
Density functional theoryDensity functional theory
Density functional theorysandhya singh
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional TheoryWesley Chen
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
The thermo electric effect
The thermo electric effectThe thermo electric effect
The thermo electric effectANANDHU THAMPI
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Scienceaimsnist
 
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...ABDERRAHMANE REGGAD
 
Quantum-Espresso_10_8_14
Quantum-Espresso_10_8_14Quantum-Espresso_10_8_14
Quantum-Espresso_10_8_14cjfoss
 
josphenson junction
josphenson junctionjosphenson junction
josphenson junctionKumar Vivek
 

Was ist angesagt? (20)

Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...
 
Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...
Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...
Acceptor–donor–acceptor small molecules based on derivatives of 3,4-ethylened...
 
Lecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine LearningLecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine Learning
 
NANO266 - Lecture 4 - Introduction to DFT
NANO266 - Lecture 4 - Introduction to DFTNANO266 - Lecture 4 - Introduction to DFT
NANO266 - Lecture 4 - Introduction to DFT
 
Introduction to DFT Part 2
Introduction to DFT Part 2Introduction to DFT Part 2
Introduction to DFT Part 2
 
Single Electron Transistor
Single Electron TransistorSingle Electron Transistor
Single Electron Transistor
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
BIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsBIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamics
 
Electrochemical CO2 reduction in acidic electrolyte.pptx
Electrochemical CO2 reduction in acidic electrolyte.pptxElectrochemical CO2 reduction in acidic electrolyte.pptx
Electrochemical CO2 reduction in acidic electrolyte.pptx
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
Density functional theory
Density functional theoryDensity functional theory
Density functional theory
 
NANO266 - Lecture 5 - Exchange-Correlation Functionals
NANO266 - Lecture 5 - Exchange-Correlation FunctionalsNANO266 - Lecture 5 - Exchange-Correlation Functionals
NANO266 - Lecture 5 - Exchange-Correlation Functionals
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theory
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
The thermo electric effect
The thermo electric effectThe thermo electric effect
The thermo electric effect
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
 
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
 
Quantum-Espresso_10_8_14
Quantum-Espresso_10_8_14Quantum-Espresso_10_8_14
Quantum-Espresso_10_8_14
 
josphenson junction
josphenson junctionjosphenson junction
josphenson junction
 

Ähnlich wie Software tools, crystal descriptors, and machine learning applied to materials design

Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
The Materials Project and computational materials discovery
The Materials Project and computational materials discoveryThe Materials Project and computational materials discovery
The Materials Project and computational materials discoveryAnubhav Jain
 
Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Anubhav Jain
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Anubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...Anubhav Jain
 
Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Anubhav Jain
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designUniversity of California, San Diego
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
Combining High-Throughput Computing and Statistical Learning to Develop and U...
Combining High-Throughput Computing and Statistical Learning to Develop and U...Combining High-Throughput Computing and Statistical Learning to Develop and U...
Combining High-Throughput Computing and Statistical Learning to Develop and U...Anubhav Jain
 
The Missing Fundamental Element
The Missing Fundamental ElementThe Missing Fundamental Element
The Missing Fundamental ElementSaurav Roy
 
Superconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlookSuperconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlookGabriel O'Brien
 
Organic Charge Trapping Memory Transistors
Organic Charge Trapping Memory TransistorsOrganic Charge Trapping Memory Transistors
Organic Charge Trapping Memory TransistorsAndre Zamith Cardoso
 
Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...Anubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 

Ähnlich wie Software tools, crystal descriptors, and machine learning applied to materials design (20)

Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
The Materials Project and computational materials discovery
The Materials Project and computational materials discoveryThe Materials Project and computational materials discovery
The Materials Project and computational materials discovery
 
Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Combining High-Throughput Computing and Statistical Learning to Develop and U...
Combining High-Throughput Computing and Statistical Learning to Develop and U...Combining High-Throughput Computing and Statistical Learning to Develop and U...
Combining High-Throughput Computing and Statistical Learning to Develop and U...
 
The Missing Fundamental Element
The Missing Fundamental ElementThe Missing Fundamental Element
The Missing Fundamental Element
 
Superconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlookSuperconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlook
 
Organic Charge Trapping Memory Transistors
Organic Charge Trapping Memory TransistorsOrganic Charge Trapping Memory Transistors
Organic Charge Trapping Memory Transistors
 
On the search for novel materials: insight and discovery through sharing of b...
On the search for novel materials: insight and discovery through sharing of b...On the search for novel materials: insight and discovery through sharing of b...
On the search for novel materials: insight and discovery through sharing of b...
 
Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 

Mehr von Anubhav Jain

Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst DesignAnubhav Jain
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAnubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials ProjectAnubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectAnubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 

Mehr von Anubhav Jain (20)

Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 

Kürzlich hochgeladen

GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 

Kürzlich hochgeladen (20)

Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 

Software tools, crystal descriptors, and machine learning applied to materials design

  • 1. Software tools, crystal descriptors, and machine learning applied to materials design Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA Presentation to Computational Materials at Berkeley, Oct 16 2018 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. 2 Materials and their properties decide what is technologically possible Electric vehicles and solar power are two technologies that have been dreamed about for many decades, but did not have much real impact for a long time … 1910 1956
  • 3. 3 Materials and their properties decide what is technologically possible Today’s revolution in clean energy technologies are largely due to advancements in materials – science, engineering, and manufacturing. Much else might be possible with better materials … but, as past examples demonstrate, it can take a long time.
  • 4. What constrains traditional approaches to materials design? 4 “[The Chevrel] discovery resulted from a lot of unsuccessful experiments of Mg ions insertion into well-known hosts for Li+ ions insertion, as well as from the thorough literature analysis concerning the possibility of divalent ions intercalation into inorganic materials.” -Aurbach group, on discovery of Chevrel cathode for multivalent (e.g., Mg2+) batteries Levi, Levi, Chasid, Aurbach J. Electroceramics (2009)
  • 5. Outline 5 ①  Density functional theory and “high- throughput” screening of materials –  High-throughput density functional theory –  Searching for new thermoelectrics ②  Software to accelerate materials design –  atomate –  matminer ③  Text mining for materials design –  “Materials Scholar”
  • 6. Density functional theory (DFT) models materials properties from first principles 6 •  1920s: The Schrödinger equation for quantum mechanics essentially contains all of chemistry embedded within it •  it is almost always too complicated to solve due to the numerous electron interactions and complexity of the wave function entity •  1960s: DFT is developed and reframes the problem for ground state properties of the system to separate interactions and written in terms of the charge density, not wavefunction •  makes solutions tractable while in principle not sacrificing accuracy for the ground state! e– e– e– e– e– e–
  • 7. How does one use DFT to design new materials? 7 A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  • 8. How accurate is density functional theory in practice? 8 Shown are typical DFT results for (i) Li battery voltages, (ii) electronic band gaps, and (iii) bulk modulus (i) (ii) (iii) (i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder, Phys. Rev. B 82, 075122 (2010). (ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010). (iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009 (2015). battery voltages band gaps bulk modulus
  • 9. DFT is limited in length + time scale (addressed by other techniques) 9 Source: NASA
  • 10. High-throughput DFT: a key idea 10 Automate the DFT procedure Supercomputing Power FireWorks Software for programming general computational workflows that can be scaled across large supercomputers. NERSC Supercomputing center, processor count is ~100,000 desktop machines. Other centers are also viable. High-throughput materials screening G. Ceder & K.A. Persson, Scientific American (2015)
  • 11. •  “DFT is too computationally expensive” –  Quote from a critic in ~2008: “It would be much too expensive to ever apply DFT to the entire ICSD database” –  DFT implementations got faster, computers got much faster and expensive: people failed to extrapolate where computing would be in 5-10 years. –  e.g., DOE INCITE program awarded 5 million CPU hours in 2003. It awarded 6 billion CPU-hours in 2018. –  Things rarely get 1000X more powerful in 15 years. Exponential growth is hard to visualize … 11 Reasons why this wasn’t always just really obvious
  • 12. •  “DFT is too computationally expensive” •  “Managing all those calculations would be hard” –  G. Ceder, high-throughput pioneer, ~2006: “If you can build a system that even just keeps track of where all our calculations are, I would already be impressed”. 12 Reasons why this wasn’t always just really obvious Facebook in 2005 was valued at ~$100 million for building something that looked like this …
  • 13. •  “DFT is too computationally expensive” •  “Managing all those calculations would be hard” •  “DFT requires a trained researcher to set the correct parameters, monitor job, & fix errors” –  Standard calculations are actually quite routine. Many things are in fact better as “automatic” – after all, shouldn’t ab initio mean no tunable parameters? –  Error correction is also better automated with a big database of fixes rather than ad-hoc. –  However – expanding the scope of what can be automated without researcher intervention (e.g., defect calculations, GW) is still a major research topic. 13 Reasons why this wasn’t always just really obvious
  • 14. Examples of (early) high-throughput studies 14 Application Researcher Search space Candidates Hit rate Scintillators Klintenberg et al. 22,000 136 1/160 Curtarolo et al. 11,893 ? ? Topological insulators Klintenberg et al. 60,000 17 1/3500 Curtarolo et al. 15,000 28 1/535 High TC superconductors Klintenberg et al. 60,000 139 1/430 Thermoelectrics – ICSD - Half Heusler systems - Half Heusler best ZT Curtarolo et al. 2,500 80,000 80,000 20 75 18 1/125 1/1055 1/4400 1-photon water splitting Jacobsen et al. 19,000 20 1/950 2-photon water splitting Jacobsen et al. 19,000 12 1/1585 Transparent shields Jacobsen et al. 19,000 8 1/2375 Hg adsorbers Bligaard et al. 5,581 14 1/400 HER catalysts Greeley et al. 756 1 1/756* Li ion battery cathodes Ceder et al. 20,000 4 1/5000* Entries marked with * have experimentally verified the candidates. See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
  • 15. Computations predict, experiments confirm 15 Sidorenkite-based Li-ion battery cathodes YCuTe2 thermoelectrics Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang, Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite (Na3MnPO4CO3): A New Intercalation Cathode Material for Na-Ion Batteries, Chem. Mater., 2013 Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs, ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M; Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric Properties of Intrinsically Doped YCuTe2 with CuTe4-based Layered Structure. J. Mat. Chem C, 2016 More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016). Li-M-O CO2 capture compounds Dunstan, M. T., Jain, A., Liu, W., Ong, S. P., Liu, T., Lee, J., Persson, K. A., Scott, S. A., Dennis, J. S. & Grey, C. . Energy and Environmental Science (2016)
  • 16. Outline 16 ①  Density functional theory and “high- throughput” screening of materials –  High-throughput density functional theory –  Searching for new thermoelectrics ②  Software to accelerate materials design –  atomate –  matminer ③  Text mining for materials design –  “Materials Scholar”
  • 17. Thermoelectric materials convert heat to electricity •  A thermoelectric material generates a voltage based on thermal gradient •  Applications –  Heat to electricity –  Refrigeration •  Advantages include: –  Reliability –  Easy to scale to different sizes (including compact) 17 www.alphabetenergy.com Alphabet Energy – 25kW generator
  • 18. Thermoelectric figure of merit 18 •  Many materials properties are important for thermoelectrics •  Focus is usually on finding materials that possess a high “figure of merit”, or zT, for high efficiency •  Target: zT at least 1, ideally >2 ZT = α2σT/κ power factor >2 mW/mK2 (PbTe=10 mW/mK2) Seebeck coefficient > 100 V/K Band structure + Boltztrap electrical conductivity > 103 /(ohm-cm) Band structure + Boltztrap thermal conductivity < 1 W/(m*K) •  e from Boltztrap •  l difficult (phonon-phonon scattering) Very difficult to balance these properties using intuition alone!
  • 19. Example: Seebeck and e– conductivity tradeoff 19 Heavy band: ü  Large DOS (higher Seebeck and more carriers) ✗ Large effective mass (poor mobility) Light band: ü  Small effective mass (improved mobility) ✗ Small DOS (lower Seebeck, fewer carriers) Multiple bands, off symmetry: ü  Large DOS with small effective mass ✗ Difficult to design! E k
  • 20. Finding good thermoelectrics is tough – can computations help? •  Thermoelectric (TE) materials must exhibit properties that are difficult to obtain simultaneously •  Can theory / computation help? As proposed as early as 2003 by Blake and Metiu1: 20 “With the cost of computing become relatively inexpensive one can envisage a time where one runs multiple computer test tube reactions like these on large Beowulf clusters - as a means of screening for new TE materials. Certainly it appears that in the future theory may be a very competent dance partner for what has previously been a solo experimental effort in searching for ever better TE materials.” 1. Blake and Metiu. Can theory help in the search for better thermoelectric materials? Chemistry, Physics, and Materials Science of Thermoelectric Materials: Beyond Bismuth Telluride, 2003 !
  • 21. But screening not trivial – difficult to predict properties 21 Chen, W. et al. Understanding thermoelectric properties from high- throughput calculations: trends, insights, and comparisons with experiment. J. Mater. Chem. C 4, 4414–4426 (2016). zT = σS2/κ power factor from constant, fixed relation time approximation and GGA band structures minimum thermal conductivity from GGA elastic constants
  • 22. Getting more accurate results is not easy … •  More accurate methods exist, but they are: –  not automatic –  too computationally expensive to run on many compounds •  Developing better computational models is a major research effort! 22 1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07 1.0E+08 200 300 400 500 600 700 800 900 1000 1100 Mobility(cm2/V*s) Temperature (K) expt AMSET BoltzTraP “AMSET” model being developed in our group for Seebeck coefficient and mobility
  • 23. In the meantime – work with what we have … 23 All data (~300GB total) is available for direct download through the Dryad repository linked in the following publication: F. Ricci, W. Chen, U. Aydemir, G.J. Snyder, G.-M. Rignanese, A. Jain, et al., An ab initio electronic transport database for inorganic materials, Sci. Data. 4 (2017) 170085.
  • 24. New Materials from screening – TmAgTe2 (calcs) 24 Zhu, H.; Hautier, G.; Aydemir, U.; Gibbs, Z. M.; Li, G.; Bajaj, S.; Pöhls, J.-H.; Broberg, D.; Chen, W.; Jain, A.; White, M. A.; Asta, M.; Snyder, G. J.; Persson, K.; Ceder, G. Computational and experimental investigation of TmAgTe 2 and XYZ 2 compounds, a new group of thermoelectric materials identified by first-principles high-throughput screening, J. Mater. Chem. C, 2015, 3 •  Calculations: trigonal p- TmAgTe2 could have power factor up to 8 mW/mK2 •  requires 1020/cm3 carriers
  • 25. TmAgTe2 (experiments) 25 Zhu, H.; Hautier, G.; Aydemir, U.; Gibbs, Z. M.; Li, G.; Bajaj, S.; Pöhls, J.-H.; Broberg, D.; Chen, W.; Jain, A.; White, M. A.; Asta, M.; Snyder, G. J.; Persson, K.; Ceder, G. Computational and experimental investigation of TmAgTe 2 and XYZ 2 compounds, a new group of thermoelectric materials identified by first-principles high-throughput screening, J. Mater. Chem. C, 2015, 3 •  Expt: p-zT only 0.35 despite very low thermal conductivity (~0.25 W/mK) •  Limitation: carrier concentration (~1017/cm3) •  likely limited by TmAg defects, as determined by followup calculations
  • 26. YCuTe2 – friendlier elements, higher zT (0.75) 26 Aydemir, U.; Pöhls, J.-H.; Zhu, H., Hautier, G.; Bajaj, S.; Gibbs, Z. M.; Chen, W.; Li, G.; Broberg, D.; Kang, S.D.; White, M. A.; Asta, M.; Ceder, G.; Persson, K.; Jain, A.; Snyder, G. J. YCuTe2: A Member of a New Class of Thermoelectric Materials with CuTe4-Based Layered Structure. J. Mat Chem C, 2016 experiment computation •  Calculations: p-YCuTe2 could only reach PF of 0.4 mW/mK2 •  SOC inhibits PF •  if thermal conductivity is low (e.g., 0.4, we get zT ~1) •  Expt: zT ~0.75 – not too far from calculation limit •  carrier concentration of 1019 •  Decent performance, but unlikely to be improved with further optimization
  • 27. Thermoelectrics screening: lessons so far •  There are issues in being able to calculate the information we need: –  existing high-throughput techniques need to be more accurate without greatly increasing computational cost –  More importantly, computing doping limits is hard and requires better methods of estimation •  We and others are working to improve the state of these issues – more ideas are needed! •  We are also exploring new ways of searching for thermoelectrics based on text mining (later in talk) 27
  • 28. Outline 28 ①  Density functional theory and “high- throughput” screening of materials –  High-throughput density functional theory –  Searching for new thermoelectrics ②  Software to accelerate materials design –  atomate –  matminer ③  Text mining for materials design –  “Materials Scholar”
  • 29. With HT-DFT, we can generate data rapidly – what to do next? 29 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >7500 elastic tensors >1000 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)!
  • 30. With HT-DFT, we can generate data rapidly – what to do next? 30 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >7500 elastic tensors >1000 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)! Goal: make it easy to generate comparable data sets on your own
  • 31. A “black-box” view of performing a calculation 31 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  • 32. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 32 lots of tedious, low-level work…! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q  set up the structure coordinates q  write input files, double-check all the flags q  copy to supercomputer q  submit job to queue q  deal with supercomputer headaches q  monitor job q  fix error jobs, resubmit to queue, wait again q  repeat process for subsequent calculations in workflow q  parse output files to obtain results q  copy and organize results, e.g., into Excel
  • 33. What would be a better way? 33 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  • 34. What would be a better way? 34 Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Workflows to run! q  band structure! q  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion!
  • 35. Ideally the method should scale to millions of calculations 35 Results!! researcher! Start with all binary oxides, replace O->S, run several different properties Workflows to run! ü  band structure! ü  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion! q  spin-orbit coupling!
  • 36. Atomate tries make it easy, automatic, and flexible to generate data with existing simulation packages 36 Results!! researcher! Run many different properties of many different materials!
  • 37. Each simulation procedure translates high-level instructions into a series of low-level tasks 37 quickly and automatically translate high-level (minimal) specifications into well-defined FireWorks workflows What is the GGA-PBE elastic tensor of GaAs? M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al., Charting the complete elastic properties of inorganic crystalline compounds, Sci. Data. 2 (2015).
  • 38. Atomate contains a library of simulation procedures 38 VASP-based •  band structure •  spin-orbit coupling •  hybrid functional calcs •  elastic tensor •  piezoelectric tensor •  Raman spectra •  NEB •  GIBBS method •  QH thermal expansion •  AIMD •  ferroelectric •  surface adsorption •  work functions Other •  BoltzTraP •  FEFF method •  Q-Chem Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  • 39. 39 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  • 40. Atomate thus encodes and standardizes knowledge about running various kinds of simulations from domain experts 40 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  • 41. Atomate now powers the Materials Project •  Online resource of density functional theory simulation data for ~85,000 inorganic materials •  Includes band structures, elastic tensors, piezoelectric tensors, battery properties and more •  >60,000 registered users •  Free •  www.materialsproject.org 41 Jain et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).!
  • 42. 42 Getting started with atomate Mathew, K. et al. Atomate: A high- level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci. 139, 140–152 (2017).! hackingmaterials.github.io /atomate! https://groups.google.com/ forum/#!forum/atomate! Paper Docs Support
  • 43. Outline 43 ①  Density functional theory and “high- throughput” screening of materials –  High-throughput density functional theory –  Searching for new thermoelectrics ②  Software to accelerate materials design –  atomate –  matminer ③  Text mining for materials design –  “Materials Scholar”
  • 44. 44 Bottom-up vs top-down approach Small number of general principles Large number of specific cases •  Conventional theory starts with a small number of principles and keeps extending / simplifying to tackle more and more cases (growing the theory) •  Data mining starts from a *very* large space of possible models and removes ones that are inconsistent with the data (“trimming” the theories)
  • 45. 45 What is needed to do machine learning on materials? How can we represent chemistry and structure as vectors? How do we get enough output data for training?
  • 46. Matminer connects materials data with data mining algorithms and data visualization libraries 46 Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
  • 47. >40 featurizer classes can generate thousands of potential descriptors that are described in the literature 47 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) •  compatible with scikit- learn pipelining •  automatically deploy multiprocessing to parallelize over data •  include citations to methodology papers
  • 48. >40 featurizer classes can generate thousands of potential descriptors that are described in the literature 48 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) •  compatible with scikit- learn pipelining •  automatically deploy multiprocessing to parallelize over data •  include citations to methodology papers
  • 49. 49 The crystal structure is a core entity that machine learning algorithms should know about Step 1. Describe each site as a fingerprint telling you similar it is to each of 22 known local environments (e.g., tetrahedral, octahedral, etc.) •  site à vector Step 2: Describe each structure as the average of its site fingerprints (other site stats can be added) •  structure à vector tetrahedron octahedron distorted 8-coordinated cube
  • 50. Defining local order parameters for various environments 50 Use a given local order parameter with a threshold for motif recognition: If qtet > qthresh, then motif is tetrahedron. Else not (too much) a tetrahedron. Tetrahedral order parameter, qtet, [1]: [1] Zimmermann et al., J. Am. Chem. Soc., 2017, 10.1021/jacs.5b08098
  • 51. We have now developed mathematical order parameters for 22 different local environments 51
  • 52. Applications of order parameters 52 Defect / interstitial site identification [1] Zimmermann et al., Frontiers of Materials, 2017, doi: 10.3389/fmats.2017.00034 Diffusion path characterization
  • 53. Deployed on Materials Project: similar structure matching 53 https://www.materialsproject.org/materials/mp-91/! Target: W similar structures (distance near 0) Cs3Sb! TiGaFeCo! CeMg2Cu!
  • 54. 54 Can cluster crystal structures by “local environment similarity”
  • 55. 55 Matbench: use matminer to create a black-box optimizer
  • 56. Dataset: 24,597 crystalline mats Scoring: 10% held validation set Test Set variability (MAD): 0.81 eV/atom Literature MAE: 0.12 eV/atom (best)1 Matbench MAE: 0.122 ± 0.024 eV/atom DFT Eform Exp. Eg Dataset: 6,354 mats Scoring: 20% held validation set Test Set variability (SD): 1.5 eV Literature RMSE: 0.45 eV (best)2 Matbench RMSE: 0.48 ± 0.07 eV Regression Performance = Variability of Test Set Average Predictive Error 56 Performance against literature best on two regression problems Problem 1: DFT-based formation energy of bulk materials based on composition + structure Problem 2: experimental band gap prediction based on composition only Choudhary et al. Physical Review Materials, 2, 083801 (2018) Zhuo et al. The Journal of Physical Chemistry Letters, 9, 1668 (2018)
  • 57. 57 Matminer – getting started Ward et al. Matminer : An open source toolkit for materials data mining. Computational Materials Science, 152, 60–69 (2018).! Paper Docs Support hackingmaterials.github.io /matminer! https://groups.google.com/ forum/#!forum/matminer!
  • 58. Outline 58 ①  Density functional theory and “high- throughput” screening of materials –  High-throughput density functional theory –  Searching for new thermoelectrics ②  Software to accelerate materials design –  atomate –  matminer ③  Text mining for materials design –  “Materials Scholar”
  • 59. 59 An engine to label the content of scientific abstracts Collect, clean, and extract information from millions of published materials science journal abstracts
  • 60. 60 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  • 61. 61 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  • 62. •  Predicting thermoelectric compositions –  Step 1: Start with all chemical compositions in our text library –  Step 2: Identify compositions with high correlation to the word “thermoelectric” (details TBA) –  Step 3: (optional) Filter out compositions explicitly studied as thermoelectrics to yield only new predictions 62 How about new materials discovery?
  • 63. 63 This method can predict thermoelectric materials years in advance of actual discovery Each year is trained only on abstracts published until that year
  • 64. 64 Text mining predictions correspond to materials with very promising computed properties – without any simulations a Li2CuSb b optoe in electron BaC
  • 65. 65 How does this work? (schematic)
  • 66. •  Recently, many new methods are being developed that combine theory, computation, and machine learning to accelerate materials design problems •  The rate of change is accelerating and new ideas are needed to bring things even further! 66 Conclusions Quantum mechanics Density functional theory High-throughput DFT e– e– e– e– e– e– Materials databases Machine learning 1920s 1960s 2000s 2010s 2010s
  • 67. •  High-throughput DFT & Materials Project –  K.A. Persson, G. Ceder, S.P. Ong & many others •  Thermoelectrics screening –  G. Snyder, G. Hautier, M.A. White, & many others on the team •  Atomate –  K. Mathew & development team •  Matminer –  L. Ward & development team •  Text mining –  V. Tshitoyan, J. Dagdelen, L. Weston & collaborators •  Funding: –  DOE-BES (MP) –  DOE-BES (ECRP) –  Toyota Research Institute •  Computing: NERSC 67 Thank you! Slides (already) posted to hackingmaterials.lbl.gov