Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Accelerated Materials Discovery Using Theory,
Optimization, and Natural Language Processing
Anubhav Jain
Energy Technologi...
2
Today, computer aided design of products is ubiquitous
3
The software for CAD has progressed by leaps and bounds
over the years
4
Materials theory is like CAD for materials –
but some of the software tools may need upgrades
Think of solution
Manually...
5
We’ve been building a comprehensive software pipeline for
virtual materials design
Researcher
ideas
ML-based
ideas
Visua...
6
What are the different components of the pipeline?
Researcher
ideas
ML-based
ideas
Visual
interface for
exploring all
re...
7
Given a search domain, the goal of our ”rocketsled” software
is to find the best solutions in as few calculations as pos...
8
There exists many packages for optimization already,
but rocketsled can offload expensive calculations to HPC
BayesOpt
S...
9
Rocketsled also allows you to insert into your own
descriptors into the optimization
At each point, you can add a vector...
• Rocketsled uses the scikit-optimize as the default
backend, which implements:
– Gaussian Process
– Random Forest
– Gradi...
11
We’ve tested rocketsled on a “mock” problem in which
answers were pre-computed with density functional theory
Can rocke...
• Random
– Obvious, but too easy to beat
– Let’s also try harder …
• Prior genetic algorithm study on the same problem
• C...
13
Rocketsled can find solutions much faster than other
methods
14
The “speedup” can be 15-30X faster than random
15
Visualization of search space sampled with and without
optimization on a ”superhard” materials design problem
7,394 mat...
• Do more with less computational budget
– e.g,. confidently find the best solutions when you have
much fewer calculations...
17
More information on Rocketsled
Dunn, A., Brenneck, J. & Jain, A.
Rocketsled: a software library for
optimizing high-thr...
18
What are the different components of the pipeline?
Researcher
ideas
ML-based
ideas
Visual
interface for
exploring all
r...
• I was at first interested in the potential of NLP to
save us from the tedious task of figuring out
which of our “predict...
20
“Solution v1”: manually make a list of all the thermoelectrics
I could find and write an algorithm for similarity
21
“Solution v1”: manually make a list of all the thermoelectrics
I could find and write an algorithm for similarity
22
“Solution v1”: manually make a list of all the thermoelectrics
I could find and write an algorithm for similarity
There...
Extracted ~2 million
abstracts of relevant
scientific articles
Use natural language
processing algorithms
to try to extrac...
24
Developed algorithms to automatically tag keywords in the
abstracts based on word2vec and LSTM networks
Weston, L. et a...
25
Now we can search!
Live on www.matscholar.com
26
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
27
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
28
Could these techniques also be used to predict which
materials we might want to screen for an application?
papers to re...
• We use the word2vec
algorithm (Google) to turn
each unique word in our
corpus into a 200-
dimensional vector
• These vec...
• We use the word2vec
algorithm (Google) to turn
each unique word in our
corpus into a 200-
dimensional vector
• These vec...
• Dot product of a composition word
with the word “thermoelectric”
essentially predicts how likely that
word is to appear ...
“Go back in time”
approach:
– For every year since
2001, see which
compounds we would
have predicted using only
literature...
• Thus far, 2 of our top 20 predictions made in
~August 2018 have already been reported in the
literature for the first ti...
• We’ve been building many software tools for
better computer-aided materials design
• Optimization algorithms and NLP wil...
35
Acknowledgements
Slides (already) posted to hackingmaterials.lbl.gov
• Rocketsled
– Alex Dunn
– U.S. Department of Ener...
Nächste SlideShare
Wird geladen in …5
×

von

Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 1 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 2 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 3 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 4 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 5 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 6 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 7 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 8 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 9 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 10 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 11 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 12 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 13 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 14 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 15 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 16 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 17 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 18 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 19 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 20 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 21 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 22 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 23 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 24 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 25 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 26 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 27 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 28 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 29 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 30 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 31 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 32 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 33 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 34 Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Slide 35
Nächste SlideShare
What to Upload to SlideShare
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

0 Gefällt mir

Teilen

Herunterladen, um offline zu lesen

Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing

Herunterladen, um offline zu lesen

Presentation given at MRS Fall 2019, Boston MA, Dec 2019

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen
  • Gehören Sie zu den Ersten, denen das gefällt!

Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing

  1. 1. Accelerated Materials Discovery Using Theory, Optimization, and Natural Language Processing Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA MRS Fall Meeting 2019 Slides (already) posted to hackingmaterials.lbl.gov
  2. 2. 2 Today, computer aided design of products is ubiquitous
  3. 3. 3 The software for CAD has progressed by leaps and bounds over the years
  4. 4. 4 Materials theory is like CAD for materials – but some of the software tools may need upgrades Think of solution Manually run some calculations
  5. 5. 5 We’ve been building a comprehensive software pipeline for virtual materials design Researcher ideas ML-based ideas Visual interface for exploring all results experiments
  6. 6. 6 What are the different components of the pipeline? Researcher ideas ML-based ideas Visual interface for exploring all results experiments
  7. 7. 7 Given a search domain, the goal of our ”rocketsled” software is to find the best solutions in as few calculations as possible https://github.com/hackingmaterials/ rocketsled
  8. 8. 8 There exists many packages for optimization already, but rocketsled can offload expensive calculations to HPC BayesOpt Scikit-optimize
  9. 9. 9 Rocketsled also allows you to insert into your own descriptors into the optimization At each point, you can add a vector of physical descriptors to help the optimizer Search space
  10. 10. • Rocketsled uses the scikit-optimize as the default backend, which implements: – Gaussian Process – Random Forest – Gradient Boosted trees • You can choose your choice of acquisition function – Expected improvement – Probability of Improvement – Greedy algorithm – etc… • You can write your own custom optimizer in Python and use it – so anything is allowed! 10 What optimizers are available in rocketsled?
  11. 11. 11 We’ve tested rocketsled on a “mock” problem in which answers were pre-computed with density functional theory Can rocketsled find the good solutions with fewer calculations than a benchmark? 18,928 cubic perovskites: ABX3 A: 1 of 52 metal cations B: 1 of 52 metal cations X3: One of 7 anions solarchoice.net.au/blog/news/perovskites-the-next-solar-pv-revolution-240714 *Either direct or indirect band gap can be used. Search space ordered according to atomic no. rank. Scores of compounds are represented by color. Solutions: 20 possible one-photon solar water splitters, based on: 1. Enthalpy of formation <0.2eV 2. Band gap* 1.5-3.0eV 3. Band* edges straddle H+/H2 and H2O/O2 E levels
  12. 12. • Random – Obvious, but too easy to beat – Let’s also try harder … • Prior genetic algorithm study on the same problem • Chemical rules – Compound must (i) be charge balanced and (ii) have even number of e- (for gap) • This eliminates 60% of the search space outright!! – Rank remaining compounds by distance of Goldschmidt tolerance factor to the ideal value of 1. 12 What are some good benchmarks to compare against?
  13. 13. 13 Rocketsled can find solutions much faster than other methods
  14. 14. 14 The “speedup” can be 15-30X faster than random
  15. 15. 15 Visualization of search space sampled with and without optimization on a ”superhard” materials design problem 7,394 mats. with elastic tensors calculated Search space: Common name K (GPa) G (GPa) Londsdaleite 435.661 522.922 Diamond 435.686 520.267 ß-C3N4 408.925 312.428 Rhenium Nitride 379.804 253.458 Tungsten carbide 385.194 278.96 Osmium 401.328 258.697 w-BN 373.241 383.285 Diamondlike-Boron Carbide 378 347
  16. 16. • Do more with less computational budget – e.g,. confidently find the best solutions when you have much fewer calculations to spend than possibilities • Get good results faster – Even if you plan to compute everything, why not get the best answers in week 1 instead of week 30? • The main downside is added complexity – If you are using our automation tools (FireWorks, atomate, etc.) then rocketsled removes the complexity of incorporating optimization 16 Potential benefits and downsides of optimization in high-throughput computational searches
  17. 17. 17 More information on Rocketsled Dunn, A., Brenneck, J. & Jain, A. Rocketsled: a software library for optimizing high-throughput computational searches. J. Phys. Mater. 2, 034002 (2019). hackingmaterials.github.io/ rocketsled https://discuss.matsci.org (use FireWorks forum) Paper Docs Support
  18. 18. 18 What are the different components of the pipeline? Researcher ideas ML-based ideas Visual interface for exploring all results experiments
  19. 19. • I was at first interested in the potential of NLP to save us from the tedious task of figuring out which of our “predictions” were already studied • For example, we would manually go through a list of 100 predictions, doing a literature review for every single one, need to find similar compounds as well, etc. – Mainly for our search for novel thermoelectrics 19 How might natural language processing help us in computational screening?
  20. 20. 20 “Solution v1”: manually make a list of all the thermoelectrics I could find and write an algorithm for similarity
  21. 21. 21 “Solution v1”: manually make a list of all the thermoelectrics I could find and write an algorithm for similarity
  22. 22. 22 “Solution v1”: manually make a list of all the thermoelectrics I could find and write an algorithm for similarity There had to be a better way!!
  23. 23. Extracted ~2 million abstracts of relevant scientific articles Use natural language processing algorithms to try to extract knowledge from all this data 23 Instead – use computers to compile the lists on our behalf
  24. 24. 24 Developed algorithms to automatically tag keywords in the abstracts based on word2vec and LSTM networks Weston, L. et al Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature. J. Chem. Inf. Model. (2019).
  25. 25. 25 Now we can search! Live on www.matscholar.com
  26. 26. 26 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  27. 27. 27 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  28. 28. 28 Could these techniques also be used to predict which materials we might want to screen for an application? papers to read “someday” NLP algorithms
  29. 29. • We use the word2vec algorithm (Google) to turn each unique word in our corpus into a 200- dimensional vector • These vectors encode the meaning of each word meaning based on trying to predict context words around the target 29 Key concept 1: the word2vec algorithm
  30. 30. • We use the word2vec algorithm (Google) to turn each unique word in our corpus into a 200- dimensional vector • These vectors encode the meaning of each word meaning based on trying to predict context words around the target 30 Key concept 1: the word2vec algorithm “You shall know a word by the company it keeps” - John Rupert Firth (1957)
  31. 31. • Dot product of a composition word with the word “thermoelectric” essentially predicts how likely that word is to appear in an abstract with the word thermoelectric • Compositions with high dot products are typically known thermoelectrics • Sometimes, compositions have a high dot product with “thermoelectric” but have never been studied as a thermoelectric • These compositions usually have high computed power factors! (BoltzTraP) 31 Key concept 2: vector dot products measure similarity Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).
  32. 32. “Go back in time” approach: – For every year since 2001, see which compounds we would have predicted using only literature data until that point in time – Make predictions of what materials are the most promising thermoelectrics for data until that year – See if those materials were actually studied as thermoelectrics in subsequent years 32 Can we predict future thermoelectrics discoveries with this method? Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).
  33. 33. • Thus far, 2 of our top 20 predictions made in ~August 2018 have already been reported in the literature for the first time as thermoelectrics – Li3Sb was the subject of a computational study (predicted zT=2.42) in Oct 2018 – SnTe2 was experimentally found to be a moderately good thermoelectric (expt zT=0.71) in Dec 2018 • We are working with an experimentalist on one of the predictions (but ”spare time” project) 33 How about “forward” predictions? [1] Yang et al. "Low lattice thermal conductivity and excellent thermoelectric behavior in Li3Sb and Li3Bi." Journal of Physics: Condensed Matter 30.42 (2018): 425401 [2] Wang et al. "Ultralow lattice thermal conductivity and electronic properties of monolayer 1T phase semimetal SiTe2 and SnTe2." Physica E: Low-dimensional Systems and Nanostructures 108 (2019): 53-59
  34. 34. • We’ve been building many software tools for better computer-aided materials design • Optimization algorithms and NLP will play roles in these next-generation tools • Hopefully, these will further improve the applicability of materials theory to real materials design 34 Conclusions 5 Researcher ideas ML-based ideas Visual interface for exploring all results experiments
  35. 35. 35 Acknowledgements Slides (already) posted to hackingmaterials.lbl.gov • Rocketsled – Alex Dunn – U.S. Department of Energy, Materials Science Division • Matscholar – Vahe Tshitoyan, Leigh Weston, John Dagdelen, Amalie Trewartha, Alex Dunn – Gerbrand Ceder & Kristin Persson – Toyota Research Institutes

Presentation given at MRS Fall 2019, Boston MA, Dec 2019

Aufrufe

Aufrufe insgesamt

234

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

4

Befehle

Downloads

17

Geteilt

0

Kommentare

0

Likes

0

×