Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Open Source Tools for Materials Informatics
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Be...
Staffing interdisciplinary research
Machine learningMaterials Science
I find a recurring dilemma and asymmetry in
staffing...
3
Who has a tougher job to get started?
MS&E major CS major
• Already has background in the
material science aspects of th...
4
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher jo...
5
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher jo...
6
There is an asymmetry in resources available
MS&E major CS major
• Hands-on code and examples to
run and modify
• Hundre...
Outline
7
①Matminer: data and descriptors for
producing ML structure-property
relationships
② Matscholar – applying natura...
8
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quick...
9
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quick...
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
10
Matminer co...
11
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we get
l...
• Typically, a lot of attention is given to advanced
algorithms for machine learning
– e.g., deep neural networks versus s...
The importance of data
13
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-
research-and-possibly-the-worl...
14
What is ImageNet?
The ImageNet data
set collected and
hand-labeled (e.g.,
via Amazon
Mechanical Turk).
The latest versi...
How data stimulates new algorithms
15
How data stimulates new algorithms
16
How can we create an
ImageNet for materials
science?
• We want a test set that contains a diverse array
of problems
– Smaller data versus larger data
– Different applications ...
18
Overview of Matbench test set
Target Property Data Source Samples Method
Bulk Modulus Materials Project 10,987 DFT-GGA
...
<1K
1K-10K10K-100K
>100K
19
Diversity of benchmark suite
mechanical
electronic
stability
optical
thermal
classification
re...
20
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we know
...
21
How about a benchmark algorithm?
Automatminer is a ”black box” machine learning model
Give it any data set with either ...
22
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Feat...
23
Can actually do apple—to-apples competition between
algorithms
24
If we can get a well-established “benchmark”, perhaps
interdisciplinary teams can start hammering on accuracy
Today
5ye...
25
Matminer, matbench, and automatminer can all be
accessed, used, and modified by anyone
Code / Examples all on Github
• ...
Outline
26
① Matminer: data and descriptors for producing
ML structure-property relationships
②Matscholar – applying natur...
We have extracted ~2
million abstracts of
relevant scientific
articles
We use natural
language processing
algorithms to tr...
28
We’ve developed algorithms to automatically tag keywords
in the abstracts
29
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
30
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
• How do we get more people
benefitting from this work
and involved in improving it?
• One solution - expose an
easy-to-us...
32
https://www.matscholar.com – demo 1
33
https://www.matscholar.com – demo 2
34
Matscholar MRS!
https://matscholar-mrs.herokuapp.com
35
Hopefully these frontend demos get you interested enough
to check the “About page”
• We need more resources to help computer
scientists learn about materials science topics
through hands-on examples and in...
37
Funding acknowledgements
Slides (already) posted to hackingmaterials.lbl.gov
• Matminer
– U.S. Department of Energy, Ma...
Nächste SlideShare
Wird geladen in …5
×

von

Open Source Tools for Materials Informatics Slide 1 Open Source Tools for Materials Informatics Slide 2 Open Source Tools for Materials Informatics Slide 3 Open Source Tools for Materials Informatics Slide 4 Open Source Tools for Materials Informatics Slide 5 Open Source Tools for Materials Informatics Slide 6 Open Source Tools for Materials Informatics Slide 7 Open Source Tools for Materials Informatics Slide 8 Open Source Tools for Materials Informatics Slide 9 Open Source Tools for Materials Informatics Slide 10 Open Source Tools for Materials Informatics Slide 11 Open Source Tools for Materials Informatics Slide 12 Open Source Tools for Materials Informatics Slide 13 Open Source Tools for Materials Informatics Slide 14 Open Source Tools for Materials Informatics Slide 15 Open Source Tools for Materials Informatics Slide 16 Open Source Tools for Materials Informatics Slide 17 Open Source Tools for Materials Informatics Slide 18 Open Source Tools for Materials Informatics Slide 19 Open Source Tools for Materials Informatics Slide 20 Open Source Tools for Materials Informatics Slide 21 Open Source Tools for Materials Informatics Slide 22 Open Source Tools for Materials Informatics Slide 23 Open Source Tools for Materials Informatics Slide 24 Open Source Tools for Materials Informatics Slide 25 Open Source Tools for Materials Informatics Slide 26 Open Source Tools for Materials Informatics Slide 27 Open Source Tools for Materials Informatics Slide 28 Open Source Tools for Materials Informatics Slide 29 Open Source Tools for Materials Informatics Slide 30 Open Source Tools for Materials Informatics Slide 31 Open Source Tools for Materials Informatics Slide 32 Open Source Tools for Materials Informatics Slide 33 Open Source Tools for Materials Informatics Slide 34 Open Source Tools for Materials Informatics Slide 35 Open Source Tools for Materials Informatics Slide 36 Open Source Tools for Materials Informatics Slide 37
Nächste SlideShare
What to Upload to SlideShare
Weiter
Herunterladen, um offline zu lesen und im Vollbildmodus anzuzeigen.

2 Gefällt mir

Teilen

Herunterladen, um offline zu lesen

Open Source Tools for Materials Informatics

Herunterladen, um offline zu lesen

Presentation given at MRS Fall, Boston MA, Dec 2019

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Open Source Tools for Materials Informatics

  1. 1. Open Source Tools for Materials Informatics Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA MRS Fall Meeting 2019 Slides (already) posted to hackingmaterials.lbl.gov
  2. 2. Staffing interdisciplinary research Machine learningMaterials Science I find a recurring dilemma and asymmetry in staffing materials informatics research Materials Informatics
  3. 3. 3 Who has a tougher job to get started? MS&E major CS major • Already has background in the material science aspects of the project • But needs to learn the machine learning and software engineering aspects • Already has background in software engineering and appropriate machine learning • But needs to learn the materials science aspects
  4. 4. 4 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started?
  5. 5. 5 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started? easier to pick up / self-learn random forests & neural networks than phase diagrams & crystal structures
  6. 6. 6 There is an asymmetry in resources available MS&E major CS major • Hands-on code and examples to run and modify • Hundreds of Youtube videos and online courses • Code reviews from collaborators • And the standard books, etc. • Books and research articles • Conversations with colleagues, impromptu lectures • Practice problems? Worked examples? Interactive code?
  7. 7. Outline 7 ①Matminer: data and descriptors for producing ML structure-property relationships ② Matscholar – applying natural language processing to materials science information retrieval
  8. 8. 8 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors? How do we get labeled training /test data? How do we know if our ML model is extraordinary?
  9. 9. 9 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors?
  10. 10. >60 featurizer classes can generate thousands of potential descriptors that are described in the literature 10 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  11. 11. 11 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we get labeled training /test data?
  12. 12. • Typically, a lot of attention is given to advanced algorithms for machine learning – e.g., deep neural networks versus standard ML • But perhaps there is not enough emphasis on developing the appropriate data sets – with enough information to train ML algorithms – with sufficient data quality – easy enough for anyone to at least get started without specialized knowledge 12 What about data?
  13. 13. The importance of data 13 https://qz.com/1034972/the-data-that-changed-the-direction-of-ai- research-and-possibly-the-world/
  14. 14. 14 What is ImageNet? The ImageNet data set collected and hand-labeled (e.g., via Amazon Mechanical Turk). The latest version has over 14 million hand-annotated images, organized into ~20,000 categories
  15. 15. How data stimulates new algorithms 15
  16. 16. How data stimulates new algorithms 16 How can we create an ImageNet for materials science?
  17. 17. • We want a test set that contains a diverse array of problems – Smaller data versus larger data – Different applications (electronic, mechanical, etc.) – Composition-only or structure information available – Classification or regression • We also want a cross-validation metric that gives reliable error estimates – i.e., less dependent on specific choice of splits 17 An “ImageNet” for materials science
  18. 18. 18 Overview of Matbench test set Target Property Data Source Samples Method Bulk Modulus Materials Project 10,987 DFT-GGA Shear Modulus Materials Project 10,987 DFT-GGA Band Gap Materials Project 106,113 DFT-GGA Metallicity Materials Project 106,113 DFT-GGA Band Gap Zhuo et al. [1] 6,354 Experiment Metallicity Zhuo et al. [1] 6,354 Experiment Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment Refractive index Materials Project 4,764 DFPT-GGA Formation Energy Materials Project 132,752 DFT-GGA Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF Steel yield strength Citrine Informatics 312 Experiment 1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
  19. 19. <1K 1K-10K10K-100K >100K 19 Diversity of benchmark suite mechanical electronic stability optical thermal classification regression experiment (composition only) DFT (structure) application data size problem type data type
  20. 20. 20 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we know if our ML model is extraordinary?
  21. 21. 21 How about a benchmark algorithm? Automatminer is a ”black box” machine learning model Give it any data set with either composition or structure inputs, and automatminer will train an ML model (no researcher intervention)
  22. 22. 22 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Dropping features with many errors • Missing value imputation • One-hot encoding • PCA-based • Correlation • Model- based (tree) Uses genetic algorithms to find the best machine learning model + hyperparameters
  23. 23. 23 Can actually do apple—to-apples competition between algorithms
  24. 24. 24 If we can get a well-established “benchmark”, perhaps interdisciplinary teams can start hammering on accuracy Today 5years 10years A lower barrier to entry in the field means more ideas can be tested from more researchers Matbenchtestset averageerror
  25. 25. 25 Matminer, matbench, and automatminer can all be accessed, used, and modified by anyone Code / Examples all on Github • github.com/hackingmaterials/matminer • github.com/hackingmaterials/matminer_examples • github.com/hackingmaterials/automatminer Matbench data on Figshare • (coming soon, still finalizing) Free support via Discourse • https://discuss.matsci.org
  26. 26. Outline 26 ① Matminer: data and descriptors for producing ML structure-property relationships ②Matscholar – applying natural language processing to materials science information retrieval
  27. 27. We have extracted ~2 million abstracts of relevant scientific articles We use natural language processing algorithms to try to extract knowledge from all this data 27 Goal: collect and organize knowledge embedded in the materials science literature
  28. 28. 28 We’ve developed algorithms to automatically tag keywords in the abstracts
  29. 29. 29 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  30. 30. 30 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  31. 31. • How do we get more people benefitting from this work and involved in improving it? • One solution - expose an easy-to-use web frontend, with links to all the backend codes in case people want to dive further – New tools like Plotly Dash make this easier than ever 31 Using a web site as a “gateway” into the algorithms frontend backend
  32. 32. 32 https://www.matscholar.com – demo 1
  33. 33. 33 https://www.matscholar.com – demo 2
  34. 34. 34 Matscholar MRS! https://matscholar-mrs.herokuapp.com
  35. 35. 35 Hopefully these frontend demos get you interested enough to check the “About page”
  36. 36. • We need more resources to help computer scientists learn about materials science topics through hands-on examples and interactive demos • Some things that can help: – Open-source implementations of materials science methods – Interactive examples (e.g., Jupyter) – Documentation and support(!) – Labeled data sets – Front-ends for easy exploration 36 Concluding thoughts
  37. 37. 37 Funding acknowledgements Slides (already) posted to hackingmaterials.lbl.gov • Matminer – U.S. Department of Energy, Materials Science Division • Matscholar – Toyota Research Institutes
  • AnnParker22

    Nov. 26, 2021
  • UmangPatel237

    Oct. 20, 2020

Presentation given at MRS Fall, Boston MA, Dec 2019

Aufrufe

Aufrufe insgesamt

323

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

1

Befehle

Downloads

17

Geteilt

0

Kommentare

0

Likes

2

×