SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Open Source Tools for Materials Informatics
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
MRS Fall Meeting 2019
Slides (already) posted to hackingmaterials.lbl.gov
Staffing interdisciplinary research
Machine learningMaterials Science
I find a recurring dilemma and asymmetry in
staffing materials informatics research
Materials Informatics
3
Who has a tougher job to get started?
MS&E major CS major
• Already has background in the
material science aspects of the
project
• But needs to learn the
machine learning and
software engineering aspects
• Already has background in
software engineering and
appropriate machine learning
• But needs to learn the
materials science aspects
4
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
5
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
easier to pick up / self-learn
random forests & neural networks
than
phase diagrams & crystal structures
6
There is an asymmetry in resources available
MS&E major CS major
• Hands-on code and examples to
run and modify
• Hundreds of Youtube videos
and online courses
• Code reviews from collaborators
• And the standard books, etc.
• Books and research articles
• Conversations with colleagues,
impromptu lectures
• Practice problems? Worked
examples? Interactive code?
Outline
7
①Matminer: data and descriptors for
producing ML structure-property
relationships
② Matscholar – applying natural language
processing to materials science information
retrieval
8
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
How do we get
labeled training
/test data?
How do we know
if our ML model is
extraordinary?
9
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
10
Matminer contains a library of descriptors for various
materials science entities
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
11
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we get
labeled training
/test data?
• Typically, a lot of attention is given to advanced
algorithms for machine learning
– e.g., deep neural networks versus standard ML
• But perhaps there is not enough emphasis on
developing the appropriate data sets
– with enough information to train ML algorithms
– with sufficient data quality
– easy enough for anyone to at least get started without
specialized knowledge
12
What about data?
The importance of data
13
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-
research-and-possibly-the-world/
14
What is ImageNet?
The ImageNet data
set collected and
hand-labeled (e.g.,
via Amazon
Mechanical Turk).
The latest version
has over 14 million
hand-annotated
images, organized
into ~20,000
categories
How data stimulates new algorithms
15
How data stimulates new algorithms
16
How can we create an
ImageNet for materials
science?
• We want a test set that contains a diverse array
of problems
– Smaller data versus larger data
– Different applications (electronic, mechanical, etc.)
– Composition-only or structure information available
– Classification or regression
• We also want a cross-validation metric that gives
reliable error estimates
– i.e., less dependent on specific choice of splits
17
An “ImageNet” for materials science
18
Overview of Matbench test set
Target Property Data Source Samples Method
Bulk Modulus Materials Project 10,987 DFT-GGA
Shear Modulus Materials Project 10,987 DFT-GGA
Band Gap Materials Project 106,113 DFT-GGA
Metallicity Materials Project 106,113 DFT-GGA
Band Gap Zhuo et al. [1] 6,354 Experiment
Metallicity Zhuo et al. [1] 6,354 Experiment
Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment
Refractive index Materials Project 4,764 DFPT-GGA
Formation Energy Materials Project 132,752 DFT-GGA
Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA
Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA
Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF
Steel yield strength Citrine Informatics 312 Experiment
1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
<1K
1K-10K10K-100K
>100K
19
Diversity of benchmark suite
mechanical
electronic
stability
optical
thermal
classification
regression
experiment
(composition
only)
DFT
(structure)
application data size
problem
type
data type
20
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we know
if our ML model is
extraordinary?
21
How about a benchmark algorithm?
Automatminer is a ”black box” machine learning model
Give it any data set with either composition or structure inputs, and
automatminer will train an ML model (no researcher intervention)
22
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Dropping
features with
many errors
• Missing value
imputation
• One-hot
encoding
• PCA-based
• Correlation
• Model-
based (tree)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
23
Can actually do apple—to-apples competition between
algorithms
24
If we can get a well-established “benchmark”, perhaps
interdisciplinary teams can start hammering on accuracy
Today
5years
10years
A lower barrier to entry
in the field means more
ideas can be tested from
more researchers
Matbenchtestset
averageerror
25
Matminer, matbench, and automatminer can all be
accessed, used, and modified by anyone
Code / Examples all on Github
• github.com/hackingmaterials/matminer
• github.com/hackingmaterials/matminer_examples
• github.com/hackingmaterials/automatminer
Matbench data on Figshare
• (coming soon, still finalizing)
Free support via Discourse
• https://discuss.matsci.org
Outline
26
① Matminer: data and descriptors for producing
ML structure-property relationships
②Matscholar – applying natural language
processing to materials science information
retrieval
We have extracted ~2
million abstracts of
relevant scientific
articles
We use natural
language processing
algorithms to try to
extract knowledge from
all this data
27
Goal: collect and organize knowledge embedded in the
materials science literature
28
We’ve developed algorithms to automatically tag keywords
in the abstracts
29
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
30
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
• How do we get more people
benefitting from this work
and involved in improving it?
• One solution - expose an
easy-to-use web frontend,
with links to all the backend
codes in case people want to
dive further
– New tools like Plotly Dash
make this easier than ever
31
Using a web site as a “gateway” into the algorithms
frontend
backend
32
https://www.matscholar.com – demo 1
33
https://www.matscholar.com – demo 2
34
Matscholar MRS!
https://matscholar-mrs.herokuapp.com
35
Hopefully these frontend demos get you interested enough
to check the “About page”
• We need more resources to help computer
scientists learn about materials science topics
through hands-on examples and interactive demos
• Some things that can help:
– Open-source implementations of materials science
methods
– Interactive examples (e.g., Jupyter)
– Documentation and support(!)
– Labeled data sets
– Front-ends for easy exploration
36
Concluding thoughts
37
Funding acknowledgements
Slides (already) posted to hackingmaterials.lbl.gov
• Matminer
– U.S. Department of Energy, Materials Science Division
• Matscholar
– Toyota Research Institutes

Weitere ähnliche Inhalte

Was ist angesagt?

Applications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NRELApplications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NREL
aimsnist
 
Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials  Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials
Altair
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
KAMAL CHOUDHARY
 

Was ist angesagt? (20)

Applications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NRELApplications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NREL
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
 
Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials  Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)
 
Lecture2
Lecture2Lecture2
Lecture2
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Topological flat bands without magic angles in massive twisted bilayer graphe...
Topological flat bands without magic angles in massive twisted bilayer graphe...Topological flat bands without magic angles in massive twisted bilayer graphe...
Topological flat bands without magic angles in massive twisted bilayer graphe...
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Presentation bi2 s3+son
Presentation bi2 s3+sonPresentation bi2 s3+son
Presentation bi2 s3+son
 
The Materials API
The Materials APIThe Materials API
The Materials API
 
Molecular Dynamics - review
Molecular Dynamics - review Molecular Dynamics - review
Molecular Dynamics - review
 
Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...
 
Nanowire Solar Cells
Nanowire Solar CellsNanowire Solar Cells
Nanowire Solar Cells
 
Fundamentals of Fuel Cells
Fundamentals of Fuel CellsFundamentals of Fuel Cells
Fundamentals of Fuel Cells
 

Ähnlich wie Open Source Tools for Materials Informatics

Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
Anubhav Jain
 

Ähnlich wie Open Source Tools for Materials Informatics (20)

Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 

Mehr von Anubhav Jain

Mehr von Anubhav Jain (19)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...
 

Kürzlich hochgeladen

Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 

Open Source Tools for Materials Informatics

  • 1. Open Source Tools for Materials Informatics Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA MRS Fall Meeting 2019 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. Staffing interdisciplinary research Machine learningMaterials Science I find a recurring dilemma and asymmetry in staffing materials informatics research Materials Informatics
  • 3. 3 Who has a tougher job to get started? MS&E major CS major • Already has background in the material science aspects of the project • But needs to learn the machine learning and software engineering aspects • Already has background in software engineering and appropriate machine learning • But needs to learn the materials science aspects
  • 4. 4 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started?
  • 5. 5 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started? easier to pick up / self-learn random forests & neural networks than phase diagrams & crystal structures
  • 6. 6 There is an asymmetry in resources available MS&E major CS major • Hands-on code and examples to run and modify • Hundreds of Youtube videos and online courses • Code reviews from collaborators • And the standard books, etc. • Books and research articles • Conversations with colleagues, impromptu lectures • Practice problems? Worked examples? Interactive code?
  • 7. Outline 7 ①Matminer: data and descriptors for producing ML structure-property relationships ② Matscholar – applying natural language processing to materials science information retrieval
  • 8. 8 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors? How do we get labeled training /test data? How do we know if our ML model is extraordinary?
  • 9. 9 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors?
  • 10. >60 featurizer classes can generate thousands of potential descriptors that are described in the literature 10 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  • 11. 11 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we get labeled training /test data?
  • 12. • Typically, a lot of attention is given to advanced algorithms for machine learning – e.g., deep neural networks versus standard ML • But perhaps there is not enough emphasis on developing the appropriate data sets – with enough information to train ML algorithms – with sufficient data quality – easy enough for anyone to at least get started without specialized knowledge 12 What about data?
  • 13. The importance of data 13 https://qz.com/1034972/the-data-that-changed-the-direction-of-ai- research-and-possibly-the-world/
  • 14. 14 What is ImageNet? The ImageNet data set collected and hand-labeled (e.g., via Amazon Mechanical Turk). The latest version has over 14 million hand-annotated images, organized into ~20,000 categories
  • 15. How data stimulates new algorithms 15
  • 16. How data stimulates new algorithms 16 How can we create an ImageNet for materials science?
  • 17. • We want a test set that contains a diverse array of problems – Smaller data versus larger data – Different applications (electronic, mechanical, etc.) – Composition-only or structure information available – Classification or regression • We also want a cross-validation metric that gives reliable error estimates – i.e., less dependent on specific choice of splits 17 An “ImageNet” for materials science
  • 18. 18 Overview of Matbench test set Target Property Data Source Samples Method Bulk Modulus Materials Project 10,987 DFT-GGA Shear Modulus Materials Project 10,987 DFT-GGA Band Gap Materials Project 106,113 DFT-GGA Metallicity Materials Project 106,113 DFT-GGA Band Gap Zhuo et al. [1] 6,354 Experiment Metallicity Zhuo et al. [1] 6,354 Experiment Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment Refractive index Materials Project 4,764 DFPT-GGA Formation Energy Materials Project 132,752 DFT-GGA Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF Steel yield strength Citrine Informatics 312 Experiment 1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
  • 19. <1K 1K-10K10K-100K >100K 19 Diversity of benchmark suite mechanical electronic stability optical thermal classification regression experiment (composition only) DFT (structure) application data size problem type data type
  • 20. 20 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we know if our ML model is extraordinary?
  • 21. 21 How about a benchmark algorithm? Automatminer is a ”black box” machine learning model Give it any data set with either composition or structure inputs, and automatminer will train an ML model (no researcher intervention)
  • 22. 22 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Dropping features with many errors • Missing value imputation • One-hot encoding • PCA-based • Correlation • Model- based (tree) Uses genetic algorithms to find the best machine learning model + hyperparameters
  • 23. 23 Can actually do apple—to-apples competition between algorithms
  • 24. 24 If we can get a well-established “benchmark”, perhaps interdisciplinary teams can start hammering on accuracy Today 5years 10years A lower barrier to entry in the field means more ideas can be tested from more researchers Matbenchtestset averageerror
  • 25. 25 Matminer, matbench, and automatminer can all be accessed, used, and modified by anyone Code / Examples all on Github • github.com/hackingmaterials/matminer • github.com/hackingmaterials/matminer_examples • github.com/hackingmaterials/automatminer Matbench data on Figshare • (coming soon, still finalizing) Free support via Discourse • https://discuss.matsci.org
  • 26. Outline 26 ① Matminer: data and descriptors for producing ML structure-property relationships ②Matscholar – applying natural language processing to materials science information retrieval
  • 27. We have extracted ~2 million abstracts of relevant scientific articles We use natural language processing algorithms to try to extract knowledge from all this data 27 Goal: collect and organize knowledge embedded in the materials science literature
  • 28. 28 We’ve developed algorithms to automatically tag keywords in the abstracts
  • 29. 29 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  • 30. 30 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  • 31. • How do we get more people benefitting from this work and involved in improving it? • One solution - expose an easy-to-use web frontend, with links to all the backend codes in case people want to dive further – New tools like Plotly Dash make this easier than ever 31 Using a web site as a “gateway” into the algorithms frontend backend
  • 35. 35 Hopefully these frontend demos get you interested enough to check the “About page”
  • 36. • We need more resources to help computer scientists learn about materials science topics through hands-on examples and interactive demos • Some things that can help: – Open-source implementations of materials science methods – Interactive examples (e.g., Jupyter) – Documentation and support(!) – Labeled data sets – Front-ends for easy exploration 36 Concluding thoughts
  • 37. 37 Funding acknowledgements Slides (already) posted to hackingmaterials.lbl.gov • Matminer – U.S. Department of Energy, Materials Science Division • Matscholar – Toyota Research Institutes