SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Evaluating Machine Learning Algorithms
for Materials Science using the
Matbench Protocol
Anubhav Jain
Staff Scientist, Lawrence Berkeley National Laboratory
Deputy Director, Materials Project
materialsproject.org
The Materials Project
Slides (already) uploaded to https://hackingmaterials.lbl.gov
Outline of talk
1. A quick introduction to the Materials Project
2. Engaging the community: The MPContribs data platform
3. Benchmarking machine learning algorithms using the Matbench protocol
A quick introduction to the
Materials Project
The core of Materials Project is a free database of
calculated materials properties and crystal structures
Free, public resource
• www.materialsproject.org
Data on ~150,000 materials,
including information on:
• electronic structure
• phonon and thermal
properties
• elastic / mechanical properties
• magnetic properties
• ferroelectric properties
• piezoelectric properties
• dielectric properties
Powered by hundreds of millions
of CPU-hours invested into high-
quality calculations
4
The core data set keeps growing with time …
5
Apps give insight into data
Materials Explorer
Phase Stability Diagrams
Pourbaix Diagrams
(Aqueous Stability)
Battery Explorer
6
The code powering the Materials Project is
available open source (BSD/MIT licenses)
just-in-time error correction, fixing your
calculations so you don’t have to
‘recipes' for common materials
science simulation tasks
making materials science web apps easy
workflow management software for
high-throughput computing
materials science analysis code:
make, transform and analyze crystals,
phase diagrams and more
& more … MP team members also contribue to
several other non-MP codes, e.g. matminer for
machine learning featurization
7
Example: calculation workflows implemented in
by dozens of collaborators
Phonons
Elasticity Defects
Magnetism
Band
Structures
Stability
Grain
Boundaries
Equations
of State
X-ray
Absorption
Spectra
Piezoelectric
Dielectric
Surfaces
& more …
9
Requirements: VASP license and a big computer
ABINIT planned in future w/G.-M. Rignanese
8
Example 2: matminer allows researchers to generate
diverse feature sets for machine learning
9
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
The Materials Project is used heavily by the research
community
> 180,000 registered
users
> 40,000 new users last year
~100 new registrations/day
~5,000-10,000 users log on every day
> 2M+ records downloaded through API each day; 1.8 TB of data served per month 10
A large fraction of users are from industry
Student
44%
Academia
36%
Industry
10%
Government
5%
Other
5%
3.5%
Schrodinger: Many of our customers are active users of
the Materials Project and use MP databases for
their projects. Enabling direct access to MP databases
from within Schrödinger software is a powerful addition
that will be appreciated by our users.
Toyota: “Materials Project
is a wonderful project.
Please accept my
appreciation to you to
release it free and easy to
access.”
Hazen Research: “Amazing
and well done data base. I
still remember searching
Landolt-Börnstein series
during my PhD for similar
things.”
11
Engaging the community:
the MPContribs data platform
How can we use Materials Project to build a
community of materials researchers?
Materials Project now has
high visibility (e.g., by search
engines)
How can we use this
platform to help add value to
the community of materials
researchers?
13
Beyond calculations: MPContribs allows the research
community to contribute their own data
A “materials detail page,”
containing all the information MP
has calculated about a specific
material
Experimental data on a
material (either specific
phase, composition, or
chemical system)
“MPContribs” bridges
the gap
14
2. Materials Project links
to your contribution
3. Your data set and
paper are linked
1. Google links to
Materials Project page
15
From Google search to your data and your research, via MP
MPContribs is open for contributions
You can now apply to contribute
your data set and we will work
with you to disseminate via MP
Designed for:
• smaller data sets (e.g., MBs to
GBs); for large data files see
NOMAD or other repos
• Linking to MP compositions
Available via mpcontribs.org
16
Benchmarking machine learning
methods using the Matbench protocol
MP is now involved in an effort to benchmark
various machine learning algorithms
18
Model 2
Without standardized benchmarks, ML models can be difficult to compare
Model 1
Dataset 1
+
No structures
No AB2C3 compositions
4k samples Dataset 2
+
Model 3
Dataset 3
+
RMSETest Set = 0.05 eV MAE5-fold CV = 0.021 eV Val. Loss = 0.005
VS. VS.
Structures avail.
100k samples
Eabove hull < 0.050 eV
???
??? ???
???
???
What’s needed –
an “ImageNet” for materials science
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/
20
Can we make the same
advancements in materials
as in computer vision?
One of the reasons computer science
/ machine learning seems to advance
so quickly is that they decouple
data generation from algorithm
development
This allows groups to focus on
algorithm development without all
the data generation, data cleaning,
etc. that often is the majority of an
end-to-end data science project
Clear comparisons also move the
field forward and measure progress 21
The ingredients of the Matbench
benchmark
qStandard data sets
qStandard test splits according to nested cross-validation procedure
qAn online leaderboard that encourages reproducible results
22
Matbench includes 13 different ML tasks
23
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer
Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
The tasks encompass a variety of problems
13 Ready-to-use ML tasks ranging in training size, target property, inputs, task type.
• Pre-cleaned datasets from literature and
online repositories (such as Materials Project)
• Wide range of practical solid state ML tasks
• Experimental and computed properties
• Standardized error evaluation (nested CV)
Browse datasets and tasks with Materials Project MPContribsML
https://ml.materialsproject.org
The ingredients of the Matbench
benchmark
ü Standard data sets
qStandard test splits according to nested cross-validation procedure
qAn online leaderboard that encourages reproducible results
26
27
Most commonly used test split procedure
• Training/validation
is used for model
selection
• Test / hold-out is
used only for error
estimation
(Test set should not
inform model
selection, i.e. “final
answer”)
Think of it as N different “universes” – we have a different training
of the model in each universe and a different hold-out.
28
Nested CV – like hold-out, but varies the hold-out set
Think of it as N different “universes” – we have a different training
of the model in each universe and a different hold-out.
29
Nested CV – like hold-out, but varies the hold-out set
“A nested CV procedure provides an almost unbiased estimate of the true error.”
Varma and Simon, Bias in error estimation when using cross-validation for model
selection (2006)
The ingredients of the Matbench
benchmark
ü Standard data sets
ü Standard test splits according to nested cross-validation procedure
qAn online leaderboard that encourages reproducible results
30
Matbench has an online leaderboard – matbench.materialsproject.org
Complete and reproducible results on standardized ML tasks
Sample-by-sample predictions of all
algorithms on all tasks, notebooks and
scripts for reproduction
Aggregate scores across nested CV folds
Complete model metadata,
hyperparameters, required compute,
academic references
.json .ipynb .py
Algorithm comparison across individual tasks OR complete benchmark
Example: matbench_dielectric
Compare both specialized and general-purpose
algorithms across multiple error metrics
Evaluation of ML paradigms drives research and development
Traditional paradigms:
• Traditional Models (e.g., RF + MagPie[1] features)
• AutoML inside “traditional ML” space (Automatminer)
Advancements in deep neural networks:
1. doi.org/10.1038/npjcompumats.2016.28
Attention Networks
(e.g., CRABNet [2])
Optimal Descriptor Networks
(e.g, MODNet [3])
Crystal Graph Networks
(e.g, CGCNN, MEGNet [4])
2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294
Matbench compares these ML model paradigms
Traditional paradigms:
• Traditional Models (e.g., RF + MagPie[1] features)
• AutoML inside “traditional ML” space (Automatminer)
Advancements in deep neural networks:
1. doi.org/10.1038/npjcompumats.2016.28
Attention Networks
(e.g., CRABNet [2])
Optimal Descriptor Networks
(e.g, MODNet [3])
Crystal Graph Networks
(e.g, CGCNN, MEGNet [4])
2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294
✓ - in Matbench
✓ - in Matbench
✓ - in Matbench
✓ - CGCNN in
Matbench
✓ - MEGNET in
progress
✓ - PR in review
Contribute your model to the body of knowledge
Matbench Python package
Evaluate an entire benchmark with ~10 lines of code
$: pip install matbench
from matbench.bench import MatbenchBenchmark
mb = MatbenchBenchmark(autoload=False)
for task in mb.tasks:
task.load()
for fold in task.folds:
train_inputs, train_outputs = task.get_train_and_val_data(fold)
my_model.train_and_validate(train_inputs, train_outputs)
test_inputs = task.get_test_data(fold, include_target=False)
predictions = my_model.predict(test_inputs)
task.record(fold, predictions)
mb.to_file("my_models_benchmark.json.gz")
Your model needs to have:
• a function that trains it
based on training data
• makes a prediction based
on the trained model
Contribute your model to the body of knowledge
Matbench Python package
Evaluate an entire benchmark with ~10 lines of code
$: pip install matbench
from matbench.bench import MatbenchBenchmark
mb = MatbenchBenchmark(autoload=False)
for task in mb.tasks:
task.load()
for fold in task.folds:
train_inputs, train_outputs = task.get_train_and_val_data(fold)
my_model.train_and_validate(train_inputs, train_outputs)
test_inputs = task.get_test_data(fold, include_target=False)
predictions = my_model.predict(test_inputs)
task.record(fold, predictions)
mb.to_file("my_models_benchmark.json.gz")
Submit model file along
with your desired model
metadata via Github PR
The ingredients of the Matbench
benchmark
ü Standard data sets
ü Standard test splits according to nested cross-validation procedure
ü An online leaderboard that encourages reproducible results
38
Results so far: graph NN for large
data sets, conventional ML for small
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer
Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
39
Overall and upcoming goals for
Matbench
• We have introduced a method that allows researchers to evaluate
their machine learning models on a standard benchmark, if they so
choose
• The “Matbench” resource also provides metadata and code examples
that allows others to reproduce and use community ML models more
easily, as well as discover new ML models
• In the future, we hope to do expand the type of tasks, perform meta-
analyses on what kinds of algorithms work best for certain problems,
and plot progress on these tasks over time
40
Concluding thoughts
The Materials Project is a free resource providing data and tools to
help perform research and development of new materials
Even more can be accomplished as a unified community to push
forward data dissemination as well as the capabilities of machine
learning
41
We encourage you to give Matbench a try, and look forward to
seeing your algorithm on the leaderboard!
Kristin Persson
MP Director
The team Intro
Thank you!
Patrick Huck
Staff Scientist
(MPContribs)
Alex Dunn
Grad Student
(Matbench /
matminer)
Slides (already) uploaded to https://hackingmaterials.lbl.gov

Weitere ähnliche Inhalte

Was ist angesagt?

Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesJason Hattrick-Simpers
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Anubhav Jain
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsAnubhav Jain
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Designaimsnist
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Anubhav Jain
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Anubhav Jain
 
Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Anubhav Jain
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsAnubhav Jain
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...aimsnist
 
Accelerating materials design through natural language processing
Accelerating materials design through natural language processingAccelerating materials design through natural language processing
Accelerating materials design through natural language processingAnubhav Jain
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...aimsnist
 

Was ist angesagt? (20)

Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...
 
Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
Failing Fastest: What an Effective HTE and ML Workflow Enables for Functional...
 
Accelerating materials design through natural language processing
Accelerating materials design through natural language processingAccelerating materials design through natural language processing
Accelerating materials design through natural language processing
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
 

Ähnlich wie Evaluating Machine Learning Algorithms for Materials Science using the Matbench Protocol

Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Презентация проекта ООО "Лаборатория Кинтех"
Презентация проекта ООО "Лаборатория Кинтех"Презентация проекта ООО "Лаборатория Кинтех"
Презентация проекта ООО "Лаборатория Кинтех"Ivan Zaev
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Joachim Schlosser
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Scientific
Scientific Scientific
Scientific marpierc
 
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesClustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesIRJET Journal
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biologyNeil Swainston
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nybergdiannepatricia
 

Ähnlich wie Evaluating Machine Learning Algorithms for Materials Science using the Matbench Protocol (20)

Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Презентация проекта ООО "Лаборатория Кинтех"
Презентация проекта ООО "Лаборатория Кинтех"Презентация проекта ООО "Лаборатория Кинтех"
Презентация проекта ООО "Лаборатория Кинтех"
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Scientific
Scientific Scientific
Scientific
 
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesClustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining Techniques
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 

Mehr von Anubhav Jain

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst DesignAnubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAnubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials ProjectAnubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectAnubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...Anubhav Jain
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Anubhav Jain
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 

Mehr von Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 

Kürzlich hochgeladen

Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDivyaK787011
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 

Kürzlich hochgeladen (20)

Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 

Evaluating Machine Learning Algorithms for Materials Science using the Matbench Protocol

  • 1. Evaluating Machine Learning Algorithms for Materials Science using the Matbench Protocol Anubhav Jain Staff Scientist, Lawrence Berkeley National Laboratory Deputy Director, Materials Project materialsproject.org The Materials Project Slides (already) uploaded to https://hackingmaterials.lbl.gov
  • 2. Outline of talk 1. A quick introduction to the Materials Project 2. Engaging the community: The MPContribs data platform 3. Benchmarking machine learning algorithms using the Matbench protocol
  • 3. A quick introduction to the Materials Project
  • 4. The core of Materials Project is a free database of calculated materials properties and crystal structures Free, public resource • www.materialsproject.org Data on ~150,000 materials, including information on: • electronic structure • phonon and thermal properties • elastic / mechanical properties • magnetic properties • ferroelectric properties • piezoelectric properties • dielectric properties Powered by hundreds of millions of CPU-hours invested into high- quality calculations 4
  • 5. The core data set keeps growing with time … 5
  • 6. Apps give insight into data Materials Explorer Phase Stability Diagrams Pourbaix Diagrams (Aqueous Stability) Battery Explorer 6
  • 7. The code powering the Materials Project is available open source (BSD/MIT licenses) just-in-time error correction, fixing your calculations so you don’t have to ‘recipes' for common materials science simulation tasks making materials science web apps easy workflow management software for high-throughput computing materials science analysis code: make, transform and analyze crystals, phase diagrams and more & more … MP team members also contribue to several other non-MP codes, e.g. matminer for machine learning featurization 7
  • 8. Example: calculation workflows implemented in by dozens of collaborators Phonons Elasticity Defects Magnetism Band Structures Stability Grain Boundaries Equations of State X-ray Absorption Spectra Piezoelectric Dielectric Surfaces & more … 9 Requirements: VASP license and a big computer ABINIT planned in future w/G.-M. Rignanese 8
  • 9. Example 2: matminer allows researchers to generate diverse feature sets for machine learning 9 >60 featurizer classes can generate thousands of potential descriptors that are described in the literature feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  • 10. The Materials Project is used heavily by the research community > 180,000 registered users > 40,000 new users last year ~100 new registrations/day ~5,000-10,000 users log on every day > 2M+ records downloaded through API each day; 1.8 TB of data served per month 10
  • 11. A large fraction of users are from industry Student 44% Academia 36% Industry 10% Government 5% Other 5% 3.5% Schrodinger: Many of our customers are active users of the Materials Project and use MP databases for their projects. Enabling direct access to MP databases from within Schrödinger software is a powerful addition that will be appreciated by our users. Toyota: “Materials Project is a wonderful project. Please accept my appreciation to you to release it free and easy to access.” Hazen Research: “Amazing and well done data base. I still remember searching Landolt-Börnstein series during my PhD for similar things.” 11
  • 12. Engaging the community: the MPContribs data platform
  • 13. How can we use Materials Project to build a community of materials researchers? Materials Project now has high visibility (e.g., by search engines) How can we use this platform to help add value to the community of materials researchers? 13
  • 14. Beyond calculations: MPContribs allows the research community to contribute their own data A “materials detail page,” containing all the information MP has calculated about a specific material Experimental data on a material (either specific phase, composition, or chemical system) “MPContribs” bridges the gap 14
  • 15. 2. Materials Project links to your contribution 3. Your data set and paper are linked 1. Google links to Materials Project page 15 From Google search to your data and your research, via MP
  • 16. MPContribs is open for contributions You can now apply to contribute your data set and we will work with you to disseminate via MP Designed for: • smaller data sets (e.g., MBs to GBs); for large data files see NOMAD or other repos • Linking to MP compositions Available via mpcontribs.org 16
  • 17. Benchmarking machine learning methods using the Matbench protocol
  • 18. MP is now involved in an effort to benchmark various machine learning algorithms 18
  • 19. Model 2 Without standardized benchmarks, ML models can be difficult to compare Model 1 Dataset 1 + No structures No AB2C3 compositions 4k samples Dataset 2 + Model 3 Dataset 3 + RMSETest Set = 0.05 eV MAE5-fold CV = 0.021 eV Val. Loss = 0.005 VS. VS. Structures avail. 100k samples Eabove hull < 0.050 eV ??? ??? ??? ??? ???
  • 20. What’s needed – an “ImageNet” for materials science https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ 20
  • 21. Can we make the same advancements in materials as in computer vision? One of the reasons computer science / machine learning seems to advance so quickly is that they decouple data generation from algorithm development This allows groups to focus on algorithm development without all the data generation, data cleaning, etc. that often is the majority of an end-to-end data science project Clear comparisons also move the field forward and measure progress 21
  • 22. The ingredients of the Matbench benchmark qStandard data sets qStandard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 22
  • 23. Matbench includes 13 different ML tasks 23 Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
  • 24. The tasks encompass a variety of problems 13 Ready-to-use ML tasks ranging in training size, target property, inputs, task type. • Pre-cleaned datasets from literature and online repositories (such as Materials Project) • Wide range of practical solid state ML tasks • Experimental and computed properties • Standardized error evaluation (nested CV)
  • 25. Browse datasets and tasks with Materials Project MPContribsML https://ml.materialsproject.org
  • 26. The ingredients of the Matbench benchmark ü Standard data sets qStandard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 26
  • 27. 27 Most commonly used test split procedure • Training/validation is used for model selection • Test / hold-out is used only for error estimation (Test set should not inform model selection, i.e. “final answer”)
  • 28. Think of it as N different “universes” – we have a different training of the model in each universe and a different hold-out. 28 Nested CV – like hold-out, but varies the hold-out set
  • 29. Think of it as N different “universes” – we have a different training of the model in each universe and a different hold-out. 29 Nested CV – like hold-out, but varies the hold-out set “A nested CV procedure provides an almost unbiased estimate of the true error.” Varma and Simon, Bias in error estimation when using cross-validation for model selection (2006)
  • 30. The ingredients of the Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 30
  • 31. Matbench has an online leaderboard – matbench.materialsproject.org
  • 32. Complete and reproducible results on standardized ML tasks Sample-by-sample predictions of all algorithms on all tasks, notebooks and scripts for reproduction Aggregate scores across nested CV folds Complete model metadata, hyperparameters, required compute, academic references .json .ipynb .py
  • 33. Algorithm comparison across individual tasks OR complete benchmark Example: matbench_dielectric Compare both specialized and general-purpose algorithms across multiple error metrics
  • 34. Evaluation of ML paradigms drives research and development Traditional paradigms: • Traditional Models (e.g., RF + MagPie[1] features) • AutoML inside “traditional ML” space (Automatminer) Advancements in deep neural networks: 1. doi.org/10.1038/npjcompumats.2016.28 Attention Networks (e.g., CRABNet [2]) Optimal Descriptor Networks (e.g, MODNet [3]) Crystal Graph Networks (e.g, CGCNN, MEGNet [4]) 2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294
  • 35. Matbench compares these ML model paradigms Traditional paradigms: • Traditional Models (e.g., RF + MagPie[1] features) • AutoML inside “traditional ML” space (Automatminer) Advancements in deep neural networks: 1. doi.org/10.1038/npjcompumats.2016.28 Attention Networks (e.g., CRABNet [2]) Optimal Descriptor Networks (e.g, MODNet [3]) Crystal Graph Networks (e.g, CGCNN, MEGNet [4]) 2. doi.org/10.1038/s41524-021-00545-1 3. doi.org/10.1038/s41524-021-00552-2 4. doi.org/10.1021/acs.chemmater.9b01294 ✓ - in Matbench ✓ - in Matbench ✓ - in Matbench ✓ - CGCNN in Matbench ✓ - MEGNET in progress ✓ - PR in review
  • 36. Contribute your model to the body of knowledge Matbench Python package Evaluate an entire benchmark with ~10 lines of code $: pip install matbench from matbench.bench import MatbenchBenchmark mb = MatbenchBenchmark(autoload=False) for task in mb.tasks: task.load() for fold in task.folds: train_inputs, train_outputs = task.get_train_and_val_data(fold) my_model.train_and_validate(train_inputs, train_outputs) test_inputs = task.get_test_data(fold, include_target=False) predictions = my_model.predict(test_inputs) task.record(fold, predictions) mb.to_file("my_models_benchmark.json.gz") Your model needs to have: • a function that trains it based on training data • makes a prediction based on the trained model
  • 37. Contribute your model to the body of knowledge Matbench Python package Evaluate an entire benchmark with ~10 lines of code $: pip install matbench from matbench.bench import MatbenchBenchmark mb = MatbenchBenchmark(autoload=False) for task in mb.tasks: task.load() for fold in task.folds: train_inputs, train_outputs = task.get_train_and_val_data(fold) my_model.train_and_validate(train_inputs, train_outputs) test_inputs = task.get_test_data(fold, include_target=False) predictions = my_model.predict(test_inputs) task.record(fold, predictions) mb.to_file("my_models_benchmark.json.gz") Submit model file along with your desired model metadata via Github PR
  • 38. The ingredients of the Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure ü An online leaderboard that encourages reproducible results 38
  • 39. Results so far: graph NN for large data sets, conventional ML for small Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3. 39
  • 40. Overall and upcoming goals for Matbench • We have introduced a method that allows researchers to evaluate their machine learning models on a standard benchmark, if they so choose • The “Matbench” resource also provides metadata and code examples that allows others to reproduce and use community ML models more easily, as well as discover new ML models • In the future, we hope to do expand the type of tasks, perform meta- analyses on what kinds of algorithms work best for certain problems, and plot progress on these tasks over time 40
  • 41. Concluding thoughts The Materials Project is a free resource providing data and tools to help perform research and development of new materials Even more can be accomplished as a unified community to push forward data dissemination as well as the capabilities of machine learning 41 We encourage you to give Matbench a try, and look forward to seeing your algorithm on the leaderboard!
  • 42. Kristin Persson MP Director The team Intro Thank you! Patrick Huck Staff Scientist (MPContribs) Alex Dunn Grad Student (Matbench / matminer) Slides (already) uploaded to https://hackingmaterials.lbl.gov