As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Going Smart and Deep on Materials at ALCF
1. Building an ALCF Data Service:
Interactive, scalable, reproducible data science
Ian Foster
Rick Wagner, Nick Saint, Eric Blau
Kyle Chard, Yadu Nand Babuji
Logan Ward, Ben Blaiszik
Mike Papka
with
André Schleife and Cheng-Wei Lee
Aiichiro Nakano (USC - ALCF INCITE 2017), Maria Chan (ANL - ALCF INCITE 2016, André Schleife (UIUC - ALCF INCITE 2016)
3. Overview
• Leadership simulations produce
data of great scientific value
• We demonstrate how to:
Make data more accessible and
useful by associating them with rich
data lifecycle and analysis services
Find Analyze Publish
Interactive ♦ Scalable ♦ Reproducible
4. Overview
• Leadership simulations produce
data of great scientific value
• We demonstrate how to:
Make data more accessible and
useful by associating them with rich
data lifecycle and analysis services
Leverage advanced data science and
machine learning (ML) methods to
reduce simulation costs and
increase data quality and value
Find Analyze Publish
Interactive ♦ Scalable ♦ Reproducible
Collect Process Represent Learn
5. Interactive, scalable, reproducible data science
PUBLISH
Automate capture, publication, and indexing of results from ALCF projects
Enable creation of workspaces and reusable data objects to accelerate data analysis and
promote replicability
ANALYZE
Combine ML approaches with ALCF HPC resources to extract more information from
existing datasets and to guide future simulation campaigns
FIND
Unify search, discovery, and consumption of datasets, workspaces, and analysis results
6. Interactive, scalable, reproducible data science
Data
Movement
Data
Discovery
Data
Publication
Automation
Machine
Learning
HPC
Data
Interactivity
Data
Access
ALCF
Other
services
PUBLISH
Automate capture, publication, and indexing of results from ALCF projects
Enable creation of workspaces and reusable data objects to accelerate data analysis and
promote replicability
ANALYZE
Combine ML approaches with ALCF HPC resources to extract more information from
existing datasets and to guide future simulation campaigns
FIND
Unify search, discovery, and consumption of datasets, workspaces, and analysis results
7. Interactive, scalable, reproducible data science
PARSL
Data
Movement
Data
Discovery
Data
Publication
Automation
Machine
Learning
HPC
Data
Interactivity
Data
Access
ALCF
Other
services
8. Materials science as an initial testbed
• Advanced materials are critical to economic
security and competitiveness, national security,
and human welfare. (MGI 2011 interagency effort
DoD, DOE, NASA, NIST, and NSF)
• Finding and understanding new materials is
complex, expensive, and time consuming: often
taking > 20 years from research to application
• Materials scientists are key users of leadership
class computing (20-30% at ALCF)
• Community data tools and services to advance
materials science are emerging
Nicholas Brawand, University of Chicago; Larry Curtiss, Argonne National Laboratory
9. Modeling material stopping power
Stopping Power: a “drag” force
experienced by high speed protons,
electrons, or positrons in a material
Areas of Application
• Nuclear reactor safety
• Magnetic confinement / inertial
containment for nuclear fusion
• Solar cell surface adsorption
• Medicine (e.g., proton therapy
cancer treatment)
• Critical to understanding
material radiation damage
André Schleife and Cheng-Wei Lee (UIUC)
2016 ALCF INCITE Project
“Electronic Response to Particle Radiation
in Condensed Matter”
André Schleife, Yosuke Kanai, Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
10. Modeling material stopping power
André Schleife and Cheng-Wei Lee (UIUC)
2016 ALCF INCITE Project
“Electronic Response to Particle Radiation
in Condensed Matter”
André Schleife, Yosuke Kanai, Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
Stopping Power: a “drag” force
experienced by high speed protons,
electrons, or positrons in a material
Areas of Application
• Nuclear reactor safety
• Magnetic confinement / inertial
containment for nuclear fusion
• Solar cell surface adsorption
• Medicine (e.g., proton therapy
cancer treatment)
• Critical to understanding
material radiation damage
11. Computing stopping power with TD-DFT
Stopping power (SP) can be accurately
calculated by time-dependent density
functional theory (TD-DFT)
Excellent agreement with experiment
Can vary orientation, projectile, material
Highly parallelizable
But we need many results
Direction dependence
Effect of defects
Many more materials
TD-DFT alone may not be sufficient
André Schleife, Yosuke Kanai, and Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
Experiment
TD-DFT
16k CPU-Hr
12. Computing stopping power with TD-DFT
Stopping power (SP) can be accurately
calculated by time-dependent density
functional theory (TD-DFT)
Excellent agreement with experiment
Can vary orientation, projectile, material
Highly parallelizable
But we need many results
Direction dependence
Effect of defects
Many more materials
TD-DFT alone may not be sufficient
André Schleife, Yosuke Kanai, and Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
Experiment
TD-DFT
16k CPU-Hr
Potential Solution:
Machine Learning!
13. 13
What? Algorithms that generate computer programs
Why? Create software too complex to write manually
General Task: Given inputs, predict output 𝑦 = 𝑓(𝑥)
Common Algorithms:
Advantages:
Fast 104-107 evaluations/CPU/sec
Adaptable Limited need to know underlying physics
Self-correcting Improves with more data.
Parallelizable Can use large-scale resources
x
y
𝒇(𝒙) = 𝒎𝒙 + 𝒃
Linear
Regression
𝒙 < 𝟒
𝒚 = 𝟐 𝒚 = 𝟔
Decision
Trees
Neural
Networks
Source: nature.com
What is machine learning?
14. Computing stopping power with TD-DFT+ML
We propose to use ML to create surrogate models for TD-DFT
How do we replace TD-DFT? First, consider what it does
Inputs:
Atomic-scale structure (atom types, position)
Electronic structure of system
Outputs:
Energy of entire system
Forces on each atom
Time-derivates of electronic structure
If successful, we can use the ML model – not TD-DFT – to compute SP
Allow prediction of future state
16. PARSL
Step 1: Data collection
Collect Process Represent Learn
3 4 -1.0
3 5 -0.5
Δ𝐻𝑓 = −1.0
Δ𝐻𝑓 = −0.5
𝑋 𝑦
Cooley
• AGNI
Fingerprints
• Ion-Ion force
• Local Charge
Density
• Linear
models
• ANNs
• RNNs
17. Stopping Power prediction: Our data
We have simulation results for H in face-centered cubic Al on a random trajectory.
André Schleife, Yosuke Kanai, and Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
18. Stopping Power prediction: Our data
We have simulation results for H in face-centered cubic Al on a random trajectory.
For each of multiple velocities, we have:
1) A simulated SP: one red point
André Schleife, Yosuke Kanai, and Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
19. Stopping Power prediction: Our data
We have simulation results for H in face-centered cubic Al on a random trajectory.
For each of multiple velocities, we have:
1) A simulated SP: one red point
2) A trajectory for that point:
André Schleife, Yosuke Kanai, and Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
20. Stopping Power prediction: Our data
We have simulation results for H in face-centered cubic Al on a random trajectory.
For each of multiple velocities, we have:
1) A simulated SP: one red point
2) A trajectory for that point:
3) A ground-state calculation for that trajectory’s starting point
(About 6GB in total, mostly Qbox output files)
André Schleife, Yosuke Kanai, and Alfredo A. Correa, 2015 -- 10.1103/PhysRevB.91.014306
21. Steps 2-3: Data processing/Representation
Collect Process Represent Learn
3 4 -1.0
3 5 -0.5
Δ𝐻𝑓 = −1.0
Δ𝐻𝑓 = −0.5
𝑋 𝑦
Cooley
• AGNI
Fingerprints
• Ion-Ion force
• Local Charge
Density
• Linear
models
• ANNs
• RNNs
PARSL
22. Designing a training set
Key Question: What are the inputs and outputs to our model?
Consider those for TD-DFT:
Inputs:
Atomic-scale structure (atom types, position)
Electronic structure of system
Outputs:
Energy of entire system
Time-derivates of electronic structure
Forces on each atom the projectile
Input: Atomic Structure, Output: Force on Particle
Collect Process Represent Learn
Requires TD-DFT to compute
Reliant on entire history (hard to predict)
Not needed to compute stopping power
23. Selecting a representation
Key Questions: What determines force on projectile? How do we quantify it?
Types of Features
Ion-ion repulsion: Can be computed directly
Electronic interactions: We approximate with two feature types
Local charge density: Density of electrons at projectile position
AGNI fingerprints*: Describe the atom positions around projectile
Another need: History Dependence
Approach: Use charge density at fixed points ahead/behind projectile
Collect Process Represent Learn
* Botu, et al. J. Phys. Chem. C 121, 511–522 (2017).
24. PARSL
Step 4: Machine learning
Collect Process Represent Learn
3 4 -1.0
3 5 -0.5
Δ𝐻𝑓 = −1.0
Δ𝐻𝑓 = −0.5
𝑋 𝑦
Cooley
• AGNI
Fingerprints
• Ion-Ion force
• Local Charge
Density
• Linear
models
• ANNs
• RNNs
25. Selecting a machine learning algorithm
Key Criterion: Prediction accuracy
Beyond accuracy, the algorithm should…
be feasible to train with >104 entries
be quick to evaluate
produce a differentiable model
Standard Procedure:
1. Identify suitable algorithms
(linear models, neural networks)
2. Evaluate performance using cross-
validation
3. Validate the model vs. unseen data
26. Live Demo
You won’t see the live demo here. But it was cool. We located Schleife
simulation data previously published to MDF; assembled a workspace
comprising Aluminum data plus four Jupyter notebooks comprising
data processing, ML training, and SP modeling methods; deployed the
workspace to ALCF; and ran the notebooks to process data, train a
model, and predict SP values for many directions.
27. Summary of analysis results
We compared a variety of ML algorithms
We computed SP for other trajectories
We evaluated data needed for training
We calculated
Stopping Power
for many
trajectories
28. Materials science and machine learning
Collect Process Represent Learn
3 4 -1.0
3 5 -0.5
Δ𝐻𝑓 = −1.0
Δ𝐻𝑓 = −0.5
𝑋 𝑦
Cooley
• AGNI
Fingerprints
• Ion-Ion force
• Local Charge
Density
• Linear
models
• ANNs
• RNNs
PARSL
29. EP
EP
EP
EP
Deep indexing
Web UI, Forge, or
REST API
• Query
• Browse
• Aggregate
Publish
Web UI or API
• Mint DOIs
• Associate
metadata
Databases
Datasets
APIs
LIMS
etc.
Distributed data
storage
Data
publication
service
Data
discovery
service
Materials Data Facility to discover data
116 data sources
3.4M records
300 TB
30. Data ingest flow
1. Data are created at ALCF
2. Data are staged, published,
and assigned a permanent
identifier (DOI)
3. Results are indexed for
easy discovery
4. Interactive analysis,
modeling, and interrogation
1
31. Data ingest flow
Data Publication
Data Storage
2
2
1. Data are created at ALCF
2. Data are staged, published,
and assigned a permanent
identifier (DOI)
3. Results are indexed for
easy discovery
4. Interactive analysis,
modeling, interrogation
32. Data ingest flow
Data Publication
Data Storage
3
Indexing
1. Data are created at ALCF
2. Data are staged, published,
and assigned a permanent
identifier (DOI)
3. Results are indexed for
easy discovery
4. Interactive analysis,
modeling, interrogation
33. Data ingest flow
Data Publication
Data Storage
Query
Fetch
PARSL4
Indexing
1. Data are created at ALCF
2. Data are staged, published,
and assigned a permanent
identifier (DOI)
3. Results are indexed for
easy discovery
4. Interactive analysis,
modeling, interrogation
34. Data collection
1. Find data through search index
2. Create BDBags for data
reusability, staging, and sharing
3. Stage data and launch
interactive environment
on ALCF computers
4. Analyze data!
35. Data collection
1. Find data through search index
2. Create BDBags for data
reusability, staging, and sharing
3. Stage data and launch
interactive environment
on ALCF computers
4. Analyze data!
36. Data staging
1. Find data through search index
2. Create BDBags for data
reusability, staging, and sharing
3. Stage data and launch
interactive environment
on ALCF computers
4. Analyze data!
37. Interactive, scalable, reproducible data analysis
Data science and learning applications require:
- Interactivity
- Scalability
- You can’t run this on a desktop
- Reproducibility
- Publish code and documentation
38. Interactive, scalable, reproducible data analysis
Data science and learning applications require:
- Interactivity
- Scalability
- You can’t run this on a desktop
- Reproducibility
- Publish code and documentation
Our solution: JupyterHub + Parsl
Interactive computing environment
Notebooks for publication
Can run on dedicated hardware
PARSL
parsl-project.orgjupyter.org
• Python-based parallel scripting library
• Tasks exposed as functions (Python or bash)
• Python code used to glue functions together
• Leverages Globus for auth and data movement
@App('python', dfk)
def compute_features(chunk):
for f in featurizers:
chunk = f.featurize_dataframe(chunk, 'atoms')
return chunk
chunks = [compute_features(chunk)
for chunk in np.array_split(data, chunks)]
39. Interactive, scalable, reproducible data science
TD-DFT Calculations Machine Learning
Direction-Dependent
Stopping Power
Existing Data ALCF Data Facility New Capabilities
40. Interactive, scalable, reproducible data science
Existing Data ALCF Data Facility New Capabilities
Next Steps
1. Model multiple velocities
2. Model more materials
3. Model direction dependence
4. Transfer Learning
Results so far
• Indexed data from an ALCF INCITE project
• Interactively built surrogate model using
ALCF Data Service capabilities
• Extended results to model SP direction
dependence in Aluminum
41. Thanks to our sponsors!
U . S . D E P A R T M E N T O F
ENERGY
ALCF DF
Parsl Globus IMaD
42. Building an ALCF Data Service:
Interactive, scalable, reproducible data science
Ian Foster
Rick Wagner, Nick Saint, Eric Blau
Kyle Chard, Yadu Nand Babuji
Logan Ward, Ben Blaiszik
Mike Papka
with
André Schleife and Cheng-Wei Lee
Aiichiro Nakano (USC - ALCF INCITE 2017), Maria Chan (ANL - ALCF INCITE 2016, André Schleife (UIUC - ALCF INCITE 2016)
Hinweis der Redaktion
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research
On-channel Proton in Gold lattice
TD-DFT offers a great way for computing the stopping power of materials. As shown in the figure on the right, it can accurately reproduce experimentally-measured stopping powers.
Given that TD-DFT is parameter-free, we can easily model the effect of changing the direction of the projectile, the type of projectile, and the host material. Additionally, there are advanced, parallel TD-DFT relies on advanced parallel codes that enable the use of leadership-class computing facilities, which is good because it is resource intensive.
Just one of the points on this plot required ~16k computing hours on Sierra at LLNL. That single point is only the stopping power for a single direction in the crystal for a single projectile type, at a single velocity, for a single host material. In the future, we want to be able to easily access the stopping power for many different types of materials in all possible directions, and even be able to ascertain the effects of defects on the stopping power. TD-DF, while quite powerful, might not be sufficient on its own to do this.
To compensate, we propose to use machine learning to extend the capability of TD-DFT.
TD-DFT offers a great way for computing the stopping power of materials. As shown in the figure on the right, it can accurately reproduce experimentally-measured stopping powers.
Given that TD-DFT is parameter-free, we can easily model the effect of changing the direction of the projectile, the type of projectile, and the host material. Additionally, there are advanced, parallel TD-DFT relies on advanced parallel codes that enable the use of leadership-class computing facilities, which is good because it is resource intensive.
Just one of the points on this plot required ~16k computing hours on Sierra at LLNL. That single point is only the stopping power for a single direction in the crystal for a single projectile type, at a single velocity, for a single host material. In the future, we want to be able to easily access the stopping power for many different types of materials in all possible directions, and even be able to ascertain the effects of defects on the stopping power. TD-DF, while quite powerful, might not be sufficient on its own to do this.
To compensate, we propose to use machine learning to extend the capability of TD-DFT.
Self-correcting: If the model’s wrong, add more data and it will automatically correct itself [theorists do this on a slower timescale]
Our first question is: How do we approach creating a surrogate for TD-DFT?
The first step in that process is recognizing what are the inputs and outputs to TD-DFT
Its inputs are the atomic-scale structure (position and types of atoms), and the current electronic structure. In essence, what is the current state of the material at the electronic level?
Its outputs are the energy of the system and quantities that allow you predict its future state: the forces acting on each atom, and the rate of change of the electronic structure (i.e., the wavefunctions for each atom).
If we can successfully emulate the function that maps these inputs and outputs, we can use our ML surrogate to compute stopping power rather than use TD-DFT directly.
Ok, this outlines what we need to replace in simple language. Now, our next step is to build this model.
We break down building a machine learning model into 4 distinct steps:
First, we need to collect a resource of raw data for training the model
Next we need to process that raw data to define a training set: what are the inputs [in broad terms] and the desired outputs
Finally, we translate our materials data into a form compatible with machine learning: a list of finite-length vectors that each have the same length. I.e., we select a representation
Lastly, we employ machine learning to find a function that maps the representation to the outputs: the classic machine learning problem.
At this point, we will break in to a live demo to show you how this process applies to modeling TD-DFT and how the ALCF Data Facility makes this work easier.
As I just mentioned the first step in our process is gathering a set of training data
For our application: the data is the data supporting a previous publication of Andre Schleife. Specifically we have the data backing this figure.
What this figure describes is the stopping power as a function of velocity for a proton traveling through aluminum as along a random trajectory.
For each red point in this figure, we have the TD-DFT data that was used to calculate it. What this actually means is we have about 6GB of Qbox output files that contain the structure and energy of the system as a function of time during the simulation.
We also have the starting point for these simulations: A ground state DFT calculation of the electronic structure of Al
[At this point, jump to showing off the ALCF portal and creating the environment. Then, go to the notebook and show what we have]
For our application: the data is the data supporting a previous publication of Andre Schleife. Specifically we have the data backing this figure.
What this figure describes is the stopping power as a function of velocity for a proton traveling through aluminum as along a random trajectory.
For each red point in this figure, we have the TD-DFT data that was used to calculate it. What this actually means is we have about 6GB of Qbox output files that contain the structure and energy of the system as a function of time during the simulation.
We also have the starting point for these simulations: A ground state DFT calculation of the electronic structure of Al
[At this point, jump to showing off the ALCF portal and creating the environment. Then, go to the notebook and show what we have]
For our application: the data is the data supporting a previous publication of Andre Schleife. Specifically we have the data backing this figure.
What this figure describes is the stopping power as a function of velocity for a proton traveling through aluminum as along a random trajectory.
For each red point in this figure, we have the TD-DFT data that was used to calculate it. What this actually means is we have about 6GB of Qbox output files that contain the structure and energy of the system as a function of time during the simulation.
We also have the starting point for these simulations: A ground state DFT calculation of the electronic structure of Al
[At this point, jump to showing off the ALCF portal and creating the environment. Then, go to the notebook and show what we have]
For our application: the data is the data supporting a previous publication of Andre Schleife. Specifically we have the data backing this figure.
What this figure describes is the stopping power as a function of velocity for a proton traveling through aluminum as along a random trajectory.
For each red point in this figure, we have the TD-DFT data that was used to calculate it. What this actually means is we have about 6GB of Qbox output files that contain the structure and energy of the system as a function of time during the simulation.
We also have the starting point for these simulations: A ground state DFT calculation of the electronic structure of Al
[At this point, jump to showing off the ALCF portal and creating the environment. Then, go to the notebook and show what we have]
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research
Now, back to the science.
Our second step in a creating a model is processing the data from its raw form to create a training set with clear inputs and outputs.
To decide what those should be for our model, we go back to the inputs and outputs.
As inputs: TD-DFT takes the atomic scale structure and electronic structure.
When building a model, we cannot use the time-dependent electronic structure because this requires TD-DFT to compute.
On the other hand, we can know the atomic scale structure.
For outputs: The stopping power of a material is the average force acting on the projectile over time
We don’t need the electronic structure to compute the stopping power, so let’s eliminate that
We can get the average force on the particle from the energy. But, the energy at any timestep is dependent on all of the previous timesteps, which makes it difficult to predict
So, we should just predict the force acting on the particle
Our next step is determine what ‘atomic structure’ actually means in terms of inputs to the model.
We choose to use two types of inputs.
First, we know part of the force deals with the ‘ion-ion’ repulsion between the projectile and the surrounding nuclei. This we can just compute directly
Secondly, we know the particle interacts with the electrons in the material. To approximate this effect, we use two kinds of features: (1) the electron density (taken from the starting condition for our simulation, and (2) the AGNI fingerprints, which capture the local arrangement of atoms and provide the basis for an ML model to capture electronic effects
Another thing we need for these features is history dependence. The force acting on a particle is not just dependent on it’s current environment, but also what just happened to it. As the particle travels at a constant velocity, we know it’s history and represent that history by computing the charge density at positions several timesteps in the past and one in the future.
Now, let’s jump back to the notebooks to show what these processing and representation calculations look like.
Our last step of the model building process is training a machine learning algorithm.
The main question when selecting a machine learning algorithm is: which algorithm produces a model with the highest prediction accuracy
However, there are also other important factors to consider with this application. The model should [explain the list]
To identify the best model we follow a simple and very common procedure:
1. We first identify suitable algorithms: for our case, linear models and neural networks are our top choices
2. Then, we test them using cross-validation
3. Finally, we validate the model using data outside of our original training set
We’ll show you this process in our notebooks
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research
Ian, should these slides come this early, or later?
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research
An index for heterogeneous distributed data, coupled with APIs to facilitate data access, discovery, and addition, layered with capabilities to support simplified deep learning against these data.
Simplify interfaces for data publication regardless of data size, type, and location
Provide automation capabilities to capture data from pipelines
Deploy APIs to foster community development and integration
Encourage data re-use
Incentivize data sharing and
Support Open Science in materials research