SlideShare ist ein Scribd-Unternehmen logo
1 von 17
The Materials Project
Validation, Provenance and Sandboxes
Goals
‱ Validation
– constantly guard against bugs in core data
and imported data
‱ Provenance
– know how data came to be
‱ Sandboxes
– Combine public and non-public data; "good
fences make good neighbors"
Validation
(Internal)
Database ID
External ID What we expected
What we got
Validation runs all the time
‱ Rules with "constraints" for every database (and sandbox)
‱ Test constraints against entire DB every night  email reports
‱ Validation engine, etc. all open-source software in pymatgen-db
Remote
server
Validation
engine
Rules
MP Databases
Reports
(email, web pages, ..)
Rules have a simple syntax
_aliases:
- snl_id = mps_id
- energy = analysis.e_above_hull
materials:
-
filter:
constraints:
- final_energy_per_atom <= 0
- initial_structure.lattice.volume > 0
- initial_structure.lattice.a > 0
- initial_structure.lattice.b > 0
- initial_structure.lattice.c > 0
- initial_structure.lattice.matrix size 3
- formation_energy_per_atom <= 5
- formation_energy_per_atom > -5
- cpu_time > 5
- e_above_hull > -0.000001
- final_energy < 0
- reduced_cell_formula size$
nelements
# Check num. ICSD sources for
selected compounds
-
filter:
- task_id = "mp-540081"
constraints:
- icsd_id size> 10
-
filter:
- task_id = "mp-20379"
constraints:
- icsd_id size 1
-
filter:
- task_id = "mp-13634"
constraints:
- icsd_id size> 0
-
filter:
- task_id = "mp-600022"
constraints:
- icsd_id size 0
# NiO2 phases should never become
stable
-
filter:
- e_above_hull = 0
constraints:
- pretty_formula != 'NiO2'
tasks:
-
filter:
- state = "successful"
constraints:
- output.final_energy_per_atom <= 0
Validation summary
Easy-to-use, integrated, efficient tools to
report errors
Next steps
– Record all check results in DB
– More sophisticated checks (Map/Reduce)
– Make it easier to add new checks internally
– Make it easier to add new check for anyone
‱ per-sandbox or even per-user ("MP Alerts")
Provenance: How do I know that
the data is correct?
Types of provenance in the system
1) Calculation workflows
– FireWorks records calculation inputs, .. results in great detail
2) External datasets
– Structure Notation Language standardizes the naming of data
sources and publications
3) Post-calculation data transformations
– New "builders" provides framework for tracking creation of final
database products
(1) (2)
(3)
Provenance is available
for every material
Provenance in DB
Structure Notation Language
"snl_final": {
"about": {
"created_at": {
"string": "2014-02-22
19:07:00.383869",
"@class": "datetime",
"@module": "datetime"
},
"_materialsproject": {
"submission_id":
52621,
"snl_id": 398676,
"spacegroup": {
"lattice_type":
"tetragonal",
"symbol":
"P4_2/mmc",
"number": 131,
"point_group":
"4/mmm",
"crystal_system":
"tetragonal",
"hall": "-P 4c 2"
}
},
"_cedergroup": {
"BURP_sids": [
409544,
409545,
409546
],
"icsd_ids": [
],
"e_above_hull":
0.075125350000000423734
},
"references": "",
"authors": [
{
"name": "Geoffroy
Hautier",
"email":
"geoffroy.hautier@uclouvain
.be"
},
{
"name": "Bo Xu",
"email":
"boxu14@mit.edu"
}
],
"remarks": [
"supplementary
compounds from MIT
matgen database"
],
"projects": [
"MIT matgen"
],
"history": [
{
"url": "http://www.fiz-
karlsruhe.de/icsd_home.htm
l",
"name": "Inorganic
Crystal Structure Database",
"description": {
"Collection code":
24692
}
},
{
"url": "",
"name": "",
"description": {
"source": null,
"orig_name": "Basic
substitution code.",
"formula": "O1 Pd1"
}
},
{
"url":
"http://ceder.mit.edu/",
"name": "MIT Ceder
group research database",
"description": {
"source": 105986,
"orig_name": "",
"formula": "FeO"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820305,
"task_type": "GGA
optimize structure (2x)",
"task_id": "mp-
753682"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820308,
"task_type":
"GGA+U optimize structure
(2x)",
"task_id": "mp-
776678"
}
}
]
},
Metadata
Crystal
DB sources
References History of
structure
optimizations
Future work: unified view of
provenance
VASP
result
ICSD
VASP
result
VASP
result
Post-
processing
Material
properties
Computation
Data import
processing
e.g., Defects
Sandbox example: Multivalent
JCESR
users
Non-
JCESR
users
Multivalent app
Sandboxes = Database + Apps
Core data Core data
+
multivalent
materials
Non-
JCESR
users
JCESR
users
Technical challenges
‱ Pre-process data for real-time search
‱ Interfaces for per-user access control
– https://materialsproject.org/materials/1234?san
dbox=jcesr
– Web UI elements
and
Future: dynamic sandbox creation
Current:
– Large & significant
additional data / apps
‱ e.g., JCESR
– Longer-term
connections to MP data
‱ e.g. porous materials
– Companies
‱ e.g. VW/Stanford
Future
small collab.
per-user?
CoD?
Summary
‱ Validation
– guard against bugs by checking all data daily
and at data import/creation time
‱ Provenance
– universal standard for annotating data
provenance
‱ Sandboxes
– unified view of distinct databases
– onramp for new collaborations and data

Weitere Àhnliche Inhalte

Was ist angesagt?

A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter HimselfSergey Soldatov
 
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...Malachi Jones
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Damir Delija
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysisCharles Lim
 
Windows Threat Hunting
Windows Threat HuntingWindows Threat Hunting
Windows Threat HuntingGIBIN JOHN
 
Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Xavier Ashe
 
Malware forensics
Malware forensicsMalware forensics
Malware forensicsSameera Amjad
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware AnalysisJongWon Kim
 
Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Jared Atkinson
 
Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Rhydham Joshi
 
DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101dc612
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsRahul Mohandas
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...Priyanka Aash
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsDamir Delija
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkEC-Council
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoConferencias FIST
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowSilvio Cesare
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesSandeep Kumar Seeram
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseTakahiro Haruyama
 

Was ist angesagt? (20)

A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
 
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysis
 
Windows Threat Hunting
Windows Threat HuntingWindows Threat Hunting
Windows Threat Hunting
 
Tcpdump hunter
Tcpdump hunterTcpdump hunter
Tcpdump hunter
 
Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016
 
Malware forensics
Malware forensicsMalware forensics
Malware forensics
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware Analysis
 
Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)
 
Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1
 
DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics tools
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your Network
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario Malicioso
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
 

Ähnlich wie Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management Systempsathishcs
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchSylvain Wallez
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEEMEMTECHSTUDENTPROJECTS
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...IEEEMEMTECHSTUDENTSPROJECTS
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226Nick Kypreos
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Public private hybrid - cmdb challenge
Public private hybrid - cmdb challengePublic private hybrid - cmdb challenge
Public private hybrid - cmdb challengeryszardsshare
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsMassimo Brignoli
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecJosh Patterson
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docbutest
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossAndrew Flatters
 

Ähnlich wie Materials Project Validation, Provenance, and Sandboxes by Dan Gunter (20)

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Public private hybrid - cmdb challenge
Public private hybrid - cmdb challengePublic private hybrid - cmdb challenge
Public private hybrid - cmdb challenge
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of Things
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 

KĂŒrzlich hochgeladen

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SĂ©rgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSĂ©rgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 

KĂŒrzlich hochgeladen (20)

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 

Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

  • 1. The Materials Project Validation, Provenance and Sandboxes
  • 2. Goals ‱ Validation – constantly guard against bugs in core data and imported data ‱ Provenance – know how data came to be ‱ Sandboxes – Combine public and non-public data; "good fences make good neighbors"
  • 3. Validation (Internal) Database ID External ID What we expected What we got
  • 4. Validation runs all the time ‱ Rules with "constraints" for every database (and sandbox) ‱ Test constraints against entire DB every night  email reports ‱ Validation engine, etc. all open-source software in pymatgen-db Remote server Validation engine Rules MP Databases Reports (email, web pages, ..)
  • 5. Rules have a simple syntax _aliases: - snl_id = mps_id - energy = analysis.e_above_hull materials: - filter: constraints: - final_energy_per_atom <= 0 - initial_structure.lattice.volume > 0 - initial_structure.lattice.a > 0 - initial_structure.lattice.b > 0 - initial_structure.lattice.c > 0 - initial_structure.lattice.matrix size 3 - formation_energy_per_atom <= 5 - formation_energy_per_atom > -5 - cpu_time > 5 - e_above_hull > -0.000001 - final_energy < 0 - reduced_cell_formula size$ nelements # Check num. ICSD sources for selected compounds - filter: - task_id = "mp-540081" constraints: - icsd_id size> 10 - filter: - task_id = "mp-20379" constraints: - icsd_id size 1 - filter: - task_id = "mp-13634" constraints: - icsd_id size> 0 - filter: - task_id = "mp-600022" constraints: - icsd_id size 0 # NiO2 phases should never become stable - filter: - e_above_hull = 0 constraints: - pretty_formula != 'NiO2' tasks: - filter: - state = "successful" constraints: - output.final_energy_per_atom <= 0
  • 6. Validation summary Easy-to-use, integrated, efficient tools to report errors Next steps – Record all check results in DB – More sophisticated checks (Map/Reduce) – Make it easier to add new checks internally – Make it easier to add new check for anyone ‱ per-sandbox or even per-user ("MP Alerts")
  • 7. Provenance: How do I know that the data is correct?
  • 8. Types of provenance in the system 1) Calculation workflows – FireWorks records calculation inputs, .. results in great detail 2) External datasets – Structure Notation Language standardizes the naming of data sources and publications 3) Post-calculation data transformations – New "builders" provides framework for tracking creation of final database products (1) (2) (3)
  • 10. Provenance in DB Structure Notation Language "snl_final": { "about": { "created_at": { "string": "2014-02-22 19:07:00.383869", "@class": "datetime", "@module": "datetime" }, "_materialsproject": { "submission_id": 52621, "snl_id": 398676, "spacegroup": { "lattice_type": "tetragonal", "symbol": "P4_2/mmc", "number": 131, "point_group": "4/mmm", "crystal_system": "tetragonal", "hall": "-P 4c 2" } }, "_cedergroup": { "BURP_sids": [ 409544, 409545, 409546 ], "icsd_ids": [ ], "e_above_hull": 0.075125350000000423734 }, "references": "", "authors": [ { "name": "Geoffroy Hautier", "email": "geoffroy.hautier@uclouvain .be" }, { "name": "Bo Xu", "email": "boxu14@mit.edu" } ], "remarks": [ "supplementary compounds from MIT matgen database" ], "projects": [ "MIT matgen" ], "history": [ { "url": "http://www.fiz- karlsruhe.de/icsd_home.htm l", "name": "Inorganic Crystal Structure Database", "description": { "Collection code": 24692 } }, { "url": "", "name": "", "description": { "source": null, "orig_name": "Basic substitution code.", "formula": "O1 Pd1" } }, { "url": "http://ceder.mit.edu/", "name": "MIT Ceder group research database", "description": { "source": 105986, "orig_name": "", "formula": "FeO" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820305, "task_type": "GGA optimize structure (2x)", "task_id": "mp- 753682" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820308, "task_type": "GGA+U optimize structure (2x)", "task_id": "mp- 776678" } } ] }, Metadata Crystal DB sources References History of structure optimizations
  • 11. Future work: unified view of provenance VASP result ICSD VASP result VASP result Post- processing Material properties Computation Data import processing e.g., Defects
  • 14. Sandboxes = Database + Apps Core data Core data + multivalent materials Non- JCESR users JCESR users
  • 15. Technical challenges ‱ Pre-process data for real-time search ‱ Interfaces for per-user access control – https://materialsproject.org/materials/1234?san dbox=jcesr – Web UI elements and
  • 16. Future: dynamic sandbox creation Current: – Large & significant additional data / apps ‱ e.g., JCESR – Longer-term connections to MP data ‱ e.g. porous materials – Companies ‱ e.g. VW/Stanford Future small collab. per-user? CoD?
  • 17. Summary ‱ Validation – guard against bugs by checking all data daily and at data import/creation time ‱ Provenance – universal standard for annotating data provenance ‱ Sandboxes – unified view of distinct databases – onramp for new collaborations and data

Hinweis der Redaktion

  1. Picture of 1915 Heinrich Campendonk painting, "Landscape with horses". Steve Martin paid $850K for a forged version of the painting, from a reputable art house in Paris, in 2004. He sold it at a loss of $250K before discovering it was a forgery. The forgery was performed by Wolfgang Beltracchi.
  2. Sandboxes are a way to share preliminary data in the context of MP data and tools.