SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Machine learning in computational materials
science
overview and personal experience
Igor Mosyagin
June 12, 2017
Few disclaimers
This talk is about computational materials science
There might be scientific fields where everything’s different
Materials science ≈ statistical physics
This talk is based on my limited experience
I also express my own opinion which may not coincide with my
current employer’s opinion.
What is «more» in this context?
In chemistry and physics, the
Avogadro constant is the number
of constituent particles, usually
atoms or molecules, that are
contained in the amount of
substance given by one mole.
NA = 6.022 × 1023
mol−1
At «normal conditions»
1 mol ∝ 22.4 L
Water bottles at pyparis coffee
breaks are 0.5 L.
Example: Density Functional Theory
A. Mattsson et al.; doi:10.1088/0965-0393/13/1/r01
Periodic table of chemical elements
Periodic table of a theoretical physicist
Some projects even get featured in national magazine
Felix A. Faber et al. doi:10.1103/PhysRevLett.117.135502
Computational costs?
Modern state-of-the-art computations — N atoms in a simulation
cell, N is several hundred. World record — few thousand.
c
a
If one adds temperature (but
stays at quantum level), it
becomes more complicated.
For temperature-involving
simulations N is typically several
hundred.
Scales typically as N3
Time to solution? A month, at least
If everything is fine, it takes a few hours for static (T = 0)
calculation, and few weeks for temperature-related simulations.
Steps in temperature-related simulations
1 Preparation. Select parameters, build simulation cells, select
starting positions etc
bash/awk/sed, gui tools. Perl, fortran, matlab
2 Running simulations in an HPC environment (shared
supercomputer with queue, priorities and quotas)
Fortran. Sometimes an old version of fortran.
3 Processing. Parsing output of calculations, building models on
top, visualization, etc
Every other language and gui tool. Also fortran
Temperature-involved calculations are expensive
Everything that is not related to HPC calculations can be done in
high-level language.
There are some packages that help with those steps. Sometimes
those packages even provide python interfaces to fortran codes
(python-ase, pymatgen). There’s a separate journal (a few) for those
sort of programms.
Human factors
The lack of software craftsmanship skills leads to people
believing that Fortran is the only option.
Lack of exposure? The «next» step after bash scripts and
fortran in data processing is usually matlab.
(Young) researchers are no different from developers:
smart
arrogant
NiH-syndrome
lazy
do complex stuff
It might be hard to convince your supervisor to allow you
spend resources on improving your «programming» skills
What can be done?
Need more exposure!
if you organize a meetup — put a note on the local university
board/fb. PhD students tend to have very similar set of interests
as developers.
if you have friends/acquaintances in academia, bring them to
meetup or ask them if they suffer any computer-related pain.
You might help them save a few weeks of work, and maybe get
a free beer in return
there’s always github physics projects that would love to have
somebody help them with code
Lead by example, if you can
Scientists believe that DS is all about classification, while «real»
science is all about regressions
If you feel bold enough, organize a tutorial
A lot of people use matlab/Rstudio only for convenient layout,
and few know that tools like jupyter/spyder exist
Some authority to use with stubborn people
10.1371/journal.pcbi.1003285 and 10.1371/journal.pone.0067111
A few databases with materials data
Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin

Weitere ähnliche Inhalte

Was ist angesagt?

Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...
Anubhav Jain
 

Was ist angesagt? (20)

Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
 
Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...Computational screening of tens of thousands of compounds as potential thermo...
Computational screening of tens of thousands of compounds as potential thermo...
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Combined Theory and Data-Driven Approaches to Thermoelectrics Materials Disco...
Combined Theory and Data-Driven Approaches to Thermoelectrics Materials Disco...Combined Theory and Data-Driven Approaches to Thermoelectrics Materials Disco...
Combined Theory and Data-Driven Approaches to Thermoelectrics Materials Disco...
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AI
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 

Ähnlich wie Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin

2013 03-15- Institut Jacques Monod - bioinfoclub
2013 03-15- Institut Jacques Monod - bioinfoclub2013 03-15- Institut Jacques Monod - bioinfoclub
2013 03-15- Institut Jacques Monod - bioinfoclub
Yannick Wurm
 
Creating a formal laboratory
Creating a formal laboratoryCreating a formal laboratory
Creating a formal laboratory
mpiskel
 
Cmu experimental design
Cmu experimental designCmu experimental design
Cmu experimental design
ray4hz
 
Data stuctures
Data stucturesData stuctures
Data stuctures
shadshaf
 
Data structures and algorisms
Data structures and algorismsData structures and algorisms
Data structures and algorisms
Ahmed Farag
 
basic statistics
basic statisticsbasic statistics
basic statistics
rosedelle
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
swapnatoya
 

Ähnlich wie Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin (20)

STM Innovations Seminar London
STM Innovations Seminar LondonSTM Innovations Seminar London
STM Innovations Seminar London
 
2013 03-15- Institut Jacques Monod - bioinfoclub
2013 03-15- Institut Jacques Monod - bioinfoclub2013 03-15- Institut Jacques Monod - bioinfoclub
2013 03-15- Institut Jacques Monod - bioinfoclub
 
Creating a formal laboratory
Creating a formal laboratoryCreating a formal laboratory
Creating a formal laboratory
 
Mean
MeanMean
Mean
 
Book
BookBook
Book
 
Cmu experimental design
Cmu experimental designCmu experimental design
Cmu experimental design
 
A Modern Introduction To Probability And Statistics Understanding Why And How...
A Modern Introduction To Probability And Statistics Understanding Why And How...A Modern Introduction To Probability And Statistics Understanding Why And How...
A Modern Introduction To Probability And Statistics Understanding Why And How...
 
Hope sos project 9-10
Hope   sos project 9-10Hope   sos project 9-10
Hope sos project 9-10
 
Data stuctures
Data stucturesData stuctures
Data stuctures
 
Data structures and algorisms
Data structures and algorismsData structures and algorisms
Data structures and algorisms
 
basic statistics
basic statisticsbasic statistics
basic statistics
 
Think_Stats.pdf
Think_Stats.pdfThink_Stats.pdf
Think_Stats.pdf
 
Research Project Management
Research Project ManagementResearch Project Management
Research Project Management
 
Maintaining lab note book
Maintaining lab note bookMaintaining lab note book
Maintaining lab note book
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
[David a. coley]_an_introduction_to_genetic_algori(book_fi.org)
 
M4D-v0.4.pdf
M4D-v0.4.pdfM4D-v0.4.pdf
M4D-v0.4.pdf
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Teaching Case Studies
Teaching Case StudiesTeaching Case Studies
Teaching Case Studies
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 

Mehr von Pôle Systematic Paris-Region

Mehr von Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin

  • 1. Machine learning in computational materials science overview and personal experience Igor Mosyagin June 12, 2017
  • 2. Few disclaimers This talk is about computational materials science There might be scientific fields where everything’s different Materials science ≈ statistical physics This talk is based on my limited experience I also express my own opinion which may not coincide with my current employer’s opinion.
  • 3.
  • 4. What is «more» in this context? In chemistry and physics, the Avogadro constant is the number of constituent particles, usually atoms or molecules, that are contained in the amount of substance given by one mole. NA = 6.022 × 1023 mol−1 At «normal conditions» 1 mol ∝ 22.4 L Water bottles at pyparis coffee breaks are 0.5 L.
  • 5. Example: Density Functional Theory A. Mattsson et al.; doi:10.1088/0965-0393/13/1/r01
  • 6. Periodic table of chemical elements
  • 7. Periodic table of a theoretical physicist
  • 8. Some projects even get featured in national magazine Felix A. Faber et al. doi:10.1103/PhysRevLett.117.135502
  • 9. Computational costs? Modern state-of-the-art computations — N atoms in a simulation cell, N is several hundred. World record — few thousand. c a If one adds temperature (but stays at quantum level), it becomes more complicated. For temperature-involving simulations N is typically several hundred. Scales typically as N3
  • 10. Time to solution? A month, at least If everything is fine, it takes a few hours for static (T = 0) calculation, and few weeks for temperature-related simulations. Steps in temperature-related simulations 1 Preparation. Select parameters, build simulation cells, select starting positions etc bash/awk/sed, gui tools. Perl, fortran, matlab 2 Running simulations in an HPC environment (shared supercomputer with queue, priorities and quotas) Fortran. Sometimes an old version of fortran. 3 Processing. Parsing output of calculations, building models on top, visualization, etc Every other language and gui tool. Also fortran
  • 11. Temperature-involved calculations are expensive Everything that is not related to HPC calculations can be done in high-level language. There are some packages that help with those steps. Sometimes those packages even provide python interfaces to fortran codes (python-ase, pymatgen). There’s a separate journal (a few) for those sort of programms.
  • 12. Human factors The lack of software craftsmanship skills leads to people believing that Fortran is the only option. Lack of exposure? The «next» step after bash scripts and fortran in data processing is usually matlab. (Young) researchers are no different from developers: smart arrogant NiH-syndrome lazy do complex stuff It might be hard to convince your supervisor to allow you spend resources on improving your «programming» skills
  • 13. What can be done? Need more exposure! if you organize a meetup — put a note on the local university board/fb. PhD students tend to have very similar set of interests as developers. if you have friends/acquaintances in academia, bring them to meetup or ask them if they suffer any computer-related pain. You might help them save a few weeks of work, and maybe get a free beer in return there’s always github physics projects that would love to have somebody help them with code Lead by example, if you can Scientists believe that DS is all about classification, while «real» science is all about regressions If you feel bold enough, organize a tutorial A lot of people use matlab/Rstudio only for convenient layout, and few know that tools like jupyter/spyder exist
  • 14. Some authority to use with stubborn people 10.1371/journal.pcbi.1003285 and 10.1371/journal.pone.0067111
  • 15. A few databases with materials data