SlideShare ist ein Scribd-Unternehmen logo
1 von 45
The PubChemQC Project
A big data construction by first-
principles calculations of molecules
中田真秀 (NAKATA Maho)
ACCC RIKEN
2016/2/17 15:50-16:40
Kobe workshop for material design on
strongly correlated electrons in molecules
and materials
http://www.aics.riken.jp/labs/cms/workshop/201602/index.html
Background
• Atoms and molecules are all composed of matter.
• A dream of theoretical chemist: do chemistry without
experiment!
• On computers 
• Chemical space is really huge!
– The number of candidates for drugs
1060http://onlinelibrary.wiley.com/doi/10.1002/wcms.1104/a
bstract)
• Cf. Exa: 1018
– Combinatorics problem
– Adding chemical reaction 10120
Why 2-RDM theory has been
suspended?
• Is there short cut for solving Schrodinger Eq?
– Density functional theory, reduced density matrix theory
• Using 2-particle reduced density matrices, we can
reduce the number of variables drastically.
– Journal of Chemical Physics, 114, 8282-8292 (2001).
Introduction of semidifinite programming
– Computational and Theoretical Chemistry Volume 1003, 1
January 2013, Pages 22-7 Application to 2D Hubbard model
– Journal of chemical physics 128, 16 164113 (2008). Variouls
molecules
• However it is not size-consistent, nor size-extensive.
– Phys. Chem. Chem. Phys., 2009,11, 5558-5560
– AIP Advances 2, 032125 (2012)
– Physical Review A 80, 042109 (2009)
Fundamental question to solving SCE…
• Does this problem can be solved efficiently?
– Very likely NO!
– Example. spinglass Hamiltonian is very hard to
solve: this is as hard as solving Traveling
Salesperson Problem
– Algorithms without
assumption on 2-particle
interaction are never efficient.
Fundamental question to solving SCE…
Results from computational complexity theory
• N-representability problem is QMA-hard
– Liu, Y.-K., Christandl, M. & Verstraete, F. Quantum computational complexity of the n-representability problem: Qma complete. Phys. Rev.
Lett. 98, 110503 (2007).
• Solving 2-local Hamiltonian is also QMA-hard
– The Complexity of the Local Hamiltonian Problem
– SIAM J. Comput., 35(5), 1070–1097. http://epubs.siam.org/doi/abs/10.1137/S0097539704445226
• finding the ground-state energy of the Hubbard model
in an external magnetic field is still QMA-hard
– http://www.nature.com/nphys/journal/v5/n10/abs/nphys
1370.html
• Good review:Computational Complexity in Electronic
Structure
– http://arxiv.org/abs/1208.3334
Fundamental question to solving SCE…
• What I have learned
– No algorithm to solve general 2-particle Hamiltonian
efficiently.
– No algorithm to solve electronic Hamiltonian efficiently
(maybe)
– Introduction of other conditions on 2-particle interaction
are mandatory.
Heuristics is much more important than
thinking about subtle shortcut.
Current status of computational
chemistry
• Relatively good agreements with experiments.
• Can explain chemical phenomena
– Many good quantum chemistry programs are
available!
– “DFT B3LYP 6-31G*” calculation is the golden
standard!
• We want to lead chemistry
– We usually explain what happened.
– We rarely predict something very exciting!
Difference between experiment and
calculation/theory
• Finding interesting phenomena or problem
– How we convert from CO2 to O2? N2+H2 to NH3?
– How to synthesize a compound from known ingredients?
• Design a key chemical reaction.
• Calculations
or
• Experiments
• Analysis of results
• Propose new experiments
Only One Difference
Difference between experiment and
calculation/theory
• No difference as science
• Most important thing is chemical intuition!
• Can we implement chemical intuition on
computers?
– Yes, but apparently long way to go.
– Basic strategy is : collect data and fed to computer
and process.
Can we implement chemical intuition
on computers?
• Collect facts by computer calculations.
– Many good implementations are available.
– Huge computer resources are required but
– They are still growing exponentially
• Fed them to computers.
Can we implement chemical intuition
on computers?
• Fed them to computers.
• Machine Learning (ML)
– Very successful on
Image /sound recognition,
natural language processing.
Organic chemistry is somewhat similar to language…
Cadeddu, A., Wylie, E. K., Jurczak, J., Wampler-Doty, M. and Grzybowski, B. A. (2014), Organic Chemistry as a Language and the
Implications of Chemical Linguistics for Structural and Retrosynthetic Analyses. Angew. Chem. Int. Ed., 53: 8108–8112.
doi:10.1002/anie.201403708
Recently, some research papers by using ML have been published
Big Data meets Quantum Chemistry Approximations: The Δ-Machine Learning
Approach Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von
Lilienfeld http://arxiv.org/abs/1503.04987 etc..
Better results by ML, we require huge dataset
Can we implement chemical intuition
on computers?
The first step might be:
• Build a huge dataset by quantum chemistry
program packages!
– Results should agree with experiments.
– Improvements on dataset is task of QC researchers.
• Faster calculations for larger systems
• Better or sufficient treatment for electron correlations
• And build a search engine database using the
result.
Googling molecule
Gives you what you need
+
What are needed for Googling molecule?
1. Types, kinds, variety of molecules
– # of molecules are infinity; but cover important ones
2. Required properties of molecules
– Molecular structure, energy, UV excitation energy, dipole
moment
3. Getting properties of molecules by calculation?
– Accuracy of calculation, and computer resources…
4. Coding or Encoding molecule
– IUPAC nomenclature is not suitable
– Do not think about graph theory
5. Fast calculation (with deep learning(?))
10^8 molecules/sec, as chemical space is huge.
Databases for lists of molecules
• PubChem: 50,000,000 molecules listed, made by NIH,
public domain, no curating (imported from catalogs,
etc), can obtain via ftp.
• ChemSpider : 28,000,000 entries, better curating, no
ftp. Restricted for redistribution, download
• Web-GDB13 : 900,000,000 entries, just generated by
combinatorics. No
• Zinc, CheMBL, DrugBank …
• CAS : 70,000,000 molecules, proprietary
• Nikkaji: 6,000,000, proprietary
We use for source of molecules
The PubChem
Ex. A molecule listed in PubChem
Database for molecular properties by
experiments
• We must do some experiments for obtaining
molecular properties.
– No free comprehensive database is known so far.
– Pharmaceutical companies do O(1,000,000)
experiments for high throughput screening.
• Experiments cost huge!
– Time consuming, large facilities, costs, hazardous
We do not do experiments!
Database for molecular properties by computer
calculation
• Golden Standard method “Density functional
theory (B3LYP functional) + 6-31g(d) basis set”
– Accuracy is quite satisfactory (1-10kcal/mol) for
biological systems, organic chemistry.
– Good implementations are available.
– Costs less (fast, just super computer, no hazardous)
– Time for calculations becomes less
• Intel Core i7 (esp. SandyBridge) is very fast.
• Still we need huge resources, though.
We calculate by computer instead!
What is a molecule?
3D coordinates
Hard to understand
but regours
Easy to understand
But many coner cases
Propionaldehyde
No rigorous definition for a molecule
wavefunction
Common name
IUPAC
nomencleature
Structure
Wikipediaより
What is a molecule?
• No rigorous definition for “what is a molecule”
• nomenclature
– 3D coordinates for nucleus
– Structural formula
– IUPAC nomenclature
– Higher abstraction or less abstraction?
• Better molecular encoding method?
– Easy to understand for human
– Easy to understand for computer as well
– Can describe most cases, and less corner cases.
– Compromise between dream and reality
Encoding molecule : SMILES
Encoding molecule
SMILES is a good encoding method for molecules
IUPAC nomenclature
tert-butyl N-[(2S,3S,5S)-5-[[4-[(1-benzyltetrazol-5-yl)
methoxy]phenyl]methyl]-3-hydroxy-6-[[(1S,2R)-
2-hydroxy-2,3-dihydro-1H-inden-1-yl]amino]-
6-oxo-1-phenylhexan-2-yl]carbamate
We can encode molecule
• SMILES
CN(C)CCOC12CCC(C3C1CCCC3)C4=CC=CC=C24
• InChI Made by IUPAC
InChI=1S/C20H29NO/c1-21(2)13-14-22-20-12-11
-15(16-7-3-5-9-18(16)20)17-8-4-6-10-19(17)20/
h3,5,7,9,15,17,19H,4,6,8,10-14H2,1-2H3
…
What is SMILES?
• Simplified Molecular Input Line Entry System
– A linear representation of molecule using ASCII.
– Conformation is also encoded
– Human readable, and also machine readable.
– Almost one-to-one mapping between a molecule and
SMILES via universal SMILES
• David Weininger at USEPA Mid-Continent Ecology Division Laboratory invented SMILES
• InChI by IUPAC
– International Chemical Identifier : open standard (non proprietary)
– NM O’Boyle invented “Universal SMILES” via InChI
Example by SMILES
http://en.wikipedia.org/wiki/SMILES
分子 構造 SMILES
Nitrogen molecule N≡N N#N
copper sulfate Cu2+ SO42- [Cu+2].[O-]S(=O)(=O)[O-]
oenanthotoxin CCC[C@@H](O)CCC=CC=C
C#CC#CC=CCO
Vitamin B1 OCCc1c(C)[n+](=cs1)Cc2cnc(C
)nc(N)2
Aflatoxin B1 O1C=C[C@H]([C@H]1O2)c3c
2cc(OC)c4c3OC(=O)C5=C4CC
C(=O)5
Some corner cases
Two different SMILES for Ferrocene
• C12C3C4C5C1[Fe]23451234C5C1C2C3C45
• [CH-]1C=CC=C1.[CH-]1C=CC=C1.[Fe+2]
Now its my turn
Construction of ab initio chemical
database
• Molecular information is from PubChem
• Properties are calculated from the first principle using
computer
– Many program packages are available
– DFT (B3LYP)
– 6-31G(d) basis set and geometry optimization
– Excited states calculation by TD-DFT 6-31G+(d)
– Best for organic molecules or bio molecules
• Molecular encoding : SMILES / InChI
• Huge computer resources
• Dream come true
– Google like search engine for chemistry
The PubChemQC Project
• http://pubchemqc.riken.jp/
• AIP Conf. Proc. 1702, 090058 (2015);
http://dx.doi.org/10.1063/1.4938866
• A public domain database for molecules
• Ab initio (The first principle) calculation of molecular
properties of PubChem
• 2014/1/15: 13,000 molecules
• 2014/7/29 : 155,792 molecules
• 2014/10/30 : 906,798 molecules
• 2014/12/3 : 1,137,286 molecules
• 2015/3/25 : 1,673,532 molecules
• 2015/5/27: 2,122,146 molecules
• 2016/2/10: 3,046,948 (2,660,218 with excited states)
The PubChemQC project
http://pubchemqc.riken.jp/
WIP: no search engine, just data
PubChemQC
http://pubchemqc.riken.jp/
PubChemQC
http://pubchemqc.riken.jp/
Related works
• Related works
– Raghunathan Ramakrishnan, Pavlo Dral, Matthias Rupp, O.
Anatole von Lilienfeld: Quantum Chemistry Structures and
Properties of 134 kilo Molecules, Scientific Data, 1: 140022,
Nature Publishing Group, 2014.
– NIST Web Book
• http://webbook.nist.gov/chemistry/
• Small numbers of molecules. Comparing many methods
– Harvard Clean Energy Project
• http://cleanenergy.molecularspace.org/
• 25,000,000 (?), molecules for photo devices made by combinatrics
– Sugimoto et al :2013CBI symposium poster
• Almost same as our database, currently not open to the
public(now??)
Our contribution: 20 times larger
How we do?
• Generate initial 3D conformation by OpenBABEL
– SDF contains 3D conformation but we don’t use.
– OpenBABEL –h (add hydrogen) --gen3d (generation of 3d
coordinate)
• Ab initio calculation by GAMESS+firefly
– Using Gaussian can lead to a political problem(?)
– PM3 optimization
– Hartree-Fock/STO-6G geometry optimization
– Firefly+GAMESS geometry optimization in B3LYP/6-31G*
– Ten excitation energies by TDDFT/6-31G+* (no geom
optimization)
How we do?
• Heavily using OpenBABEL
• Extraction Molecular information
– Sort by molecular weight of PubChem compouds
– OpenBABEL
• Encoded by SMILES
– Isomeric smiles: 3D conformation retained
– OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@
@H](O)1
– CCC[C@@H](O)CCC=CC=CC#CC#CC=CCO
– CC(=O)OCCC(/C)=CC[C@H](C(C)=C)CCC=C
How to convert pubchem Compound
to quantum chemistry calculation
aflatoxin
O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5
Ab initio calculation by
OpenBABEL
Final results will be
• Uploaded to http://pubchemqc.riken.jp/
• Currently we upload
– input file (ground / excited state)
– Output file (ground / excited state)
– Final geometry in Mol file
Scaling of computation
• Embarrassingly parallel for each molecule
• Very roughly speaking, required time for
calculation scales like N^4
– N : molecular weight
• Problems are very hard (complexity theory)
– Hartree-Fock calculation
– DFT (b3lyp) calculation
– geometry optimization
• Practically many molecules can be solved
efficiently
Computer Resources
• RICC : Intel Xeon 5570 Westmere, 2.93GHz 8
cores/node) x 1000
– 1000-10000 molecules/day (MW 160)
– Heavily depend on conditions of other users
– Time limit: 8 hours
• Quest : Intel Core2 duo (1.6GHz/node) x 700
– 3000-8000 molecules / day (MW 160)
– 100-1000 molecules / day (MW 200-300)
– Time limit: 20 hours
• Some compounds fail to calculate are ignored for
this time.
Computer Resources
• Storage
– Approx. 500GB for 1,000,000 molecules (xz
compressed)
– Approx. 20 TB for 40,000,000 molecules (xz
compressed)
Molecular weight and Lipinski Rule
• Lipinski’s five rule (Pfizer's rule of five): rule of
thumb for drug discovery
• No more than 5 hydrogen bond donors
• Not more than 10 hydrogen bond acceptors
• A molecular mass less than 500 daltons
• An octanol-water partition coefficient log P not greater than 5
• Molecular weight should be smaller than 500 is
very good for computational chemistry
– For routine calculations without experimental data
other than molecular formula
– If larger than 500, secondary or higher structure
becomes important. E.g., protein
Molecular Weight distribution at
PubChem
We are still here
Lipinski limit MW=500
30,000,000 molecules
(excluding mixtures)
How long it will take to finish?
• For drug design, we need to calculate all
molecules of MW < 500
• Total 30,000,000 molecules
– This number may increase in the future
• Current (2014/12/4) 1,100,000 molecules
– Only 3%
• 10,000 molecules/day -> 8.2years
How long it will take to finish?
• 10+ years? No, maybe far less.
• 25 years ago (1990) computers are so slow
– Even ab initio calculations are very difficult on
486DX@25MHz or
68000@10MHz
Outlook, prospect, hope…
• Far better in silico screening
– Less or no experiment is necessary
• Even more faster calculation using machine learning
– 10,000 molecules / second ?
– Requires huge data set to learn.
– bio or organic molecules are easy to calculate.
– Already available: Raghunathan Ramakrishnan
https://scholar.google.co.jp/citations?user=jSCGozoA
AAAJ&hl=ja&oi=sra
• Database for chemical reaction
– Precise calculation is required
– GRRM method + machine learning (?)
• Geometry optimization for Protein (PDB)
– Only X ray crystal structures are available
http://pubchemqc.riken.jp/
Difficulties in this project
• Parameters needed for calculations varies by
molecules
• Properties can be different by initial guess
• Computer Resources
– Raspberry Pi? NVIDIA Jetson? Bonic?
• Molecular encoding never ends
– SMILES or InChI is not complete
– Some corner cases may be chemically interesting.

Weitere ähnliche Inhalte

Was ist angesagt?

Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit Ganesan Narayanasamy
 
Computational Performance of Phase Field Calculations using a Matrix-Free (Su...
Computational Performance of Phase Field Calculations using a Matrix-Free (Su...Computational Performance of Phase Field Calculations using a Matrix-Free (Su...
Computational Performance of Phase Field Calculations using a Matrix-Free (Su...Stephen DeWitt
 
Quantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of SolidsQuantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of SolidsKAMAL CHOUDHARY
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Anubhav Jain
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...KAMAL CHOUDHARY
 
Database of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageDatabase of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageKAMAL CHOUDHARY
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Scienceaimsnist
 
Combinatorial Experimentation and Machine Learning for Materials Discovery
Combinatorial Experimentation and Machine Learning for Materials DiscoveryCombinatorial Experimentation and Machine Learning for Materials Discovery
Combinatorial Experimentation and Machine Learning for Materials Discoveryaimsnist
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkYu Liu
 
Predicting local atomic structures from X-ray absorption spectroscopy using t...
Predicting local atomic structures from X-ray absorption spectroscopy using t...Predicting local atomic structures from X-ray absorption spectroscopy using t...
Predicting local atomic structures from X-ray absorption spectroscopy using t...aimsnist
 
Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...
Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...
Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...Daniel Wheeler
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsAnubhav Jain
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer ChemistryPreferred Networks
 
2020 ICLR PC-DARTS and Atom NAS
2020 ICLR PC-DARTS and Atom NAS2020 ICLR PC-DARTS and Atom NAS
2020 ICLR PC-DARTS and Atom NASEunseop Shin
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUAlex Camargo
 
DOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor LayerDOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor Layerjeep82cj
 
Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASHGanesan Narayanasamy
 
News from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charmNews from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charmjuanrojochacon
 

Was ist angesagt? (20)

Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
Computational Performance of Phase Field Calculations using a Matrix-Free (Su...
Computational Performance of Phase Field Calculations using a Matrix-Free (Su...Computational Performance of Phase Field Calculations using a Matrix-Free (Su...
Computational Performance of Phase Field Calculations using a Matrix-Free (Su...
 
Quantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of SolidsQuantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of Solids
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
 
Database of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit SpillageDatabase of Topological Materials and Spin-orbit Spillage
Database of Topological Materials and Spin-orbit Spillage
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
 
Combinatorial Experimentation and Machine Learning for Materials Discovery
Combinatorial Experimentation and Machine Learning for Materials DiscoveryCombinatorial Experimentation and Machine Learning for Materials Discovery
Combinatorial Experimentation and Machine Learning for Materials Discovery
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Predicting local atomic structures from X-ray absorption spectroscopy using t...
Predicting local atomic structures from X-ray absorption spectroscopy using t...Predicting local atomic structures from X-ray absorption spectroscopy using t...
Predicting local atomic structures from X-ray absorption spectroscopy using t...
 
Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...
Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...
Using an Explicit Nucleation Model in PRISIMS-PF to Predict Precipate Microst...
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
2020 ICLR PC-DARTS and Atom NAS
2020 ICLR PC-DARTS and Atom NAS2020 ICLR PC-DARTS and Atom NAS
2020 ICLR PC-DARTS and Atom NAS
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
 
DOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor LayerDOE Efficiency Enhancing Solar Downconverting Phosphor Layer
DOE Efficiency Enhancing Solar Downconverting Phosphor Layer
 
Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASH
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
News from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charmNews from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charm
 

Andere mochten auch

為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理Maho Nakata
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分Maho Nakata
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)Maho Nakata
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードするMaho Nakata
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回Maho Nakata
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回Maho Nakata
 
Fx自動売買システムの構築
Fx自動売買システムの構築Fx自動売買システムの構築
Fx自動売買システムの構築Zhiqiang Bian
 
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)MITSUNARI Shigeo
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用Maho Nakata
 
How effective is the combination of your main product and ur axcillery texts
How effective is the combination of your main product and ur axcillery textsHow effective is the combination of your main product and ur axcillery texts
How effective is the combination of your main product and ur axcillery textsKishan Ruda
 
Hacking Your Head - Managing Information Overload (45 mix)
Hacking Your Head  - Managing Information Overload (45 mix)Hacking Your Head  - Managing Information Overload (45 mix)
Hacking Your Head - Managing Information Overload (45 mix)Jo Hanna Pearce
 
Social Networking: The nuts and bolts of Facebook, Twitter and Google+
Social Networking:  The nuts and bolts of Facebook, Twitter and Google+Social Networking:  The nuts and bolts of Facebook, Twitter and Google+
Social Networking: The nuts and bolts of Facebook, Twitter and Google+Mary Thengvall
 
Презентация нового Фонда АРТ Активный
Презентация нового Фонда АРТ АктивныйПрезентация нового Фонда АРТ Активный
Презентация нового Фонда АРТ Активныйkya.artcapital
 
Universidad nacional de chimborazo drive trabajo colavorativo
Universidad nacional de chimborazo drive trabajo colavorativoUniversidad nacional de chimborazo drive trabajo colavorativo
Universidad nacional de chimborazo drive trabajo colavorativoUNACH
 
OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...
OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...
OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...Triumph Consultancy Services
 
目的別の検索
目的別の検索目的別の検索
目的別の検索stucon
 

Andere mochten auch (20)

為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回
 
Fx自動売買システムの構築
Fx自動売買システムの構築Fx自動売買システムの構築
Fx自動売買システムの構築
 
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
Xeon PhiとN体計算コーディング x86/x64最適化勉強会6(@k_nitadoriさんの代理アップ)
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
 
How effective is the combination of your main product and ur axcillery texts
How effective is the combination of your main product and ur axcillery textsHow effective is the combination of your main product and ur axcillery texts
How effective is the combination of your main product and ur axcillery texts
 
Hacking Your Head - Managing Information Overload (45 mix)
Hacking Your Head  - Managing Information Overload (45 mix)Hacking Your Head  - Managing Information Overload (45 mix)
Hacking Your Head - Managing Information Overload (45 mix)
 
Social Networking: The nuts and bolts of Facebook, Twitter and Google+
Social Networking:  The nuts and bolts of Facebook, Twitter and Google+Social Networking:  The nuts and bolts of Facebook, Twitter and Google+
Social Networking: The nuts and bolts of Facebook, Twitter and Google+
 
Презентация нового Фонда АРТ Активный
Презентация нового Фонда АРТ АктивныйПрезентация нового Фонда АРТ Активный
Презентация нового Фонда АРТ Активный
 
На войне как на войне
На войне как на войнеНа войне как на войне
На войне как на войне
 
Activity 3 Image Deck
Activity 3 Image DeckActivity 3 Image Deck
Activity 3 Image Deck
 
Actividad6 algoritmos
Actividad6 algoritmosActividad6 algoritmos
Actividad6 algoritmos
 
Info sacu
Info sacuInfo sacu
Info sacu
 
Universidad nacional de chimborazo drive trabajo colavorativo
Universidad nacional de chimborazo drive trabajo colavorativoUniversidad nacional de chimborazo drive trabajo colavorativo
Universidad nacional de chimborazo drive trabajo colavorativo
 
OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...
OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...
OHSUG 2012 presentation - SmartHelp - Driving compliance with integrated qual...
 
目的別の検索
目的別の検索目的別の検索
目的別の検索
 

Ähnlich wie Kobeworkshop pubchemqc project

Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptx03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptxNeeraj Singh
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Anubhav Jain
 
From Classroom to Collaboration: Crossing Computational and Classic Chemistry
From Classroom to Collaboration: Crossing Computational and Classic ChemistryFrom Classroom to Collaboration: Crossing Computational and Classic Chemistry
From Classroom to Collaboration: Crossing Computational and Classic Chemistrykarl.barnes
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit MinSung Kim
 
Morgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distMorgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distddm314
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Christoph Steinbeck
 
RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014Richard West
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
Predicting Molecular Properties
Predicting Molecular PropertiesPredicting Molecular Properties
Predicting Molecular PropertiesYassin Youssfi
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
 
phd seminar tazeem (1).pptx
phd seminar tazeem (1).pptxphd seminar tazeem (1).pptx
phd seminar tazeem (1).pptxtazeemfatima7
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Publiclab13unisa
 
Alternative Computing
Alternative ComputingAlternative Computing
Alternative ComputingShayshab Azad
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 

Ähnlich wie Kobeworkshop pubchemqc project (20)

Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf
 
Is 20TB really Big Data?
Is 20TB really Big Data?Is 20TB really Big Data?
Is 20TB really Big Data?
 
03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptx03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptx
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
From Classroom to Collaboration: Crossing Computational and Classic Chemistry
From Classroom to Collaboration: Crossing Computational and Classic ChemistryFrom Classroom to Collaboration: Crossing Computational and Classic Chemistry
From Classroom to Collaboration: Crossing Computational and Classic Chemistry
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit
 
Morgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distMorgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 dist
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
 
RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014RMG at the Flame Chemistry Workshop 2014
RMG at the Flame Chemistry Workshop 2014
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Digitizing documents to provide a public spectroscopy database
Digitizing documents to provide a public spectroscopy databaseDigitizing documents to provide a public spectroscopy database
Digitizing documents to provide a public spectroscopy database
 
Predicting Molecular Properties
Predicting Molecular PropertiesPredicting Molecular Properties
Predicting Molecular Properties
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
phd seminar tazeem (1).pptx
phd seminar tazeem (1).pptxphd seminar tazeem (1).pptx
phd seminar tazeem (1).pptx
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Public
 
Alternative Computing
Alternative ComputingAlternative Computing
Alternative Computing
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Molecular Modeling.ppt.ppt
Molecular Modeling.ppt.pptMolecular Modeling.ppt.ppt
Molecular Modeling.ppt.ppt
 
ECP Application Development
ECP Application DevelopmentECP Application Development
ECP Application Development
 

Mehr von Maho Nakata

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)Maho Nakata
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてMaho Nakata
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編Maho Nakata
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてMaho Nakata
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望Maho Nakata
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewMaho Nakata
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IMaho Nakata
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かもMaho Nakata
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 PubchemqcプロジェクトMaho Nakata
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントするMaho Nakata
 
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1 立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1 Maho Nakata
 
The PubchemQC project
The PubchemQC projectThe PubchemQC project
The PubchemQC projectMaho Nakata
 
Direct variational calculation of second-order reduced density matrix : appli...
Direct variational calculation of second-order reduced density matrix : appli...Direct variational calculation of second-order reduced density matrix : appli...
Direct variational calculation of second-order reduced density matrix : appli...Maho Nakata
 
高精度線形代数演算ライブラリMPACK 0.8.0の紹介
高精度線形代数演算ライブラリMPACK 0.8.0の紹介高精度線形代数演算ライブラリMPACK 0.8.0の紹介
高精度線形代数演算ライブラリMPACK 0.8.0の紹介Maho Nakata
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKMaho Nakata
 
Recent progresses in the variational reduced-density-matrix method
Recent progresses in the variational reduced-density-matrix methodRecent progresses in the variational reduced-density-matrix method
Recent progresses in the variational reduced-density-matrix methodMaho Nakata
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiMaho Nakata
 
A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...Maho Nakata
 

Mehr von Maho Nakata (18)

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解について
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定について
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a review
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part I
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする
 
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1 立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
 
The PubchemQC project
The PubchemQC projectThe PubchemQC project
The PubchemQC project
 
Direct variational calculation of second-order reduced density matrix : appli...
Direct variational calculation of second-order reduced density matrix : appli...Direct variational calculation of second-order reduced density matrix : appli...
Direct variational calculation of second-order reduced density matrix : appli...
 
高精度線形代数演算ライブラリMPACK 0.8.0の紹介
高精度線形代数演算ライブラリMPACK 0.8.0の紹介高精度線形代数演算ライブラリMPACK 0.8.0の紹介
高精度線形代数演算ライブラリMPACK 0.8.0の紹介
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK
 
Recent progresses in the variational reduced-density-matrix method
Recent progresses in the variational reduced-density-matrix methodRecent progresses in the variational reduced-density-matrix method
Recent progresses in the variational reduced-density-matrix method
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
 
A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...
 

Kürzlich hochgeladen

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 

Kürzlich hochgeladen (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

Kobeworkshop pubchemqc project

  • 1. The PubChemQC Project A big data construction by first- principles calculations of molecules 中田真秀 (NAKATA Maho) ACCC RIKEN 2016/2/17 15:50-16:40 Kobe workshop for material design on strongly correlated electrons in molecules and materials http://www.aics.riken.jp/labs/cms/workshop/201602/index.html
  • 2. Background • Atoms and molecules are all composed of matter. • A dream of theoretical chemist: do chemistry without experiment! • On computers  • Chemical space is really huge! – The number of candidates for drugs 1060http://onlinelibrary.wiley.com/doi/10.1002/wcms.1104/a bstract) • Cf. Exa: 1018 – Combinatorics problem – Adding chemical reaction 10120
  • 3. Why 2-RDM theory has been suspended? • Is there short cut for solving Schrodinger Eq? – Density functional theory, reduced density matrix theory • Using 2-particle reduced density matrices, we can reduce the number of variables drastically. – Journal of Chemical Physics, 114, 8282-8292 (2001). Introduction of semidifinite programming – Computational and Theoretical Chemistry Volume 1003, 1 January 2013, Pages 22-7 Application to 2D Hubbard model – Journal of chemical physics 128, 16 164113 (2008). Variouls molecules • However it is not size-consistent, nor size-extensive. – Phys. Chem. Chem. Phys., 2009,11, 5558-5560 – AIP Advances 2, 032125 (2012) – Physical Review A 80, 042109 (2009)
  • 4. Fundamental question to solving SCE… • Does this problem can be solved efficiently? – Very likely NO! – Example. spinglass Hamiltonian is very hard to solve: this is as hard as solving Traveling Salesperson Problem – Algorithms without assumption on 2-particle interaction are never efficient.
  • 5. Fundamental question to solving SCE… Results from computational complexity theory • N-representability problem is QMA-hard – Liu, Y.-K., Christandl, M. & Verstraete, F. Quantum computational complexity of the n-representability problem: Qma complete. Phys. Rev. Lett. 98, 110503 (2007). • Solving 2-local Hamiltonian is also QMA-hard – The Complexity of the Local Hamiltonian Problem – SIAM J. Comput., 35(5), 1070–1097. http://epubs.siam.org/doi/abs/10.1137/S0097539704445226 • finding the ground-state energy of the Hubbard model in an external magnetic field is still QMA-hard – http://www.nature.com/nphys/journal/v5/n10/abs/nphys 1370.html • Good review:Computational Complexity in Electronic Structure – http://arxiv.org/abs/1208.3334
  • 6. Fundamental question to solving SCE… • What I have learned – No algorithm to solve general 2-particle Hamiltonian efficiently. – No algorithm to solve electronic Hamiltonian efficiently (maybe) – Introduction of other conditions on 2-particle interaction are mandatory. Heuristics is much more important than thinking about subtle shortcut.
  • 7. Current status of computational chemistry • Relatively good agreements with experiments. • Can explain chemical phenomena – Many good quantum chemistry programs are available! – “DFT B3LYP 6-31G*” calculation is the golden standard! • We want to lead chemistry – We usually explain what happened. – We rarely predict something very exciting!
  • 8. Difference between experiment and calculation/theory • Finding interesting phenomena or problem – How we convert from CO2 to O2? N2+H2 to NH3? – How to synthesize a compound from known ingredients? • Design a key chemical reaction. • Calculations or • Experiments • Analysis of results • Propose new experiments Only One Difference
  • 9. Difference between experiment and calculation/theory • No difference as science • Most important thing is chemical intuition! • Can we implement chemical intuition on computers? – Yes, but apparently long way to go. – Basic strategy is : collect data and fed to computer and process.
  • 10. Can we implement chemical intuition on computers? • Collect facts by computer calculations. – Many good implementations are available. – Huge computer resources are required but – They are still growing exponentially • Fed them to computers.
  • 11. Can we implement chemical intuition on computers? • Fed them to computers. • Machine Learning (ML) – Very successful on Image /sound recognition, natural language processing. Organic chemistry is somewhat similar to language… Cadeddu, A., Wylie, E. K., Jurczak, J., Wampler-Doty, M. and Grzybowski, B. A. (2014), Organic Chemistry as a Language and the Implications of Chemical Linguistics for Structural and Retrosynthetic Analyses. Angew. Chem. Int. Ed., 53: 8108–8112. doi:10.1002/anie.201403708 Recently, some research papers by using ML have been published Big Data meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld http://arxiv.org/abs/1503.04987 etc.. Better results by ML, we require huge dataset
  • 12. Can we implement chemical intuition on computers? The first step might be: • Build a huge dataset by quantum chemistry program packages! – Results should agree with experiments. – Improvements on dataset is task of QC researchers. • Faster calculations for larger systems • Better or sufficient treatment for electron correlations • And build a search engine database using the result.
  • 13. Googling molecule Gives you what you need +
  • 14. What are needed for Googling molecule? 1. Types, kinds, variety of molecules – # of molecules are infinity; but cover important ones 2. Required properties of molecules – Molecular structure, energy, UV excitation energy, dipole moment 3. Getting properties of molecules by calculation? – Accuracy of calculation, and computer resources… 4. Coding or Encoding molecule – IUPAC nomenclature is not suitable – Do not think about graph theory 5. Fast calculation (with deep learning(?)) 10^8 molecules/sec, as chemical space is huge.
  • 15. Databases for lists of molecules • PubChem: 50,000,000 molecules listed, made by NIH, public domain, no curating (imported from catalogs, etc), can obtain via ftp. • ChemSpider : 28,000,000 entries, better curating, no ftp. Restricted for redistribution, download • Web-GDB13 : 900,000,000 entries, just generated by combinatorics. No • Zinc, CheMBL, DrugBank … • CAS : 70,000,000 molecules, proprietary • Nikkaji: 6,000,000, proprietary We use for source of molecules
  • 17. Ex. A molecule listed in PubChem
  • 18. Database for molecular properties by experiments • We must do some experiments for obtaining molecular properties. – No free comprehensive database is known so far. – Pharmaceutical companies do O(1,000,000) experiments for high throughput screening. • Experiments cost huge! – Time consuming, large facilities, costs, hazardous We do not do experiments!
  • 19. Database for molecular properties by computer calculation • Golden Standard method “Density functional theory (B3LYP functional) + 6-31g(d) basis set” – Accuracy is quite satisfactory (1-10kcal/mol) for biological systems, organic chemistry. – Good implementations are available. – Costs less (fast, just super computer, no hazardous) – Time for calculations becomes less • Intel Core i7 (esp. SandyBridge) is very fast. • Still we need huge resources, though. We calculate by computer instead!
  • 20. What is a molecule? 3D coordinates Hard to understand but regours Easy to understand But many coner cases Propionaldehyde No rigorous definition for a molecule wavefunction Common name IUPAC nomencleature Structure Wikipediaより
  • 21. What is a molecule? • No rigorous definition for “what is a molecule” • nomenclature – 3D coordinates for nucleus – Structural formula – IUPAC nomenclature – Higher abstraction or less abstraction? • Better molecular encoding method? – Easy to understand for human – Easy to understand for computer as well – Can describe most cases, and less corner cases. – Compromise between dream and reality
  • 22. Encoding molecule : SMILES Encoding molecule SMILES is a good encoding method for molecules IUPAC nomenclature tert-butyl N-[(2S,3S,5S)-5-[[4-[(1-benzyltetrazol-5-yl) methoxy]phenyl]methyl]-3-hydroxy-6-[[(1S,2R)- 2-hydroxy-2,3-dihydro-1H-inden-1-yl]amino]- 6-oxo-1-phenylhexan-2-yl]carbamate We can encode molecule • SMILES CN(C)CCOC12CCC(C3C1CCCC3)C4=CC=CC=C24 • InChI Made by IUPAC InChI=1S/C20H29NO/c1-21(2)13-14-22-20-12-11 -15(16-7-3-5-9-18(16)20)17-8-4-6-10-19(17)20/ h3,5,7,9,15,17,19H,4,6,8,10-14H2,1-2H3 …
  • 23. What is SMILES? • Simplified Molecular Input Line Entry System – A linear representation of molecule using ASCII. – Conformation is also encoded – Human readable, and also machine readable. – Almost one-to-one mapping between a molecule and SMILES via universal SMILES • David Weininger at USEPA Mid-Continent Ecology Division Laboratory invented SMILES • InChI by IUPAC – International Chemical Identifier : open standard (non proprietary) – NM O’Boyle invented “Universal SMILES” via InChI
  • 24. Example by SMILES http://en.wikipedia.org/wiki/SMILES 分子 構造 SMILES Nitrogen molecule N≡N N#N copper sulfate Cu2+ SO42- [Cu+2].[O-]S(=O)(=O)[O-] oenanthotoxin CCC[C@@H](O)CCC=CC=C C#CC#CC=CCO Vitamin B1 OCCc1c(C)[n+](=cs1)Cc2cnc(C )nc(N)2 Aflatoxin B1 O1C=C[C@H]([C@H]1O2)c3c 2cc(OC)c4c3OC(=O)C5=C4CC C(=O)5
  • 25. Some corner cases Two different SMILES for Ferrocene • C12C3C4C5C1[Fe]23451234C5C1C2C3C45 • [CH-]1C=CC=C1.[CH-]1C=CC=C1.[Fe+2]
  • 26. Now its my turn
  • 27. Construction of ab initio chemical database • Molecular information is from PubChem • Properties are calculated from the first principle using computer – Many program packages are available – DFT (B3LYP) – 6-31G(d) basis set and geometry optimization – Excited states calculation by TD-DFT 6-31G+(d) – Best for organic molecules or bio molecules • Molecular encoding : SMILES / InChI • Huge computer resources • Dream come true – Google like search engine for chemistry
  • 28. The PubChemQC Project • http://pubchemqc.riken.jp/ • AIP Conf. Proc. 1702, 090058 (2015); http://dx.doi.org/10.1063/1.4938866 • A public domain database for molecules • Ab initio (The first principle) calculation of molecular properties of PubChem • 2014/1/15: 13,000 molecules • 2014/7/29 : 155,792 molecules • 2014/10/30 : 906,798 molecules • 2014/12/3 : 1,137,286 molecules • 2015/3/25 : 1,673,532 molecules • 2015/5/27: 2,122,146 molecules • 2016/2/10: 3,046,948 (2,660,218 with excited states)
  • 32. Related works • Related works – Raghunathan Ramakrishnan, Pavlo Dral, Matthias Rupp, O. Anatole von Lilienfeld: Quantum Chemistry Structures and Properties of 134 kilo Molecules, Scientific Data, 1: 140022, Nature Publishing Group, 2014. – NIST Web Book • http://webbook.nist.gov/chemistry/ • Small numbers of molecules. Comparing many methods – Harvard Clean Energy Project • http://cleanenergy.molecularspace.org/ • 25,000,000 (?), molecules for photo devices made by combinatrics – Sugimoto et al :2013CBI symposium poster • Almost same as our database, currently not open to the public(now??) Our contribution: 20 times larger
  • 33. How we do? • Generate initial 3D conformation by OpenBABEL – SDF contains 3D conformation but we don’t use. – OpenBABEL –h (add hydrogen) --gen3d (generation of 3d coordinate) • Ab initio calculation by GAMESS+firefly – Using Gaussian can lead to a political problem(?) – PM3 optimization – Hartree-Fock/STO-6G geometry optimization – Firefly+GAMESS geometry optimization in B3LYP/6-31G* – Ten excitation energies by TDDFT/6-31G+* (no geom optimization)
  • 34. How we do? • Heavily using OpenBABEL • Extraction Molecular information – Sort by molecular weight of PubChem compouds – OpenBABEL • Encoded by SMILES – Isomeric smiles: 3D conformation retained – OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@ @H](O)1 – CCC[C@@H](O)CCC=CC=CC#CC#CC=CCO – CC(=O)OCCC(/C)=CC[C@H](C(C)=C)CCC=C
  • 35. How to convert pubchem Compound to quantum chemistry calculation aflatoxin O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5 Ab initio calculation by OpenBABEL
  • 36. Final results will be • Uploaded to http://pubchemqc.riken.jp/ • Currently we upload – input file (ground / excited state) – Output file (ground / excited state) – Final geometry in Mol file
  • 37. Scaling of computation • Embarrassingly parallel for each molecule • Very roughly speaking, required time for calculation scales like N^4 – N : molecular weight • Problems are very hard (complexity theory) – Hartree-Fock calculation – DFT (b3lyp) calculation – geometry optimization • Practically many molecules can be solved efficiently
  • 38. Computer Resources • RICC : Intel Xeon 5570 Westmere, 2.93GHz 8 cores/node) x 1000 – 1000-10000 molecules/day (MW 160) – Heavily depend on conditions of other users – Time limit: 8 hours • Quest : Intel Core2 duo (1.6GHz/node) x 700 – 3000-8000 molecules / day (MW 160) – 100-1000 molecules / day (MW 200-300) – Time limit: 20 hours • Some compounds fail to calculate are ignored for this time.
  • 39. Computer Resources • Storage – Approx. 500GB for 1,000,000 molecules (xz compressed) – Approx. 20 TB for 40,000,000 molecules (xz compressed)
  • 40. Molecular weight and Lipinski Rule • Lipinski’s five rule (Pfizer's rule of five): rule of thumb for drug discovery • No more than 5 hydrogen bond donors • Not more than 10 hydrogen bond acceptors • A molecular mass less than 500 daltons • An octanol-water partition coefficient log P not greater than 5 • Molecular weight should be smaller than 500 is very good for computational chemistry – For routine calculations without experimental data other than molecular formula – If larger than 500, secondary or higher structure becomes important. E.g., protein
  • 41. Molecular Weight distribution at PubChem We are still here Lipinski limit MW=500 30,000,000 molecules (excluding mixtures)
  • 42. How long it will take to finish? • For drug design, we need to calculate all molecules of MW < 500 • Total 30,000,000 molecules – This number may increase in the future • Current (2014/12/4) 1,100,000 molecules – Only 3% • 10,000 molecules/day -> 8.2years
  • 43. How long it will take to finish? • 10+ years? No, maybe far less. • 25 years ago (1990) computers are so slow – Even ab initio calculations are very difficult on 486DX@25MHz or 68000@10MHz
  • 44. Outlook, prospect, hope… • Far better in silico screening – Less or no experiment is necessary • Even more faster calculation using machine learning – 10,000 molecules / second ? – Requires huge data set to learn. – bio or organic molecules are easy to calculate. – Already available: Raghunathan Ramakrishnan https://scholar.google.co.jp/citations?user=jSCGozoA AAAJ&hl=ja&oi=sra • Database for chemical reaction – Precise calculation is required – GRRM method + machine learning (?) • Geometry optimization for Protein (PDB) – Only X ray crystal structures are available http://pubchemqc.riken.jp/
  • 45. Difficulties in this project • Parameters needed for calculations varies by molecules • Properties can be different by initial guess • Computer Resources – Raspberry Pi? NVIDIA Jetson? Bonic? • Molecular encoding never ends – SMILES or InChI is not complete – Some corner cases may be chemically interesting.