SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
RDKit: where did we come from and where are
we going?
Greg Landrum (@dr_greg_landrum)
12th International Conference on Chemical Structures
12 June, 2022
The Trustees of the CSA Trust are pleased to announce that
Greg Landrum has been awarded the 2022 Mike Lynch
Award, in recognition of his work on the development of
RDKit and his fostering of the community around it, a
transformative software resource for cheminformatics and
machine learning. https://csa-trust.org/2022/05/13/mike-lynch-award-2022-greg-landrum/
The purpose of the Award is to recognise and encourage outstanding
accomplishments in education, research and development activities that are
related to the systems and methods used to store, process and retrieve
information about chemical structures, reactions and properties.
The Mike Lynch Award will be presented at a prestigious, relevant conference
to be identified prior to each presentation and the awardee will be asked to
give a presentation at the conference. https://csa-trust.org/awards-and-grants/awards/
3
The RDKit
4
Acknowledgements
● Everyone who has contributed code, questions,
answers, bug reports, etc
● The people who manage RDKit packaging
● The organizers and sponsors of the RDKit
UGMs
● People who have funded RDKit development
(directly or indirectly)
● The others in our community who've been
pushing the idea and adoption of open source
5
An open source toolkit for cheminformatics
● Business-friendly BSD license
● Core data structures and algorithms in
C++
● Python 3.x wrapper generated using
Boost.Python
● Java and C# wrappers generated with
SWIG
● JavaScript wrappers
● CFFI wrapper for usage from other
languages
● 2D and 3D molecular operations
● Descriptor generation for machine
learning
● Molecular database cartridge for
PostgreSQL
● Cheminformatics nodes for KNIME
(distributed from the KNIME
community site:
http://www.knime.org/rdkit)
6
Ecodesystem
Exact same implementation regardless of where you are using it from
7
Releases, reproducibility, and citability
● 2 feature releases per year
● ~monthly patch releases with bug fixes
● Every release is assigned a DOI and archived on Zenodo
https://zenodo.org/record/6483170
8
Packaging
- conda-forge: conda install -c conda-forge rdkit
- pypi: pip install rdkit-pypi
- npm: npm i @rdkit/rdkit
- apt: apt install python3-rdkit postgresql-14-rdkit
9
Sustainability: the bus problem
https://commons.wikimedia.org/wiki/File:Postauto_susten.jpg
10
Sustainability: the bus problem
RDKit maintainers:
- Greg
- Brian Kelley (Relay Therapeutics)
- Ricardo Rodriguez (Schrödinger)
- Paolo Tosco (Novartis)
Regular code contributors:
- David Cosgrove
- Peter Gedeck
- Gareth Jones
- Eisuke Kawashima
- Dan Nealschneider
- Sereina Riniker
- Roger Sayle
- Riccardo Vianello
The RDKit community
How it started…
The RDKit community
How it’s going…
Where we came from, where we’re going
14
The early days
● 2000-2006: initial development work at Rational Discovery
● 2006: code open sourced and released on sourceforge.net
15
Aside: some motivations for open-sourcing scientific code
● Recognition
● Helping the scientific community
● Feedback and help from others
● You get to keep using the code when you move on
to your next position
16
Some history
● 2000-2006: initial development work at Rational Discovery
● 2006: code open sourced and released on sourceforge.net
● 2007: First NIBR contribution (chemical reaction handling); Noel discovers the RDKit
● 2008: first POC of Java wrapper; Mac support added; SLN and Mol2 parsers;
● 2009: Morgan fingerprints; switch to cmake; switch to VF2 for SSS
● 2010: PostgreSQL cartridge; First iteration of the KNIME nodes; $RDBASE/Contrib appears;
SaltRemover and FunctionalGroups code
● 2011: New Java wrappers; more functionality moved to C++; InChI support; AvalonTools
integration
● 2012: First UGM; Speed improvements; MCS implementation; IPython integration; “RDKit
Cookbook” appears
● 2013: Move to github; Pandas integration; MMFF and Open3DAlign support; PDB support;
rdkit blog started
17
Some history, cntd
● 2014: python3 support; conda integration; experimental lucene integration; MCS implementation in
C++
● 2015: new drawing code; improved canonicalization algorithm; ETKDG; reduced memory usage
● 2016: Regular patch releases; easier builds; performance improvements; KNIME nodes move to
Github
● 2017: Modern C++; R-group decomposition, first GSoC participation, conda-forge packages
● 2018: CoordGen integration; molecular standardization
● 2019: Azure DevOps, substructure speedup, new molecule hashing code, Neo4J integration, new JS
wrappers
● 2020: new CIP implementation, scaffold network, abbreviations, tautomer-insensitive substructure
search
● 2021: rdkit-cffi, more drawing improvements, R-group decomposition improvements
● 2022: C++17, generics for searching, non-tetrahedral symmetry…
An aside…
19
Looking forward
20
Longer term RDKit objectives
● Improved support for other classes of molecules
■ Polymers
■ Organometallics
● Ensuring that the PostgreSQL cartridge is a plausible
candidate for use in a corporate “data warehouse”1
● Ensuring all the pieces are in place to make it easy to
write a compound registration system
1
or whatever such things are called these days
21
Future directions: the cartridge
Ensuring that the PostgreSQL cartridge is a plausible candidate
for use in a corporate “data warehouse”
- Integration of tautomer insensitive search
- Integration of the MolStandardize code
- Improvements to the chemical reaction handling
- Integration of the generics for searching
Further ideas
- Adding some 3D search capabilities
22
Future directions: registration systems
First: what is a chemical registration system?
23
Aside: Goals of a compound registration system
We want to be able to answer these questions:
- Have we seen this compound before?
- Give me a key for this compound
- Give me the structure for this key
24
Aside: Goals of a compound registration system
We want to be able to answer these questions:
- Have we seen this compound before?
- Give me a key for this compound
- Give me the structure for this key
So what do we need to be able to do?
- Standardize molecules
- Generate hashes/keys for standardized molecules
- Store structures
25
Using keys for registration
Idea: use a hash to combine:
- The molecular structure (via a fixed H
InChI)
- A stereo code
- A stereo comment
https://github.com/rdkit/UGM_2015/blob/8f562e70add17bab35f43823af0f03673f8a
1f2d/Presentations/KeyToRegistration.GregLandrum.pdf
26
Future directions: registration systems
Ensuring all the pieces are in place to make it easy to write a compound registration system
- Improvements to MolStandardize code
- Improvements to the molecular hashing code
- Support for more other classes of molecules
27
Let’s talk about molecular identity
This isn’t just a topic for standard compound registration systems.
28
Molecular identity and computational questions
● Which molecules were used to generate this
result?
● Have I already done a calculation using this
molecule?
● Was this molecule part of my training set?
All of these require us to be able to answer
the question
“are these two molecules the same?”
Here be dragons…
29
Some things making molecular identity nontrivial
30
Some things making molecular identity nontrivial
● Counterions, solvents
● Resonance forms
● Charges
● Tautomers
● Stereochemistry
Sometimes we care about these differences, sometimes we don’t. It depends on the context
around when asking the question “are these two molecules the same?”
This is not a comprehensive list
31
Identity hashes for molecules
Idea: convert the molecule into some form which allows us to test whether or not it’s
identical to other molecules via a simple string (or numerical) comparison.
What “identical” means will be determined by the identity hash used.
Familiar examples:
- Canonical SMILES
- InChI
32
Contextual identity
Instead of having a single key/hash for a molecule, store a collection of layers with different
levels of detail/types of information. When searching, choose the layers which are relevant
for the current use case
● Store molecules using some relatively lossless format (e.g. v3000 SDF)
● Use molecular hashes capturing different levels of information to establish whether or
not duplicates exist
Note: it’s possible to do a limited version of this via careful manipulation of InChI strings
33
Some more identity hashes
https://www.nextmovesoftware.com/talks/OBoyle_MolHash_ACS_201908.pdf
Available in the RDKit since the 2019.09 release
34
Some of the basic identity hashes in rdMolHash
● Molecular formula
● Anonymous graph
● Element graph
● Murcko scaffold
● Tautomer
● Canonical smiles
There are many others
35
Hashes for registration
The team at Schrödinger1
have contributed a new RDKit module for calculating layered
hashes which are useful for compound identity testing and registration. This will be in the
2022.09 release.
Layers it currently supports:
- Formula
- Canonical SMILES : with and without stereo
- Tautomer hash: with and without stereo
- Sgroup data (for some help with polymers and things like atropisomers)
- “Escape layer” (free text allowing a structure to be different even if everything else says
it’s the same)
1
Chris Von Bargen, Hussein Faara, Dan Nealschneider, Ricardo Rodriguez, Rachel Walker
36
Registration hash example
{<HashLayer.CANONICAL_SMILES: 1>: 'COc1ccc2[nH]c([S@@](=O)Cc3ncc(C)c(OC)c3C)nc2c1',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C17H19N3O3S',
<HashLayer.NO_STEREO_SMILES: 4>: 'COc1ccc2[nH]c(S(=O)Cc3ncc(C)c(OC)c3C)nc2c1',
<HashLayer.NO_STEREO_TAUTOMER_HASH: 5>:
'CO[C]1[CH][CH][C]2[N][C]([S]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0',
<HashLayer.SGROUP_DATA: 6>: '[]',
<HashLayer.TAUTOMER_HASH: 7>:
'CO[C]1[CH][CH][C]2[N][C]([S@@]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0'}
37
Handling tautomers
{<HashLayer.CANONICAL_SMILES: 1>:
'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2c[nH]c3ncc(-c
4ccc(Cl)cc4)cc23)c1F',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S',
…
<HashLayer.TAUTOMER_HASH: 7>:
'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C](
[O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C
](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'}
{<HashLayer.CANONICAL_SMILES: 1>:
'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2cnc3[nH]cc(-c
4ccc(Cl)cc4)cc2-3)c1F',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S',
…
<HashLayer.TAUTOMER_HASH: 7>:
'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C](
[O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C
](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'}
38
Handling atropisomers
Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
39
Handling atropisomers
Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
The bold and hashed bonds are just drawing features and don’t survive translation
to things like CXSMILES or mol files. But we can use S groups to indicate the
stereochemistry
40
Handling atropisomers
Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
{<HashLayer.CANONICAL_SMILES: 1>:
'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O
)n3C',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H21FN6O3',
…
<HashLayer.SGROUP_DATA: 6>: '[{"fieldName":
"atropisomer", "atom": [19, 20], "bonds": [],
"value": "M"}]',
…}
{<HashLayer.CANONICAL_SMILES: 1>:
'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O
)n3C',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H21FN6O3',
…
<HashLayer.SGROUP_DATA: 6>: '[{"fieldName":
"atropisomer", "atom": [19, 20], "bonds": [],
"value": "P"}]',
…}
41
Handling polymers
{<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1',
…,
<HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU",
"atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]],
"index": 1, "connect": "HT", "label": "n"}]',
…}
{<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1',
…,
<HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU",
"atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]],
"index": 1, "connect": "HH", "label": "n"}]',
…}
42
Handling enhanced stereochemistry
Ethambutol
These two describe the same racemic mixture
43
Handling enhanced stereochemistry
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO',
…,
<HashLayer.NO_STEREO_SMILES: 4>:
'CCC(CO)NCCNC(CC)CO',
…}
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO |&1:2,9|',
…,
<HashLayer.NO_STEREO_SMILES: 4>:
'CCC(CO)NCCNC(CC)CO',
…}
We get the same hash if the molecule is drawn with
wedged bonds.
44
Using the escape layer
Suppose I start with the racemic mixture, run it through a chiral column, and
collect the two fractions
I want to register the two fractions separately without determining the absolute
stereochemistry
45
Using the escape layer
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|',
<HashLayer.ESCAPE: 2>: ‘first fraction',
…}
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|',
<HashLayer.ESCAPE: 2>: ‘second fraction',
…}
46
Aside: using the escape layer for comp chem
{…
<HashLayer.ESCAPE: 2>: ‘conformer 1',
…}
{…
<HashLayer.ESCAPE: 2>: ‘conformer 2',
…}
Suppose I want to store multiple conformers/poses of the same molecule
47
Wrapping up: molecular identity
● For many computational tasks we want to be
able to figure out whether or not we have
seen/used a particular molecule
● The definition of “same” for molecules
depends on the context/question being asked
● Layered registration hashes make it easy (and
cheap) to store sets of molecules and answer
the context-dependent “are these the same?”
question
48
Thanks!
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

合成経路探索 -論文まとめ- (PFN中郷孝祐)
合成経路探索 -論文まとめ-  (PFN中郷孝祐)合成経路探索 -論文まとめ-  (PFN中郷孝祐)
合成経路探索 -論文まとめ- (PFN中郷孝祐)Preferred Networks
 
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Japan
 
RSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpRSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpsonickun
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことNVIDIA Japan
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
グラフニューラルネットワーク入門
グラフニューラルネットワーク入門グラフニューラルネットワーク入門
グラフニューラルネットワーク入門ryosuke-kojima
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -MITSUNARI Shigeo
 
乱数と擬似乱数の生成技術
乱数と擬似乱数の生成技術乱数と擬似乱数の生成技術
乱数と擬似乱数の生成技術SeiyaSakata
 
DockerコンテナでGitを使う
DockerコンテナでGitを使うDockerコンテナでGitを使う
DockerコンテナでGitを使うKazuhiro Suga
 
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)Kuniyasu Suzaki
 
CTF for ビギナーズ ネットワーク講習資料
CTF for ビギナーズ ネットワーク講習資料CTF for ビギナーズ ネットワーク講習資料
CTF for ビギナーズ ネットワーク講習資料SECCON Beginners
 
不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)
不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)
不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)Preferred Networks
 
敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)cvpaper. challenge
 
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFAShohei Hido
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 

Was ist angesagt? (20)

合成経路探索 -論文まとめ- (PFN中郷孝祐)
合成経路探索 -論文まとめ-  (PFN中郷孝祐)合成経路探索 -論文まとめ-  (PFN中郷孝祐)
合成経路探索 -論文まとめ- (PFN中郷孝祐)
 
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワークNVIDIA Modulus: Physics ML 開発のためのフレームワーク
NVIDIA Modulus: Physics ML 開発のためのフレームワーク
 
RSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjpRSA暗号運用でやってはいけない n のこと #ssmjp
RSA暗号運用でやってはいけない n のこと #ssmjp
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないこと
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
グラフニューラルネットワーク入門
グラフニューラルネットワーク入門グラフニューラルネットワーク入門
グラフニューラルネットワーク入門
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Ml system in_python
Ml system in_pythonMl system in_python
Ml system in_python
 
暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -暗号文のままで計算しよう - 準同型暗号入門 -
暗号文のままで計算しよう - 準同型暗号入門 -
 
乱数と擬似乱数の生成技術
乱数と擬似乱数の生成技術乱数と擬似乱数の生成技術
乱数と擬似乱数の生成技術
 
DockerコンテナでGitを使う
DockerコンテナでGitを使うDockerコンテナでGitを使う
DockerコンテナでGitを使う
 
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
3種類のTEE比較(Intel SGX, ARM TrustZone, RISC-V Keystone)
 
CTF for ビギナーズ ネットワーク講習資料
CTF for ビギナーズ ネットワーク講習資料CTF for ビギナーズ ネットワーク講習資料
CTF for ビギナーズ ネットワーク講習資料
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
 
暗認本読書会7
暗認本読書会7暗認本読書会7
暗認本読書会7
 
不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)
不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)
不老におけるOptunaを利用した分散ハイパーパラメータ最適化 - 今村秀明(名古屋大学 Optuna講習会)
 
敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)
 
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 

Ähnlich wie RDKit toolkit fosters open source cheminformatics

ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage PreservationEster Giallonardo
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)Alexandre Gouaillard
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareJoel Nothman
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigmJonathan Challener
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewDelft University of Technology
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisMarcus Hanwell
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...sparkfabrik
 
Docs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesDocs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesAnne Gentle
 
Continuous Security for GitOps
Continuous Security for GitOpsContinuous Security for GitOps
Continuous Security for GitOpsWeaveworks
 
OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023Shane Coughlan
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the sameEDB
 
PRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdfPRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdfAvinashDesireddy
 
Kubernetes Security Workshop
Kubernetes Security WorkshopKubernetes Security Workshop
Kubernetes Security WorkshopMirantis
 
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_InsightsJuni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_InsightsTriNimbus
 
ExSchema - ICSM'13
ExSchema - ICSM'13ExSchema - ICSM'13
ExSchema - ICSM'13jccastrejon
 
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...Antonio de la Torre Fernández
 

Ähnlich wie RDKit toolkit fosters open source cheminformatics (20)

ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage Preservation
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source Software
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
 
Docs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesDocs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API Experiences
 
Continuous Security for GitOps
Continuous Security for GitOpsContinuous Security for GitOps
Continuous Security for GitOps
 
OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023
 
Service computation20.ppt
Service computation20.pptService computation20.ppt
Service computation20.ppt
 
BlockchainLAB Hackathon
BlockchainLAB HackathonBlockchainLAB Hackathon
BlockchainLAB Hackathon
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the same
 
PRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdfPRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdf
 
Kubernetes Security Workshop
Kubernetes Security WorkshopKubernetes Security Workshop
Kubernetes Security Workshop
 
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_InsightsJuni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
 
ExSchema - ICSM'13
ExSchema - ICSM'13ExSchema - ICSM'13
ExSchema - ICSM'13
 
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...
20191116 DevFest 2019 The Legacy Code came to stay (El legacy vino para queda...
 

Mehr von Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registrationGreg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 

Mehr von Greg Landrum (18)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 

Kürzlich hochgeladen

Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 

Kürzlich hochgeladen (20)

Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

RDKit toolkit fosters open source cheminformatics

  • 1. RDKit: where did we come from and where are we going? Greg Landrum (@dr_greg_landrum) 12th International Conference on Chemical Structures 12 June, 2022
  • 2. The Trustees of the CSA Trust are pleased to announce that Greg Landrum has been awarded the 2022 Mike Lynch Award, in recognition of his work on the development of RDKit and his fostering of the community around it, a transformative software resource for cheminformatics and machine learning. https://csa-trust.org/2022/05/13/mike-lynch-award-2022-greg-landrum/ The purpose of the Award is to recognise and encourage outstanding accomplishments in education, research and development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and properties. The Mike Lynch Award will be presented at a prestigious, relevant conference to be identified prior to each presentation and the awardee will be asked to give a presentation at the conference. https://csa-trust.org/awards-and-grants/awards/
  • 4. 4 Acknowledgements ● Everyone who has contributed code, questions, answers, bug reports, etc ● The people who manage RDKit packaging ● The organizers and sponsors of the RDKit UGMs ● People who have funded RDKit development (directly or indirectly) ● The others in our community who've been pushing the idea and adoption of open source
  • 5. 5 An open source toolkit for cheminformatics ● Business-friendly BSD license ● Core data structures and algorithms in C++ ● Python 3.x wrapper generated using Boost.Python ● Java and C# wrappers generated with SWIG ● JavaScript wrappers ● CFFI wrapper for usage from other languages ● 2D and 3D molecular operations ● Descriptor generation for machine learning ● Molecular database cartridge for PostgreSQL ● Cheminformatics nodes for KNIME (distributed from the KNIME community site: http://www.knime.org/rdkit)
  • 6. 6 Ecodesystem Exact same implementation regardless of where you are using it from
  • 7. 7 Releases, reproducibility, and citability ● 2 feature releases per year ● ~monthly patch releases with bug fixes ● Every release is assigned a DOI and archived on Zenodo https://zenodo.org/record/6483170
  • 8. 8 Packaging - conda-forge: conda install -c conda-forge rdkit - pypi: pip install rdkit-pypi - npm: npm i @rdkit/rdkit - apt: apt install python3-rdkit postgresql-14-rdkit
  • 9. 9 Sustainability: the bus problem https://commons.wikimedia.org/wiki/File:Postauto_susten.jpg
  • 10. 10 Sustainability: the bus problem RDKit maintainers: - Greg - Brian Kelley (Relay Therapeutics) - Ricardo Rodriguez (Schrödinger) - Paolo Tosco (Novartis) Regular code contributors: - David Cosgrove - Peter Gedeck - Gareth Jones - Eisuke Kawashima - Dan Nealschneider - Sereina Riniker - Roger Sayle - Riccardo Vianello
  • 11. The RDKit community How it started…
  • 12. The RDKit community How it’s going…
  • 13. Where we came from, where we’re going
  • 14. 14 The early days ● 2000-2006: initial development work at Rational Discovery ● 2006: code open sourced and released on sourceforge.net
  • 15. 15 Aside: some motivations for open-sourcing scientific code ● Recognition ● Helping the scientific community ● Feedback and help from others ● You get to keep using the code when you move on to your next position
  • 16. 16 Some history ● 2000-2006: initial development work at Rational Discovery ● 2006: code open sourced and released on sourceforge.net ● 2007: First NIBR contribution (chemical reaction handling); Noel discovers the RDKit ● 2008: first POC of Java wrapper; Mac support added; SLN and Mol2 parsers; ● 2009: Morgan fingerprints; switch to cmake; switch to VF2 for SSS ● 2010: PostgreSQL cartridge; First iteration of the KNIME nodes; $RDBASE/Contrib appears; SaltRemover and FunctionalGroups code ● 2011: New Java wrappers; more functionality moved to C++; InChI support; AvalonTools integration ● 2012: First UGM; Speed improvements; MCS implementation; IPython integration; “RDKit Cookbook” appears ● 2013: Move to github; Pandas integration; MMFF and Open3DAlign support; PDB support; rdkit blog started
  • 17. 17 Some history, cntd ● 2014: python3 support; conda integration; experimental lucene integration; MCS implementation in C++ ● 2015: new drawing code; improved canonicalization algorithm; ETKDG; reduced memory usage ● 2016: Regular patch releases; easier builds; performance improvements; KNIME nodes move to Github ● 2017: Modern C++; R-group decomposition, first GSoC participation, conda-forge packages ● 2018: CoordGen integration; molecular standardization ● 2019: Azure DevOps, substructure speedup, new molecule hashing code, Neo4J integration, new JS wrappers ● 2020: new CIP implementation, scaffold network, abbreviations, tautomer-insensitive substructure search ● 2021: rdkit-cffi, more drawing improvements, R-group decomposition improvements ● 2022: C++17, generics for searching, non-tetrahedral symmetry…
  • 20. 20 Longer term RDKit objectives ● Improved support for other classes of molecules ■ Polymers ■ Organometallics ● Ensuring that the PostgreSQL cartridge is a plausible candidate for use in a corporate “data warehouse”1 ● Ensuring all the pieces are in place to make it easy to write a compound registration system 1 or whatever such things are called these days
  • 21. 21 Future directions: the cartridge Ensuring that the PostgreSQL cartridge is a plausible candidate for use in a corporate “data warehouse” - Integration of tautomer insensitive search - Integration of the MolStandardize code - Improvements to the chemical reaction handling - Integration of the generics for searching Further ideas - Adding some 3D search capabilities
  • 22. 22 Future directions: registration systems First: what is a chemical registration system?
  • 23. 23 Aside: Goals of a compound registration system We want to be able to answer these questions: - Have we seen this compound before? - Give me a key for this compound - Give me the structure for this key
  • 24. 24 Aside: Goals of a compound registration system We want to be able to answer these questions: - Have we seen this compound before? - Give me a key for this compound - Give me the structure for this key So what do we need to be able to do? - Standardize molecules - Generate hashes/keys for standardized molecules - Store structures
  • 25. 25 Using keys for registration Idea: use a hash to combine: - The molecular structure (via a fixed H InChI) - A stereo code - A stereo comment https://github.com/rdkit/UGM_2015/blob/8f562e70add17bab35f43823af0f03673f8a 1f2d/Presentations/KeyToRegistration.GregLandrum.pdf
  • 26. 26 Future directions: registration systems Ensuring all the pieces are in place to make it easy to write a compound registration system - Improvements to MolStandardize code - Improvements to the molecular hashing code - Support for more other classes of molecules
  • 27. 27 Let’s talk about molecular identity This isn’t just a topic for standard compound registration systems.
  • 28. 28 Molecular identity and computational questions ● Which molecules were used to generate this result? ● Have I already done a calculation using this molecule? ● Was this molecule part of my training set? All of these require us to be able to answer the question “are these two molecules the same?” Here be dragons…
  • 29. 29 Some things making molecular identity nontrivial
  • 30. 30 Some things making molecular identity nontrivial ● Counterions, solvents ● Resonance forms ● Charges ● Tautomers ● Stereochemistry Sometimes we care about these differences, sometimes we don’t. It depends on the context around when asking the question “are these two molecules the same?” This is not a comprehensive list
  • 31. 31 Identity hashes for molecules Idea: convert the molecule into some form which allows us to test whether or not it’s identical to other molecules via a simple string (or numerical) comparison. What “identical” means will be determined by the identity hash used. Familiar examples: - Canonical SMILES - InChI
  • 32. 32 Contextual identity Instead of having a single key/hash for a molecule, store a collection of layers with different levels of detail/types of information. When searching, choose the layers which are relevant for the current use case ● Store molecules using some relatively lossless format (e.g. v3000 SDF) ● Use molecular hashes capturing different levels of information to establish whether or not duplicates exist Note: it’s possible to do a limited version of this via careful manipulation of InChI strings
  • 33. 33 Some more identity hashes https://www.nextmovesoftware.com/talks/OBoyle_MolHash_ACS_201908.pdf Available in the RDKit since the 2019.09 release
  • 34. 34 Some of the basic identity hashes in rdMolHash ● Molecular formula ● Anonymous graph ● Element graph ● Murcko scaffold ● Tautomer ● Canonical smiles There are many others
  • 35. 35 Hashes for registration The team at Schrödinger1 have contributed a new RDKit module for calculating layered hashes which are useful for compound identity testing and registration. This will be in the 2022.09 release. Layers it currently supports: - Formula - Canonical SMILES : with and without stereo - Tautomer hash: with and without stereo - Sgroup data (for some help with polymers and things like atropisomers) - “Escape layer” (free text allowing a structure to be different even if everything else says it’s the same) 1 Chris Von Bargen, Hussein Faara, Dan Nealschneider, Ricardo Rodriguez, Rachel Walker
  • 36. 36 Registration hash example {<HashLayer.CANONICAL_SMILES: 1>: 'COc1ccc2[nH]c([S@@](=O)Cc3ncc(C)c(OC)c3C)nc2c1', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C17H19N3O3S', <HashLayer.NO_STEREO_SMILES: 4>: 'COc1ccc2[nH]c(S(=O)Cc3ncc(C)c(OC)c3C)nc2c1', <HashLayer.NO_STEREO_TAUTOMER_HASH: 5>: 'CO[C]1[CH][CH][C]2[N][C]([S]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0', <HashLayer.SGROUP_DATA: 6>: '[]', <HashLayer.TAUTOMER_HASH: 7>: 'CO[C]1[CH][CH][C]2[N][C]([S@@]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0'}
  • 37. 37 Handling tautomers {<HashLayer.CANONICAL_SMILES: 1>: 'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2c[nH]c3ncc(-c 4ccc(Cl)cc4)cc23)c1F', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S', … <HashLayer.TAUTOMER_HASH: 7>: 'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C]( [O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C ](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'} {<HashLayer.CANONICAL_SMILES: 1>: 'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2cnc3[nH]cc(-c 4ccc(Cl)cc4)cc2-3)c1F', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S', … <HashLayer.TAUTOMER_HASH: 7>: 'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C]( [O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C ](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'}
  • 38. 38 Handling atropisomers Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
  • 39. 39 Handling atropisomers Structures from: https://doi.org/10.1016/j.xphs.2021.10.011 The bold and hashed bonds are just drawing features and don’t survive translation to things like CXSMILES or mol files. But we can use S groups to indicate the stereochemistry
  • 40. 40 Handling atropisomers Structures from: https://doi.org/10.1016/j.xphs.2021.10.011 {<HashLayer.CANONICAL_SMILES: 1>: 'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O )n3C', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H21FN6O3', … <HashLayer.SGROUP_DATA: 6>: '[{"fieldName": "atropisomer", "atom": [19, 20], "bonds": [], "value": "M"}]', …} {<HashLayer.CANONICAL_SMILES: 1>: 'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O )n3C', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H21FN6O3', … <HashLayer.SGROUP_DATA: 6>: '[{"fieldName": "atropisomer", "atom": [19, 20], "bonds": [], "value": "P"}]', …}
  • 41. 41 Handling polymers {<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1', …, <HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU", "atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]], "index": 1, "connect": "HT", "label": "n"}]', …} {<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1', …, <HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU", "atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]], "index": 1, "connect": "HH", "label": "n"}]', …}
  • 42. 42 Handling enhanced stereochemistry Ethambutol These two describe the same racemic mixture
  • 43. 43 Handling enhanced stereochemistry {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO', …, <HashLayer.NO_STEREO_SMILES: 4>: 'CCC(CO)NCCNC(CC)CO', …} {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO |&1:2,9|', …, <HashLayer.NO_STEREO_SMILES: 4>: 'CCC(CO)NCCNC(CC)CO', …} We get the same hash if the molecule is drawn with wedged bonds.
  • 44. 44 Using the escape layer Suppose I start with the racemic mixture, run it through a chiral column, and collect the two fractions I want to register the two fractions separately without determining the absolute stereochemistry
  • 45. 45 Using the escape layer {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|', <HashLayer.ESCAPE: 2>: ‘first fraction', …} {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|', <HashLayer.ESCAPE: 2>: ‘second fraction', …}
  • 46. 46 Aside: using the escape layer for comp chem {… <HashLayer.ESCAPE: 2>: ‘conformer 1', …} {… <HashLayer.ESCAPE: 2>: ‘conformer 2', …} Suppose I want to store multiple conformers/poses of the same molecule
  • 47. 47 Wrapping up: molecular identity ● For many computational tasks we want to be able to figure out whether or not we have seen/used a particular molecule ● The definition of “same” for molecules depends on the context/question being asked ● Layered registration hashes make it easy (and cheap) to store sets of molecules and answer the context-dependent “are these the same?” question