SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Open PHACTS - Chemistry
Platform Update and learnings
Antony Williams and Valery Tkachenko
ORCID ID:0000-0002-2668-4821
@gray_alasdair Big Data Integration 2
OpenPHACTS and CRS Diagram
The Chemical Registration Service
Chemistry processing
•Validation
•Standardization
•Properties generation
•Properties retrieval
Export
•RDF
•SDF
API
•Domain-specific searches
•Chemical visualization
•Properties
•Conversions
Subsystems
• “CVSP” (frontend, backend, database)
• Compounds (frontend, database)
• OpenPHACTS API (frontend, database)
• Datasources registry (frontend, database)
• Processing farm (optional)
Structure-Based Database linking
• Open PHACTS, and many other projects
requiring the linking of structure databases,
depend on mappings
• Different databases use different processes
for standardization prior at deposition
• Examples: PubChem, EBI databases,
ChemSpider, etc.
DrugBank
• ~60 records can’t be dearomatized unambiguously
• ~40 records where InChIs did not match structure
• 2 records where SMILES, InChI and name did not
match the structure
• 7 records with 2 stereo bonds at chiral atoms
DB04283 DB04462
Standardizers
• EBI Standardizer:
https://wwwdev.ebi.ac.uk/chembl/extra/francis/sta
/
• PubChem Standardizer: https://
pubchem.ncbi.nlm.nih.gov/standardize/standardi
• NCGC Standardizer: https://tripod.nih.gov/?
p=61
• The CVSP Standardizer work in Open
PHACTS http://cvsp.chemspider.com/
Standardization Rules
• Available from: http://tinyurl.com/hwapem3
• Use the SRS as guidance for standardization
• Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
The CVSP System
http://cvsp.chemspider.com
Supports various file formats
Comptox Chemistry Dashboard
Prior to deposition check a deposition…
>3450 compounds in one SDF
98 Errors, 1571 Warnings
Review Errors
Validation Rule Set
Various Rules Sets Available
CVSP – My own custom rules
ChEMBL Validation Review
(of 1.3 million records)
• 11,020 records with 4 bonds and zero charge, e.g.
CHEMBL501101 or CHEMBL501973
• 271 records with hypervalent oxygen (e.g. ,
CHEMBL2219679), carbon (e.g. 1005895), boron,
chlorine, iodine or phosphine
• 6,177 records where direction of bond makes no
sense, e.g. CHEMBL12760 and CHEMBL34704
Chemical Validation first…
Standardization Second
• Chemical Validation detects errors –
Standardization FIXES them according to rules
• SMIRKS transformations are based on both
InChI Normalization and FDA SRS rules
Standardization SMIRKS
Examples of InChI normalization
[*;H+:1]>>[*;H:1]
[O,S,Se,Te:1]=[O+,S+,Se+,Te+:2][C-;v3:3]>>[O,S,Se,Te:1]=[O,S,Se,Te:2]=[C:3]
[N-,P-,As-,Sb-:1]=[C+;v3:2]>>[N,P,As,Sb:1]#[C:2]
Examples of FDA SRS rules
[n:1]=[O:2]>>[n+:1][O-:2]
[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]
[N+0;H3:1].[C:3](=[O:4])[O:5][H:6]>>[N+1;H4:1].[C:3](=[O:4])[O-:5]
Thiopurine [H:1][S:2][c:3]1[n:8][c:7]([H,*:13])[n:6][c:5]2[c:4]1[n:11][c:10]
([H,*:12])[n:9]2>>[H:1][N:8]1[C:7]([H,*:13])=[N:6][C:5]2=[C:4]([N:11]=[C:10]
([H,*:12])[N:9]2)[C:3]1=[S:2]
Examples of Standardization
Double bond with adjacent wiggly single bond
Collapser hydrogen atoms with no stereo bonds
Examples of Standardization
Remove symmetric stereocenters
Turn off chiral flag if no up or down bonds
Defining a Community Rule Set
• There are multiple standardizers, each with
their own rules set
• Can we decide on a default community rules
set, like Standard InChI, that could be used
by ALL Standardizers?
• A joint meeting between the Research Data
Alliance (RDA), IUPAC and ACS Division of
Chemical Information discussed the value
and possibilities of this approach (July 2016)
EPA is investigating CVSP
• EPA is investigating CVSP as a validation
and standardization platform
• Considering the API aspects of CVSP to
integrate to our registration system
• CVSP is a reference implementation and
“starting point” for a community rules set
CVSP code is now Open Source
• Open Source CVSP code now released
• Code is hosted on Open PHACTS Github
https://github.com/openphacts/ops-crs
• Valery Tkachenko will offer future support
• Hoping for additional community engagement
and support
• Some details of availability….
Virtual Machines
• OPS_FRONT (all websites and API)
• OPS_BACK (all heavy-lifting)
• OPS_DB (databases)
• VMs are VMware images
• Can be converted to other hypervisors
Thank you
Emails: tony27587@gmail.com and tkachenko.valery@gmail.com
SLIDES: www.slideshare.net/AntonyWilliams

Weitere ähnliche Inhalte

Andere mochten auch

Proposed Linked Data Migration Framework for Singapore Government Datasets
Proposed Linked Data Migration Framework for Singapore Government DatasetsProposed Linked Data Migration Framework for Singapore Government Datasets
Proposed Linked Data Migration Framework for Singapore Government DatasetsAravind Sesagiri Raamkumar
 
Experiences and adventures with no sql and its applications to cheminformatic...
Experiences and adventures with no sql and its applications to cheminformatic...Experiences and adventures with no sql and its applications to cheminformatic...
Experiences and adventures with no sql and its applications to cheminformatic...Valery Tkachenko
 
Verkko ja tyohyvinvointi, TTL 26052010
Verkko ja tyohyvinvointi, TTL 26052010Verkko ja tyohyvinvointi, TTL 26052010
Verkko ja tyohyvinvointi, TTL 26052010Karoliina Luoto
 
Wordpress: Make Your Site Impressively Beautiful
Wordpress: Make Your Site Impressively BeautifulWordpress: Make Your Site Impressively Beautiful
Wordpress: Make Your Site Impressively BeautifulMafel Gorne
 
Testing and validating your idea
Testing and validating your ideaTesting and validating your idea
Testing and validating your ideaJanna Bastow
 
Kevin's Portfolio, Graphics and Photos
Kevin's Portfolio, Graphics and Photos Kevin's Portfolio, Graphics and Photos
Kevin's Portfolio, Graphics and Photos Kevin Pendergraft
 
MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)
MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)
MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)aalvarez1410
 
Empleo y esclerosis múltiple.
Empleo y esclerosis múltiple.Empleo y esclerosis múltiple.
Empleo y esclerosis múltiple.José María
 
Processes should serve creativity - Which processes help creatives to work be...
Processes should serve creativity - Which processes help creatives to work be...Processes should serve creativity - Which processes help creatives to work be...
Processes should serve creativity - Which processes help creatives to work be...Jan Korsanke
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical informationValery Tkachenko
 
Nociones de derecho civil mapa conceptual
Nociones de derecho civil mapa conceptualNociones de derecho civil mapa conceptual
Nociones de derecho civil mapa conceptualylerin
 
Serverless Logging with AWS Lambda and the Elastic Stack
Serverless Logging with AWS Lambda and the Elastic StackServerless Logging with AWS Lambda and the Elastic Stack
Serverless Logging with AWS Lambda and the Elastic StackEdoardo Paolo Scalafiotti
 
Érzelmek hálójában – hálózat- és tartalomelemzés
Érzelmek hálójában – hálózat- és tartalomelemzésÉrzelmek hálójában – hálózat- és tartalomelemzés
Érzelmek hálójában – hálózat- és tartalomelemzésZoltan Varju
 
Nociones del derecho civil mapa conceptual
Nociones del derecho civil mapa conceptualNociones del derecho civil mapa conceptual
Nociones del derecho civil mapa conceptualMaría Herrera
 
How to Decide: When to Use What In Office 365 - SharePoint Fest DC
How to Decide: When to Use What In Office 365 - SharePoint Fest DCHow to Decide: When to Use What In Office 365 - SharePoint Fest DC
How to Decide: When to Use What In Office 365 - SharePoint Fest DCRichard Harbridge
 
Postcron: Automate and Plan Posts Ahead
Postcron: Automate and Plan Posts AheadPostcron: Automate and Plan Posts Ahead
Postcron: Automate and Plan Posts AheadMafel Gorne
 

Andere mochten auch (18)

Proposed Linked Data Migration Framework for Singapore Government Datasets
Proposed Linked Data Migration Framework for Singapore Government DatasetsProposed Linked Data Migration Framework for Singapore Government Datasets
Proposed Linked Data Migration Framework for Singapore Government Datasets
 
Experiences and adventures with no sql and its applications to cheminformatic...
Experiences and adventures with no sql and its applications to cheminformatic...Experiences and adventures with no sql and its applications to cheminformatic...
Experiences and adventures with no sql and its applications to cheminformatic...
 
Verkko ja tyohyvinvointi, TTL 26052010
Verkko ja tyohyvinvointi, TTL 26052010Verkko ja tyohyvinvointi, TTL 26052010
Verkko ja tyohyvinvointi, TTL 26052010
 
Wordpress: Make Your Site Impressively Beautiful
Wordpress: Make Your Site Impressively BeautifulWordpress: Make Your Site Impressively Beautiful
Wordpress: Make Your Site Impressively Beautiful
 
Dercho civil
Dercho civilDercho civil
Dercho civil
 
Testing and validating your idea
Testing and validating your ideaTesting and validating your idea
Testing and validating your idea
 
Kevin's Portfolio, Graphics and Photos
Kevin's Portfolio, Graphics and Photos Kevin's Portfolio, Graphics and Photos
Kevin's Portfolio, Graphics and Photos
 
MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)
MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)
MAPA CONCEPTUAL SOCIOLOGIA (unidades IV Y V)
 
2016 laura's resume
2016 laura's resume2016 laura's resume
2016 laura's resume
 
Empleo y esclerosis múltiple.
Empleo y esclerosis múltiple.Empleo y esclerosis múltiple.
Empleo y esclerosis múltiple.
 
Processes should serve creativity - Which processes help creatives to work be...
Processes should serve creativity - Which processes help creatives to work be...Processes should serve creativity - Which processes help creatives to work be...
Processes should serve creativity - Which processes help creatives to work be...
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical information
 
Nociones de derecho civil mapa conceptual
Nociones de derecho civil mapa conceptualNociones de derecho civil mapa conceptual
Nociones de derecho civil mapa conceptual
 
Serverless Logging with AWS Lambda and the Elastic Stack
Serverless Logging with AWS Lambda and the Elastic StackServerless Logging with AWS Lambda and the Elastic Stack
Serverless Logging with AWS Lambda and the Elastic Stack
 
Érzelmek hálójában – hálózat- és tartalomelemzés
Érzelmek hálójában – hálózat- és tartalomelemzésÉrzelmek hálójában – hálózat- és tartalomelemzés
Érzelmek hálójában – hálózat- és tartalomelemzés
 
Nociones del derecho civil mapa conceptual
Nociones del derecho civil mapa conceptualNociones del derecho civil mapa conceptual
Nociones del derecho civil mapa conceptual
 
How to Decide: When to Use What In Office 365 - SharePoint Fest DC
How to Decide: When to Use What In Office 365 - SharePoint Fest DCHow to Decide: When to Use What In Office 365 - SharePoint Fest DC
How to Decide: When to Use What In Office 365 - SharePoint Fest DC
 
Postcron: Automate and Plan Posts Ahead
Postcron: Automate and Plan Posts AheadPostcron: Automate and Plan Posts Ahead
Postcron: Automate and Plan Posts Ahead
 

Ähnlich wie OpenPHACTS - Chemistry Platform Update and Learnings

How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...Ken Karapetyan
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...Ken Karapetyan
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectKen Karapetyan
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Kamel Mansouri
 

Ähnlich wie OpenPHACTS - Chemistry Platform Update and Learnings (20)

ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 
Hosting Public Domain Chemicals Data Online for the Community – the Challenge...
Hosting Public Domain Chemicals Data Online for the Community – the Challenge...Hosting Public Domain Chemicals Data Online for the Community – the Challenge...
Hosting Public Domain Chemicals Data Online for the Community – the Challenge...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
Chemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet EraChemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet Era
 

Mehr von Valery Tkachenko

Evolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureEvolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureValery Tkachenko
 
In silico design of new functional materials
In silico design of new functional materialsIn silico design of new functional materials
In silico design of new functional materialsValery Tkachenko
 
Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Valery Tkachenko
 
Abstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsAbstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsValery Tkachenko
 
Machine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsMachine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsValery Tkachenko
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionValery Tkachenko
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsValery Tkachenko
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Valery Tkachenko
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Valery Tkachenko
 
Living in a world of federated knowledge challenges, principles, tools and ...
Living in a world of federated knowledge   challenges, principles, tools and ...Living in a world of federated knowledge   challenges, principles, tools and ...
Living in a world of federated knowledge challenges, principles, tools and ...Valery Tkachenko
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Valery Tkachenko
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataValery Tkachenko
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesValery Tkachenko
 
Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Valery Tkachenko
 
Open Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchOpen Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchValery Tkachenko
 
OMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesOMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesValery Tkachenko
 
Not just another reaction database
Not just another reaction databaseNot just another reaction database
Not just another reaction databaseValery Tkachenko
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 

Mehr von Valery Tkachenko (20)

Evolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureEvolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the future
 
In silico design of new functional materials
In silico design of new functional materialsIn silico design of new functional materials
In silico design of new functional materials
 
Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...
 
Abstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsAbstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representations
 
Machine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsMachine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpoints
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...
 
Living in a world of federated knowledge challenges, principles, tools and ...
Living in a world of federated knowledge   challenges, principles, tools and ...Living in a world of federated knowledge   challenges, principles, tools and ...
Living in a world of federated knowledge challenges, principles, tools and ...
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical data
 
Tools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databasesTools and approaches for data deposition into nanomaterial databases
Tools and approaches for data deposition into nanomaterial databases
 
Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0Chemistry Validation and Standardization Platform v2.0
Chemistry Validation and Standardization Platform v2.0
 
Open Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchOpen Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials research
 
OMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesOMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spaces
 
Not just another reaction database
Not just another reaction databaseNot just another reaction database
Not just another reaction database
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 

Kürzlich hochgeladen

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 

Kürzlich hochgeladen (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 

OpenPHACTS - Chemistry Platform Update and Learnings

  • 1. Open PHACTS - Chemistry Platform Update and learnings Antony Williams and Valery Tkachenko ORCID ID:0000-0002-2668-4821
  • 2. @gray_alasdair Big Data Integration 2 OpenPHACTS and CRS Diagram
  • 3. The Chemical Registration Service Chemistry processing •Validation •Standardization •Properties generation •Properties retrieval Export •RDF •SDF API •Domain-specific searches •Chemical visualization •Properties •Conversions
  • 4.
  • 5. Subsystems • “CVSP” (frontend, backend, database) • Compounds (frontend, database) • OpenPHACTS API (frontend, database) • Datasources registry (frontend, database) • Processing farm (optional)
  • 6. Structure-Based Database linking • Open PHACTS, and many other projects requiring the linking of structure databases, depend on mappings • Different databases use different processes for standardization prior at deposition • Examples: PubChem, EBI databases, ChemSpider, etc.
  • 7. DrugBank • ~60 records can’t be dearomatized unambiguously • ~40 records where InChIs did not match structure • 2 records where SMILES, InChI and name did not match the structure • 7 records with 2 stereo bonds at chiral atoms DB04283 DB04462
  • 8. Standardizers • EBI Standardizer: https://wwwdev.ebi.ac.uk/chembl/extra/francis/sta / • PubChem Standardizer: https:// pubchem.ncbi.nlm.nih.gov/standardize/standardi • NCGC Standardizer: https://tripod.nih.gov/? p=61 • The CVSP Standardizer work in Open PHACTS http://cvsp.chemspider.com/
  • 9.
  • 10. Standardization Rules • Available from: http://tinyurl.com/hwapem3 • Use the SRS as guidance for standardization • Adjust as necessary to our needs
  • 12. Salt and Ionic Bonds
  • 15. Comptox Chemistry Dashboard Prior to deposition check a deposition…
  • 17. 98 Errors, 1571 Warnings
  • 20. Various Rules Sets Available
  • 21. CVSP – My own custom rules
  • 22. ChEMBL Validation Review (of 1.3 million records) • 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973 • 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine • 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704
  • 23. Chemical Validation first… Standardization Second • Chemical Validation detects errors – Standardization FIXES them according to rules • SMIRKS transformations are based on both InChI Normalization and FDA SRS rules
  • 24. Standardization SMIRKS Examples of InChI normalization [*;H+:1]>>[*;H:1] [O,S,Se,Te:1]=[O+,S+,Se+,Te+:2][C-;v3:3]>>[O,S,Se,Te:1]=[O,S,Se,Te:2]=[C:3] [N-,P-,As-,Sb-:1]=[C+;v3:2]>>[N,P,As,Sb:1]#[C:2] Examples of FDA SRS rules [n:1]=[O:2]>>[n+:1][O-:2] [*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3] [N+0;H3:1].[C:3](=[O:4])[O:5][H:6]>>[N+1;H4:1].[C:3](=[O:4])[O-:5] Thiopurine [H:1][S:2][c:3]1[n:8][c:7]([H,*:13])[n:6][c:5]2[c:4]1[n:11][c:10] ([H,*:12])[n:9]2>>[H:1][N:8]1[C:7]([H,*:13])=[N:6][C:5]2=[C:4]([N:11]=[C:10] ([H,*:12])[N:9]2)[C:3]1=[S:2]
  • 25. Examples of Standardization Double bond with adjacent wiggly single bond Collapser hydrogen atoms with no stereo bonds
  • 26. Examples of Standardization Remove symmetric stereocenters Turn off chiral flag if no up or down bonds
  • 27. Defining a Community Rule Set • There are multiple standardizers, each with their own rules set • Can we decide on a default community rules set, like Standard InChI, that could be used by ALL Standardizers? • A joint meeting between the Research Data Alliance (RDA), IUPAC and ACS Division of Chemical Information discussed the value and possibilities of this approach (July 2016)
  • 28. EPA is investigating CVSP • EPA is investigating CVSP as a validation and standardization platform • Considering the API aspects of CVSP to integrate to our registration system • CVSP is a reference implementation and “starting point” for a community rules set
  • 29. CVSP code is now Open Source • Open Source CVSP code now released • Code is hosted on Open PHACTS Github https://github.com/openphacts/ops-crs • Valery Tkachenko will offer future support • Hoping for additional community engagement and support • Some details of availability….
  • 30. Virtual Machines • OPS_FRONT (all websites and API) • OPS_BACK (all heavy-lifting) • OPS_DB (databases) • VMs are VMware images • Can be converted to other hypervisors
  • 31. Thank you Emails: tony27587@gmail.com and tkachenko.valery@gmail.com SLIDES: www.slideshare.net/AntonyWilliams

Hinweis der Redaktion

  1. Open PHACTS was developed to support the key questions of drug discovery Business questions have been at the heart of Open PHACTS and have driven the development of the platform Mx/psa, how calculated who did it? Mash up. With your data too, - top layer join together but need them all commercial Data provided by many publishers Originally in many formats: relational, SD files and RDF Worked closely with publishers Data licensing was a major issue Over 5 billion triples – 14 datasets & growing Hosted on beefy hardware; data in memory (aim) Extensive memcaching Pose complex queries to extract data