SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Lehigh Univ., Apr. 26, 2019
Lehigh Univ., Apr. 26, 2019
URL: https://pubchem.ncbi.nlm.nih.gov
Lehigh Univ., Apr. 26, 2019
3
Introduction to PubChem
Lehigh Univ., Apr. 26, 2019
 PubChem (https://pubchem.ncbi.nlm.nih.gov)
 A “public” repository of information on small molecules
and their biological activities, developed and
maintained by the U.S. National Institutes of Health
(NIH).
 Launched in 2004 as a part of the Molecular Libraries
Roadmap initiatives.
 A key resource of chemical information for researchers
in the area of cheminformatics, chemical biology,
medicinal chemistry, and many others.
Lehigh Univ., Apr. 26, 2019
0.0
1.0
2.0
3.0
4.0
Jan
Apr
Jul
Oct
Jan
Apr
Jul
Oct
Jan
Apr
Jul
Oct
Jan
Apr
Jul
Oct
Jan
NumberofUsers
(millions)
Month
Number of Monthly Unique Users
(Jan 2015 - Mar 2019, interactive users only)
2017
PubChem Usage Statistics
2015 2016
3.5 million unique users per month at peak (October 2018)
2018
5
2019
Lehigh Univ., Apr. 26, 2019
Top 5 Chemistry Websites
1. acs.org
2. rsc.org
3. sigmaaldrich.com
4. pubchem.ncbi.nlm.nih.gov
5. cas.org
Source: https://www.alexa.com/topsites/category/Top/Science/Chemistry
6
PubChem is the only public website among them.
Lehigh Univ., Apr. 26, 2019
 Dual role of PubChem
>600
Data
sources
PubChem The public
 Data archive/repository:
Collects/maintains chemical information submitted by data
contributors.
 Knowledgebase:
Provides (high-quality) data to the public.
Lehigh Univ., Apr. 26, 2019
Depositor-provided
Bioactivity test results
Unique chemical
structure extraction
through
Standardization
Depositor-provided
substance descriptions
Unique chemical structures
Activity of
tested
“substances”
Activity of “compounds”
derived from associated
“substances”
Data Contributors
Substance
deposition
Assay
deposition
 Data organization in PubChem
Lehigh Univ., Apr. 26, 2019
 PubChem (https://pubchem.ncbi.nlm.nih.gov)
 PubChem contains:
• >243.9 million substance descriptions,
• >97.6 million unique chemical structures,
• >264.8 million biological activity test results,
• >1.3 million biological assays, covering >10,000 unique
protein sequence targets.
(Arguably) the largest corpus of
publicly available chemical information
from 600+ data sources.
(as of April 24, 2016)
Lehigh Univ., Apr. 26, 2019
10
Exploring PubChem
using the web interfaces
Lehigh Univ., Apr. 26, 2019
11
Note
 (Almost) all tasks you can do using PubChem’s web
interfaces can be automated using its programmatic
interfaces.
Lehigh Univ., Apr. 26, 2019
12
PubChem Web Interfaces
 Text Search
 Structure Search
 ID List Upload
 Classification Browser
 PubChem Docs
Lehigh Univ., Apr. 26, 2019
13
URL: https://pubchem.ncbi.nlm.nih.gov
Lehigh Univ., Apr. 26, 2019
14
Lehigh Univ., Apr. 26, 2019
15
Lehigh Univ., Apr. 26, 2019
16
Lehigh Univ., Apr. 26, 2019
17
Lehigh Univ., Apr. 26, 2019
18
PubChem Web Interfaces
 Text Search
 Structure Search
 ID List Upload
 Classification Browser
 PubChem Docs
Lehigh Univ., Apr. 26, 2019
19
Lehigh Univ., Apr. 26, 2019
20
Lehigh Univ., Apr. 26, 2019
21
Lehigh Univ., Apr. 26, 2019
22
Lehigh Univ., Apr. 26, 2019
23
PubChem Web Interfaces
 Text Search
 Structure Search
 ID List Upload
 Classification Browser
 PubChem Docs
Lehigh Univ., Apr. 26, 2019
24
Lehigh Univ., Apr. 26, 2019
25
Lehigh Univ., Apr. 26, 2019
26
Lehigh Univ., Apr. 26, 2019
27
Lehigh Univ., Apr. 26, 2019
28
PubChem Web Interfaces
 Text Search
 Structure Search
 ID List Upload
 Classification Browser
 PubChem Docs
Lehigh Univ., Apr. 26, 2019
29
Lehigh Univ., Apr. 26, 2019
30
Lehigh Univ., Apr. 26, 2019
31
Lehigh Univ., Apr. 26, 2019
32
Structure
download
Refine
the result
Lehigh Univ., Apr. 26, 2019
33
PubChem Web Interfaces
 Text Search
 Structure Search
 ID List Upload
 Classification Browser
 PubChem Docs
Lehigh Univ., Apr. 26, 2019
34
Lehigh Univ., Apr. 26, 2019
35
Lehigh Univ., Apr. 26, 2019
36
PubChem for
Drug Discovery
Lehigh Univ., Apr. 26, 2019
 PubChem’s chemical space
Lipinski’s rule of 5
(drug-likeness)
Congreve’s rule of 3
(lead-likeness)
Lead
compounds
Drug
candidates
Modification
Mol. Wt.: < 500
H-Bond Donor ≤ 5
H-Bond Accepter ≤ 10
LogP ≤ 5
Mol. Wt.: < 300
H-Bond Donor ≤ 3
H-Bond Acceptor ≤ 3
LogP ≤ 3
Rotatable Bond ≤ 3
PSA ≤ 60
Lehigh Univ., Apr. 26, 2019
Lead-like
11.2 millions
(11%)
Drug-like
73.3 millions
(75%)
All compounds
97.6 millions
(100%)
 PubChem’s chemical space
Lehigh Univ., Apr. 26, 2019
Ro5
73.3
millions
(75%)
Ro51
15.0
millions
(15%)
Ro52
6.5 millions
(7%)
Ro53
1.3 millions
(1%)
Ro54
0.3 millions
(~0%)
 PubChem’s chemical space
Ro5 + Ro5-1 = 90%
Lehigh Univ., Apr. 26, 2019
 Bioactivity Data in PubChem
Tested
3.4 millions
(3.50%)
Active
(AC  1 nM)
62 thousands
(0.06%)
Active
(1 nM < AC  1 M)
713 thousands
(0.73%)
Active
(others)
465 thousands
(0.47%)
Inactive
2.1 millions
(1.34%)
Not Tested
94.2 millions
(96.51%)
All Compounds
97.6 millions
(100.00%)
Lehigh Univ., Apr. 26, 2019
High-Throughput
Screening data
Literature-extracted
data
 Bioactivity Data in PubChem
Lehigh Univ., Apr. 26, 2019
High-Throughput
Screening data
• From Molecular Libraries
Program and other HTS
projects.
• Many inactives
• False hits
(e.g., aggregators,
autofluoresent
compounds)
• Often measured at single
concentration
Literature-extracted
data
 Bioactivity Data in PubChem
Lehigh Univ., Apr. 26, 2019
High-Throughput
Screening data
• From Molecular Libraries
Program and other HTS
projects.
• Many inactives
• False hits
(e.g., aggregators,
autofluoresent
compounds)
• Often measured at single
concentration
Literature-extracted
data
• From manual curation or
data mining
• No (or few) inactives
• Provided by various
PubChem depositors
including:
ChEMBL,
PDBbind, BindingDB,
Guide to Pharmacology
 Bioactivity Data in PubChem
Lehigh Univ., Apr. 26, 2019
• Manual extraction from peer-reviewed papers in journals in
medicinal chemistry and natural product
ChEMBL
• Experimental binding affinity data for biomolecular
complexes in the PDB
PDBBind
• Binding affinities, focusing chiefly on the interactions of
protein considered to be drug targets with drug like small
molecules
BindingDB
• Drug targets (G-protein-coupled receptors, ion channels, and
nuclear hormone receptors and their ligands
Guide to Pharmacology
 Literature-extracted Bioactivity data
Lehigh Univ., Apr. 26, 2019
 Annotations available in PubChem
 DrugBank
 Comprehensive information on FDA-approved and
investigational drugs
• drug indications,
• mechanism of action,
• target macromolecules,
• interactions with genes/proteins,
• ADMET, ……
 Hazardous Substance Data Bank (HSDB)
 Toxicological information on chemicals of interest
in environmental and human health
Lehigh Univ., Apr. 26, 2019
 Annotations available in PubChem
 Molecular Modeling Database (MMDB)
 Protein-bound 3D structures (found in PDB)
 Cambridge Structural Database (CSD)
 3-D crystal structures
 NLM’s Dailymed
 Drug labeling information
Lehigh Univ., Apr. 26, 2019
 Annotations available in PubChem
 FDA
 Orange book
 Unique ingredient identifiers,
 Pharmacologic Classes
 EPA
 Substance Registry Services
 Chemical data collected under the:
o Toxic Substance Control Act
o Clean Air Act
Lehigh Univ., Apr. 26, 2019
 Availability of compounds for subsequent
experiments
• Virtual screening hits should be synthesizable or
purchasable.
• PubChem contains “real” molecules (not “virtual”
molecules)
• At least one or more data contributors claim that
they have the compound and/or information about
it.
• Some of these data contributors are chemical
vendors (e.g., Sigma Aldrich).
Lehigh Univ., Apr. 26, 2019
 Two important aspects of PubChem records
(in the context of “compound availability”)
 Non-live compounds:
 Not searchable although they exist.
 No associated substances due to:
o Mistakenly submitted substances
o Incorrect information
o No intention to share
Lehigh Univ., Apr. 26, 2019
 Two important aspects of PubChem records
(in the context of “compound availability”)
 Non-live compounds:
 Not searchable although they exist.
 No associated substances due to:
o Mistakenly submitted substances
o Incorrect information
o No intention to share
 Legacy designation:
 No longer maintains their records up-to-date.
o Discontinued funding, low business priority, …
Lehigh Univ., Apr. 26, 2019
 Compound patentability for IP protection
• PubChem provides patent link information for
compounds, provided by:
o IBM
o SureChEMBL (formerly, SureChem)
o NextMove Software
o SCRIPDB
o BindingDB
Lehigh Univ., Apr. 26, 2019
 Compound patentability for IP protection
>336 million chemical-patent links
>6 million patent documents
>16 million unique chemical structures
• Covering patent documents published by:
o U.S. Patent and Trademark Office (PTO)
o European Patent Office (EPO)
o World Intellectual Property Organization (WIPO)
 Compounds with patent links are annotated with
WIPO International Patent Classification (IPC)
information
Lehigh Univ., Apr. 26, 2019
Entrez Utilities
(E-Utils)
Power User
Gateway
(PUG)
PUG-SOAP
PUG-REST
PubChem
RDF REST
PUG-View
 Programmatic access to PubChem for
Automation of VS pipeline
Lehigh Univ., Apr. 26, 2019
 Conceptual Framework of a PUG-REST request
PubChem
Servers
User’s
Computer
① INPUT
Identifiers
(CIDs, SIDs, AIDs)
③ OUTPUT
Results in
a desired format
② OPERATION
with identifiers
 All necessary information is encoded into a one-line URL.
Lehigh Univ., Apr. 26, 2019
Options specific to
some operations
http://pubchem.ncbi.nlm.nih.gov/rest/pug/<INPUT>/<OPERATION>/<OUTPUT>[?OPTIONS]
Prolog
(common to all PUG REST requests)
<INPUT>
Specifies identifiers of interest,
by identifiers
by chemical name
by chemical structure search
by cross reference
by listkey, ......
<OPERATION>
Specifies what to do with input
get full records
get molecular properties
get synonyms or images
get cross references
many other operations
<OUTPUT>
Specifies desired output format
XML  PNG
JSON  SDF
JSONP  CSV
ASNB  TXT
ASNT
 URL construction for a PUG-REST request
http://...... /compound/cid/2244,1983/record/XML?record_type=3d
 Retrieve in XML full records for CIDs 2244 and 1983, including 3-D
structure description.
 All necessary information is encoded into a one-line URL.
Lehigh Univ., Apr. 26, 2019
Request volume limitations
 PUG-REST is NOT designed for very large volumes of
requests
(e.g. millions of requests)
 Any script or application should not make more than five
requests per second to avoid overloading the PubChem
servers.
 If you have a large data set to process, please contact us
for help on optimizing your task.
Lehigh Univ., Apr. 26, 2019
 PubChemRDF
 Encodes PubChem information using RDF.
 Helps researchers work with PubChem data on local
computing resources using semantic web technologies.
 harnesses ontological frameworks to help facilitate
PubChem data sharing, analysis, and integration with
resources external to the National Center for
Biotechnology (NCBI) and
across scientific domains.
RDF = Resource Description Framework
Lehigh Univ., Apr. 26, 2019
 Programmatic access to PubChem RDF data is also available
through a REST-ful interface.
 PubChemRDF for data exchange and integration
PubChemRDF
From FTP
Triple Store
Apache Jena TDB,
Open-Link
Virtuoso,
……
RDF-aware
graph
Databases
Neo4j, …
SPARQL
Query
Interface
Graph
Traversal
Algorithm
(RDF : Resource Description Framework)
Lehigh Univ., Apr. 26, 2019
59
Summary
Lehigh Univ., Apr. 26, 2019
 Summary
• PubChem is the largest source of publicly
available chemical information, collected from
more than 600 data sources.
• In addition to bioactivity data generated through
high-throughput screenings, PubChem contains a
substantial amount of bioactivity information
extracted from scientific articles.
• Chemical vendor and patent information for
compounds in PubChem helps prioritize hit
compounds for further screening.
Lehigh Univ., Apr. 26, 2019
 Summary
• PubChem supports programmatic access to its
data, allowing for building an automated virtual
screening pipeline.
• PubChemRDF allows users to download
PubChem data on a local computing facility and
integrate them with their own data.
• PubChem data can be used for developing
computational prediction models for bioactivity or
toxicity of molecules.
Lehigh Univ., Apr. 26, 2019
 References
 Getting the most out of PubChem for virtual screening
Expert Opin. Drug Discov. 2016, 11(9), 843-855.
 PubChem 2019 update: improved access to chemical data
Nucleic Acids Res (2019) 47(D1):D1102-1109.
 PubChem Substance and Compound Databases
Nucleic Acids Res (2016) 44 (D1): D1202-D1213.
 An update on PUG-REST: RESTful interface for programmatic
access to PubChem
Nucleic Acids Res (2017) 46 (W1): W563-570.
 PUG-SOAP and PUG-REST: web services for programmatic access
to chemical information in PubChem
Nucleic Acids Res (2015) 43 (W1): W605-W611
Lehigh Univ., Apr. 26, 2019
Acknowledgements
 The PubChem Team
 Funding from the National Library of Medicine
63
 PubChem data contributors and users
Evan Bolton Jia He Thiessen Paul
Jie Chen Siqian He Bo Yu
Tiejun Chung Qingliang Li Leonid Zaslavsky
Asta Gindulyte Ben Shoemaker Jian Zhang
Lehigh Univ., Apr. 26, 2019
Thank you for your attention.
Questions?
Email: sunghwan.kim@nih.gov
kimsungh@ncbi.nlm.nih.gov

Weitere ähnliche Inhalte

Was ist angesagt?

Bioinformatics, its application main
Bioinformatics, its application mainBioinformatics, its application main
Bioinformatics, its application mainKAUSHAL SAHU
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)Sabahat Ali
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
Drug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AIDrug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AIIndrajeetKumar124
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsAmna Jalil
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug designN K
 
Chemo informatics scope and applications
Chemo informatics scope and applicationsChemo informatics scope and applications
Chemo informatics scope and applicationsshyam I
 
energy minimization
energy minimizationenergy minimization
energy minimizationpradeep kore
 
Toxicological databases
Toxicological databasesToxicological databases
Toxicological databasesSAURABH KUMAR
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsHamid Ur-Rahman
 
Chemical database preparation ppt
Chemical database preparation pptChemical database preparation ppt
Chemical database preparation pptsamantlalit
 

Was ist angesagt? (20)

(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Bioinformatics, its application main
Bioinformatics, its application mainBioinformatics, its application main
Bioinformatics, its application main
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)
 
Chemoinformatics
ChemoinformaticsChemoinformatics
Chemoinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Drug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AIDrug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AI
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
String.pptx
String.pptxString.pptx
String.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug design
 
Chemo informatics scope and applications
Chemo informatics scope and applicationsChemo informatics scope and applications
Chemo informatics scope and applications
 
energy minimization
energy minimizationenergy minimization
energy minimization
 
Toxicological databases
Toxicological databasesToxicological databases
Toxicological databases
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Ddbj
DdbjDdbj
Ddbj
 
Docking
DockingDocking
Docking
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Chemical database preparation ppt
Chemical database preparation pptChemical database preparation ppt
Chemical database preparation ppt
 

Ähnlich wie PubChem and Its Applications for Drug Discovery

PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingSunghwan Kim
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoverySunghwan Kim
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsSunghwan Kim
 
PubChem as a resource for chemical information education
PubChem as a resource for chemical information educationPubChem as a resource for chemical information education
PubChem as a resource for chemical information educationSunghwan Kim
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistrySunghwan Kim
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy trainingSunghwan Kim
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChemSunghwan Kim
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChemSunghwan Kim
 
Mechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data Cloud
Mechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data CloudMechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data Cloud
Mechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data CloudMaulik Kamdar
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChemSunghwan Kim
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data ChemistrySunghwan Kim
 
Environmental chemical information in PubChem
Environmental chemical information in PubChem Environmental chemical information in PubChem
Environmental chemical information in PubChem Jian Zhang
 
Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...
Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...
Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...OECD Environment
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The APIopen_phacts
 

Ähnlich wie PubChem and Its Applications for Drug Discovery (20)

PubChem as a resource for chemical information training
PubChem as a resource for chemical information trainingPubChem as a resource for chemical information training
PubChem as a resource for chemical information training
 
Exploiting PubChem for Drug Discovery
Exploiting PubChem for Drug DiscoveryExploiting PubChem for Drug Discovery
Exploiting PubChem for Drug Discovery
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
 
ChemSpider as a hub for online chemical information resources
ChemSpider as a hub for online chemical information resources   ChemSpider as a hub for online chemical information resources
ChemSpider as a hub for online chemical information resources
 
PubChem as a resource for chemical information education
PubChem as a resource for chemical information educationPubChem as a resource for chemical information education
PubChem as a resource for chemical information education
 
PubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data ChemistryPubChem: A Public Chemical Information Resource for Big Data Chemistry
PubChem: A Public Chemical Information Resource for Big Data Chemistry
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Cheminformatics Education with PubChem
Cheminformatics Education with PubChemCheminformatics Education with PubChem
Cheminformatics Education with PubChem
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 
Mechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data Cloud
Mechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data CloudMechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data Cloud
Mechanism-Based Pharmacovigilance Over the Life-Sciences Linked-Open-Data Cloud
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChem
 
Probe Miner AACR Annual Meeting, Chicago, 2018
Probe Miner AACR Annual Meeting, Chicago, 2018Probe Miner AACR Annual Meeting, Chicago, 2018
Probe Miner AACR Annual Meeting, Chicago, 2018
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Environmental chemical information in PubChem
Environmental chemical information in PubChem Environmental chemical information in PubChem
Environmental chemical information in PubChem
 
Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...
Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...
Tools to Retrieve relevant AOP information - Holly Mortensen, US Environmenta...
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
 

Mehr von Sunghwan Kim

PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligenceSunghwan Kim
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationSunghwan Kim
 
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Sunghwan Kim
 
PubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourcePubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourceSunghwan Kim
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistrySunghwan Kim
 
Chemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemChemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemSunghwan Kim
 
Chemical Structure Standardization and Synonym Filtering in PubChem
Chemical Structure Standardization and Synonym Filtering in PubChemChemical Structure Standardization and Synonym Filtering in PubChem
Chemical Structure Standardization and Synonym Filtering in PubChemSunghwan Kim
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Sunghwan Kim
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem Sunghwan Kim
 
NCBI Minute: Integrating PubChem into Your Chemistry Teaching
NCBI Minute: Integrating PubChem into Your Chemistry TeachingNCBI Minute: Integrating PubChem into Your Chemistry Teaching
NCBI Minute: Integrating PubChem into Your Chemistry TeachingSunghwan Kim
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 

Mehr von Sunghwan Kim (12)

PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
PubChem and its application for cheminformatics education
PubChem and its application for cheminformatics educationPubChem and its application for cheminformatics education
PubChem and its application for cheminformatics education
 
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
Cheminformatics Online Chemistry Course (OLCC): A Community Effort to Introdu...
 
PubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information ResourcePubChem as an Emerging Toxicological Information Resource
PubChem as an Emerging Toxicological Information Resource
 
PubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistryPubChem: a public chemical information resource for big data chemistry
PubChem: a public chemical information resource for big data chemistry
 
Chemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChemChemical Health and Safety Information in PubChem
Chemical Health and Safety Information in PubChem
 
Chemical Structure Standardization and Synonym Filtering in PubChem
Chemical Structure Standardization and Synonym Filtering in PubChemChemical Structure Standardization and Synonym Filtering in PubChem
Chemical Structure Standardization and Synonym Filtering in PubChem
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...Using open bioactivity data for developing machine-learning prediction models...
Using open bioactivity data for developing machine-learning prediction models...
 
Searching for patent information in PubChem
Searching for patent information in PubChem Searching for patent information in PubChem
Searching for patent information in PubChem
 
NCBI Minute: Integrating PubChem into Your Chemistry Teaching
NCBI Minute: Integrating PubChem into Your Chemistry TeachingNCBI Minute: Integrating PubChem into Your Chemistry Teaching
NCBI Minute: Integrating PubChem into Your Chemistry Teaching
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 

Kürzlich hochgeladen

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Kürzlich hochgeladen (20)

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

PubChem and Its Applications for Drug Discovery

  • 2. Lehigh Univ., Apr. 26, 2019 URL: https://pubchem.ncbi.nlm.nih.gov
  • 3. Lehigh Univ., Apr. 26, 2019 3 Introduction to PubChem
  • 4. Lehigh Univ., Apr. 26, 2019  PubChem (https://pubchem.ncbi.nlm.nih.gov)  A “public” repository of information on small molecules and their biological activities, developed and maintained by the U.S. National Institutes of Health (NIH).  Launched in 2004 as a part of the Molecular Libraries Roadmap initiatives.  A key resource of chemical information for researchers in the area of cheminformatics, chemical biology, medicinal chemistry, and many others.
  • 5. Lehigh Univ., Apr. 26, 2019 0.0 1.0 2.0 3.0 4.0 Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan NumberofUsers (millions) Month Number of Monthly Unique Users (Jan 2015 - Mar 2019, interactive users only) 2017 PubChem Usage Statistics 2015 2016 3.5 million unique users per month at peak (October 2018) 2018 5 2019
  • 6. Lehigh Univ., Apr. 26, 2019 Top 5 Chemistry Websites 1. acs.org 2. rsc.org 3. sigmaaldrich.com 4. pubchem.ncbi.nlm.nih.gov 5. cas.org Source: https://www.alexa.com/topsites/category/Top/Science/Chemistry 6 PubChem is the only public website among them.
  • 7. Lehigh Univ., Apr. 26, 2019  Dual role of PubChem >600 Data sources PubChem The public  Data archive/repository: Collects/maintains chemical information submitted by data contributors.  Knowledgebase: Provides (high-quality) data to the public.
  • 8. Lehigh Univ., Apr. 26, 2019 Depositor-provided Bioactivity test results Unique chemical structure extraction through Standardization Depositor-provided substance descriptions Unique chemical structures Activity of tested “substances” Activity of “compounds” derived from associated “substances” Data Contributors Substance deposition Assay deposition  Data organization in PubChem
  • 9. Lehigh Univ., Apr. 26, 2019  PubChem (https://pubchem.ncbi.nlm.nih.gov)  PubChem contains: • >243.9 million substance descriptions, • >97.6 million unique chemical structures, • >264.8 million biological activity test results, • >1.3 million biological assays, covering >10,000 unique protein sequence targets. (Arguably) the largest corpus of publicly available chemical information from 600+ data sources. (as of April 24, 2016)
  • 10. Lehigh Univ., Apr. 26, 2019 10 Exploring PubChem using the web interfaces
  • 11. Lehigh Univ., Apr. 26, 2019 11 Note  (Almost) all tasks you can do using PubChem’s web interfaces can be automated using its programmatic interfaces.
  • 12. Lehigh Univ., Apr. 26, 2019 12 PubChem Web Interfaces  Text Search  Structure Search  ID List Upload  Classification Browser  PubChem Docs
  • 13. Lehigh Univ., Apr. 26, 2019 13 URL: https://pubchem.ncbi.nlm.nih.gov
  • 14. Lehigh Univ., Apr. 26, 2019 14
  • 15. Lehigh Univ., Apr. 26, 2019 15
  • 16. Lehigh Univ., Apr. 26, 2019 16
  • 17. Lehigh Univ., Apr. 26, 2019 17
  • 18. Lehigh Univ., Apr. 26, 2019 18 PubChem Web Interfaces  Text Search  Structure Search  ID List Upload  Classification Browser  PubChem Docs
  • 19. Lehigh Univ., Apr. 26, 2019 19
  • 20. Lehigh Univ., Apr. 26, 2019 20
  • 21. Lehigh Univ., Apr. 26, 2019 21
  • 22. Lehigh Univ., Apr. 26, 2019 22
  • 23. Lehigh Univ., Apr. 26, 2019 23 PubChem Web Interfaces  Text Search  Structure Search  ID List Upload  Classification Browser  PubChem Docs
  • 24. Lehigh Univ., Apr. 26, 2019 24
  • 25. Lehigh Univ., Apr. 26, 2019 25
  • 26. Lehigh Univ., Apr. 26, 2019 26
  • 27. Lehigh Univ., Apr. 26, 2019 27
  • 28. Lehigh Univ., Apr. 26, 2019 28 PubChem Web Interfaces  Text Search  Structure Search  ID List Upload  Classification Browser  PubChem Docs
  • 29. Lehigh Univ., Apr. 26, 2019 29
  • 30. Lehigh Univ., Apr. 26, 2019 30
  • 31. Lehigh Univ., Apr. 26, 2019 31
  • 32. Lehigh Univ., Apr. 26, 2019 32 Structure download Refine the result
  • 33. Lehigh Univ., Apr. 26, 2019 33 PubChem Web Interfaces  Text Search  Structure Search  ID List Upload  Classification Browser  PubChem Docs
  • 34. Lehigh Univ., Apr. 26, 2019 34
  • 35. Lehigh Univ., Apr. 26, 2019 35
  • 36. Lehigh Univ., Apr. 26, 2019 36 PubChem for Drug Discovery
  • 37. Lehigh Univ., Apr. 26, 2019  PubChem’s chemical space Lipinski’s rule of 5 (drug-likeness) Congreve’s rule of 3 (lead-likeness) Lead compounds Drug candidates Modification Mol. Wt.: < 500 H-Bond Donor ≤ 5 H-Bond Accepter ≤ 10 LogP ≤ 5 Mol. Wt.: < 300 H-Bond Donor ≤ 3 H-Bond Acceptor ≤ 3 LogP ≤ 3 Rotatable Bond ≤ 3 PSA ≤ 60
  • 38. Lehigh Univ., Apr. 26, 2019 Lead-like 11.2 millions (11%) Drug-like 73.3 millions (75%) All compounds 97.6 millions (100%)  PubChem’s chemical space
  • 39. Lehigh Univ., Apr. 26, 2019 Ro5 73.3 millions (75%) Ro51 15.0 millions (15%) Ro52 6.5 millions (7%) Ro53 1.3 millions (1%) Ro54 0.3 millions (~0%)  PubChem’s chemical space Ro5 + Ro5-1 = 90%
  • 40. Lehigh Univ., Apr. 26, 2019  Bioactivity Data in PubChem Tested 3.4 millions (3.50%) Active (AC  1 nM) 62 thousands (0.06%) Active (1 nM < AC  1 M) 713 thousands (0.73%) Active (others) 465 thousands (0.47%) Inactive 2.1 millions (1.34%) Not Tested 94.2 millions (96.51%) All Compounds 97.6 millions (100.00%)
  • 41. Lehigh Univ., Apr. 26, 2019 High-Throughput Screening data Literature-extracted data  Bioactivity Data in PubChem
  • 42. Lehigh Univ., Apr. 26, 2019 High-Throughput Screening data • From Molecular Libraries Program and other HTS projects. • Many inactives • False hits (e.g., aggregators, autofluoresent compounds) • Often measured at single concentration Literature-extracted data  Bioactivity Data in PubChem
  • 43. Lehigh Univ., Apr. 26, 2019 High-Throughput Screening data • From Molecular Libraries Program and other HTS projects. • Many inactives • False hits (e.g., aggregators, autofluoresent compounds) • Often measured at single concentration Literature-extracted data • From manual curation or data mining • No (or few) inactives • Provided by various PubChem depositors including: ChEMBL, PDBbind, BindingDB, Guide to Pharmacology  Bioactivity Data in PubChem
  • 44. Lehigh Univ., Apr. 26, 2019 • Manual extraction from peer-reviewed papers in journals in medicinal chemistry and natural product ChEMBL • Experimental binding affinity data for biomolecular complexes in the PDB PDBBind • Binding affinities, focusing chiefly on the interactions of protein considered to be drug targets with drug like small molecules BindingDB • Drug targets (G-protein-coupled receptors, ion channels, and nuclear hormone receptors and their ligands Guide to Pharmacology  Literature-extracted Bioactivity data
  • 45. Lehigh Univ., Apr. 26, 2019  Annotations available in PubChem  DrugBank  Comprehensive information on FDA-approved and investigational drugs • drug indications, • mechanism of action, • target macromolecules, • interactions with genes/proteins, • ADMET, ……  Hazardous Substance Data Bank (HSDB)  Toxicological information on chemicals of interest in environmental and human health
  • 46. Lehigh Univ., Apr. 26, 2019  Annotations available in PubChem  Molecular Modeling Database (MMDB)  Protein-bound 3D structures (found in PDB)  Cambridge Structural Database (CSD)  3-D crystal structures  NLM’s Dailymed  Drug labeling information
  • 47. Lehigh Univ., Apr. 26, 2019  Annotations available in PubChem  FDA  Orange book  Unique ingredient identifiers,  Pharmacologic Classes  EPA  Substance Registry Services  Chemical data collected under the: o Toxic Substance Control Act o Clean Air Act
  • 48. Lehigh Univ., Apr. 26, 2019  Availability of compounds for subsequent experiments • Virtual screening hits should be synthesizable or purchasable. • PubChem contains “real” molecules (not “virtual” molecules) • At least one or more data contributors claim that they have the compound and/or information about it. • Some of these data contributors are chemical vendors (e.g., Sigma Aldrich).
  • 49. Lehigh Univ., Apr. 26, 2019  Two important aspects of PubChem records (in the context of “compound availability”)  Non-live compounds:  Not searchable although they exist.  No associated substances due to: o Mistakenly submitted substances o Incorrect information o No intention to share
  • 50. Lehigh Univ., Apr. 26, 2019  Two important aspects of PubChem records (in the context of “compound availability”)  Non-live compounds:  Not searchable although they exist.  No associated substances due to: o Mistakenly submitted substances o Incorrect information o No intention to share  Legacy designation:  No longer maintains their records up-to-date. o Discontinued funding, low business priority, …
  • 51. Lehigh Univ., Apr. 26, 2019  Compound patentability for IP protection • PubChem provides patent link information for compounds, provided by: o IBM o SureChEMBL (formerly, SureChem) o NextMove Software o SCRIPDB o BindingDB
  • 52. Lehigh Univ., Apr. 26, 2019  Compound patentability for IP protection >336 million chemical-patent links >6 million patent documents >16 million unique chemical structures • Covering patent documents published by: o U.S. Patent and Trademark Office (PTO) o European Patent Office (EPO) o World Intellectual Property Organization (WIPO)  Compounds with patent links are annotated with WIPO International Patent Classification (IPC) information
  • 53. Lehigh Univ., Apr. 26, 2019 Entrez Utilities (E-Utils) Power User Gateway (PUG) PUG-SOAP PUG-REST PubChem RDF REST PUG-View  Programmatic access to PubChem for Automation of VS pipeline
  • 54. Lehigh Univ., Apr. 26, 2019  Conceptual Framework of a PUG-REST request PubChem Servers User’s Computer ① INPUT Identifiers (CIDs, SIDs, AIDs) ③ OUTPUT Results in a desired format ② OPERATION with identifiers  All necessary information is encoded into a one-line URL.
  • 55. Lehigh Univ., Apr. 26, 2019 Options specific to some operations http://pubchem.ncbi.nlm.nih.gov/rest/pug/<INPUT>/<OPERATION>/<OUTPUT>[?OPTIONS] Prolog (common to all PUG REST requests) <INPUT> Specifies identifiers of interest, by identifiers by chemical name by chemical structure search by cross reference by listkey, ...... <OPERATION> Specifies what to do with input get full records get molecular properties get synonyms or images get cross references many other operations <OUTPUT> Specifies desired output format XML  PNG JSON  SDF JSONP  CSV ASNB  TXT ASNT  URL construction for a PUG-REST request http://...... /compound/cid/2244,1983/record/XML?record_type=3d  Retrieve in XML full records for CIDs 2244 and 1983, including 3-D structure description.  All necessary information is encoded into a one-line URL.
  • 56. Lehigh Univ., Apr. 26, 2019 Request volume limitations  PUG-REST is NOT designed for very large volumes of requests (e.g. millions of requests)  Any script or application should not make more than five requests per second to avoid overloading the PubChem servers.  If you have a large data set to process, please contact us for help on optimizing your task.
  • 57. Lehigh Univ., Apr. 26, 2019  PubChemRDF  Encodes PubChem information using RDF.  Helps researchers work with PubChem data on local computing resources using semantic web technologies.  harnesses ontological frameworks to help facilitate PubChem data sharing, analysis, and integration with resources external to the National Center for Biotechnology (NCBI) and across scientific domains. RDF = Resource Description Framework
  • 58. Lehigh Univ., Apr. 26, 2019  Programmatic access to PubChem RDF data is also available through a REST-ful interface.  PubChemRDF for data exchange and integration PubChemRDF From FTP Triple Store Apache Jena TDB, Open-Link Virtuoso, …… RDF-aware graph Databases Neo4j, … SPARQL Query Interface Graph Traversal Algorithm (RDF : Resource Description Framework)
  • 59. Lehigh Univ., Apr. 26, 2019 59 Summary
  • 60. Lehigh Univ., Apr. 26, 2019  Summary • PubChem is the largest source of publicly available chemical information, collected from more than 600 data sources. • In addition to bioactivity data generated through high-throughput screenings, PubChem contains a substantial amount of bioactivity information extracted from scientific articles. • Chemical vendor and patent information for compounds in PubChem helps prioritize hit compounds for further screening.
  • 61. Lehigh Univ., Apr. 26, 2019  Summary • PubChem supports programmatic access to its data, allowing for building an automated virtual screening pipeline. • PubChemRDF allows users to download PubChem data on a local computing facility and integrate them with their own data. • PubChem data can be used for developing computational prediction models for bioactivity or toxicity of molecules.
  • 62. Lehigh Univ., Apr. 26, 2019  References  Getting the most out of PubChem for virtual screening Expert Opin. Drug Discov. 2016, 11(9), 843-855.  PubChem 2019 update: improved access to chemical data Nucleic Acids Res (2019) 47(D1):D1102-1109.  PubChem Substance and Compound Databases Nucleic Acids Res (2016) 44 (D1): D1202-D1213.  An update on PUG-REST: RESTful interface for programmatic access to PubChem Nucleic Acids Res (2017) 46 (W1): W563-570.  PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem Nucleic Acids Res (2015) 43 (W1): W605-W611
  • 63. Lehigh Univ., Apr. 26, 2019 Acknowledgements  The PubChem Team  Funding from the National Library of Medicine 63  PubChem data contributors and users Evan Bolton Jia He Thiessen Paul Jie Chen Siqian He Bo Yu Tiejun Chung Qingliang Li Leonid Zaslavsky Asta Gindulyte Ben Shoemaker Jian Zhang
  • 64. Lehigh Univ., Apr. 26, 2019 Thank you for your attention. Questions? Email: sunghwan.kim@nih.gov kimsungh@ncbi.nlm.nih.gov