SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Using Chemicalize.org with Other Open
Resources to Extract SAR from Patents and
     Explore Intersects in PubChem




                     Christopher Southan

             ChrisDS Consulting, Göteborg, Sweden,

    Prepared for the ChemAxon UGM, May 2012, version 2nd May




                                                               [1]
Key Relationships in Patents and Papers
                                                                            MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGGA
                                                                            PLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGY
                                                                            YVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQ
                                                                            RQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGP
                                                                            NVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDD
                                                                            SLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASV
                                                                            GGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQ
                                                                            DLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKA
                                                                            ASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLM
                                                                            GEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSS
                                                                            TGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRT
                                                                            AAVEGPFVTLDMEDCGYNIPQTDESTLMTIAYVMAAICAL
                                                                            FMLPLCLMVCQWRCLRCLRQQHDDFADDISLLK




     Document            Assay            Result             Compound             Target


                                                                   Discerning and
                                                                   mapping these
   2011 http://www.ncbi.nlm.nih.gov/pubmed/21569515                relatioshionships from
                                                                   documents is crucial
                                                                   and demanding

                                                                   Chemicalize.org is a
    2010 http://www.citeulike.org/user/cdsouthan/article/8637426   significant advance in
                                                                   open chemistry
                                                                   extraction

2012 http://www.slideshare.net/cdsouthan/southan-bio-it2012patents
                                                                                                             [2]
Practical Utilities

• Name-to-struc (n>s) for selected or batch conversions from
  patents, papers, abstracts, web pages and other sources
• Intersect different content at identity or similarity level
• Molecular properties and bulk download
• Extracted structures archived, searchable and sharable
• Similarity display of analogue series from a document
• Bulk upload to PubChem for intersects and triage
• Result display in JChem for Excel
• Can iterate with OPSIN for IUPAC fixes
                                                                [3]
Chemicalize.org Exploitation Challenges

• Specific retrieval of patent or other source (e.g. target recall)
• Working different sources (e.g. CiteXplore/espace/Scibite for retrieval,
  Google for cross-checks, WIPO for images and tables,
  Freepatentsonline for deeper queries)
• Eyeballing original documents for relevant sections
• Locating exemplified drug-relevant/lead-like structures with data links
• For many patents examples >> activity data links > potent structures
• Selecting best sources/family members for optimal IUPAC extraction
  quality (e.g. US pats and FPO)
• Filtering novel structures from common chemistry
• Need to be PubChem cogniscant for effective triage
• For a variety of reasons some documents have low extraction rates
• Tricks and work-rounds enhance exploitation


                                                                             [4]
Target Recall: CiteExplore




•   Title only ”DPPIV”                           Medline = 37 Patents = 31
•   Title + abstract ”DPPIV”                    Medline = 402 Patents = 144
•   Title + abstract ”dipeptidyl peptidase” Medline = 4,838 Patents = 1,520
•   Title + abstract ” inhibitor”       Medline = 772,053 Patents = 124,516
•   Title + abstract ” diabetes”        Medline = 431,299 Patents = 36,792
•   Title + abstract ”DPPIV OR dipeptidyl peptidase AND inhibitor AND
    diabetes” Medline = 1,105 Patents = 604

CiteXplore is restricted to EBI patent abstracts so you can get higher recall at
full-text sources such as SureChemOpen, EPO/espace, WIPO and FPO
(but not search Medline in parallel)

                                                                                   [5]
Target Alerts: SciBite




                      US2012040982
                          DPPIV
                    Boeringer Ingelheim
                         Feb 2012
                                          [6]
Slicing and Dicing US2012040982 (I)




•   Chemicalize converted 1,390 structures from the FreePatentsOnline (FPO) URL
•   From the 497 examples 486 converted
•   Need to scan the document and iterate with scroll bar to spot lead-like structures
                                                                                         [7]
Slicing and Dicing US2012040982 (II)




• OPSIN picks up some of what chemicalize misses (e.g. 389 above) but not all
• OPSIN error reports may help fix a series for Chemicalize (e.g 1 vs. L)
• Practically more important if that example has potent activity

                                                                            [8]
Slicing and Dicing US2012040982 (III)




•   Similarity display clearly picks out the lead-like analog series (top)
•   Select via FPO text > example list only, > Word > PDF > chemicalize
    upload > SDF download 486 structures (bottom)
•   However, from the partial descriptions these may include prophetics
•   Also download 28 claimed examples via PDF                                [9]
Slicing and Dicing US2012040982 (IV)




•   Can locate an SAR table with 11 point IC50s
•   But.... only 9 examples below 100 nM, example 25 is 56 nM
•   The designation of series 1 and 2 obfuscates their example identity
                                                                          [10]
PubChem Triage of Chemicalize Output (I)




•   Example 25 SMILES > neither an exact match nor tautomer – thus novel
•   Repeat search at 95% Tanimoto > 289 neighbors > cluster
•   Closest PubChem analog > ChemSpider > SureChem > Novo Nordisk DPPIV
    patent from 2005                                                     [11]
PubChem Triage of Chemicalize Output (II)




•   Total extraction from US2012040982 > 1,390 SDFs > 1387 uploaded > 7 “failed”
•   493 exact matches (= preexisting PubChem CIDs)
•   486 example-only SDFs > upload > 21 exact-match CIDs
•   34 claims-only give 9 exact-match CIDs, primary sources were:
•   5 from ChEMBL from a Boeringer Ingleheim 2007 Publication
•   7 from Thomson Pharma
•   2 from ChemSpider with SureChem links to Boeringer Ingleheim patents
•   Thus 461 examples chemicalized from US2012040982 are “novel” structures
•   However, cannot check enatiomeric or tautomeric inexact matches from
    PubChem interface (only for existing CIDs)
                                                                              [12]
PubChem Triage of Chemicalize Output (III)




•   Chemicalize examples-plus-claims US201204098 = 29 CIDs (search 36 above)
•   Thomson Pharma/Discovery gate intersect is ~ Derwent WPI (search 31)
•   This matched 20 from the 29 (search 36), presumably DWPI extractions
•   ChEMBL (7) matched 6 from 29 (i.e. extracted from papers)
•   SLING matched 8 from 29 (i.e. extracted from EPO patents)
•   It was thus possible to intersect the chemicalize extractions from this patent with
    four independent primary sources in PubChem from patents and publications

                                                                                          [13]
Patent ”Walking” from Chemicalize
                   similarity results (I)




•   The similarity results from one example gave 1734 matches out to Tanimoto 0.5,
    extending ”beyond” the example space of US2012040982.
•   Scrolling these shows at Tanimoto 0.6, with shared substructures in blue,
    connect to a different older patent US7772226, also for DPPIV, from Eisai

                                                                                 [14]
Patent ”Walking” from Chemicalize
                    similarity results (II)




•   US7772226 from FPO converted 1127 (i.e. more than the 992 from PatBase)
•   680 matched PubChem CIDs
•   Example 228 CC#CCN1C(=NC2=C1C(=O)NC(OC1=CC=C(C=C1)C(=O)OC(=O)C(F)(F)F)=N2)N1CCNCC1
    had a 12 nM IC50 for DPPIV
•   Can even ”walk” to a third DPPIV patent WO2007071738 from Novartis
                                                                                         [15]
Extracting from CiteXplore ChEMBL

                         • CiteXplore lists ChEMBL
                           IUPACs and IDs

                         • Can chemicalize all
                           ChEMBL structures from
                           from one paper

                         • Difficult to ID these in
                           ChEMBL

                         • Upload 8 structures to
                           PubChem

                         • 7 match ChEMBL IDs

                         • Only one matches the 29
                           from US2012040982

                         • Thus paper probably from
                           mutiple patents
                                                      [16]
Mining PubMed Central Full-text Papers (I)




•    Only a few examples converted direct
•    So > wordpad > direct chemicalize (iterate) > web page (Google sites)
•    Download > Upload to JChem for Excel
•    Add in IC50 values from paper

                                                                             [17]
Mining PubMed Central Full-text Papers (II)




                         •   Add the SAR data from
                             the paper into the
                             structure table

                         •   These had no exact
                             matches in PubChem




                                                     [18]
Chemicalizing the DrugBank Entry for DPPIV


                             41 conversions of
                             inhbitors, many are PDB
                             ligands




                                                       [19]
Can Even Extract Catalogues that have no
          SMILES or InChIs....




                                   Tocris DPPIV
                                   inhibitor >
                                   chemicalize >

                                   PubChem > 6
                                   analogs




                                               [20]
Conclusions
•   Chemicalize.org is powerful, flexible and free, as in beer....
•   Significantly enables small-scale roll-your-own patent mining
•   Ditto for journal article/abstract mining (e.g. for papers not captured in ChEMBL)
•   You still need perspicacity to discern SAR details
•   Complementary to commercial patent databases populated by manual extraction
    (e.g. you can extract more structures)
•   Commercial automated patent extraction databases typically combine ChemAxon
    n>s with other algorithms (e.g. http://www.chemaxon.com/library/benchmarking-
    chemaxon%E2%80%99s-name-to-structure-batch-tool-on-patent-text/)
•   While they thus out-perform chemicalize, it is still very useful for intersecting
    journal articles or other sources against any databases
•   Significant novel content (w.r.t. public databases) is accumulating via ”default
    crowdsourcing” in the chemicalize archive which becomes an important cross-
    check source and can be ”walked” between documents
•   Combined with OPSIN and OSRA structures from most sources are extractable
•   Synergies with sources such as PubChem, PubMed Central, ChEMBL and
    SureChemOpen will advance academic drug discovery and chemical biology


                                                                                    [21]
Questions Welcome

ChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm
Mobile: +46(0)702-530710
Skype: cdsouthan
Email: cdsouthan – at - hotmail.com
Twitter:       http://twitter.com/#!/cdsouthan
Blog:          http://cdsouthan.blogspot.com/ (includes postings on patent themes)
LinkedIN:      http://www.linkedin.com/in/cdsouthan
Website:       http://www.cdsouthan.info/CDS_prof.htm
Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year
Citations:     http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en
Presentations: http://www.slideshare.net/cdsouthan




                                                                                     [22]

Weitere ähnliche Inhalte

Ähnlich wie Exploring SAR between Patents and PubChem

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
Comparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between DatabasesComparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between DatabasesChris Southan
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)Oscar Corcho
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Prof. Wim Van Criekinge
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLDr. Haxel Consult
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceGeorge Papadatos
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsSean Ekins
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningOlexandr Isayev
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureChris Southan
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...Kamel Mansouri
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Kamel Mansouri
 

Ähnlich wie Exploring SAR between Patents and PubChem (20)

The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Comparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between DatabasesComparison of Compounds-to-targets between Databases
Comparison of Compounds-to-targets between Databases
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
Bioinformatics t9-t10-bio cheminformatics-wimvancriekinge_v2013
 
ICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBLICIC 2014 From SureChem to SureChEMBL
ICIC 2014 From SureChem to SureChEMBL
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
CINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resourceCINF 55: SureChEMBL: An open patent chemistry resource
CINF 55: SureChEMBL: An open patent chemistry resource
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Data Management Plan in Clinical Trials.pptx
Data Management Plan in Clinical Trials.pptxData Management Plan in Clinical Trials.pptx
Data Management Plan in Clinical Trials.pptx
 
CDISC-CDASH
CDISC-CDASHCDISC-CDASH
CDISC-CDASH
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning modelsDevelopment and sharing of ADME/Tox and Drug Discovery Machine learning models
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 
Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...Free online access to experimental and predicted chemical properties through ...
Free online access to experimental and predicted chemical properties through ...
 

Mehr von Chris Southan

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulationsChris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeChris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Chris Southan
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCPChris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteinsChris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFERChris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 posterChris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagensChris Southan
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyChris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide TribulationsChris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology updateChris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProtChris Southan
 

Mehr von Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 

Kürzlich hochgeladen

Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 

Kürzlich hochgeladen (20)

Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 

Exploring SAR between Patents and PubChem

  • 1. Using Chemicalize.org with Other Open Resources to Extract SAR from Patents and Explore Intersects in PubChem Christopher Southan ChrisDS Consulting, Göteborg, Sweden, Prepared for the ChemAxon UGM, May 2012, version 2nd May [1]
  • 2. Key Relationships in Patents and Papers MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGGA PLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGY YVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQ RQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGP NVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDD SLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASV GGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQ DLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKA ASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLM GEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSS TGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRT AAVEGPFVTLDMEDCGYNIPQTDESTLMTIAYVMAAICAL FMLPLCLMVCQWRCLRCLRQQHDDFADDISLLK Document Assay Result Compound Target Discerning and mapping these 2011 http://www.ncbi.nlm.nih.gov/pubmed/21569515 relatioshionships from documents is crucial and demanding Chemicalize.org is a 2010 http://www.citeulike.org/user/cdsouthan/article/8637426 significant advance in open chemistry extraction 2012 http://www.slideshare.net/cdsouthan/southan-bio-it2012patents [2]
  • 3. Practical Utilities • Name-to-struc (n>s) for selected or batch conversions from patents, papers, abstracts, web pages and other sources • Intersect different content at identity or similarity level • Molecular properties and bulk download • Extracted structures archived, searchable and sharable • Similarity display of analogue series from a document • Bulk upload to PubChem for intersects and triage • Result display in JChem for Excel • Can iterate with OPSIN for IUPAC fixes [3]
  • 4. Chemicalize.org Exploitation Challenges • Specific retrieval of patent or other source (e.g. target recall) • Working different sources (e.g. CiteXplore/espace/Scibite for retrieval, Google for cross-checks, WIPO for images and tables, Freepatentsonline for deeper queries) • Eyeballing original documents for relevant sections • Locating exemplified drug-relevant/lead-like structures with data links • For many patents examples >> activity data links > potent structures • Selecting best sources/family members for optimal IUPAC extraction quality (e.g. US pats and FPO) • Filtering novel structures from common chemistry • Need to be PubChem cogniscant for effective triage • For a variety of reasons some documents have low extraction rates • Tricks and work-rounds enhance exploitation [4]
  • 5. Target Recall: CiteExplore • Title only ”DPPIV” Medline = 37 Patents = 31 • Title + abstract ”DPPIV” Medline = 402 Patents = 144 • Title + abstract ”dipeptidyl peptidase” Medline = 4,838 Patents = 1,520 • Title + abstract ” inhibitor” Medline = 772,053 Patents = 124,516 • Title + abstract ” diabetes” Medline = 431,299 Patents = 36,792 • Title + abstract ”DPPIV OR dipeptidyl peptidase AND inhibitor AND diabetes” Medline = 1,105 Patents = 604 CiteXplore is restricted to EBI patent abstracts so you can get higher recall at full-text sources such as SureChemOpen, EPO/espace, WIPO and FPO (but not search Medline in parallel) [5]
  • 6. Target Alerts: SciBite US2012040982 DPPIV Boeringer Ingelheim Feb 2012 [6]
  • 7. Slicing and Dicing US2012040982 (I) • Chemicalize converted 1,390 structures from the FreePatentsOnline (FPO) URL • From the 497 examples 486 converted • Need to scan the document and iterate with scroll bar to spot lead-like structures [7]
  • 8. Slicing and Dicing US2012040982 (II) • OPSIN picks up some of what chemicalize misses (e.g. 389 above) but not all • OPSIN error reports may help fix a series for Chemicalize (e.g 1 vs. L) • Practically more important if that example has potent activity [8]
  • 9. Slicing and Dicing US2012040982 (III) • Similarity display clearly picks out the lead-like analog series (top) • Select via FPO text > example list only, > Word > PDF > chemicalize upload > SDF download 486 structures (bottom) • However, from the partial descriptions these may include prophetics • Also download 28 claimed examples via PDF [9]
  • 10. Slicing and Dicing US2012040982 (IV) • Can locate an SAR table with 11 point IC50s • But.... only 9 examples below 100 nM, example 25 is 56 nM • The designation of series 1 and 2 obfuscates their example identity [10]
  • 11. PubChem Triage of Chemicalize Output (I) • Example 25 SMILES > neither an exact match nor tautomer – thus novel • Repeat search at 95% Tanimoto > 289 neighbors > cluster • Closest PubChem analog > ChemSpider > SureChem > Novo Nordisk DPPIV patent from 2005 [11]
  • 12. PubChem Triage of Chemicalize Output (II) • Total extraction from US2012040982 > 1,390 SDFs > 1387 uploaded > 7 “failed” • 493 exact matches (= preexisting PubChem CIDs) • 486 example-only SDFs > upload > 21 exact-match CIDs • 34 claims-only give 9 exact-match CIDs, primary sources were: • 5 from ChEMBL from a Boeringer Ingleheim 2007 Publication • 7 from Thomson Pharma • 2 from ChemSpider with SureChem links to Boeringer Ingleheim patents • Thus 461 examples chemicalized from US2012040982 are “novel” structures • However, cannot check enatiomeric or tautomeric inexact matches from PubChem interface (only for existing CIDs) [12]
  • 13. PubChem Triage of Chemicalize Output (III) • Chemicalize examples-plus-claims US201204098 = 29 CIDs (search 36 above) • Thomson Pharma/Discovery gate intersect is ~ Derwent WPI (search 31) • This matched 20 from the 29 (search 36), presumably DWPI extractions • ChEMBL (7) matched 6 from 29 (i.e. extracted from papers) • SLING matched 8 from 29 (i.e. extracted from EPO patents) • It was thus possible to intersect the chemicalize extractions from this patent with four independent primary sources in PubChem from patents and publications [13]
  • 14. Patent ”Walking” from Chemicalize similarity results (I) • The similarity results from one example gave 1734 matches out to Tanimoto 0.5, extending ”beyond” the example space of US2012040982. • Scrolling these shows at Tanimoto 0.6, with shared substructures in blue, connect to a different older patent US7772226, also for DPPIV, from Eisai [14]
  • 15. Patent ”Walking” from Chemicalize similarity results (II) • US7772226 from FPO converted 1127 (i.e. more than the 992 from PatBase) • 680 matched PubChem CIDs • Example 228 CC#CCN1C(=NC2=C1C(=O)NC(OC1=CC=C(C=C1)C(=O)OC(=O)C(F)(F)F)=N2)N1CCNCC1 had a 12 nM IC50 for DPPIV • Can even ”walk” to a third DPPIV patent WO2007071738 from Novartis [15]
  • 16. Extracting from CiteXplore ChEMBL • CiteXplore lists ChEMBL IUPACs and IDs • Can chemicalize all ChEMBL structures from from one paper • Difficult to ID these in ChEMBL • Upload 8 structures to PubChem • 7 match ChEMBL IDs • Only one matches the 29 from US2012040982 • Thus paper probably from mutiple patents [16]
  • 17. Mining PubMed Central Full-text Papers (I) • Only a few examples converted direct • So > wordpad > direct chemicalize (iterate) > web page (Google sites) • Download > Upload to JChem for Excel • Add in IC50 values from paper [17]
  • 18. Mining PubMed Central Full-text Papers (II) • Add the SAR data from the paper into the structure table • These had no exact matches in PubChem [18]
  • 19. Chemicalizing the DrugBank Entry for DPPIV 41 conversions of inhbitors, many are PDB ligands [19]
  • 20. Can Even Extract Catalogues that have no SMILES or InChIs.... Tocris DPPIV inhibitor > chemicalize > PubChem > 6 analogs [20]
  • 21. Conclusions • Chemicalize.org is powerful, flexible and free, as in beer.... • Significantly enables small-scale roll-your-own patent mining • Ditto for journal article/abstract mining (e.g. for papers not captured in ChEMBL) • You still need perspicacity to discern SAR details • Complementary to commercial patent databases populated by manual extraction (e.g. you can extract more structures) • Commercial automated patent extraction databases typically combine ChemAxon n>s with other algorithms (e.g. http://www.chemaxon.com/library/benchmarking- chemaxon%E2%80%99s-name-to-structure-batch-tool-on-patent-text/) • While they thus out-perform chemicalize, it is still very useful for intersecting journal articles or other sources against any databases • Significant novel content (w.r.t. public databases) is accumulating via ”default crowdsourcing” in the chemicalize archive which becomes an important cross- check source and can be ”walked” between documents • Combined with OPSIN and OSRA structures from most sources are extractable • Synergies with sources such as PubChem, PubMed Central, ChEMBL and SureChemOpen will advance academic drug discovery and chemical biology [21]
  • 22. Questions Welcome ChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm Mobile: +46(0)702-530710 Skype: cdsouthan Email: cdsouthan – at - hotmail.com Twitter: http://twitter.com/#!/cdsouthan Blog: http://cdsouthan.blogspot.com/ (includes postings on patent themes) LinkedIN: http://www.linkedin.com/in/cdsouthan Website: http://www.cdsouthan.info/CDS_prof.htm Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year Citations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en Presentations: http://www.slideshare.net/cdsouthan [22]