Digging out Structures for Repurposing: Non-competitive Intelligence
1. Digging out Structures for Repurposing:
Non-competitive Intelligence
PubChem Seminar April 2013
Christopher Southan, TW2Informatics, Göteborg, Sweden
[1]
3. Outline
• Trawling for repurposing-relevant data
• Code names statistics and name > structure triage
• The NCATS/MRC challenge
• Story of JNJ-39393406
• Scaling-up Code name hunting and x-mapping
• Code name in clinical trials, MeSH, PubChem
• Story of PF-04457845
• Trials, MeSH and PubChem code name intersects
• Conclusions
[3]
4. Intelligence: trawling compound information
Competitive Non-competitive
• Directed towards commercially • Directed towards repositioning any
positioning and/or repurposing compound
own portfolio • Collaborative approaches to IP
• Major big pharma activity holders (but new IP possible)
• Mixed commercial/public sources • Can utilise public resources alone
• Internal specialists • Different domain expert entry
• Typically a closed activity (i.e. little points
open “best practice”) • Predominantly an open activity
• Typically therapeutic area aligned (e.g. OSDD)
• Can be hypothesis-neutral
[4]
5. Structures:
connecting to repurposing-relevant data
• Code names and synonyms
• Resolving these to structures
• Database entries
• BioAssay results
• Target/pathway links
• In vitro & in vivo research papers
• Clinical trial results and papers
• Patents for analogues and SAR
• Comparative in vivo data
• Mendelian and GWAS disease links
• Expression data for cpds
• In silico modeling (including rare or NTDs)
• Vendor similarity matches
[5]
6. Code names: 2-15 year information hole
Pharmaprojects
2009-10 figures
[6]
7. Drugs,code names, INN/USANs and structures:
few congruent hard numbers
• Pharmaprojects (2013) drug profiles ~ 50,000
• Thomson Reuters Cortelis (2012) drug monographs = 41,889
• Pharmaprojects (via ProQuest, 2012) records ~ 35,000
• Thomson Reuters Partnering (2011 structures, PMID: 22024215) = 17,901
• Pharmaprojects (2003 structures) = 14,000
• ChEMBL USANs (2013) = 10,568
• PubChem (2013) “USAN [synonym] OR INN [synonym]” = 9,890
• Pharmaprojects (2010 in development, no structure count) = 9,737
• GVKBIO Clinical Candidate structures (2008, PMID:20298516) = 8,864
• Pharmaprojects (2010 review, no structures) Phase 1+2+3 = 3,828
[7]
8. Code names: major repurposing potential – but..
• ~ 95% of the 30K are/will become “parked” or “abandoned”
• Can be repurposed in silico at least
• Obvious hierarchy : leads> development > clinical trials > INN > approved
• Problems
– New code names < 50% - 70% blinded (i.e. no structures)
– Some older code names never un-blinded
– Code naming practices independent and completely ad hoc
– Publications, conference reports, clinical trials entries, press releases
and portfolio listings linked to “blinded” code names (no structures)
– Even for public declarations (e.g. papers) data linked into “the system”
(e.g. synonym mapping) is patchy
– Code originators do not provenance public database entries
– Data supporting non-progression decisions rarely disclosed
– http://chembl.blogspot.se/p/research-code-stems.html 100’s of codes
[8]
9. Code name-to-structure mapping triage
Dig out the code names Name/image > struc
PubChem Substance • chemicalize.org, OPSIN,
Chemical Identifier Resolver,
PubChem Compound sketchers, OSRA
PubMed/MeSH • Cross-checks:
– SMILES/SDF/InChI strings
PubChem and ChemSpider
Google Scholar – InChIKey in Google
– SureChemOpen patent search
Google Images – Clinicaltrials.gov
– Synonym trawling
Google open (filtered)
[9]
23. Not all JNJ-s are blinded: JNJ-40418677
IUPAC in abstract but code still PubChem –ve
IUPAC name converted at chemicalize.org for PubChem mapping
[23]
25. Phases & codes in Clinicaltrials.gov:
thin on results
• Interventional studies = 115356 , 7895 with results (7%)
• Results | Interventional Studies | Phase 1, 2, 3 | Industry = 4477
• Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 1004
• Results | Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 122 (12%)
• Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 |
Industry = 1640
• Results | Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase
1, 2, 3 | Industry = 185 (11%)
[25]
26. altrials.net: public pressure > more results > more
repurposing opportunities
http://www.youtube.com/watch?v=lQ6YTU5kGXw&fe
ature=youtu.be&t=28m39s
[26]
37. PF-04457845: (almost) a total system success
• Declared efficacy failure > possible repurposing candidate
• Selection of analogues and a probe [18F]PF-9811 (CID 70679467)
• The “system” did well because of good publishing practice (e.g. full text)
• Code, structure, target, papers, trials and patents all connected
• 5mg for $275
But-
• Serendipitous finding (no “efficacy failure” or “study stopped” tags)
• Lack of clinicaltrials.org <> PubMed
• BindingDB using deprecated ChEBI ID
• PMID:21505060 not yet in ChEMBL
• No direct target or patent nos. in CID record because no DrugBank,
SCRIPDB or IBM capture
• [18F]PF-9811 PubChem, [(18)F]PF-9811 PubMed, PF-9811-18F Books
[37]
38. Looking at code name intersects in different
parts of the system
[38]
45. Conclusions
• Stalled development candidates, designated by company codes,
constitute a large potential repurposing information estate
• Historical in vitro , pharmacological & clinical data linked to ~ 30K codes
• But only 40-50% have structures assignable from open sources
• An even smaller proportion have code names in PubChem
• Public name>struc>data capture is ad hoc and needs improving
• Repurposing-relevant relationships are not easy to dig out
• Some “non competitive intelligence” approaches are shown here
• The big push for transparency and open access should improve
disclosure, data capture, linkage and repurposing opportunities
Happy hunting !
TED Talk: Francis Collins: We need better drugs -- now
http://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html
[45]
Hinweis der Redaktion
IUPAC in abstract converted by MeSH but not transferred to PubChemChemicalize.org used for conversion, matched patent sourcesTherefore structure is there but code synonym is notNo ones responsibility to submit the code-to-struc