SureChem ACS 2012. Presented by Nico on behalf of all three authors. The data is searchable at https://open.surechem.com/login. Related information included recent posts at http://cdsouthan.blogspot.se/
Driving Behavioral Change for Information Management through Data-Driven Gree...
Integrating Patents with Research Data
1. Integrating patent chemistry with
public and private non-patent
research resources
Nicko Goncharoff ACS Fall 2012
Andrew Hinton, PhD 19 August
Christopher Southan, PhD
2.
3.
4. SureChem Data Collection
Database of automatically mined structure data
from text and images
•20M annotated US, EP, WO full text records
and Japan patent abstracts
•12M unique chemical structures
I
•MEDLINE – 19M abstracts (coming Q4)
5. Free resource for researchers Professional search needs
Enables linking to public and Data export, alerts, patent family
proprietary content search, chemical relevance filters…
API or Data Feed access to
chemistry & full text
Integrate with internal
databases & workflows
8. Current Patent Sources In
PubChem
4000000 3.7 M
3500000
3000000
Numbers of SID's
2.3 M
2500000
2000000
1500000
1000000
500000 280 K
10 K
0
EPO(Sling) Chemicalize.org IBM Thomson
Thompson
Pharma
9. Patent & Literature Sources in
PubChem
The Big Three
Thomson Pharma, ChEMBL +
patents and literature PubMed + Journals
3,756,283 918,077
41% lead-like 45% lead-like
3,291,940 281,920 515,745
52,975
129,448 67,437
2,113,169
IBM, pre-2000 patents 2,369,481 32% lead-like
10. SureChem to Deposit All
Structures* into PubChem - 2012
•1976 to present
•Deposition of structures only
•View related patents in SureChemOpen
•*Some filtering of common chemistry likely
11. SureChem and IBM in PubChem
(2 Example Patents)
SureChem Total: 776 IBM Total : 527
US583593, Inhibitors of squalene
synthetase and protein
farnesyltransferase. Abbott
478 298 229 SureChem Total: 832 IBM Total: 239
686 146 93
WO-1994018188-A1
4-hydroxy-benzopyran-2-ones and 4-
hydroxy-cycloalkyl[b]pyran-2-ones
HIV protease inhibitors, Upjohn
15. SureChem Unique Contribution
SureChem
Pubchem
79 96 (ThomsonPharma ,
Chemicalize)
Stage No. of Structures
Available from SureChem (SC) 1848
Pre-Exist in PubChem 669
Pre-Exist – not from IC 50 table 573
Pre-Exist – from IC 50 table 96 (12 from TP + 84 via chemicalize.org)
Unique-SC with IC 50 79
Unique-SC – beyond IC 50 table 1100
17. SureChem Chemical Relevance
Filtering
• Frequency counts of chemicals within patents
• Additional molecular property filtering i.e. Lipinski descriptors
• Natural Language Processing – based indexing of Exemplified Compounds
Automated indexing of Exemplified Compounds in text
18. Conclusion
SureChem deposition into PubChem will
– Significantly expand public patent chemistry scope
– Contribute unique and timely MedChem-relevant data
– Enable open drug discovery and chemical biology
– Advance progress toward a more open, federated
chemical information network