SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Patent Cheminformatics

Identification of key compounds in patents


3rd Strasbourg Summer School on Chemoinformatics
Strasbourg, France, 25-29 June 2012


Dr. Sorel Muresan

AstraZeneca R&D Mölndal
Outlook



Patents, brief into

Sources for accessing full text patents

Compound extraction from patents

Key compound prediction
What is a patent?

A patent   application is an agreement between
inventor and state, allowing an inventor a monopoly
over their invention for a limited time. In the EU,
applicants are required to disclose their inventions
in a manner sufficiently clear and complete for them to be
carried out by a person skilled in the art. In the United
States, inventors are additionally required to include the
‘best mode’ of making or practicing the invention.
Patents are very interesting documents

 Three reasons why life scientists should read patents more
   frequently

      1. Some information appears earlier in patents than in scientific
         journals

      2. Patents may contain sound data that never appear in the
         literature.

      3. Patents are a source of hard-to-get information from
         commercial suppliers.




Seeber, F. Nat. Protocols 2007
Patents as pharmaceutical data source
Complementary between journals and patents
 “In certain fields, new advances are disclosed in patents long before
 they are published in peer-reviewed journals.”         Grubb. W.P.

                 “Novel Cannabinoid-1 Receptor Inverse Agonist for the Treatment of Obesity”
                                                         modulates

                                                                     CNR1


  Patent application    Patent publication            Journal publication
      Nov 2002              Mar 2004                      Dec 2006


                  ~18 months                 2.5 years
  USPTO patents (PN: US20040058820)                                            (PMID: 17181138)
Unique chemistry from patents
   Data from AstraZeneca’s Chemistry Connect




               ~6% of compound structures exemplified in patents
               were also published in journal articles


Muresan, S. et al. DDT 2011
Southan, C. et al J.Cheminfo. 2009
Anatomy of a patent

   Front page - contains a wealth of information about
     the patent

   Detailed description of the invention - the heart of a
     patent application. It generally describes one or more
     preferred embodiments of the invention in enough detail
     to enable someone of ordinary skill in the art to make or
     use the invention without having to resort to undue
     experimentation

   Claims - the most important part of the patent application.
     Define the scope of patent protection afforded to the
     owner of a patent.

C.P. Miller, M.J. Evans, The Chemist's Companion Guide to Patent Law, 2011
Searching for patents

Official public portals




Open full-text




Structure -> Patents & Patent -> Structure
Manually extracted SAR data (commercial)

• GOSTAR (GVKBIO Online Structure Activity Relationship Database) is a
  comprehensive database that captures explicit relationships between the three
  entities of publications, compounds and sequences.

• It includes 2.6 million compounds linked to 3,500 sequences with 12.5M SAR
  points extracted from 43,000 patents and 67,000 articles from 125 journals
Annotate patents manually
Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
Annotate patents manually
Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
Name-2-structure
OPSIN – http://opsin.ch.cam.ac.uk/
Chemical Named Entity Recognition (CNER)



                      N-Ethyl-p-menthane-3-carboxamide

                                         Name-to-Structure
                                         software



                       C(C)NC(=O)C1CC(CCC1C(C)C)C
                                     O


                                          N
                                          H




                     WS3 cooling agent (CAS 39711-79-0)
Compound extraction via chemical text mining
Traditional text mining pipeline




• Determining the start and end of IUPAClike names in free text is a tricky
  business.
• Chemical names can contain whitespace, hyphens, commas,
  parenthesis, brackets, braces, apostrophes, superscripts, greek
  characters, digits and periods.
• This is made harder still by typos, OCR errors, hyphenation, linefeeds,
  XML tags, line and page numbers and similar noise.
OCR Errors: Compound Names


• lH-ben zimidazole → 1H-benzimidazole

• 4- (2-ADAMANTYLCARBAM0YL) -5-TERT-BUTYL-
  PYRAZOL-1-YL] BENZOIC ACID →
  4-(2-adamantylcarbamoyl)-5-tert-butyl-pyrazol-1-
  yl]benzoic acid

• triphenylposhine → triphenylphosphine
Text Mining Conversion by Name Class
                                           ChemAxon 5.5
  Class   Category             Names       converts 60%
                                         n2s_1 (%) n2s_2 (%) n2s_3 (%)   None (%)
   M      Molecule           7,262,798       81.4       64.8      77.1        7.8
   D      Dictionary           26,876        38.1       45.1       3.5       38.5
   R      Registry number     304,064           0         0         0        100
   C      CAS number           47,815           0         0         0        100
   E      Element                 836        92.8     58.5      78.7          3.2
                                           NCI/CADD Chemical Identifier Resolver
   P      Fragment           2,663,677       72.9     58.9         0         19.8
                                                     converts 48%
   A      Atom fragment            96        90.6     36.5         0          6.3
   Y      Polymer                 295           0       44.1      22.7       36.9
   G      Generic                1263         2.6        6.3       0.5       91.9
   N      Noise                   104        32.7        24       19.2       52.9
          Total             10,307,824       76.3       61.0      54.3       14.1



Muresan, S. ChemAxon UGM 2011
Extraction compounds from US20100221398
Google Patents + Chemicalize.org (ChemAxon)
Extraction compounds from US20100221398
Google Patents + Chemicalize.org (ChemAxon)
What is a key compound?


Drug candidate

Compound(s) with optimal physicochemical properties

Compouns(s) with the most suitable pharmacokinetic
 profile

The most biologically active tool or probe
Key compounds from patents

”Old school” techniques
      Look for clues in text: “most preferred” wording
      in claims; crystal form info; scale of synthesis

Structural information alone
      Frequency of group (FOG) analysis of
      exemplified compounds

Structures and SAR data
      Work out SAR using biological data and
      structures
Exercise 2b: Predicting Key Example Compounds
Key compound prediction from patents




 Figure 2. Graphical image of patent example compounds in chemical space. Each gray circle represents an example
 compound. The black circle, square, and triangle represent key compound candidates.



     Theory: Chemists carry out extensive SAR around key compounds.
     Cluster examples and look for centres of densely populated regions


 Kazunari Hattori; Hiroaki Wakabayashi; Kenta Tamaki; J. Chem. Inf. Model. 2008, 48, 135-142.
 DOI: 10.1021/ci7002686
 Copyright © 2008 American Chemical Society
Key compound prediction from patents
  From WO1996025405 the earliest patent which claims it, can you
  work out the structure of Bextra (Valdecoxib), the Pfizer NSAID?




74 exemplified cmpds
R-group decomposition (Free-Wilson like)
FOG ranking
Key compound prediction from patents
Frequency of group (FOG) analysis


1) Extract compounds from patents
   • manual curation (GVKBIO)
   • text mining (SureChemOpen, OSCAR)

2) Define core
   • manually (e.g. from patent Markush)
   • automated core perception

3) R-group decomposition (ChemAxon’s JChem)

4) Rank compounds based on FOG (Spotfire, EXCEL)
WO1996025405 - Bextra




 Source     #compounds   Bextra   Bextra
                         exists   ranked
 GVKBIO     74           Y        1 (broad core)
                                  1 (narrow core)
 SureChem   501          Y        1 (broad core)
                                  1 (narrow core)
EP268956 - Aciphex




  Source     #compounds   Aciphex   Aciphex
                          exists    ranked
  GVKBIO     27           Y         2 (core1)
                                    1 (core2)
  SureChem   168          Y         1 (core1)
                                    1 (core2)
EP268956 - Aciphex
EP296749 - Aricept




  Source     #compounds   Aricept   Aricept
                          exists    ranked
  GVKBIO     76           Y         1

  SureChem   108          Y         1
EP296749 - Arimidex




  Source     #compounds   Arimidex   Arimidex
                          exists     ranked
  GVKBIO     70           Y          1

  SureChem   267          Y          1
WO1989006649 - Raxar




X is CH, CF, CCI, or N.


  Source             #compounds   Raxar    Raxar
                                  exists   ranked
  GVKBIO             102          Y        5

  SureChem           0            N        n/a
US3912743 – PAXIL




 Source    #compounds   Paxil    Paxil
                        exists   ranked
 GVKBIO    60           Y        54
US3912743 – PAXIL
Key compound prediction
 48 drug patents, with more than 10 compounds, including the drug


Method           Key compound   Key compound
                 as the first   within the first 5
                 ranked         ranked
                 compound       compounds

Cluster seed     11 (23%)       26 (54%)
analysis

Molecular Idol   5 (10%)        22 (46%)


Frequency of     11 (23%)       17 (35%)
group analysis
JCIM paper
http://dx.doi.org/10.1021/ci3001293
Summary



• Unique chemistry from patents (8% out of 55M parent
  structures in AstraZeneca’s Chemistry Connect)

• Public sources for full text patent access and open
  source software for chemical text mining

• Frequency of group analysis is a simple process to
  visualize and rank patent compounds
Acknowledgements

•   Jon Winter
•   Christian Tyrchan
•   Jonas Boström
•   Mats Eriksson
•   Paul Xie
•   Chris Southan

• GVKBIO
• IBM
• SureChem

• Next Move Software
• ChemAxon
• Unilever Centre for Molecular
  Science Informatics

Weitere ähnliche Inhalte

Was ist angesagt?

Medicinal chemistry of Antifungal agents
Medicinal chemistry of Antifungal agentsMedicinal chemistry of Antifungal agents
Medicinal chemistry of Antifungal agentsGanesh Mote
 
Anti-TB & anti leprosy drugs
Anti-TB & anti leprosy drugs Anti-TB & anti leprosy drugs
Anti-TB & anti leprosy drugs Karun Kumar
 
fluoroquinolones
 fluoroquinolones fluoroquinolones
fluoroquinolonesUmair hanif
 
Antitubercular Agents
Antitubercular AgentsAntitubercular Agents
Antitubercular AgentsSom Dutt
 
Medicinal chemistry- antimalerial agents
Medicinal chemistry- antimalerial agentsMedicinal chemistry- antimalerial agents
Medicinal chemistry- antimalerial agentsDHARMENDRA BARIA
 
Sexually transmitted infections
Sexually transmitted infectionsSexually transmitted infections
Sexually transmitted infectionsPravin Prasad
 
Antibiotics requiring therapeutic drug monitoring(1)
Antibiotics requiring therapeutic drug monitoring(1)Antibiotics requiring therapeutic drug monitoring(1)
Antibiotics requiring therapeutic drug monitoring(1)Mahen Kothalawala
 
Basic Principles of Chemotherapy
Basic Principles of ChemotherapyBasic Principles of Chemotherapy
Basic Principles of ChemotherapyVijay Salvekar
 
Comparative Study of Omeprazole and Esomeprazole
Comparative Study of Omeprazole and EsomeprazoleComparative Study of Omeprazole and Esomeprazole
Comparative Study of Omeprazole and EsomeprazolePharmacy Universe
 
Pro-Drug Concept
Pro-Drug ConceptPro-Drug Concept
Pro-Drug ConceptAkhil Nagar
 
Presentation On Proton Pump Inhibitor
Presentation On Proton Pump InhibitorPresentation On Proton Pump Inhibitor
Presentation On Proton Pump Inhibitorsoumayan88
 
Antibiotic principles
Antibiotic principlesAntibiotic principles
Antibiotic principlesK.J Mokori
 

Was ist angesagt? (20)

Medicinal chemistry of Antifungal agents
Medicinal chemistry of Antifungal agentsMedicinal chemistry of Antifungal agents
Medicinal chemistry of Antifungal agents
 
Anti-TB & anti leprosy drugs
Anti-TB & anti leprosy drugs Anti-TB & anti leprosy drugs
Anti-TB & anti leprosy drugs
 
fluoroquinolones
 fluoroquinolones fluoroquinolones
fluoroquinolones
 
Chloramphenicol
ChloramphenicolChloramphenicol
Chloramphenicol
 
Macrolide antibiotics
Macrolide antibioticsMacrolide antibiotics
Macrolide antibiotics
 
Antitubercular Agents
Antitubercular AgentsAntitubercular Agents
Antitubercular Agents
 
Medicinal chemistry- antimalerial agents
Medicinal chemistry- antimalerial agentsMedicinal chemistry- antimalerial agents
Medicinal chemistry- antimalerial agents
 
Sexually transmitted infections
Sexually transmitted infectionsSexually transmitted infections
Sexually transmitted infections
 
Antibiotics requiring therapeutic drug monitoring(1)
Antibiotics requiring therapeutic drug monitoring(1)Antibiotics requiring therapeutic drug monitoring(1)
Antibiotics requiring therapeutic drug monitoring(1)
 
Basic Principles of Chemotherapy
Basic Principles of ChemotherapyBasic Principles of Chemotherapy
Basic Principles of Chemotherapy
 
5.other antibiotics
5.other antibiotics5.other antibiotics
5.other antibiotics
 
Comparative Study of Omeprazole and Esomeprazole
Comparative Study of Omeprazole and EsomeprazoleComparative Study of Omeprazole and Esomeprazole
Comparative Study of Omeprazole and Esomeprazole
 
Macrolide antibiotics
Macrolide antibioticsMacrolide antibiotics
Macrolide antibiotics
 
Lecture 05 Drugs in Pregnancy
Lecture 05 Drugs in PregnancyLecture 05 Drugs in Pregnancy
Lecture 05 Drugs in Pregnancy
 
Med.chem sulfonamides
Med.chem  sulfonamidesMed.chem  sulfonamides
Med.chem sulfonamides
 
Pro-Drug Concept
Pro-Drug ConceptPro-Drug Concept
Pro-Drug Concept
 
Presentation On Proton Pump Inhibitor
Presentation On Proton Pump InhibitorPresentation On Proton Pump Inhibitor
Presentation On Proton Pump Inhibitor
 
Antibiotic principles
Antibiotic principlesAntibiotic principles
Antibiotic principles
 
Quinolones
QuinolonesQuinolones
Quinolones
 
drug interactions
drug interactionsdrug interactions
drug interactions
 

Andere mochten auch

Alcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerAlcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerSorel Muresan
 
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALAkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALSorel Muresan
 
Micellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate SurfactantsMicellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate SurfactantsJenny Chen
 
Nano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis ReportNano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis ReportGridlogics
 
NANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELLNANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELLVishal Singh
 
presentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELLpresentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELLcutejuhi
 
Fuel cell seminar
Fuel cell seminarFuel cell seminar
Fuel cell seminarBipin Gupta
 
Nano Tech Fuel Presentation
Nano Tech Fuel PresentationNano Tech Fuel Presentation
Nano Tech Fuel PresentationCaptNano
 
Nano fuel cell
Nano fuel cellNano fuel cell
Nano fuel cellRaj Kumar
 
Electric drives
Electric drivesElectric drives
Electric drivesSamsu Deen
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer IntroductionGridlogics
 
Fuel cells presentation
Fuel cells presentationFuel cells presentation
Fuel cells presentationVaibhav Chavan
 
Power electronic drives ppt
Power electronic drives pptPower electronic drives ppt
Power electronic drives pptSai Manoj
 
Electrical ac & dc drives ppt
Electrical ac & dc drives pptElectrical ac & dc drives ppt
Electrical ac & dc drives pptchakri218
 
POWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICESPOWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICESshazaliza
 
Electric drives
Electric drivesElectric drives
Electric drivesraj_e2004
 

Andere mochten auch (19)

Alcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymerAlcoguard® H5941 – The sustainable bio-polymer
Alcoguard® H5941 – The sustainable bio-polymer
 
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINALAkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
AkzoNobel NRE HH Conference Poland 2016-04-26 - FINAL
 
Micellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate SurfactantsMicellization of Alcohol Ethoxylate Surfactants
Micellization of Alcohol Ethoxylate Surfactants
 
Nano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis ReportNano Fuel Additives - Patent Analysis Report
Nano Fuel Additives - Patent Analysis Report
 
NANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELLNANOTECHNOLOGY IN FUEL CELL
NANOTECHNOLOGY IN FUEL CELL
 
presentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELLpresentation on NANO-TECH REGENERATIVE FUEL CELL
presentation on NANO-TECH REGENERATIVE FUEL CELL
 
Fuel cell seminar
Fuel cell seminarFuel cell seminar
Fuel cell seminar
 
Nano Tech Fuel Presentation
Nano Tech Fuel PresentationNano Tech Fuel Presentation
Nano Tech Fuel Presentation
 
Nano fuel cell
Nano fuel cellNano fuel cell
Nano fuel cell
 
Polyfuse
PolyfusePolyfuse
Polyfuse
 
Fuel Cells
Fuel CellsFuel Cells
Fuel Cells
 
Electric drives
Electric drivesElectric drives
Electric drives
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer Introduction
 
Fuel cells presentation
Fuel cells presentationFuel cells presentation
Fuel cells presentation
 
Power electronic drives ppt
Power electronic drives pptPower electronic drives ppt
Power electronic drives ppt
 
Fuel cells
Fuel cellsFuel cells
Fuel cells
 
Electrical ac & dc drives ppt
Electrical ac & dc drives pptElectrical ac & dc drives ppt
Electrical ac & dc drives ppt
 
POWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICESPOWER ELECTRONIC DEVICES
POWER ELECTRONIC DEVICES
 
Electric drives
Electric drivesElectric drives
Electric drives
 

Ähnlich wie Patent Cheminformatics: Identification of key compounds in patents

Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Chris Southan
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSGeorge Papadatos
 
Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research DataChris Southan
 
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Hitesh Patel
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...Dr. Haxel Consult
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3glennfish
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Kamel Mansouri
 
Fragment Based Drug Discovery
Fragment Based Drug DiscoveryFragment Based Drug Discovery
Fragment Based Drug DiscoveryAnthony Coyne
 
Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Frank Oellien
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology Sean Ekins
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSopen_phacts
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...Kamel Mansouri
 
Polymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case studyPolymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case studyJordi Labs
 

Ähnlich wie Patent Cheminformatics: Identification of key compounds in patents (20)

Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research Data
 
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
 
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
Can a Free Access Structure-Centric Community for Chemists Benefit Drug Disco...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...II-PIC 2017: Why did I miss that Patent? How value added databases of STN  he...
II-PIC 2017: Why did I miss that Patent? How value added databases of STN he...
 
Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3Introduction to Nanotechnology: Part 3
Introduction to Nanotechnology: Part 3
 
foglar book.pdf
foglar book.pdffoglar book.pdf
foglar book.pdf
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
 
Fragment Based Drug Discovery
Fragment Based Drug DiscoveryFragment Based Drug Discovery
Fragment Based Drug Discovery
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?Tautomers - Advanced Databases for in-silico Screening?
Tautomers - Advanced Databases for in-silico Screening?
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 
Patent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTSPatent annotations: From SureChEMBL to Open PHACTS
Patent annotations: From SureChEMBL to Open PHACTS
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
Chemical abstracts ii
Chemical abstracts   iiChemical abstracts   ii
Chemical abstracts ii
 
Polymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case studyPolymer Deformulation of a Medical Device case study
Polymer Deformulation of a Medical Device case study
 
Checking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying ChemistryChecking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying Chemistry
 

Mehr von Sorel Muresan

Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Sorel Muresan
 
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Sorel Muresan
 
Berol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningBerol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningSorel Muresan
 
Multifunctional hydrotropes
Multifunctional hydrotropesMultifunctional hydrotropes
Multifunctional hydrotropesSorel Muresan
 
Thickening with cationic surfactants
Thickening with cationic surfactantsThickening with cationic surfactants
Thickening with cationic surfactants Sorel Muresan
 
Challenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelChallenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelSorel Muresan
 
Berol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightBerol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightSorel Muresan
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsSorel Muresan
 
Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Sorel Muresan
 

Mehr von Sorel Muresan (10)

Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
Multifunctional Surfactants - Essential ingredients for efficient cleaning pr...
 
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
Narrow Range Ethoxylates - Highly targeted performance for more effective cle...
 
Berol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaningBerol® SurfBoost AD15 - a sustainable solution for cleaning
Berol® SurfBoost AD15 - a sustainable solution for cleaning
 
Multifunctional hydrotropes
Multifunctional hydrotropesMultifunctional hydrotropes
Multifunctional hydrotropes
 
Thickening with cationic surfactants
Thickening with cationic surfactantsThickening with cationic surfactants
Thickening with cationic surfactants
 
Challenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobelChallenges with CLP and solutions offered by AkzoNobel
Challenges with CLP and solutions offered by AkzoNobel
 
Berol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delightBerol® ENV226 Plus - a formulator’s delight
Berol® ENV226 Plus - a formulator’s delight
 
Chemistry Connect
Chemistry ConnectChemistry Connect
Chemistry Connect
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
 
Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...Automated spelling correction to improve recall rates of name-to-structure to...
Automated spelling correction to improve recall rates of name-to-structure to...
 

Kürzlich hochgeladen

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Patent Cheminformatics: Identification of key compounds in patents

  • 1. Patent Cheminformatics Identification of key compounds in patents 3rd Strasbourg Summer School on Chemoinformatics Strasbourg, France, 25-29 June 2012 Dr. Sorel Muresan AstraZeneca R&D Mölndal
  • 2. Outlook Patents, brief into Sources for accessing full text patents Compound extraction from patents Key compound prediction
  • 3. What is a patent? A patent application is an agreement between inventor and state, allowing an inventor a monopoly over their invention for a limited time. In the EU, applicants are required to disclose their inventions in a manner sufficiently clear and complete for them to be carried out by a person skilled in the art. In the United States, inventors are additionally required to include the ‘best mode’ of making or practicing the invention.
  • 4. Patents are very interesting documents Three reasons why life scientists should read patents more frequently 1. Some information appears earlier in patents than in scientific journals 2. Patents may contain sound data that never appear in the literature. 3. Patents are a source of hard-to-get information from commercial suppliers. Seeber, F. Nat. Protocols 2007
  • 5. Patents as pharmaceutical data source Complementary between journals and patents “In certain fields, new advances are disclosed in patents long before they are published in peer-reviewed journals.” Grubb. W.P. “Novel Cannabinoid-1 Receptor Inverse Agonist for the Treatment of Obesity” modulates CNR1 Patent application Patent publication Journal publication Nov 2002 Mar 2004 Dec 2006 ~18 months 2.5 years USPTO patents (PN: US20040058820) (PMID: 17181138)
  • 6. Unique chemistry from patents Data from AstraZeneca’s Chemistry Connect ~6% of compound structures exemplified in patents were also published in journal articles Muresan, S. et al. DDT 2011 Southan, C. et al J.Cheminfo. 2009
  • 7. Anatomy of a patent Front page - contains a wealth of information about the patent Detailed description of the invention - the heart of a patent application. It generally describes one or more preferred embodiments of the invention in enough detail to enable someone of ordinary skill in the art to make or use the invention without having to resort to undue experimentation Claims - the most important part of the patent application. Define the scope of patent protection afforded to the owner of a patent. C.P. Miller, M.J. Evans, The Chemist's Companion Guide to Patent Law, 2011
  • 8.
  • 9.
  • 10.
  • 11. Searching for patents Official public portals Open full-text Structure -> Patents & Patent -> Structure
  • 12. Manually extracted SAR data (commercial) • GOSTAR (GVKBIO Online Structure Activity Relationship Database) is a comprehensive database that captures explicit relationships between the three entities of publications, compounds and sequences. • It includes 2.6 million compounds linked to 3,500 sequences with 12.5M SAR points extracted from 43,000 patents and 67,000 articles from 125 journals
  • 13. Annotate patents manually Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
  • 14. Annotate patents manually Brat – rapid annotation tool (http://brat.nlplab.org/index.html)
  • 16. Chemical Named Entity Recognition (CNER) N-Ethyl-p-menthane-3-carboxamide Name-to-Structure software C(C)NC(=O)C1CC(CCC1C(C)C)C O N H WS3 cooling agent (CAS 39711-79-0)
  • 17. Compound extraction via chemical text mining Traditional text mining pipeline • Determining the start and end of IUPAClike names in free text is a tricky business. • Chemical names can contain whitespace, hyphens, commas, parenthesis, brackets, braces, apostrophes, superscripts, greek characters, digits and periods. • This is made harder still by typos, OCR errors, hyphenation, linefeeds, XML tags, line and page numbers and similar noise.
  • 18. OCR Errors: Compound Names • lH-ben zimidazole → 1H-benzimidazole • 4- (2-ADAMANTYLCARBAM0YL) -5-TERT-BUTYL- PYRAZOL-1-YL] BENZOIC ACID → 4-(2-adamantylcarbamoyl)-5-tert-butyl-pyrazol-1- yl]benzoic acid • triphenylposhine → triphenylphosphine
  • 19. Text Mining Conversion by Name Class ChemAxon 5.5 Class Category Names converts 60% n2s_1 (%) n2s_2 (%) n2s_3 (%) None (%) M Molecule 7,262,798 81.4 64.8 77.1 7.8 D Dictionary 26,876 38.1 45.1 3.5 38.5 R Registry number 304,064 0 0 0 100 C CAS number 47,815 0 0 0 100 E Element 836 92.8 58.5 78.7 3.2 NCI/CADD Chemical Identifier Resolver P Fragment 2,663,677 72.9 58.9 0 19.8 converts 48% A Atom fragment 96 90.6 36.5 0 6.3 Y Polymer 295 0 44.1 22.7 36.9 G Generic 1263 2.6 6.3 0.5 91.9 N Noise 104 32.7 24 19.2 52.9 Total 10,307,824 76.3 61.0 54.3 14.1 Muresan, S. ChemAxon UGM 2011
  • 20. Extraction compounds from US20100221398 Google Patents + Chemicalize.org (ChemAxon)
  • 21. Extraction compounds from US20100221398 Google Patents + Chemicalize.org (ChemAxon)
  • 22. What is a key compound? Drug candidate Compound(s) with optimal physicochemical properties Compouns(s) with the most suitable pharmacokinetic profile The most biologically active tool or probe
  • 23. Key compounds from patents ”Old school” techniques Look for clues in text: “most preferred” wording in claims; crystal form info; scale of synthesis Structural information alone Frequency of group (FOG) analysis of exemplified compounds Structures and SAR data Work out SAR using biological data and structures
  • 24. Exercise 2b: Predicting Key Example Compounds Key compound prediction from patents Figure 2. Graphical image of patent example compounds in chemical space. Each gray circle represents an example compound. The black circle, square, and triangle represent key compound candidates. Theory: Chemists carry out extensive SAR around key compounds. Cluster examples and look for centres of densely populated regions Kazunari Hattori; Hiroaki Wakabayashi; Kenta Tamaki; J. Chem. Inf. Model. 2008, 48, 135-142. DOI: 10.1021/ci7002686 Copyright © 2008 American Chemical Society
  • 25. Key compound prediction from patents From WO1996025405 the earliest patent which claims it, can you work out the structure of Bextra (Valdecoxib), the Pfizer NSAID? 74 exemplified cmpds
  • 28. Key compound prediction from patents Frequency of group (FOG) analysis 1) Extract compounds from patents • manual curation (GVKBIO) • text mining (SureChemOpen, OSCAR) 2) Define core • manually (e.g. from patent Markush) • automated core perception 3) R-group decomposition (ChemAxon’s JChem) 4) Rank compounds based on FOG (Spotfire, EXCEL)
  • 29. WO1996025405 - Bextra Source #compounds Bextra Bextra exists ranked GVKBIO 74 Y 1 (broad core) 1 (narrow core) SureChem 501 Y 1 (broad core) 1 (narrow core)
  • 30. EP268956 - Aciphex Source #compounds Aciphex Aciphex exists ranked GVKBIO 27 Y 2 (core1) 1 (core2) SureChem 168 Y 1 (core1) 1 (core2)
  • 32. EP296749 - Aricept Source #compounds Aricept Aricept exists ranked GVKBIO 76 Y 1 SureChem 108 Y 1
  • 33. EP296749 - Arimidex Source #compounds Arimidex Arimidex exists ranked GVKBIO 70 Y 1 SureChem 267 Y 1
  • 34. WO1989006649 - Raxar X is CH, CF, CCI, or N. Source #compounds Raxar Raxar exists ranked GVKBIO 102 Y 5 SureChem 0 N n/a
  • 35. US3912743 – PAXIL Source #compounds Paxil Paxil exists ranked GVKBIO 60 Y 54
  • 37. Key compound prediction 48 drug patents, with more than 10 compounds, including the drug Method Key compound Key compound as the first within the first 5 ranked ranked compound compounds Cluster seed 11 (23%) 26 (54%) analysis Molecular Idol 5 (10%) 22 (46%) Frequency of 11 (23%) 17 (35%) group analysis
  • 39. Summary • Unique chemistry from patents (8% out of 55M parent structures in AstraZeneca’s Chemistry Connect) • Public sources for full text patent access and open source software for chemical text mining • Frequency of group analysis is a simple process to visualize and rank patent compounds
  • 40. Acknowledgements • Jon Winter • Christian Tyrchan • Jonas Boström • Mats Eriksson • Paul Xie • Chris Southan • GVKBIO • IBM • SureChem • Next Move Software • ChemAxon • Unilever Centre for Molecular Science Informatics