SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Evaluating patent full text
documents with chemical ontologies
OntoChem IT Solutions GmbH
Blücherstr. 24
06120 Halle (Saale)
Germany
Tel. +49 345 4780472
Fax: +49 345 4780471
mail: info(at)ontochem.com
Evaluating patent full text
documents with chemical ontologies
• spin-out from OntoChem GmbH
• started 1.7.2015
• 15 chemists, bioinformatics, biologists,
linguists, pharmacists
• extracting knowledge from documents,
selling software & services
OntoChem IT Solutions GmbH
Blücherstr. 24
06120 Halle (Saale)
Germany
Tel. +49 345 4780472
Fax: +49 345 4780471
mail: info(at)ontochem.com
3
Computer readable, formal representation of knowledge...
describe relationships between knowledge concepts:
aspirin benzoic acid carboxylic acid
acetyl salicylic acids
can be used to infer extract, search, sort and analyse knowledge
What are Ontologies ?
„is a“ „is a“
4
ChEBI Chemical Entities of Biological Interest
https://www.ebi.ac.uk/chebi/ has about 40,000 compounds manually classified:
MeSH – medical subject headings ... PubChem
Chemical Ontologies...
5
SODIAC:
automated compound classification software
Structure based Ontology Development and Individual Assignment Center
ontology editor, OBO specification conformity
Definition of compound classes via SMARTS
chemical structure editor
sub-structure AND, OR and NOT logic compound to class assignment
chemistry error detection
chemical hierarchy construction
Classifying Chemistry: SODIAC
6
SODIAC:
AND/OR logic to assign Vitamin C derivatives:
• described in different tautomeric forms in databases
• logic needed for classifying correct stereochemistry in substituted compounds
Classifying Chemistry: SODIAC
concept: Vitamin C derivatives
AND AND AND
OR OR
7
structural chemical ontologies are often not based on sub-structures !
Progesterone 19-Norprogesterone 4-8* more active
class: Gestagens class: Gestagens>Progestins
Pregnane (female hormons) Androstane (male hormons)
class: Gonans>Pregnans class: Gonans>Estrans
Classifying Chemistry: not straightforward...
drugbank & ChEBI:
Progestin,
a synthetic progestogen
parent
& SSS
not parent
but SSS
not parent
but SSS
ChEBI:
corticosteroid hormone
same family
different family
8
Chemistry Ontologies
Organic chemistry
7.586 class concepts, 29.709 class terms
3,185 concepts linked to ChEBI concepts
2,465 concepts linked to MeSH concepts
68 million concepts linked to PubChem
Inorganic materials
52.4209 concepts, 56.332 terms
Groups-substituents-fragments
4.428 concepts, 12.754 terms
Substances
989 concepts, 3.522 terms
Polymers
2361 concepts, 7.176 terms
9
Acetylsalicylic acid
SODIAC v2.5.2
Direct Parents:
aromatic compounds, benzenes, carbon compounds, carboxylic acids,
ethanoic acid esters, methyl esters, monocyclic compounds, oxygen compounds,
salicylic acid derivatives
bioavailable molecules, hydrophilic molecules, lead like molecules, lipinski molecules, small molecules
CHEBI:15365; MeSH:D001241
Ancestors:
6-membered carbocycles, 6-membered cyclic compounds, acetic acid derivatives, acids,
carbocycles, carbon group compounds, carbonyl compounds, carboxylic acid derivatives,
carboxylic acid esters, chalcogen compounds, cyclic compounds, esters, fatty acyls,
fatty esters, lipids, monocarboxylic acid derivatives, monocyclic carbocycles, organic acids,
organic compounds, organic esters, salicylic acid derivatives, short chain fatty acid esters
Classifying Chemistry: Example
10
Basic Biology Ontologies
Genes, Proteins & Peptides
annotation version: 708,141 concepts, 2,627,612 terms
classification version: 832,902 concepts, 3,177,057 terms
with linkouts to GO, InterPro, HomoloGene, HUGO, KEGG, Uniprot ...
Diseases
SNOMED-CT, MedDRA, ICD-9, ICD-10, HDO, UMLS, Loinc, MeSH
annotation version: 105,824 concepts, 360,077 terms
Species
based on NCBI, GRIN, IPNI, Cornucopia, World Economic Plants ...
annotation version: 1,012,634 concepts, 1,664,042 terms
Anatomy
different species and stage dependent ontologies available
general anatomy: 4,773 concepts, 19,450 terms
11
Other Biology Ontologies
Cell lines
5,566 concepts, 13,083 terms
Cosmetology
1,187 concepts, 2,017 terms
Effects
35,477 concepts, 111,012 terms
Nutrition
19,193 concepts, 115,699 terms
Physiology
533 concepts, 619 terms
Toxicology
1,019 concepts, 2,150 terms
12
Other Ontologies
Countries
annotation version: 245 concepts, 85,069 terms
Companies
annotation version: 26,388 concepts, 5,757 terms
Material properties
annotation version: 1,081 concepts, 2,428 terms
Methods
annotation version: 2,502 concepts, 10,053 terms
Regions & Geopolitics
annotation version: 3774 concepts, 13,356 terms
Relations
annotation version: 603 concepts, 2,290 syntaxes
13
General Ontologies
Wikipedia
annotation version: 5,200,842 concepts, 11,490,831 terms
Magnitudes & Units
annotation version: 228 concepts, 510 terms
Persons
annotation version: >1,000,000 persons
Relations
annotation version: 603 concepts, 2,290 syntaxes
14
Understanding Patents with Ontologies
NLP for patents pose some unique challenges:
• multilingual
• poor OCR (optical character recognition)
• multi-disciplinary
• many
>90 million full text documents from >110 patent offices
• large
up to 500 pages
with sentences spanning >20 pages
• obscure:
hand drawings
unclear language
15
Understanding Patents
Collaboration with infoapps GmbH (Munich)
Standard full text data
US, EP, DE, WO,
AT, CH, BE, CA, ES, FR, GB, MA.
Standard full text data
AR, BR, CN, DK, FI, ID, EI, EN,
JP, KR, MX, MY, NL, NO, RU, SE,
TH, TW, VN.
Original full text data
Machine/human translation (EN)
AR, AT, BE, BR, CA, CH, CN, DE,
DK, EP, ES, FI, FR, ID, JP, KR,
MX, NL, NO, RU, SE, TH, TW,
VN, WO.
16
chemistry annotator
OCMiner® UIMA Pipeline
identify
document
type
OCMiner® UIMA Pipeline
picture PDF
OCR
Text PDF
PDF
reader
XML doc
XML
reader
Office doc
Office
reader
document
classifier
XML
detagger
language
detector
normalize
text
tokenize
text
acronym
abbrev
detector
person
annotator
document
structure
domain
annotators
1…n
dictionary
name-2-
structure
formula &
molpuzzler
class/group
resolution
cleanup &
rule
combiner
coordinated
entity
resolution
context
handler
NE
confidence
domain
annotators
1…n
domain
annotators
1…n
relationship
extraction consumer
BRAT
consumer
index
consumer
XML
17
BRAT (Goran Topić) file example:
PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014.
Annotated chemical patent corpus: a gold standard for text mining.
Akhondi SA, Klenner AG, Tyrchan C, Manchala AK, Boppana K, Lowe D, Zimmermann M, Jagarlapudi SA, Sayle R,
Kors JA, Muresan S
Regular Names in Patents
18
Chemical Compound
5,7-bis(trifluoromethyl)-pyrazolo[1,5-a]pyrimidine-2-carbonitrile :
Chemical Class
pyrazolo[1,5-a]pyrimidines :
Chemical substituent + class
2-Bromo-, 2-fluoro-, and 2-chloro pyrazolo[1,5-a]pyrimidines:
Other Name Types in Patents
19
Named Entities in Patents
extracting named entities (NE) from infoapps patents
from 19 million patents with chemistry, selected
4.7 million patents from 2001-2010 (publication year)
Ontology
term annotation
count
unique concepts
per doc
unique
concepts
Chemistry 1,465,510,682 294,771,572 ?
Proteins 204,902,329 30,167,344 67,993
Anatomy non-plants 126,856,048 21,192,154 2,378
Methods 112,230,880 21,725,977 1,959
Species 105,618,715 25,901,359 81,036
Diseases 82,857,385 24,592,233 21,367
Physiology 68,504,035 12,703,542 497
Nutrition 59,367,731 12,839,777 3,861
Cosmetology 23,465,151 4,883,741 920
Anatomy plants, fungi 22,326,124 4,212,548 802
Cell lines 9,857,621 2,325,743 2,079
Toxicity 7,986,832 2,858,977 423
Species plants, fungi 7,444,143 2,345,605 7,347
Regions 6,974,421 2,781,913 1,040
Herbal drugs 162,729 46,830 131
20
Understanding Patents with Ontologies
21
3 reasons:
patent claims are „ontological“
background knowledge helps to extract the meaning of named entities
end user, using knowledge classifications
which natural product compound class is useful to treat inflammation of the skin?
Ontologies – Why ?
22
Patent claims are “ontological”
Patent classes & ad hoc classes:
e.g. chemical
„compounds according to claim 1“
„acyl-pyrrolopyridines“
any Markush structure, Patent classes etc
e.g. uses: „anti-infectives“ (e.g. antibacterial, antiviral, antiparasitic ... )
Chemical Ontologies – Why ?
23
ontology based NLP to extract the meaning of named entities
• ontology based context sensitive Named Entity resolution
...glucose... ...glucose oxidase... ...glucose oxidase activity...
finally: ...inhibitor of glucose oxidase activity...
• ontology based anaphora & cataphora resolution
Tetrahydrofurane is a commonly used solvent in organic ...
This cyclic ether has a melting point of -108,4 °C
• ontology based fingerprints
classifying documents, e.g. into patent classes
Chemical Ontologies – Why ?
24
3 BRAT parts of one document:
Ontology Based Property Extraction
25
Understanding Patent Claims Logic
high quality patent annotations need:
• annotated text corpus “Gold Set”
• background ontologies
Annotated between <chemistry> & <disease>: p=is_Active_Part_Of, i=is_Instance_Of.
LREC 2014: Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations
from Patents, Antje Schlaf, Claudia Bobach, Matthias Irmer
26
Enduser Application Examples
27
End User: Understanding Patents
Collaboration with infoapps GmbH (Munich): ChemAnalyser
28
End User: Understanding Patents
ChemAnalyser – causative relationship mining
29
End User: Understanding Patents
ChemAnalyser – causative relationship mining
30
End User: Understanding Patents
ChemAnalyser – causative relationship mining
31
End User: Patent Big Data Analytics
Hot Compounds, hot targets ?
L. Weber, T. Böhme, M. Irmer, Pharm. Pat. Analyst 2013, 2,
Ontology-based content analysis of US patent applications from 2001–2010
32
End User: Patent Big Data Analytics
enrichment factors for chemistry related diseases...
Chemistry Concept
cardiovascular
system
disease of
mental health
disease of
metabolism
respiratory
system
nervous
system
musculo-skeletal
system
reproductive
system
gastro-
intestinal
system
immune
system
endocrine
system
prostaglandin F2β derivatives 557 0 0 0 607 427 0 0 375 0
hallucinogens 494 1922 332 449 538 364 3146 622 199 1901
cichoric acid 821 1662 432 1625 509 652 11623 1480 604 7239
alpha 1-adrenoceptor agonist 821 0 267 1736 501 611 8684 1014 543 5636
pregn-4,9(11)-enes 398 256 231 450 491 386 0 467 317 1296
canrenoic acids 771 1343 425 1180 473 534 8474 1260 459 4960
aconitane derivatives 0 1785 205 0 458 257 0 0 0 0
pseudoalkaloid derivatives 0 1778 204 0 456 256 0 0 0 0
diterpene alkaloid derivatives 0 1778 204 0 456 256 0 0 0 0
13,14-dihydro-15-keto-prostaglandin D2
derivatives
651 0 213 1831 447 482 0 1188 521 3956
ripisartan derivatives 953 0 351 0 436 411 0 0 409 0
potassium-sparing diuretics 896 1387 399 1156 425 496 6456 1218 501 3863
steroid acids 692 1193 379 1046 423 485 7578 1132 412 4418
Milfasartan 926 0 304 0 407 414 0 917 404 0
pyrrolizidine alkaloids 453 1041 293 1264 407 464 0 1081 498 0
milfasartan derivatives 930 0 303 0 406 416 0 913 402 0
Pratosartan 695 929 450 523 394 240 2747 794 246 2800
33
End User: Online Database ChemAnalyser
ChemAnalyser – Structure
ChemAnalyser – Full text & ontology based semantic searching
ChemAnalyser – Organic chemistry & drug discovery
ChemAnalyser – Alloys & Inorganic Materials
ChemAnalyser – Cosmetics & Nutrition
ChemAnalyser – Polymers
ChemAnalyser – Reach Report Support
34
Thanks!
Please register at
www.chemanalyser.com
for more information and a free trial.
35
Thanks!

Weitere ähnliche Inhalte

Andere mochten auch

New Product Introductions - InfoChem
New Product Introductions - InfoChemNew Product Introductions - InfoChem
New Product Introductions - InfoChemDr. Haxel Consult
 
New Product Introductions - CAS
New Product Introductions - CASNew Product Introductions - CAS
New Product Introductions - CASDr. Haxel Consult
 
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...Dr. Haxel Consult
 
New Product Introduction - Intellixir
New Product Introduction - IntellixirNew Product Introduction - Intellixir
New Product Introduction - IntellixirDr. Haxel Consult
 
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...Dr. Haxel Consult
 
New Product Introductions - Questel
New Product Introductions - QuestelNew Product Introductions - Questel
New Product Introductions - QuestelDr. Haxel Consult
 
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study    Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study Dr. Haxel Consult
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellDr. Haxel Consult
 
The Final ICIC 2016 Programme in Heidelberg
The Final ICIC 2016 Programme in HeidelbergThe Final ICIC 2016 Programme in Heidelberg
The Final ICIC 2016 Programme in HeidelbergDr. Haxel Consult
 
II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...Dr. Haxel Consult
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...Dr. Haxel Consult
 

Andere mochten auch (13)

New Product Introductions - InfoChem
New Product Introductions - InfoChemNew Product Introductions - InfoChem
New Product Introductions - InfoChem
 
New Product Introductions - CAS
New Product Introductions - CASNew Product Introductions - CAS
New Product Introductions - CAS
 
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Ch...
 
New Product Introduction - Intellixir
New Product Introduction - IntellixirNew Product Introduction - Intellixir
New Product Introduction - Intellixir
 
RightsDirekt
RightsDirektRightsDirekt
RightsDirekt
 
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
 
New Product Introductions - Questel
New Product Introductions - QuestelNew Product Introductions - Questel
New Product Introductions - Questel
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study    Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
The Final ICIC 2016 Programme in Heidelberg
The Final ICIC 2016 Programme in HeidelbergThe Final ICIC 2016 Programme in Heidelberg
The Final ICIC 2016 Programme in Heidelberg
 
II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 

Kürzlich hochgeladen (20)

Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 

Evaluating Patent Full Text Documents with Chemical Ontologies

  • 1. Evaluating patent full text documents with chemical ontologies OntoChem IT Solutions GmbH Blücherstr. 24 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com
  • 2. Evaluating patent full text documents with chemical ontologies • spin-out from OntoChem GmbH • started 1.7.2015 • 15 chemists, bioinformatics, biologists, linguists, pharmacists • extracting knowledge from documents, selling software & services OntoChem IT Solutions GmbH Blücherstr. 24 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com
  • 3. 3 Computer readable, formal representation of knowledge... describe relationships between knowledge concepts: aspirin benzoic acid carboxylic acid acetyl salicylic acids can be used to infer extract, search, sort and analyse knowledge What are Ontologies ? „is a“ „is a“
  • 4. 4 ChEBI Chemical Entities of Biological Interest https://www.ebi.ac.uk/chebi/ has about 40,000 compounds manually classified: MeSH – medical subject headings ... PubChem Chemical Ontologies...
  • 5. 5 SODIAC: automated compound classification software Structure based Ontology Development and Individual Assignment Center ontology editor, OBO specification conformity Definition of compound classes via SMARTS chemical structure editor sub-structure AND, OR and NOT logic compound to class assignment chemistry error detection chemical hierarchy construction Classifying Chemistry: SODIAC
  • 6. 6 SODIAC: AND/OR logic to assign Vitamin C derivatives: • described in different tautomeric forms in databases • logic needed for classifying correct stereochemistry in substituted compounds Classifying Chemistry: SODIAC concept: Vitamin C derivatives AND AND AND OR OR
  • 7. 7 structural chemical ontologies are often not based on sub-structures ! Progesterone 19-Norprogesterone 4-8* more active class: Gestagens class: Gestagens>Progestins Pregnane (female hormons) Androstane (male hormons) class: Gonans>Pregnans class: Gonans>Estrans Classifying Chemistry: not straightforward... drugbank & ChEBI: Progestin, a synthetic progestogen parent & SSS not parent but SSS not parent but SSS ChEBI: corticosteroid hormone same family different family
  • 8. 8 Chemistry Ontologies Organic chemistry 7.586 class concepts, 29.709 class terms 3,185 concepts linked to ChEBI concepts 2,465 concepts linked to MeSH concepts 68 million concepts linked to PubChem Inorganic materials 52.4209 concepts, 56.332 terms Groups-substituents-fragments 4.428 concepts, 12.754 terms Substances 989 concepts, 3.522 terms Polymers 2361 concepts, 7.176 terms
  • 9. 9 Acetylsalicylic acid SODIAC v2.5.2 Direct Parents: aromatic compounds, benzenes, carbon compounds, carboxylic acids, ethanoic acid esters, methyl esters, monocyclic compounds, oxygen compounds, salicylic acid derivatives bioavailable molecules, hydrophilic molecules, lead like molecules, lipinski molecules, small molecules CHEBI:15365; MeSH:D001241 Ancestors: 6-membered carbocycles, 6-membered cyclic compounds, acetic acid derivatives, acids, carbocycles, carbon group compounds, carbonyl compounds, carboxylic acid derivatives, carboxylic acid esters, chalcogen compounds, cyclic compounds, esters, fatty acyls, fatty esters, lipids, monocarboxylic acid derivatives, monocyclic carbocycles, organic acids, organic compounds, organic esters, salicylic acid derivatives, short chain fatty acid esters Classifying Chemistry: Example
  • 10. 10 Basic Biology Ontologies Genes, Proteins & Peptides annotation version: 708,141 concepts, 2,627,612 terms classification version: 832,902 concepts, 3,177,057 terms with linkouts to GO, InterPro, HomoloGene, HUGO, KEGG, Uniprot ... Diseases SNOMED-CT, MedDRA, ICD-9, ICD-10, HDO, UMLS, Loinc, MeSH annotation version: 105,824 concepts, 360,077 terms Species based on NCBI, GRIN, IPNI, Cornucopia, World Economic Plants ... annotation version: 1,012,634 concepts, 1,664,042 terms Anatomy different species and stage dependent ontologies available general anatomy: 4,773 concepts, 19,450 terms
  • 11. 11 Other Biology Ontologies Cell lines 5,566 concepts, 13,083 terms Cosmetology 1,187 concepts, 2,017 terms Effects 35,477 concepts, 111,012 terms Nutrition 19,193 concepts, 115,699 terms Physiology 533 concepts, 619 terms Toxicology 1,019 concepts, 2,150 terms
  • 12. 12 Other Ontologies Countries annotation version: 245 concepts, 85,069 terms Companies annotation version: 26,388 concepts, 5,757 terms Material properties annotation version: 1,081 concepts, 2,428 terms Methods annotation version: 2,502 concepts, 10,053 terms Regions & Geopolitics annotation version: 3774 concepts, 13,356 terms Relations annotation version: 603 concepts, 2,290 syntaxes
  • 13. 13 General Ontologies Wikipedia annotation version: 5,200,842 concepts, 11,490,831 terms Magnitudes & Units annotation version: 228 concepts, 510 terms Persons annotation version: >1,000,000 persons Relations annotation version: 603 concepts, 2,290 syntaxes
  • 14. 14 Understanding Patents with Ontologies NLP for patents pose some unique challenges: • multilingual • poor OCR (optical character recognition) • multi-disciplinary • many >90 million full text documents from >110 patent offices • large up to 500 pages with sentences spanning >20 pages • obscure: hand drawings unclear language
  • 15. 15 Understanding Patents Collaboration with infoapps GmbH (Munich) Standard full text data US, EP, DE, WO, AT, CH, BE, CA, ES, FR, GB, MA. Standard full text data AR, BR, CN, DK, FI, ID, EI, EN, JP, KR, MX, MY, NL, NO, RU, SE, TH, TW, VN. Original full text data Machine/human translation (EN) AR, AT, BE, BR, CA, CH, CN, DE, DK, EP, ES, FI, FR, ID, JP, KR, MX, NL, NO, RU, SE, TH, TW, VN, WO.
  • 16. 16 chemistry annotator OCMiner® UIMA Pipeline identify document type OCMiner® UIMA Pipeline picture PDF OCR Text PDF PDF reader XML doc XML reader Office doc Office reader document classifier XML detagger language detector normalize text tokenize text acronym abbrev detector person annotator document structure domain annotators 1…n dictionary name-2- structure formula & molpuzzler class/group resolution cleanup & rule combiner coordinated entity resolution context handler NE confidence domain annotators 1…n domain annotators 1…n relationship extraction consumer BRAT consumer index consumer XML
  • 17. 17 BRAT (Goran Topić) file example: PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014. Annotated chemical patent corpus: a gold standard for text mining. Akhondi SA, Klenner AG, Tyrchan C, Manchala AK, Boppana K, Lowe D, Zimmermann M, Jagarlapudi SA, Sayle R, Kors JA, Muresan S Regular Names in Patents
  • 18. 18 Chemical Compound 5,7-bis(trifluoromethyl)-pyrazolo[1,5-a]pyrimidine-2-carbonitrile : Chemical Class pyrazolo[1,5-a]pyrimidines : Chemical substituent + class 2-Bromo-, 2-fluoro-, and 2-chloro pyrazolo[1,5-a]pyrimidines: Other Name Types in Patents
  • 19. 19 Named Entities in Patents extracting named entities (NE) from infoapps patents from 19 million patents with chemistry, selected 4.7 million patents from 2001-2010 (publication year) Ontology term annotation count unique concepts per doc unique concepts Chemistry 1,465,510,682 294,771,572 ? Proteins 204,902,329 30,167,344 67,993 Anatomy non-plants 126,856,048 21,192,154 2,378 Methods 112,230,880 21,725,977 1,959 Species 105,618,715 25,901,359 81,036 Diseases 82,857,385 24,592,233 21,367 Physiology 68,504,035 12,703,542 497 Nutrition 59,367,731 12,839,777 3,861 Cosmetology 23,465,151 4,883,741 920 Anatomy plants, fungi 22,326,124 4,212,548 802 Cell lines 9,857,621 2,325,743 2,079 Toxicity 7,986,832 2,858,977 423 Species plants, fungi 7,444,143 2,345,605 7,347 Regions 6,974,421 2,781,913 1,040 Herbal drugs 162,729 46,830 131
  • 21. 21 3 reasons: patent claims are „ontological“ background knowledge helps to extract the meaning of named entities end user, using knowledge classifications which natural product compound class is useful to treat inflammation of the skin? Ontologies – Why ?
  • 22. 22 Patent claims are “ontological” Patent classes & ad hoc classes: e.g. chemical „compounds according to claim 1“ „acyl-pyrrolopyridines“ any Markush structure, Patent classes etc e.g. uses: „anti-infectives“ (e.g. antibacterial, antiviral, antiparasitic ... ) Chemical Ontologies – Why ?
  • 23. 23 ontology based NLP to extract the meaning of named entities • ontology based context sensitive Named Entity resolution ...glucose... ...glucose oxidase... ...glucose oxidase activity... finally: ...inhibitor of glucose oxidase activity... • ontology based anaphora & cataphora resolution Tetrahydrofurane is a commonly used solvent in organic ... This cyclic ether has a melting point of -108,4 °C • ontology based fingerprints classifying documents, e.g. into patent classes Chemical Ontologies – Why ?
  • 24. 24 3 BRAT parts of one document: Ontology Based Property Extraction
  • 25. 25 Understanding Patent Claims Logic high quality patent annotations need: • annotated text corpus “Gold Set” • background ontologies Annotated between <chemistry> & <disease>: p=is_Active_Part_Of, i=is_Instance_Of. LREC 2014: Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patents, Antje Schlaf, Claudia Bobach, Matthias Irmer
  • 27. 27 End User: Understanding Patents Collaboration with infoapps GmbH (Munich): ChemAnalyser
  • 28. 28 End User: Understanding Patents ChemAnalyser – causative relationship mining
  • 29. 29 End User: Understanding Patents ChemAnalyser – causative relationship mining
  • 30. 30 End User: Understanding Patents ChemAnalyser – causative relationship mining
  • 31. 31 End User: Patent Big Data Analytics Hot Compounds, hot targets ? L. Weber, T. Böhme, M. Irmer, Pharm. Pat. Analyst 2013, 2, Ontology-based content analysis of US patent applications from 2001–2010
  • 32. 32 End User: Patent Big Data Analytics enrichment factors for chemistry related diseases... Chemistry Concept cardiovascular system disease of mental health disease of metabolism respiratory system nervous system musculo-skeletal system reproductive system gastro- intestinal system immune system endocrine system prostaglandin F2β derivatives 557 0 0 0 607 427 0 0 375 0 hallucinogens 494 1922 332 449 538 364 3146 622 199 1901 cichoric acid 821 1662 432 1625 509 652 11623 1480 604 7239 alpha 1-adrenoceptor agonist 821 0 267 1736 501 611 8684 1014 543 5636 pregn-4,9(11)-enes 398 256 231 450 491 386 0 467 317 1296 canrenoic acids 771 1343 425 1180 473 534 8474 1260 459 4960 aconitane derivatives 0 1785 205 0 458 257 0 0 0 0 pseudoalkaloid derivatives 0 1778 204 0 456 256 0 0 0 0 diterpene alkaloid derivatives 0 1778 204 0 456 256 0 0 0 0 13,14-dihydro-15-keto-prostaglandin D2 derivatives 651 0 213 1831 447 482 0 1188 521 3956 ripisartan derivatives 953 0 351 0 436 411 0 0 409 0 potassium-sparing diuretics 896 1387 399 1156 425 496 6456 1218 501 3863 steroid acids 692 1193 379 1046 423 485 7578 1132 412 4418 Milfasartan 926 0 304 0 407 414 0 917 404 0 pyrrolizidine alkaloids 453 1041 293 1264 407 464 0 1081 498 0 milfasartan derivatives 930 0 303 0 406 416 0 913 402 0 Pratosartan 695 929 450 523 394 240 2747 794 246 2800
  • 33. 33 End User: Online Database ChemAnalyser ChemAnalyser – Structure ChemAnalyser – Full text & ontology based semantic searching ChemAnalyser – Organic chemistry & drug discovery ChemAnalyser – Alloys & Inorganic Materials ChemAnalyser – Cosmetics & Nutrition ChemAnalyser – Polymers ChemAnalyser – Reach Report Support
  • 34. 34 Thanks! Please register at www.chemanalyser.com for more information and a free trial.