SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
1 / 37
Automatic chemical annotation of large full-
text patent corpora. Pitfalls, challenges and
user benefits
J. Eiblmaier, D. Geppert, L. Isenko, H. Saller
ICIC 2015 Nice, October 18 – 21
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
2 / 37
» Introduction: Chemical Named Entity
Recognition - One size fits all?
» Show Case Projects
 The Hard ‘n’ Heavy: Chemisches Zentralblatt
 The Tricky: Wiley Smart Article
 The Networked: Springer Chemistry Demonstrator
 The Elegant: PATENTSCOPE
 The Powerful: FIZ Karlsruhe Full-text Databases
» Conclusion
Outline
© cora / PIXELIO, www.pixelio.de
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
3 / 37
Outline
© cora / PIXELIO, www.pixelio.de
» Introduction: Chemical Named Entity
Recognition - One size fits all?
» Show Case Projects
 The Hard ‘n’ Heavy: Chemisches Zentralblatt
 The Tricky: Wiley Smart Article
 The Networked: Springer Chemistry Demonstrator
 The Elegant: PATENTSCOPE
 The Powerful: FIZ Karlsruhe Full-text Databases
» Conclusion
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
4 / 37
On Size fits All?
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
5 / 37
Different Sources
Stefan Emilius / pixelio.de
[...] Wedged etched SiOz film 50 nm -} 2 nm with refractive index
measurement o~~~~~~~U-~ o » ~ ~ ~ ~ ~ ~ ~ ~ Psi (deg] oL-~~-L~~~~~
25 30 3S also be used for in situ thin film evaluation (4). Therefore the
ellipsometric heads are at- tached to the specific process chamber via opti -
cal windows. De- pending on growth/etch rates thicknesses can be
measured (end- point) with ac- curacies in the .••• - 81"",laUG" - " .. IU,.,..."t
0 1 1 0 1 1 2 0 Psi (deg[a) Theoretical Psi/Delta curves of an etched SiOz
film on silicon b) growth of a-Si on glass subs tate [...]
, Different Aims!
XML
PDF
HTML
TXT
SGML
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
6 / 37
Discoverability
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
7 / 37
Interactivity
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
8 / 37
Semantics
Valium OR DiazepamValium OR Diazepam OR Ansiolisina OR Diazemuls OR Relanium OR Stesolid
OR Apaurin OR Faustan OR Seduxen OR Sibazon OR Methyldiazepinone OR
Calmocitene OR Neurolytril OR Bialzepam OR Ceregulart OR Condition OR
Diazetard OR Liberetas OR Relaminal OR Serenamin OR Tranquirit OR Ansiolin
OR Apozepam OR Atensine OR Bensedin OR Calmpose OR Diacepan OR
Diazepan OR Dipezona OR Domalium OR Kiatrium OR Paranten OR Quetinil
OR Quiatril OR Quievita OR Renborin OR Ruhsitus OR Seduksen OR Serenack
OR Serenzin OR Stesolin OR Tensopam OR Horizon OR Lembrol OR Morosan
OR Saromet OR Sedipam OR Setonil Anxionil OR Benzopin OR Calmaven OR
Chuansuan OR Desconet OR Desloneg OR Diaceplex OR Diazepin OR
Gewacalm OR Jinpanfan OR Mentalium OR Metamidol OR Nixtensyn OR
Novodipam OR Pacitran OR Paralium OR Prozepam OR Psychopax OR
Radizepam OR Simasedan OR Trankinon OR Trazepam OR Valaxona OR
Valiquid OR Valuzepam OR Vanconin OR Antenex OR Arzepam OR Betapam
OR Diapine OR Diaquel OR 7-Chloro-1,3-dihydro-1-methyl-5-phenyl-2H-1,4-
benzodiazepin-2-one OR NCGC00178168-01 OR WLN: T67 GNV JN IHJ CG G1
KR OR 2H-1,4-Benzodiazepin-2-one, 7-chloro-1,3-dihydro-1-methyl-5-phenyl-
OR CPD000058398 OR SAM001246536 OR SMR000058398 OR 439-14-5 OR
7-Chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-2(1H)-one OR 7-Chloro-1-
methyl-2-oxo-5-phenyl-3H-1,4-benzodiazepine OR 7-Chloro-1-methyl-5-phenyl-
2H-1,4-benzodiazepin-2-one OR C06948 OR D00293 OR 5-24-04-00300 OR
D003975 OR A3662/0155188 OR I06-0194 OR 1-Methyl-5-phenyl-7-chloro-1,3-
dihydro-2H-1,4-benzodiazepin-2-one OR 7-Chloro-1-methyl-5-3H-1,4-
benzodiazepin-2(1H)-one OR 7-chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-
2-one OR DZP OR Dap OR Pax OR 11100-37-1 OR 53320-84-6 OR
InChI=1/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-
2-4-6-11/h2-9H,10H2,1H
...
(361 Depositor-Supplied Synonyms in PubChem)
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
9 / 37
Linkability
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
10 / 37
Structurability
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
11 / 37
Outline
© cora / PIXELIO, www.pixelio.de
» Introduction: Chemical Named Entity
Recognition - One size fits all?
» Show Case Projects
 The Hard ‘n’ Heavy: Chemisches Zentralblatt
 The Tricky: Wiley Smart Article
 The Networked: Springer Chemistry Demonstrator
 The Elegant: PATENTSCOPE
 The Powerful: FIZ Karlsruhe Full-text Databases
» Conclusion
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
12 / 37
The Hard ´n Heavy: Chemisches Zentralblatt
© Joachim Reisig / pixelio.de
Named Entity Recognition project to create
» Web based, language independent
structure database
» First and oldest abstracts journal in chemistry,
covering chemical literature from 1830 to 1969
» Two million abstracts (900,000 pages, German)
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
13 / 37
The Hard ´n Heavy: Chemisches Zentralblatt
» Print from back page
» Bad quality of original source: blotted and stained pages
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
14 / 37
1870
Challenges OCR
1830
1910 1930
1969
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
15 / 37
» Ambiguous old fonts (h=b; c=e; ligations)
» Spaced text
 Specific rules, large German dictionaries and extensive training are
applied to correct systematic errors of standard OCR process
Challenges OCR
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
16 / 37
Challenges Annotation
» Obsolete German language
 Schwefelsaures Natrium, Aurantiin
 Chlorür, Bromür, Oxydul
» Historical names
 Pelopeum  Columbium  Niobium
» Different spelling for the same name:
 Dibrom…  Bibrom…
 Ätzkali  Aetzkali
 
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
17 / 37
Results
» 2.4 million chemical names with
associated structure
 1 Million unique names
 500,000 unique structures
 Results linked to original source (PDF)
» Benchmark
 Recall: 53,8%
 Precision: 89,7%
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
18 / 37
The Networked: Springer Chemistry Demonstrator
© Stephanie Hofschlaeger / pixelio.de
» Large scale automatic extraction of chemical
entities from SpringerLink documents
» Joint definition of output formats (inline and
standoff XML)
» Semantic enrichment of chemically relevant
SpringerLink documents (> 2,700 titles)
» Creation of a chemical registry including all
chemistry sources of Springer / InfoChem
having structural information
» Implementation of an online-demonstrator,
Interlink different data repositories via the
chemical structure
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
19 / 37
Structures
Structures
(annotated)
Structures
Reactions
Database
Structures
--------
Full-text
Structures
(annotated)
Full-text
Structures
(annotated)
2,000
annotated
documents
Central Compound Registry
Compound
Registry
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
20 / 37
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
21 / 37
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
22 / 37
The Tricky: Wiley Smart Article
» Real time bimodal extraction of chemical
information
 ChemDraw files
 Full-text (five journals, two reference works)
» Merged into the Wiley XML
» Workflow system KNIME developed at
InfoChem, installed at Wiley Hoboken
» Chemicals, Reagents/Catalysts, Drugs,
ReactionTypes, Chem. Technology
» Challenge: merging of information from different
domains and sources (overlapping, conflicting,
missing concepts)
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
23 / 37
Text annotation: Chemistry enrichment workflow*
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
24 / 37
XML
SDfile
ICScheme
Processor
XML+
HTML
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
25 / 37
(…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234"
associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource
mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical
compound"/> </enrichedObject> (…)
(…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info-
0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has
shown (…)
*Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
(…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234"
associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource
mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical
compound"/> </enrichedObject> (…)
(…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info-
0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has
shown (…)
(…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234"
associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource
mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical
compound"/> </enrichedObject> (…)
(…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info-
0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has
shown (…)
(…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234"
associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource
mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical
compound"/> </enrichedObject> (…)
(…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info-
0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has
shown (…)
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
26 / 37
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
27 / 37
The Elegant: Addition of chemical search capabilities to
the WIPO PATENTSCOPE search system
» Chemical annotation of PATENTSCOPE
full-text patent documents
» Replacement of detected compounds by
their corresponding InChIKey (IUPAC
International Chemical identifier key)
» Recognition of graphical representations of
chemical compounds in PATENTSCOPE
documents
» Solr/Lucene based exact structure search
» Workflow system KNIME
» Extension of the PATENTSCOPE GUI with
chemical structure search
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
28 / 37
(…) At the moment the surgical procedure starts, benzodiazepin, e.g.
diazepam, is administered in a dose of no more than 5 mg. (…)
(…) At the moment the surgical procedure starts, benzodiazepin, e.g.
@AAOVKJBEBIDNHE-UHFFFAOYSA-N@, is administered in a dose of
no more than 5 mg. (…)
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
29 / 37
PATENTSCOPE
Documents
Enriched PATENTSCOPE
Documents
(…) At the moment the surgical
procedure starts, benzodiazepin, e.g.
diazepam, is administered in a dose of
no more than 5 mg. (…)
(…) At the moment the surgical procedure
starts, benzodiazepin, e.g.
@AAOVKJBEBIDNHE-UHFFFAOYSA-N@,
is administered in a dose of no more than 5
mg. (…)
AAOVKJBEBIDNH
E-UHFFFAOYSA-N
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
30 / 37
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
31 / 37
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
32 / 37
The Powerful: ‚Chemical Annotation for Online Systems‘
» Joint research project FIZ Karlsruhe /
InfoChem
» Chemical annotation of full-text patent
databases (tens of millions of documents)
» Ca. 800,000 updated documents / week
» First show case: EP full-text patents
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
33 / 37
» First show case EP full-text patents
 Bibliographic data and full text of patent applications / granted patents
published by the European Patent Office
 1978 – present, weekly updates
 More than 4.05 million family records with more than 6.9 million
publications (8/15)
» Main challenges of the annotation: performance, quality, integrability
» Integration of ICANNOTATOR into a Hadoop Environment / FIZ
proprietary workflow system (first results: „86 ms/document with
Hadoop and 2,1s without Hadoop“)
» Joint definition of output formats based on FIZ XML DTD
» Joint creation of annotation guidelines and a benchmark set of 100
manually annotated patent documents
» Envisaged user scenarios:
 searching structural chemical information from full-texts
 linking of text and structural information
 support evaluation of patent full-texts
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
34 / 37
Outline
© cora / PIXELIO, www.pixelio.de
» Introduction: Chemical Named Entity
Recognition - One size fits all?
» Show Case Projects
 The Hard ‘n’ Heavy: Chemisches Zentralblatt
 The Tricky: Wiley Smart Article
 The Networked: Springer Chemistry Demonstrator
 The Elegant: PATENTSCOPE
 The Powerful: FIZ Karlsruhe Full-text Databases
» Conclusion
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
35 / 37
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
36 / 37
» Wiley
 Michael Forster
 Reinhard Neudert
» FIZ Karlsruhe
 Leni Helmes
 Michael Schwantner
» WIPO
 Christophe Mazenc
 Paul Halfpenny
» The InfoChem Team
Acknowledgements
© P. Storz / PIXELIO, www.pixelio.de
InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21
37 / 37
4.bp.blogspot.com/.../s1600/thank-you.jpg

Weitere ähnliche Inhalte

Was ist angesagt?

IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999
IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999
IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999Dr. Haxel Consult
 
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...Dr. Haxel Consult
 
ICIC 2017: New Poduct presentations InfoChem
ICIC 2017: New Poduct presentations InfoChemICIC 2017: New Poduct presentations InfoChem
ICIC 2017: New Poduct presentations InfoChemDr. Haxel Consult
 
IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...
IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...
IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...Dr. Haxel Consult
 
Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...
Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...
Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...Dr. Haxel Consult
 
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...Dr. Haxel Consult
 

Was ist angesagt? (6)

IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999
IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999
IC-SDV 2018: Emmanuelle Fortune (INPI) Tale of patents filed in France in 1999
 
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...
 
ICIC 2017: New Poduct presentations InfoChem
ICIC 2017: New Poduct presentations InfoChemICIC 2017: New Poduct presentations InfoChem
ICIC 2017: New Poduct presentations InfoChem
 
IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...
IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...
IC-SDV 2018: Anna Maria Villa (PATENTSIGHT) Do Mergers Stifle Innovation? – E...
 
Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...
Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...
Final Programme - ICIC 2016 - 28th ICIC International Conference for the Info...
 
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
IC-SDV 2018: Martin Kracker (EPO) Linked Open EP data – a new Product from th...
 

Andere mochten auch

New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - MinesoftDr. Haxel Consult
 
New Product Introductions - ChemAxon
New Product Introductions - ChemAxonNew Product Introductions - ChemAxon
New Product Introductions - ChemAxonDr. Haxel Consult
 
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...Dr. Haxel Consult
 
Welcome to France, Homebase of the French Speaking Patent Information Associa...
Welcome to France, Homebase of the French Speaking Patent Information Associa...Welcome to France, Homebase of the French Speaking Patent Information Associa...
Welcome to France, Homebase of the French Speaking Patent Information Associa...Dr. Haxel Consult
 
Optimising Content Spending with Analytics
Optimising Content Spending with AnalyticsOptimising Content Spending with Analytics
Optimising Content Spending with AnalyticsDr. Haxel Consult
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheDr. Haxel Consult
 
New Product Introductions - Questel
New Product Introductions - QuestelNew Product Introductions - Questel
New Product Introductions - QuestelDr. Haxel Consult
 
New Product Introductions - CAS
New Product Introductions - CASNew Product Introductions - CAS
New Product Introductions - CASDr. Haxel Consult
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsDr. Haxel Consult
 
New Product Introduction - Intellixir
New Product Introduction - IntellixirNew Product Introduction - Intellixir
New Product Introduction - IntellixirDr. Haxel Consult
 
Systematic, Automated Analysis of Patents and Related Literature
Systematic, Automated Analysis of Patents and Related LiteratureSystematic, Automated Analysis of Patents and Related Literature
Systematic, Automated Analysis of Patents and Related LiteratureDr. Haxel Consult
 
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study    Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study Dr. Haxel Consult
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellDr. Haxel Consult
 
II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...Dr. Haxel Consult
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...Dr. Haxel Consult
 

Andere mochten auch (17)

New Product Introductions - Minesoft
New Product Introductions - MinesoftNew Product Introductions - Minesoft
New Product Introductions - Minesoft
 
New Product Introductions - ChemAxon
New Product Introductions - ChemAxonNew Product Introductions - ChemAxon
New Product Introductions - ChemAxon
 
RightsDirekt
RightsDirektRightsDirekt
RightsDirekt
 
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full...
 
Welcome to France, Homebase of the French Speaking Patent Information Associa...
Welcome to France, Homebase of the French Speaking Patent Information Associa...Welcome to France, Homebase of the French Speaking Patent Information Associa...
Welcome to France, Homebase of the French Speaking Patent Information Associa...
 
Optimising Content Spending with Analytics
Optimising Content Spending with AnalyticsOptimising Content Spending with Analytics
Optimising Content Spending with Analytics
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ Karlsruhe
 
New Product Introductions - Questel
New Product Introductions - QuestelNew Product Introductions - Questel
New Product Introductions - Questel
 
New Product Introductions - CAS
New Product Introductions - CASNew Product Introductions - CAS
New Product Introductions - CAS
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveats
 
New Product Introduction - Intellixir
New Product Introduction - IntellixirNew Product Introduction - Intellixir
New Product Introduction - Intellixir
 
Systematic, Automated Analysis of Patents and Related Literature
Systematic, Automated Analysis of Patents and Related LiteratureSystematic, Automated Analysis of Patents and Related Literature
Systematic, Automated Analysis of Patents and Related Literature
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study    Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
 

Ähnlich wie Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Challenges and User Benefits

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...
IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...
IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...Dr. Haxel Consult
 
The long and winding road to chemical information
The long and winding road to chemical informationThe long and winding road to chemical information
The long and winding road to chemical informationEngelbert Zass
 
ICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemDr. Haxel Consult
 
Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...
Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...
Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...MongoDB
 
Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714
Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714
Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714Andreas Rennet
 
Fl lab collection_05_2011_en
Fl lab collection_05_2011_enFl lab collection_05_2011_en
Fl lab collection_05_2011_enVahid RG-zadeh
 
Data-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDFData-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDFSteffen Neumann
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Innovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo EnglishInnovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo EnglishZoltan Galla
 
ICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChemICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChemDr. Haxel Consult
 
170928 Presentation BioConnection (SMB meeting)
170928 Presentation BioConnection (SMB meeting)170928 Presentation BioConnection (SMB meeting)
170928 Presentation BioConnection (SMB meeting)SMBBV
 
Productivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPCProductivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPCJialin Liu
 
CEITEC - Central European Institute of Technology
 CEITEC - Central European Institute of Technology CEITEC - Central European Institute of Technology
CEITEC - Central European Institute of TechnologyJIC
 
Réveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - VerhaertRéveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - VerhaertAlain Krafft
 
CEO Meeting Fraunhofer Institut: Investigación Aplicada para la Innovación
CEO Meeting Fraunhofer Institut: Investigación Aplicada para la InnovaciónCEO Meeting Fraunhofer Institut: Investigación Aplicada para la Innovación
CEO Meeting Fraunhofer Institut: Investigación Aplicada para la InnovaciónClub de Innovación
 

Ähnlich wie Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Challenges and User Benefits (20)

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...
IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...
IC-SDV 2018: Josef Eiblmaier: The journey continues: the addition of French, ...
 
The long and winding road to chemical information
The long and winding road to chemical informationThe long and winding road to chemical information
The long and winding road to chemical information
 
ICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChem
 
Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...
Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...
Internet of Things Cologne 2015: To Make the World a Brighter Place - How Dat...
 
Overview R&DD Activities in the Climit Program, Hans Jørg Fell (Gassnova) UK/...
Overview R&DD Activities in the Climit Program, Hans Jørg Fell (Gassnova) UK/...Overview R&DD Activities in the Climit Program, Hans Jørg Fell (Gassnova) UK/...
Overview R&DD Activities in the Climit Program, Hans Jørg Fell (Gassnova) UK/...
 
Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714
Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714
Fortech solutions business_cases_evaluation_additive_manufacturing_en_20180714
 
Fl lab collection_05_2011_en
Fl lab collection_05_2011_enFl lab collection_05_2011_en
Fl lab collection_05_2011_en
 
Data-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDFData-sharing in Metabolomics beyond your supplemental PDF
Data-sharing in Metabolomics beyond your supplemental PDF
 
Swise arc2015
Swise arc2015Swise arc2015
Swise arc2015
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Innovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo EnglishInnovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo English
 
ICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChemICIC 2013 New Product Introductions InfoChem
ICIC 2013 New Product Introductions InfoChem
 
170928 Presentation BioConnection (SMB meeting)
170928 Presentation BioConnection (SMB meeting)170928 Presentation BioConnection (SMB meeting)
170928 Presentation BioConnection (SMB meeting)
 
Productivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPCProductivity and Performance: An Exploration of Parallel H5py on HPC
Productivity and Performance: An Exploration of Parallel H5py on HPC
 
CEITEC - Central European Institute of Technology
 CEITEC - Central European Institute of Technology CEITEC - Central European Institute of Technology
CEITEC - Central European Institute of Technology
 
Réveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - VerhaertRéveil en Form' - Dual use - Verhaert
Réveil en Form' - Dual use - Verhaert
 
CEO Meeting Fraunhofer Institut: Investigación Aplicada para la Innovación
CEO Meeting Fraunhofer Institut: Investigación Aplicada para la InnovaciónCEO Meeting Fraunhofer Institut: Investigación Aplicada para la Innovación
CEO Meeting Fraunhofer Institut: Investigación Aplicada para la Innovación
 
Daniel Dussaux
Daniel Dussaux Daniel Dussaux
Daniel Dussaux
 
Custo sucessivo
Custo sucessivoCusto sucessivo
Custo sucessivo
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftAanSulistiyo
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolinonuriaiuzzolino1
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查ydyuyu
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptxAsmae Rabhi
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Roommeghakumariji156
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 

Kürzlich hochgeladen (20)

Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 

Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Challenges and User Benefits

  • 1. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 1 / 37 Automatic chemical annotation of large full- text patent corpora. Pitfalls, challenges and user benefits J. Eiblmaier, D. Geppert, L. Isenko, H. Saller ICIC 2015 Nice, October 18 – 21
  • 2. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 2 / 37 » Introduction: Chemical Named Entity Recognition - One size fits all? » Show Case Projects  The Hard ‘n’ Heavy: Chemisches Zentralblatt  The Tricky: Wiley Smart Article  The Networked: Springer Chemistry Demonstrator  The Elegant: PATENTSCOPE  The Powerful: FIZ Karlsruhe Full-text Databases » Conclusion Outline © cora / PIXELIO, www.pixelio.de
  • 3. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 3 / 37 Outline © cora / PIXELIO, www.pixelio.de » Introduction: Chemical Named Entity Recognition - One size fits all? » Show Case Projects  The Hard ‘n’ Heavy: Chemisches Zentralblatt  The Tricky: Wiley Smart Article  The Networked: Springer Chemistry Demonstrator  The Elegant: PATENTSCOPE  The Powerful: FIZ Karlsruhe Full-text Databases » Conclusion
  • 4. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 4 / 37 On Size fits All?
  • 5. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 5 / 37 Different Sources Stefan Emilius / pixelio.de [...] Wedged etched SiOz film 50 nm -} 2 nm with refractive index measurement o~~~~~~~U-~ o » ~ ~ ~ ~ ~ ~ ~ ~ Psi (deg] oL-~~-L~~~~~ 25 30 3S also be used for in situ thin film evaluation (4). Therefore the ellipsometric heads are at- tached to the specific process chamber via opti - cal windows. De- pending on growth/etch rates thicknesses can be measured (end- point) with ac- curacies in the .••• - 81"",laUG" - " .. IU,.,..."t 0 1 1 0 1 1 2 0 Psi (deg[a) Theoretical Psi/Delta curves of an etched SiOz film on silicon b) growth of a-Si on glass subs tate [...] , Different Aims! XML PDF HTML TXT SGML
  • 6. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 6 / 37 Discoverability
  • 7. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 7 / 37 Interactivity *Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
  • 8. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 8 / 37 Semantics Valium OR DiazepamValium OR Diazepam OR Ansiolisina OR Diazemuls OR Relanium OR Stesolid OR Apaurin OR Faustan OR Seduxen OR Sibazon OR Methyldiazepinone OR Calmocitene OR Neurolytril OR Bialzepam OR Ceregulart OR Condition OR Diazetard OR Liberetas OR Relaminal OR Serenamin OR Tranquirit OR Ansiolin OR Apozepam OR Atensine OR Bensedin OR Calmpose OR Diacepan OR Diazepan OR Dipezona OR Domalium OR Kiatrium OR Paranten OR Quetinil OR Quiatril OR Quievita OR Renborin OR Ruhsitus OR Seduksen OR Serenack OR Serenzin OR Stesolin OR Tensopam OR Horizon OR Lembrol OR Morosan OR Saromet OR Sedipam OR Setonil Anxionil OR Benzopin OR Calmaven OR Chuansuan OR Desconet OR Desloneg OR Diaceplex OR Diazepin OR Gewacalm OR Jinpanfan OR Mentalium OR Metamidol OR Nixtensyn OR Novodipam OR Pacitran OR Paralium OR Prozepam OR Psychopax OR Radizepam OR Simasedan OR Trankinon OR Trazepam OR Valaxona OR Valiquid OR Valuzepam OR Vanconin OR Antenex OR Arzepam OR Betapam OR Diapine OR Diaquel OR 7-Chloro-1,3-dihydro-1-methyl-5-phenyl-2H-1,4- benzodiazepin-2-one OR NCGC00178168-01 OR WLN: T67 GNV JN IHJ CG G1 KR OR 2H-1,4-Benzodiazepin-2-one, 7-chloro-1,3-dihydro-1-methyl-5-phenyl- OR CPD000058398 OR SAM001246536 OR SMR000058398 OR 439-14-5 OR 7-Chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin-2(1H)-one OR 7-Chloro-1- methyl-2-oxo-5-phenyl-3H-1,4-benzodiazepine OR 7-Chloro-1-methyl-5-phenyl- 2H-1,4-benzodiazepin-2-one OR C06948 OR D00293 OR 5-24-04-00300 OR D003975 OR A3662/0155188 OR I06-0194 OR 1-Methyl-5-phenyl-7-chloro-1,3- dihydro-2H-1,4-benzodiazepin-2-one OR 7-Chloro-1-methyl-5-3H-1,4- benzodiazepin-2(1H)-one OR 7-chloro-1-methyl-5-phenyl-3H-1,4-benzodiazepin- 2-one OR DZP OR Dap OR Pax OR 11100-37-1 OR 53320-84-6 OR InChI=1/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3- 2-4-6-11/h2-9H,10H2,1H ... (361 Depositor-Supplied Synonyms in PubChem)
  • 9. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 9 / 37 Linkability
  • 10. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 10 / 37 Structurability
  • 11. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 11 / 37 Outline © cora / PIXELIO, www.pixelio.de » Introduction: Chemical Named Entity Recognition - One size fits all? » Show Case Projects  The Hard ‘n’ Heavy: Chemisches Zentralblatt  The Tricky: Wiley Smart Article  The Networked: Springer Chemistry Demonstrator  The Elegant: PATENTSCOPE  The Powerful: FIZ Karlsruhe Full-text Databases » Conclusion
  • 12. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 12 / 37 The Hard ´n Heavy: Chemisches Zentralblatt © Joachim Reisig / pixelio.de Named Entity Recognition project to create » Web based, language independent structure database » First and oldest abstracts journal in chemistry, covering chemical literature from 1830 to 1969 » Two million abstracts (900,000 pages, German)
  • 13. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 13 / 37 The Hard ´n Heavy: Chemisches Zentralblatt » Print from back page » Bad quality of original source: blotted and stained pages
  • 14. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 14 / 37 1870 Challenges OCR 1830 1910 1930 1969
  • 15. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 15 / 37 » Ambiguous old fonts (h=b; c=e; ligations) » Spaced text  Specific rules, large German dictionaries and extensive training are applied to correct systematic errors of standard OCR process Challenges OCR
  • 16. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 16 / 37 Challenges Annotation » Obsolete German language  Schwefelsaures Natrium, Aurantiin  Chlorür, Bromür, Oxydul » Historical names  Pelopeum  Columbium  Niobium » Different spelling for the same name:  Dibrom…  Bibrom…  Ätzkali  Aetzkali  
  • 17. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 17 / 37 Results » 2.4 million chemical names with associated structure  1 Million unique names  500,000 unique structures  Results linked to original source (PDF) » Benchmark  Recall: 53,8%  Precision: 89,7%
  • 18. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 18 / 37 The Networked: Springer Chemistry Demonstrator © Stephanie Hofschlaeger / pixelio.de » Large scale automatic extraction of chemical entities from SpringerLink documents » Joint definition of output formats (inline and standoff XML) » Semantic enrichment of chemically relevant SpringerLink documents (> 2,700 titles) » Creation of a chemical registry including all chemistry sources of Springer / InfoChem having structural information » Implementation of an online-demonstrator, Interlink different data repositories via the chemical structure
  • 19. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 19 / 37 Structures Structures (annotated) Structures Reactions Database Structures -------- Full-text Structures (annotated) Full-text Structures (annotated) 2,000 annotated documents Central Compound Registry Compound Registry
  • 20. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 20 / 37
  • 21. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 21 / 37
  • 22. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 22 / 37 The Tricky: Wiley Smart Article » Real time bimodal extraction of chemical information  ChemDraw files  Full-text (five journals, two reference works) » Merged into the Wiley XML » Workflow system KNIME developed at InfoChem, installed at Wiley Hoboken » Chemicals, Reagents/Catalysts, Drugs, ReactionTypes, Chem. Technology » Challenge: merging of information from different domains and sources (overlapping, conflicting, missing concepts)
  • 23. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 23 / 37 Text annotation: Chemistry enrichment workflow* *Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin
  • 24. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 24 / 37 XML SDfile ICScheme Processor XML+ HTML
  • 25. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 25 / 37 (…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234" associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical compound"/> </enrichedObject> (…) (…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info- 0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has shown (…) *Reinhard Neudert: Enhancing the User Experience for Wiley Chemistry Content, ICIC 2012 14. – 17. October, Berlin (…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234" associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical compound"/> </enrichedObject> (…) (…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info- 0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has shown (…) (…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234" associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical compound"/> </enrichedObject> (…) (…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info- 0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has shown (…) (…) <enrichedObject relevance=“primary" xml:id="asia266-eo-1234" associatedDataRef=“#asia266-sch-0016”> <label>6</label><mediaResource mimeType="chemical/x-mol-file„href="enrich_out/asia266-eo-1234.sdf" alt="chemical compound"/> </enrichedObject> (…) (…) <infoAsset type="drugGenericName chemicalName" xml:id="asia573-info- 0004">Himbacine</infoAsset> (<link href="#asia573-eo-0001"/>),<link href="#bib1">1</link> has shown (…)
  • 26. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 26 / 37
  • 27. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 27 / 37 The Elegant: Addition of chemical search capabilities to the WIPO PATENTSCOPE search system » Chemical annotation of PATENTSCOPE full-text patent documents » Replacement of detected compounds by their corresponding InChIKey (IUPAC International Chemical identifier key) » Recognition of graphical representations of chemical compounds in PATENTSCOPE documents » Solr/Lucene based exact structure search » Workflow system KNIME » Extension of the PATENTSCOPE GUI with chemical structure search
  • 28. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 28 / 37 (…) At the moment the surgical procedure starts, benzodiazepin, e.g. diazepam, is administered in a dose of no more than 5 mg. (…) (…) At the moment the surgical procedure starts, benzodiazepin, e.g. @AAOVKJBEBIDNHE-UHFFFAOYSA-N@, is administered in a dose of no more than 5 mg. (…)
  • 29. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 29 / 37 PATENTSCOPE Documents Enriched PATENTSCOPE Documents (…) At the moment the surgical procedure starts, benzodiazepin, e.g. diazepam, is administered in a dose of no more than 5 mg. (…) (…) At the moment the surgical procedure starts, benzodiazepin, e.g. @AAOVKJBEBIDNHE-UHFFFAOYSA-N@, is administered in a dose of no more than 5 mg. (…) AAOVKJBEBIDNH E-UHFFFAOYSA-N
  • 30. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 30 / 37
  • 31. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 31 / 37
  • 32. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 32 / 37 The Powerful: ‚Chemical Annotation for Online Systems‘ » Joint research project FIZ Karlsruhe / InfoChem » Chemical annotation of full-text patent databases (tens of millions of documents) » Ca. 800,000 updated documents / week » First show case: EP full-text patents
  • 33. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 33 / 37 » First show case EP full-text patents  Bibliographic data and full text of patent applications / granted patents published by the European Patent Office  1978 – present, weekly updates  More than 4.05 million family records with more than 6.9 million publications (8/15) » Main challenges of the annotation: performance, quality, integrability » Integration of ICANNOTATOR into a Hadoop Environment / FIZ proprietary workflow system (first results: „86 ms/document with Hadoop and 2,1s without Hadoop“) » Joint definition of output formats based on FIZ XML DTD » Joint creation of annotation guidelines and a benchmark set of 100 manually annotated patent documents » Envisaged user scenarios:  searching structural chemical information from full-texts  linking of text and structural information  support evaluation of patent full-texts
  • 34. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 34 / 37 Outline © cora / PIXELIO, www.pixelio.de » Introduction: Chemical Named Entity Recognition - One size fits all? » Show Case Projects  The Hard ‘n’ Heavy: Chemisches Zentralblatt  The Tricky: Wiley Smart Article  The Networked: Springer Chemistry Demonstrator  The Elegant: PATENTSCOPE  The Powerful: FIZ Karlsruhe Full-text Databases » Conclusion
  • 35. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 35 / 37
  • 36. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 36 / 37 » Wiley  Michael Forster  Reinhard Neudert » FIZ Karlsruhe  Leni Helmes  Michael Schwantner » WIPO  Christophe Mazenc  Paul Halfpenny » The InfoChem Team Acknowledgements © P. Storz / PIXELIO, www.pixelio.de
  • 37. InfoChem GmbH © 2015 Dr. Josef EiblmaierICIC 2015 Nice, October 18 – 21 37 / 37 4.bp.blogspot.com/.../s1600/thank-you.jpg