SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Dr. Matthias Negri
Scientific Information Center
Boehringer Ingelheim Pharma GmbH & Co. KG
Chemistry-Enriched Patent Curation
semi-automatic analysis and elaboration of patents
II-SDV Nice, 21 April 2015
Árpád Figyelmesi
ChemAxon
Content
1. Chemistry in patents
2. Why do we need a patent curation workflow?
3. Semi-automatic Patent Curation Workflow - Overview
4. Linked tools/technologies
5. ChemCurator (ChemCC)
6. Semi-automatic Patent Curation Workflow – Step by Step
7. Lessons learned, weak-points, limitations
8. Outlook
Negri Matthias, II-SDV 2015 2
Chemistry in patents
Chemistry appears within diverse form in patents:
1. TEXT - IUPAC names, common names, etc
2. IMAGES - embedded within or attached to the document
3. ATTACHMENTS (MOL/CDX)
4. TABLES
– as ONE-image file (tables with chemistry and bioactivity data)
– as chemistry-only image files embedded within table tags
5. Markush Structures/Formulas with R-groups
---------------------------------------------------------------------------------------
 Currently NO commercial solution covers all these cases
 Most of the cases are considered in the patent curation workflow
(Markush/R-group Formulas recognized and stored separately)
Negri Matthias, II-SDV 2015 3
Why do we need a patent curation workflow?
Motivations:
1. Linked chemistry-retrieval from patents (+ chemistry as images)
2. IUPAC-enriched XML patent files  as NEW source for text-mining
3. extraction of bioactivity data/targets/diseases/… in relation to chemistry
4. Similarity/Substructure frequency in compound sets of patents
5. …
Negri Matthias, II-SDV 2015 4
Semi-automatic Patent Curation Workflow
Overview – current state
2 parallel branches
Negri Matthias, II-SDV 2015 5
I2E API KNIME – Batch indexing, text-mining and (relational) data retrieval
SLOWER & memory intensive vs BUTHigher Quality, More Control & IUPAC-enriched XML
FASTER vs LESS informative/flexible - ChemCC as the (near) future perspectiveINPUT
Linked tools/technologies
1. KNIME/XPATH
2. ChemAxon ChemCurator (ChemCC)
3. Other ChemAxon tools in KNIME nodes (document2structure/d2s, Naming,
Molconverter, Structure checker, Standardizer, …)
4. Text/data-mining – Linguamatics I2E (+I2E Chemistry)
5. Optical Structure Recognition – Keymodule CLiDE Batch
Negri Matthias, II-SDV 2015 6
Content
1. Chemistry in patents
2. Why do we need a patent curation workflow?
3. Semi-automatic Patent Curation Workflow - Overview
4. Linked tools/technologies
5. ChemCurator (ChemCC)
6. Semi-automatic Patent Curation Workflow – Step by Step
7. Lessons learned, weak-points, limitations
8. Outlook
Negri Matthias, II-SDV 2015 7
Computer-aided chemical data extraction
 English, Chinese and Japanese N2S
 Markush Editor
 Structure Checker
 Hit visualization
 Third party OSR technologies
ChemCurator (ChemCC)
Árpád Figyelmesi, II-SDV 20158
ChemCurator (ChemCC)
Name to Structure
 Support for many nomenclatures (common, drug names, …)
 IUPAC names
 Custom dictionaries
 English (2008)
 Chinese (2013)
 Japanese (2014)
Árpád Figyelmesi, II-SDV 20159
Compound Extraction View
Compound listProject explorer
Annotated document
Selected structures
ChemCurator (ChemCC)
10
Markush Extraction View
Markush editor
Example structures
Annotated document
Project explorer
Selected structures
Structure checker
ChemCurator (ChemCC)
11
General Document Curation
Extract Markush Structures from patents
Extract specific structures
 Journal articles
 Company reports
 Patent examples
Structure extraction wizards
 Exclude fragments, chemical elements, etc.
ChemCurator (ChemCC)
Árpád Figyelmesi, II-SDV 201512
ChemCurator (ChemCC)
Integration & Information Sharing
Other ChemAxon products:
 Direct IJC schema connection
 Project sharing function
 Accessible from Plexus, IJC, etc.
Third party tools:
 Standard file formats
 Export functions
 Easily processable projects
Árpád Figyelmesi, II-SDV 201513
Content
1. Chemistry in patents
2. Why do we need a patent curation workflow?
3. Semi-automatic Patent Curation Workflow - Overview
4. Linked tools/technologies
5. ChemCurator (ChemCC)
6. Semi-automatic Patent Curation Workflow – Step by Step
7. Lessons learned, weak-points, limitations
8. Outlook
Negri Matthias, II-SDV 2015 14
Semi-automatic Patent Curation Workflow
a) input sources and b) bibliographic data
a) Input sources
 files with patent-IDs list
 XML collection
 …
b) Retrieval of bibliographic information and attachment data
 family ID, patent references, expiration date, etc
 Attachment files MOL/CDX (US-patents only), TIF files
 ….
Negri Matthias, II-SDV 2015 15
Semi-automatic Patent Curation Workflow
c) chemistry retrieval/extraction/filtering
1. ChemCurator branch
 data retrieval (XML, attachments) from IFI Claims Direct BI-server
 ChemCurator project creation/sharing/annotation  html output
 Chemistry extraction name2structure/document2structure  sdf output
 Generation of pre-annotated patent set stored as ChemCC projects
 Faster, but lower quality within the chemistry extraction process
Negri Matthias, II-SDV 2015 16
2. KNIME branch
- OCR-errors CLEAN-UP in KNIME  improved chemistry recognition
- MOL/CDX/TIF - standardizer, structure checker  filter formulas, solvents, R-groups
 Higher quality and more control in chemistry extraction process
Semi-automatic Patent Curation Workflow
c) chemistry retrieval/extraction/filtering
Negri Matthias, II-SDV 2015 17
2. KNIME branch
 MOL  IUPAC
 CDX  IUPAC
 TIFF (via CLiDE)  IUPAC
Semi-automatic Patent Curation Workflow
c) chemistry retrieval/extraction/filtering
Negri Matthias, II-SDV 2015 18
Merging and Comparison of the converted chemistry
output of MOL/CDX/TIF – 2 “quality” checks
 IUPAC
 string length (different output order of chemicals
in multiple molecules image/multiMOL files
 OCR-correction (“dictionary” based)
2. KNIME - Chemistry “Normalization”
 (within KNIME) set up a relation between each TIFF/attachment file
1. to (one or more) IUPAC name(s)
2. to a position/section in the text/document
Semi-automatic Patent Curation Workflow
c) chemistry retrieval/extraction/filtering
Negri Matthias, II-SDV 2015 19
Merge IUPAC Clean-Up IUPAC
If NO IUPAC  IMG-name is set
“Normalize” IUPAC names
Semi-automatic Patent Curation Workflow
d) TIF/attachment replacement with IUPAC names
Chemistry present as text is recognized and extracted either via
- Textmining (I2E chemistry – d2s is working in behind) or
- Within KNIME/ChemCC using annotate/molconvert
Replacement:
<chemistry> vs IUPAC
IUPAC-enriched XML
Negri Matthias, II-SDV 2015 20
OCR-errors in chemical names
Semi-automatic Patent Curation Workflow
d) TIF/attachment replacement with IUPAC names
TIF
CDX
MOL
Replacement with the derived IUPAC name
Negri Matthias, II-SDV 2015 21
XPATH/XML parsing and extraction of:
 Tables
 Rows - XML tags & strings
 Entries - XML tags & strings
Semi-automatic Patent Curation Workflow
e) Bioactivity/tabular data extraction with KNIME/XPATH
Negri Matthias, II-SDV 2015 22
IUPAC-enriched XML as source for I2E API/textmining
 indexing
 pre-defined queries
 results retrieval
 saved as SDF files (KNIME)
Semi-automatic Patent Curation Workflow
f)Text-/datamining with Linguamatics I2E via KNIME
Text-mining retrieved (chemistry-related) information
 Example Nr.
 Bioactivity data from tables
 Claims, regions where chemistry appears in patents
 Genes, diseases
Negri Matthias, II-SDV 2015 23
1. Example Nr. – IUPAC
Table:Image:
For comparison – chemistry in PDF:
Semi-automatic Patent Curation Workflow
f) Bioactivity Data using I2E multi-queries – 2 steps
Source: (IUPAC-enriched) XML
2. Example Nr. – Bioactivity data
24
IUPAC
Bioactivity
Example Nr.
Semi-automatic Patent Curation Workflow
g) Visualize data-/textmining results in ChemCC
 SDF file loaded into ChemCC project + automatic mapping to existing chemistry
Negri Matthias, II-SDV 2015 25
Lessons learned, weak-points, limitations
1. Advantages KNIME Full-Mode (MOL/CDX/TIF) vs ChemCC branch
 chemistry check/normalization – 3 input sources  improved quality
 improved chemistry recall - ALL images (incl. tables and drawings)
 More filtering options in KNIME workflow vs ChemCurator only
 IUPAC-enriched XML as new source for I2E
 ….
Negri Matthias, II-SDV 2015 26
Lessons learned, weak-points, limitations
2. No full automation of the workflow due to lack of homogenicity in patent data (US
vs WO, EP, etc..)
 Missing attachment files
 No tables present in XML
 Error rate in chemistry recognition (OPSIN vs n2s/d2s)
 …
 NEEDS: different workflows/branches, patent-files clean-up (OCR)
3. Time & Computational Resources-consuming process
Negri Matthias, II-SDV 2015 27
Outlook
1. KNIME Workflow
 Add new data fields to Chemicals: BI-internal codes, genes, targets, etc..
 Usage of ChemCC html output as source for textmining
 Ontology mapping
 Expand workflow by including other sources (internal PDF, literature full-text)
 Use KNIME to interconnect to BI-intern workflows, DB, etc
 chemistry-linked information in a patent-DB  improved (semantic) search
Negri Matthias, II-SDV 2015 28
Outlook
2. ChemCurator
 Improved n2s
 New command-line functions
 Complex-phrase requests from IFI server
 Improved SDF import
 Preprocessing wizards
Árpád Figyelmesi, II-SDV 201529
Thank You !
Negri Matthias, II-SDV 2015 30
INPUT

Weitere ähnliche Inhalte

Ähnlich wie II-SDV 2015, 20 - 21 April, in Nice

A "STEP" Forward for Product Lifecycle Management
A "STEP" Forward for Product Lifecycle Management A "STEP" Forward for Product Lifecycle Management
A "STEP" Forward for Product Lifecycle Management CORETECHNOLOGIE
 
EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...
EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...
EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...ChemAxon
 
A step forward to product lifecycle
A step forward to product lifecycleA step forward to product lifecycle
A step forward to product lifecycleCORETECHNOLOGIE
 
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...Frank Oellien
 
Ctd and ectd m pharmacy notes scop satara
Ctd and ectd  m pharmacy notes scop sataraCtd and ectd  m pharmacy notes scop satara
Ctd and ectd m pharmacy notes scop sataranikhil salunkhe
 
Iochem.carles bo
Iochem.carles boIochem.carles bo
Iochem.carles bomaredata
 
Product design and value engineering (PDVE) Ch 1 introduction
Product design and value engineering (PDVE) Ch 1 introductionProduct design and value engineering (PDVE) Ch 1 introduction
Product design and value engineering (PDVE) Ch 1 introductionChirag Patel
 
EUGM 2014 - Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...
EUGM 2014 -  Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...EUGM 2014 -  Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...
EUGM 2014 - Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...ChemAxon
 
Tool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software EngineeringTool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software EngineeringHeiko Koziolek
 
An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...Roberto Pepato
 
Computer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringComputer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringuniversity of sust.
 
Micrcontroller iv sem lab manual
Micrcontroller iv sem lab manualMicrcontroller iv sem lab manual
Micrcontroller iv sem lab manualRohiniHM2
 
Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...
Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...
Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...WMG, University of Warwick
 
Chemical data management system - Case Study
Chemical data management system - Case StudyChemical data management system - Case Study
Chemical data management system - Case StudyRight Information
 
PerformanceSCORM
PerformanceSCORMPerformanceSCORM
PerformanceSCORMopenforum
 
Performance optimization for a TYPO3 website
Performance optimization for a TYPO3 websitePerformance optimization for a TYPO3 website
Performance optimization for a TYPO3 websiteAliénor.net
 
Automation in Manufacturing (Unit-5) by Varun Pratap Singh.pdf
Automation in Manufacturing (Unit-5) by Varun Pratap Singh.pdfAutomation in Manufacturing (Unit-5) by Varun Pratap Singh.pdf
Automation in Manufacturing (Unit-5) by Varun Pratap Singh.pdfVarun Pratap Singh
 

Ähnlich wie II-SDV 2015, 20 - 21 April, in Nice (20)

A "STEP" Forward for Product Lifecycle Management
A "STEP" Forward for Product Lifecycle Management A "STEP" Forward for Product Lifecycle Management
A "STEP" Forward for Product Lifecycle Management
 
EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...
EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...
EUGM15 - Gábor Pőcze, András Dancsó (ComCix, Egis Pharmaceuticals): Two sides...
 
A step forward to product lifecycle
A step forward to product lifecycleA step forward to product lifecycle
A step forward to product lifecycle
 
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline ...
 
Ctd and ectd m pharmacy notes scop satara
Ctd and ectd  m pharmacy notes scop sataraCtd and ectd  m pharmacy notes scop satara
Ctd and ectd m pharmacy notes scop satara
 
Iochem.carles bo
Iochem.carles boIochem.carles bo
Iochem.carles bo
 
Product design and value engineering (PDVE) Ch 1 introduction
Product design and value engineering (PDVE) Ch 1 introductionProduct design and value engineering (PDVE) Ch 1 introduction
Product design and value engineering (PDVE) Ch 1 introduction
 
EUGM 2014 - Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...
EUGM 2014 -  Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...EUGM 2014 -  Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...
EUGM 2014 - Richard Bolton (GlaxoSmithKline): GlaxoSmithKline: 5 years with ...
 
CTD & ECTD
CTD & ECTDCTD & ECTD
CTD & ECTD
 
Tool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software EngineeringTool-Driven Technology Transfer in Software Engineering
Tool-Driven Technology Transfer in Software Engineering
 
An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...
 
Computer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringComputer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineering
 
Micrcontroller iv sem lab manual
Micrcontroller iv sem lab manualMicrcontroller iv sem lab manual
Micrcontroller iv sem lab manual
 
Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...
Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...
Neil Reynolds, WMG University of Warwick, Innovations in Composite Materials ...
 
ECTD BY NITESH
ECTD BY NITESHECTD BY NITESH
ECTD BY NITESH
 
Chemical data management system - Case Study
Chemical data management system - Case StudyChemical data management system - Case Study
Chemical data management system - Case Study
 
PerformanceSCORM
PerformanceSCORMPerformanceSCORM
PerformanceSCORM
 
Performance optimization for a TYPO3 website
Performance optimization for a TYPO3 websitePerformance optimization for a TYPO3 website
Performance optimization for a TYPO3 website
 
Aspen HYSYS - Basic Course (SS)
Aspen HYSYS - Basic Course (SS)Aspen HYSYS - Basic Course (SS)
Aspen HYSYS - Basic Course (SS)
 
Automation in Manufacturing (Unit-5) by Varun Pratap Singh.pdf
Automation in Manufacturing (Unit-5) by Varun Pratap Singh.pdfAutomation in Manufacturing (Unit-5) by Varun Pratap Singh.pdf
Automation in Manufacturing (Unit-5) by Varun Pratap Singh.pdf
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 

Kürzlich hochgeladen (20)

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 

II-SDV 2015, 20 - 21 April, in Nice

  • 1. Dr. Matthias Negri Scientific Information Center Boehringer Ingelheim Pharma GmbH & Co. KG Chemistry-Enriched Patent Curation semi-automatic analysis and elaboration of patents II-SDV Nice, 21 April 2015 Árpád Figyelmesi ChemAxon
  • 2. Content 1. Chemistry in patents 2. Why do we need a patent curation workflow? 3. Semi-automatic Patent Curation Workflow - Overview 4. Linked tools/technologies 5. ChemCurator (ChemCC) 6. Semi-automatic Patent Curation Workflow – Step by Step 7. Lessons learned, weak-points, limitations 8. Outlook Negri Matthias, II-SDV 2015 2
  • 3. Chemistry in patents Chemistry appears within diverse form in patents: 1. TEXT - IUPAC names, common names, etc 2. IMAGES - embedded within or attached to the document 3. ATTACHMENTS (MOL/CDX) 4. TABLES – as ONE-image file (tables with chemistry and bioactivity data) – as chemistry-only image files embedded within table tags 5. Markush Structures/Formulas with R-groups ---------------------------------------------------------------------------------------  Currently NO commercial solution covers all these cases  Most of the cases are considered in the patent curation workflow (Markush/R-group Formulas recognized and stored separately) Negri Matthias, II-SDV 2015 3
  • 4. Why do we need a patent curation workflow? Motivations: 1. Linked chemistry-retrieval from patents (+ chemistry as images) 2. IUPAC-enriched XML patent files  as NEW source for text-mining 3. extraction of bioactivity data/targets/diseases/… in relation to chemistry 4. Similarity/Substructure frequency in compound sets of patents 5. … Negri Matthias, II-SDV 2015 4
  • 5. Semi-automatic Patent Curation Workflow Overview – current state 2 parallel branches Negri Matthias, II-SDV 2015 5 I2E API KNIME – Batch indexing, text-mining and (relational) data retrieval SLOWER & memory intensive vs BUTHigher Quality, More Control & IUPAC-enriched XML FASTER vs LESS informative/flexible - ChemCC as the (near) future perspectiveINPUT
  • 6. Linked tools/technologies 1. KNIME/XPATH 2. ChemAxon ChemCurator (ChemCC) 3. Other ChemAxon tools in KNIME nodes (document2structure/d2s, Naming, Molconverter, Structure checker, Standardizer, …) 4. Text/data-mining – Linguamatics I2E (+I2E Chemistry) 5. Optical Structure Recognition – Keymodule CLiDE Batch Negri Matthias, II-SDV 2015 6
  • 7. Content 1. Chemistry in patents 2. Why do we need a patent curation workflow? 3. Semi-automatic Patent Curation Workflow - Overview 4. Linked tools/technologies 5. ChemCurator (ChemCC) 6. Semi-automatic Patent Curation Workflow – Step by Step 7. Lessons learned, weak-points, limitations 8. Outlook Negri Matthias, II-SDV 2015 7
  • 8. Computer-aided chemical data extraction  English, Chinese and Japanese N2S  Markush Editor  Structure Checker  Hit visualization  Third party OSR technologies ChemCurator (ChemCC) Árpád Figyelmesi, II-SDV 20158
  • 9. ChemCurator (ChemCC) Name to Structure  Support for many nomenclatures (common, drug names, …)  IUPAC names  Custom dictionaries  English (2008)  Chinese (2013)  Japanese (2014) Árpád Figyelmesi, II-SDV 20159
  • 10. Compound Extraction View Compound listProject explorer Annotated document Selected structures ChemCurator (ChemCC) 10
  • 11. Markush Extraction View Markush editor Example structures Annotated document Project explorer Selected structures Structure checker ChemCurator (ChemCC) 11
  • 12. General Document Curation Extract Markush Structures from patents Extract specific structures  Journal articles  Company reports  Patent examples Structure extraction wizards  Exclude fragments, chemical elements, etc. ChemCurator (ChemCC) Árpád Figyelmesi, II-SDV 201512
  • 13. ChemCurator (ChemCC) Integration & Information Sharing Other ChemAxon products:  Direct IJC schema connection  Project sharing function  Accessible from Plexus, IJC, etc. Third party tools:  Standard file formats  Export functions  Easily processable projects Árpád Figyelmesi, II-SDV 201513
  • 14. Content 1. Chemistry in patents 2. Why do we need a patent curation workflow? 3. Semi-automatic Patent Curation Workflow - Overview 4. Linked tools/technologies 5. ChemCurator (ChemCC) 6. Semi-automatic Patent Curation Workflow – Step by Step 7. Lessons learned, weak-points, limitations 8. Outlook Negri Matthias, II-SDV 2015 14
  • 15. Semi-automatic Patent Curation Workflow a) input sources and b) bibliographic data a) Input sources  files with patent-IDs list  XML collection  … b) Retrieval of bibliographic information and attachment data  family ID, patent references, expiration date, etc  Attachment files MOL/CDX (US-patents only), TIF files  …. Negri Matthias, II-SDV 2015 15
  • 16. Semi-automatic Patent Curation Workflow c) chemistry retrieval/extraction/filtering 1. ChemCurator branch  data retrieval (XML, attachments) from IFI Claims Direct BI-server  ChemCurator project creation/sharing/annotation  html output  Chemistry extraction name2structure/document2structure  sdf output  Generation of pre-annotated patent set stored as ChemCC projects  Faster, but lower quality within the chemistry extraction process Negri Matthias, II-SDV 2015 16
  • 17. 2. KNIME branch - OCR-errors CLEAN-UP in KNIME  improved chemistry recognition - MOL/CDX/TIF - standardizer, structure checker  filter formulas, solvents, R-groups  Higher quality and more control in chemistry extraction process Semi-automatic Patent Curation Workflow c) chemistry retrieval/extraction/filtering Negri Matthias, II-SDV 2015 17
  • 18. 2. KNIME branch  MOL  IUPAC  CDX  IUPAC  TIFF (via CLiDE)  IUPAC Semi-automatic Patent Curation Workflow c) chemistry retrieval/extraction/filtering Negri Matthias, II-SDV 2015 18
  • 19. Merging and Comparison of the converted chemistry output of MOL/CDX/TIF – 2 “quality” checks  IUPAC  string length (different output order of chemicals in multiple molecules image/multiMOL files  OCR-correction (“dictionary” based) 2. KNIME - Chemistry “Normalization”  (within KNIME) set up a relation between each TIFF/attachment file 1. to (one or more) IUPAC name(s) 2. to a position/section in the text/document Semi-automatic Patent Curation Workflow c) chemistry retrieval/extraction/filtering Negri Matthias, II-SDV 2015 19 Merge IUPAC Clean-Up IUPAC If NO IUPAC  IMG-name is set “Normalize” IUPAC names
  • 20. Semi-automatic Patent Curation Workflow d) TIF/attachment replacement with IUPAC names Chemistry present as text is recognized and extracted either via - Textmining (I2E chemistry – d2s is working in behind) or - Within KNIME/ChemCC using annotate/molconvert Replacement: <chemistry> vs IUPAC IUPAC-enriched XML Negri Matthias, II-SDV 2015 20
  • 21. OCR-errors in chemical names Semi-automatic Patent Curation Workflow d) TIF/attachment replacement with IUPAC names TIF CDX MOL Replacement with the derived IUPAC name Negri Matthias, II-SDV 2015 21
  • 22. XPATH/XML parsing and extraction of:  Tables  Rows - XML tags & strings  Entries - XML tags & strings Semi-automatic Patent Curation Workflow e) Bioactivity/tabular data extraction with KNIME/XPATH Negri Matthias, II-SDV 2015 22
  • 23. IUPAC-enriched XML as source for I2E API/textmining  indexing  pre-defined queries  results retrieval  saved as SDF files (KNIME) Semi-automatic Patent Curation Workflow f)Text-/datamining with Linguamatics I2E via KNIME Text-mining retrieved (chemistry-related) information  Example Nr.  Bioactivity data from tables  Claims, regions where chemistry appears in patents  Genes, diseases Negri Matthias, II-SDV 2015 23
  • 24. 1. Example Nr. – IUPAC Table:Image: For comparison – chemistry in PDF: Semi-automatic Patent Curation Workflow f) Bioactivity Data using I2E multi-queries – 2 steps Source: (IUPAC-enriched) XML 2. Example Nr. – Bioactivity data 24 IUPAC Bioactivity Example Nr.
  • 25. Semi-automatic Patent Curation Workflow g) Visualize data-/textmining results in ChemCC  SDF file loaded into ChemCC project + automatic mapping to existing chemistry Negri Matthias, II-SDV 2015 25
  • 26. Lessons learned, weak-points, limitations 1. Advantages KNIME Full-Mode (MOL/CDX/TIF) vs ChemCC branch  chemistry check/normalization – 3 input sources  improved quality  improved chemistry recall - ALL images (incl. tables and drawings)  More filtering options in KNIME workflow vs ChemCurator only  IUPAC-enriched XML as new source for I2E  …. Negri Matthias, II-SDV 2015 26
  • 27. Lessons learned, weak-points, limitations 2. No full automation of the workflow due to lack of homogenicity in patent data (US vs WO, EP, etc..)  Missing attachment files  No tables present in XML  Error rate in chemistry recognition (OPSIN vs n2s/d2s)  …  NEEDS: different workflows/branches, patent-files clean-up (OCR) 3. Time & Computational Resources-consuming process Negri Matthias, II-SDV 2015 27
  • 28. Outlook 1. KNIME Workflow  Add new data fields to Chemicals: BI-internal codes, genes, targets, etc..  Usage of ChemCC html output as source for textmining  Ontology mapping  Expand workflow by including other sources (internal PDF, literature full-text)  Use KNIME to interconnect to BI-intern workflows, DB, etc  chemistry-linked information in a patent-DB  improved (semantic) search Negri Matthias, II-SDV 2015 28
  • 29. Outlook 2. ChemCurator  Improved n2s  New command-line functions  Complex-phrase requests from IFI server  Improved SDF import  Preprocessing wizards Árpád Figyelmesi, II-SDV 201529
  • 30. Thank You ! Negri Matthias, II-SDV 2015 30 INPUT