Searching for Markush structures has been a rather difficult task especially since it was necessary in the past to work with different retrieval systems. With the new implementation of the DWPI Markush database from Thomson Reuters on STN it is now possible to search for Markush structures using a single structure query for all structure databases. In this system the structure and bibliographic databases are integrated within a content domain which allows easy and fast projections between the databases. It will be shown that the DWPI Markush concept of superatoms can be integrated in the STN query system, allowing users to exploit the full potential of the DWPI Markush data. To enable complete and high precision searches it was necessary to develop a new Markush search engine. Improved evaluation of Markush structures is possible with hit structure display, highlighting, and assembled structures. Based on this implementation it will be possible to develop further innovative features in the future.
5. 1.9 M Markush structures referencing 768 K DWPI documents
Records in DWPIM are structure based
– Markush Compound Number (YYWW-AAANN) links to DWPI
– Single Markush structure including all variations
Nodes in the structure may be a specific atom, a shortcut, a
superatom representing generically-described groups in the
structure or a variable group
Only generic structures are indexed in DWPIM
– Specific structures can be simultaneously searched for in DCR
– In some rare cases specific structures are contained only in
DWPIM
Substance Classes: organics, organometallics, inorganics,
polymers, fullerenes, peptides
Indexing from 33 patent issuing authorities worldwide
DWPIM on STN – ICIC, Nizza, 2015 5
Content of the Derwent Markush Resource
6. Chemistry-related Databases of Thomson Reuters
DWPIM on STN – ICIC, Nizza, 2015 6
M3 *01* A111 A940 B115
B702 B713
Derwent World Patent
Index (DWPI)
AN 2012-N47214 [201312] DWPI
TI Preparation of active material …
28.8 m Inventions
Derwent Chemistry
Resource (DCR)
0334-20904
2.5 m Chemical Structures
Derwent Markush
Resource (DWPIM)
1154-11901-K
1.9 m Markush Structures
7. Integrate different concepts of STN and Derwent
– Different set of homology nodes
– Different bonding conventions
Retain indexing concept of Derwent within the STN
environment
Develop a new state of the art Markush search engine
– Focus on full recall
– Provide full set of search functionalities for high precision
Improve current possibilities for the evaluation of Markush
search results
Provide high performance search environment
DWPIM on STN – ICIC, Nizza, 2015 7
Challenges
8. DWPIM Concept is based on 22 Superatoms
DWPIM on STN – ICIC, Nizza, 2015 8
Acyclic / Cyclic
(ATOM, CLASS, ANY)
CHK (Alkyl, Alkylene)
CHE (Alkenyl, Alkenylene)
CHY (Alkynyl, Alkynylene)
ARY (Aryl)
CYC (Cycloaliphatic)
HEA (Monocyclic heteroaryl)
HET (Monocyclic nonaromatic)
HEF (Fused heterocyclic)
9. DWPIM Concept is based on 22 Superatoms
DWPIM on STN – ICIC, Nizza, 2015 9
Acyclic / Cyclic
(ATOM, CLASS, ANY)
Elements
(ATOM, CLASS, ANY)
Others
(only match on themselves)
CHK (Alkyl, Alkylene) MX (Any metal) ACY (Acyl)
CHE (Alkenyl, Alkenylene) A35 (Group III A - V A metal) DYE (Chromophore)
CHY (Alkynyl, Alkynylene) ACT (Actinide) PEG (Polymer end group)
ARY (Aryl) AMX (Alkali/alkaline earth
metal)
POL (Polymer)
CYC (Cycloaliphatic) LAN (Lanthanide) PRT (Protecting group)
HEA (Monocyclic heteroaryl) TRM (Transition metal) UNK (Any atom or group including H)
HET (Monocyclic nonaromatic) HAL (Halogen) XX (Any atom or group excluding H)
HEF (Fused heterocyclic)
10. Derwent Indexing Hierarchy of Superatoms
DWPIM on STN – ICIC, Nizza, 2015 10
XX
ARY CYC HEF HEA HET CHK CHE CHY MX HAL
A35 ACT AMX LAN TRM
11. STN Query Hierarchy of Nodes
DWPIM on STN – ICIC, Nizza, 2015 11
R
Cb Ak
Cy
XMHy
12. DWPIM on STN – ICIC, Nizza, 2015 12
Integrated Query Hierarchy of Generic Nodes
R
Cb
AkCy
specific:
benzene
specific:
cyclohexane
specific:
quinoline
specific:
pyridine
specific:
piperidine
specific:
CH3 …
specific:
CH2 = CH3
specific:
ethyne
ARY CYC HEF HEA HET CHK CHE CHY
Hy
13. Match levels control the degree of structure query matching
between the query structure and the structure in the search
file.
Match levels are assigned to each atom and generic
group/superatom.
All nodes of a ring system have the same match level
– Note: Special cases require different match levels for the
atoms of a ring system
Default Match Levels on STN
– ATOM for ring nodes
– CLASS for chain nodes
DWPIM on STN – ICIC, Nizza, 2015 13
Match Levels Control the Search Process
15. Search: pyridine
MLE: ATOM
DWPIM on STN – ICIC, Nizza, 2015 15
Effect of Match Level in DWPIM: ATOM
R
Cb
AkCy
specific:
benzene
specific:
cyclohexane
specific:
quinoline
specific:
pyridine
specific:
piperidine
specific:
CH3 …
specific:
CH2 = CH3
specific:
ethyne
ARY CYC HEF HEA HET CHK CHE CHY
Hy
16. DWPIM on STN – ICIC, Nizza, 2015 16
Effect of Match Level in DWPIM: CLASS
R
Cb
AkCy
specific:
benzene
specific:
cyclohexane
specific:
quinoline
specific:
pyridine
specific:
piperidine
specific:
CH3 …
specific:
CH2 = CH3
specific:
ethyne
ARY CYC HEF HEA HET CHK CHE CHY
Hy
Search: pyridine
MLE: CLASS
17. DWPIM on STN – ICIC, Nizza, 2015 17
Effect of Match Level in DWPIM: ANY
Search: pyridine
MLE: ANY
R
Cb
AkCy
specific:
benzene
specific:
cyclohexane
specific:
quinoline
specific:
pyridine
specific:
piperidine
specific:
CH3 …
specific:
CH2 = CH3
specific:
ethyne
ARY CYC HEF HEA HET CHK CHE CHY
Hy
18. Match Level ATOM Results in Specific Nodes
DWPIM on STN – ICIC, Nizza, 2015 18
MCN 9912-JKW07Query (MLE ATOM)
19. Match Level CLASS Results in Generic Nodes
Answer #5: 9917-IVD01Query (MLE CLASS)
DWPIM on STN – ICIC, Nizza, 2015 19
20. STN-Convention: Bond values in the structure editor
• Note that “exact” from structure editor is
translated to single exact or double exact
Indexed bonds: Bond value in the indexed
structures, e.g.
• single exact can only match to a single bond
• single/normalized bond can either match to a
single or to a normalize bond in the indexed
structure.
DWPIM on STN – ICIC, Nizza, 2015 20
Different Types of Bond Conventions
21. Why are bond values important ?
DWPIM on STN – ICIC, Nizza, 2015 21
Query structure
Exact (= single exact)
normalized
exact/
normalized
22. Rings: Normalized bonds are used in ring systems with an even
number of atoms containing alternate double and single bonds.
• Benzene, Pyridine, Naphtalene
• Exceptions: cyclopentadienyl anion, cycloheptatrienly cation
Tautomers:
DWPIM on STN – ICIC, Nizza, 2015 22
Bond Normalization
Where Z can be: B, C, Si, N, P, As, S, Se, Te, F, Cl, Br, I
X and Y can be: O, S, Se, Te, N
23. DWPIM on STN – ICIC, Nizza, 2015 23
Query:
Search Example 1
24. DWPIM on STN – ICIC, Nizza, 2015 24
Search Example 2
Query:
25. Consistent search of MARPAT
from CAS and DWPIM from
Thomson Reuters on a single
platform
Offers same Markush attributes
as classic STN
Retrieves results as Markush
structures, not references
Highly efficient retrieval and
evaluation of Markush results
DWPIM hit structures:
Both the Assembled view and
Full view will be available.
With DWPIM new STN will deliver the first-ever,
Unified Markush Search Solution
26. Full integration in Thomson Reuters and CAS content
Simultaneous search for generic and specific chemical structures
DWPISM
Derwent World
Patents Index®
DCR Derwent
Chemistry
Resource
~2.5 Mio
DWPIM
Derwent Markush
Resource
~1.9 Mio
CAplusSM
Chemical
Abstracts
MARPAT®
CAS Markush
Database
~ 1.1 Mio
CAS
RegistrySM
~ 102 Mio
Structure
Search
REAXYSFILE
~ 25 Mio
DWPIM is integrated with DWPI for efficient searching together
with CAS Databases
27. • Introduction
• Derwent and STN Generic Node Concepts
• Application of Match Levels
• Derwent and STN Bonding Conventions
• Some Examples
• STN Unified Markush Solution
• Derwent Markush Resource – Data Content
27
Agenda
28. DERWENT MARKUSH RESOURCE ON STN
28
DWPIM on STN – ICIC, Nizza, 2015
Contains full DWPI and INPI Markush data backfile
1.9 million Markush structures
33 patent issuing authorities covered
Indexing for >777,000 patent families in DWPI
US, EP and WO coverage from 1978 onwards
DWPI data from 1987 to date
INPI sourced data from 1961 to 1998
Full coverage of organics, organometallics, inorganic salts
and metal oxides
Partial coverage of alloys, intermetallics and polymers
30. DERWENT MARKUSH RESOURCE ON STN
CONTENT
• Integration of INPI content
– 213,000 structure files
– New DWPI format compound numbers created and added
into 120,000 DWPI records
• Format 82nn-nnnnn and 83nn-nnnnn
• INPI indexing was added to all relevant DWPI records
– including those already containing DWPI sourced Markush
indexing for the period 1987 – 1998 (ie double indexing)
• Re-indexed DWPI content
– 45,0000 structures were re-indexed, with new compound
numbers added into DWPI
• Format 84nn-nnnnn and 85nn-nnnnn
30