This document outlines 7 issues that can occur when analyzing patent citation data from the PATSTAT database and provides solutions to each problem. The issues include citations disappearing when investigating EP/PCT citations, missing priority information, incomplete address data, ownership changes not being tracked, unequal coverage across patent offices, duplicate citation records, and duplicate inventor/applicant records. Solutions proposed include integrating citation information from equivalent documents, adding missing priority records, supplementing address data from other sources, using ownership transfer codes to track changes, and removing duplicate records.
đChandigarh Call Girls đ9878799926đJust CallđChandigarh Call Girl In Chandiga...
Â
Patstat 7 deadly sin and how to solve them
1. PATSTAT 7 DEADLY SINS
(and how to solve them)
Gianluca Tarasconi â Crios UniversitĂ Bocconi
Blog: http://rawpatentdata.blogspot.com
2. ï When investigating EP / PCT citations some of them
disappear, as if âeaten upâ
ï SEE FI: EP1103560 equivalent to WO0006594
PUBLN_AUTH
APPLN_ID
+ NR
PUBLN_ID
NPL_CITN
SEQ _NR
347305
EP1103560
511640
1
347305
EP1103560
511640
3
NPL_PUBLN_ID NPL_BIBLIO
No further relevant documents
950236893 disclosed
See also references of WO
950236894 0006594A1
ï From citations table we would agree it has only 2 NPL
(and one of them is "SEE ALSO CITATIONS OF
WO0006594 â
3. ï but looking at the patent's search report:
http://worldwide.espacenet.com/publicationDetails/originalDocument?CC=EP&NR=1103560A1&KC=A1&
FT=D&ND=3&date=20010530&DB=EPODOC&locale=en_EP
we can find 1 NPL and 3 Patents that are in reality listed as backward citations
4. ï As a matter of fact, seeking in espacenet the
correspondig WO we find:
http://worldwide.espacenet.com/publicationDetails/citedDocuments?CC=WO&NR=0006594A1&KC=A1&FT=D&ND=
4&date=20000210&DB=EPODOC&locale=en_EP
5. ï An issue inherited from REFI dataset:
when a PCT / EP patent has an equivalent we may
have, instead of the list of citations, an entry as NPL
that is something like "SEE ALSO CITATIONS OFâŠâ
'EP'
'WO'
publn_id affected
334140
334135
publn_id tot
4505456
2682654
ï (REFI is the original EPO dataset for citations)
%
7,42%
12,46%
6. ï A possible solution is to integrate citations of WO/EP
equivalent patent where a record of NPL contains the
string "SEE ALSOâ
ï This means adding patents and NPL citations from the
corresponding INTERNAT_APPLN_ID in TLS201. In
our example weâd find all (and something more ï)
PUBLN_AUTH
APPLN_ID
+NR
30241523 WO0006594
30241523 WO0006594
30241523 WO0006594
30241523 WO0006594
30241523 WO0006594
PAT_CITN
PUBLN_ID
SEQ_NR
38723126
38723126
38723126
38723126
38723126
CITED_PAT
NPL_CITN
PUBLN_ID
SEQ_NR
NPL PUBLN_ID NPL_BIBLIO / CITED PUBNR
1
45098451
0
0 JPH09124691
2
46832598
0
0 JPH01143897
3
34575918
0
0 JPS56139455
0
0
0
0
1
2
GILBERT M. RISHTON ET AL: 'A beta-turn
Mimic...Study of Cyclic Peptide RGD and RCD Celladhension Inhibitors' LETTERS IN PEPTIDE SCIENCE
2925184 vol. 3
952900902 See also references of EP 1103560A1
7. ï A part of applications in the DB result not to have any
priority but they should (greedy EPO kept them for itselfâŠ)
ï Easy identification with EP applications with a
corresponding PCT with no priority
APPLN_ID APPLN_AUTH APPLN_NR
'
20701 'EP'
92917913'
APPLN_KIND APPLN_FILING_DATE INTERNAT_APPLN_ID
'A'
'1992-08-10'
11643479
ï Seeking both APPLN_ID and INTERNAT_APPLN_ID in
TLS204 we find they have no priorities (no records
returned);
ï Surprisingly they are in the same inpadoc family
(TLS219)
8. ï If we seek in Espacenet we see 2 more priorities are lost:
ï 48.945 cases (1% of PCT applications) involved, but may
have a bigger effect if counted by priorities missingâŠ
9. ï Partial solution:
ï Add APPLN_ID and INTERNAT_APPLN_ID from
TLS201 to a customized priorities list, correspondingly
as application and priority.
ï This issue affects also calculation of priority yearâŠ
10. ï As from table aside:
address coverage for
some application
autorities is very poor,
enraging many
usersâŠ
ï
(top 15 authorities by distinct person id;
oct 2012 data)
11. ï Two safe ways to recover address data:
ï 1) for PCT applications: rescue data from regional
phase persons: about 7/8 % of data
ï 2) rescue data from homonims for the same
applications (ie applicant = inventor): 3% of data
(especially effective for USPTO)
ï One more risky method:
ï Rescue data from homonims in patent priorities
(10/15% but less safe, see example below)
DOCDB_FAMILY_ID APPLN_ID APPLN_ID1 person_name PERSON_CTRY_CODE PERSON_CTRY_CODE1
22857305 24074575
6621661 'A. AGRAWAL' 'US'
'IN'
12. ï TLS221 offers a way, through PRS CODE RAP1 to track
ownership changes.
ï For example: for EP application id 15706726 we can
track a double ownership change
APPLN_ID DATE
PRS CODE NAME
EXPLANATION
15706726 18/10/2006 RAP1
NEUMANN ELEKTROTECHNIK GMBH
TRANSFER OF RIGHTS OF AN EP APPLICATION
15706726 24/11/2010 RAP1
SENTEX CHEMNITZ GMBH
TRANSFER OF RIGHTS OF AN EP APPLICATION
ï If we look at patstat TLS206/207 tables we will find
actual owner (SENTEX CHEMNITZ) but we cannot get
the first owner (removed from envious new ownerâŠ
ï)
13. ï The only solution is to look in espacenet and find, in
bibliographic data web page, the name of the first owner
(in this case Univ Dresda)
ï
ï Note: oct 2013 pastat includes major changes to persons
management maybe this issue has new solutionsâŠ
14. ï PATSTAT covers about 100 patent authorities, but with
inequal coverage and pubblication lags.
ï Good coverage and short lags for EU countries; less
good and regular for national patent authorities
outside EU (except big players ie US JPâŠ)
ï In next page an example, using GB as baseline for BR
and IN: (applications count)
15. ï
ï
ï
Two possible errors:
different transmission timeframe (decay of patent count in BR starts before GB);
Partial data transimssion: counts are different than official data from patent office (see IN
next page)
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
BR
10851
10122
9103
10272
10992
13557
15580
18589
19032
21019
20725
20626
19265
20909
22816
23973
23472
16078
10088
8843
5028
539
7
GB
30055
29991
30089
29901
29560
29909
30448
31219
32828
35222
36996
36884
36318
35452
33794
31066
30495
30848
28816
27103
25363
24010
7955
IN
2209
2002
1958
2032
2529
2554
1679
1383
1026
750
690
705
757
1049
1113
1691
1973
2215
2541
2507
2988
872
28
16. ï Official data for India applications:
IN
2007
2008
2009
2010
2011
2012
2215
2541
2507
2988
872
28
ï On EPO website at page:
https://data.epo.org/data/data.html coverage by
authority is listed (available in absolute numbers not
in % )
17. ï Consider all records contained in TLS212 for
PAT_PUBLN_ID = 3:
PAT_PUBLN_ID CITN_ID CITED_PAT_PUBLN_ID PAT_CITN_SEQ_NR CITN_ORIGIN
3
1
20433311
1 '0 '
3
2
20473739
2 '0 '
3
3
15421766
3 '0 '
3
4
20433311
4 '1 '
3
5
20473739
5 '1 '
3
6
15421766
6 '1 '
ï We note all records are duplicated since the origin of
citation is double (both 0/1: applicant and examiner)
ï This may lead to an overextimation of citations received.
ï About 750K records out of 100M (0,75%) suffer of this issue
butâŠ
18. ï Distribution of the error is very concentrated:
ï EPO: 218.000 (about 2.5 %)
ï WIPO 318.000 (about 3%)
ï This makes also unclear which is the origin of
citationâŠ
ï Solution: count distinct citations; move to a separate
table citation origin data
19. ï Some applications, in TLS207 have the same person_id
repeated twice. FI:
ï This âreproductive actâ
APPLN_ID PERSON_ID
APPLT_SEQ_NR
INVT_SEQ_NR
2055
15868134
0
2
2055
15868134
0
5
2055
27024905
0
1
2055
27024905
0
4
2055
31219618
2
0
2055
31219618
1
0
2055
40555313
0
3
2055
40555313
0
6
affects only 0,1% of TLS207
but is probably originated
by a legal event like Change of Ownership or data
correction, since all of the records list one of that events
(3.4% of such applications suffer of duplications).
ï Suggestion: data from TLS207 should be treated using a
DISTINCT clause.