SlideShare ist ein Scribd-Unternehmen logo
1 von 35
PATSTAT users 7 deadly sins
Gianluca Tarasconi, ICRIOS DBA
rawpatentdata.blogspot.com
Leuven 20/9/2017
In short
 This presentation aims to show out 7
common errors user may incur in when
they use PATSTAT;
 This is in ideal the continuation of
‘PATSTAT 7 deadly sins’ from 2013
 Nevertheless there is only one sin user
have to avoid when using patent data:
 … SLOTH ….
10
Inventors / applicants are not always listed (I)
A part of applications miss Inventors and/or
applicants data
SELECT
Sum(If(b.APPLN_ID IS NULL, 1, 0)) AS noperson,
Count(c.APPLN_ID) AS n_APPLN_ID
FROM
patstat.tls207_pers_appln b
RIGHT JOIN patstat.tls201_appln c ON b.APPLN_ID = c.APPLN_ID
WHERE
Appln_kind <>”D2”
10
n appln_id no person %
221.595.818 18.202.821 9%
Autumn 2016 data
Inventors / applicants are not always listed (II)
10
 Limit to A,W applications, offices with >
10.000 applications
appln_auth appln_kind noperson n_APPLN_ID perc
LU A 57057 88522 64%
BE A 453348 784265 58%
NL A 382777 681266 56%
SE A 552912 1345982 41%
AT A 154041 751803 20%
CH A 341522 1839496 19%
FR A 793741 4501015 18%
DD A 99003 651159 15%
EA A 17272 118772 15%
GT A 2423 17932 14%
CA A 928350 7300631 13%
CS A 46955 381949 12%
GB A 579052 4924739 12%
DK A 68952 589986 12%
Person_id is not an entity id (I)
 Person_id in patstat do not identifies an
entity but a distinct name – address-
country
 Same entity  more person_ids
 Same person_id  more entity
10
Person_id is not an entity id: top inventors
SELECT
a.PERSON_NAME, a.PERSON_ADDRESS, a.PERSON_CTRY_CODE,
Count(c.APPLN_ID) AS Count_APPLN_ID,
Min(c.EARLIEST_FILING_YEAR) AS Min_EARLIEST_FILING_YEAR,
Max(c.EARLIEST_FILING_YEAR) AS Max_EARLIEST_FILING_YEAR
FROM
patstat.tls207_pers_appln b
INNER JOIN patstat.tls206_person a ON a.PERSON_ID = b.person_id
INNER JOIN patstat.tls201_appln c ON b.APPLN_ID = c.APPLN_ID
WHERE b.invt_seq_nr > 0 and c.EARLIEST_FILING_YEAR < 9999
GROUP BY a.PERSON_NAME, a.PERSON_ADDRESS,
a.PERSON_CTRY_CODE
ORDER BY Count_APPLN_ID DESC
10
Person_id is not an entity id : top inventors (II)
person_name ctry_code person_id n_app minyear maxyear
THE INVENTOR HAS WAIVED THE RIGHT TO BE MENTIONED 19584860 38067 2002 2015
KVASENKOV OLEG IVANOVICH RU 34298480 29682 2003 2015
WANG WEI 15786453 23156 1985 2015
ZHANG WEI 14837632 21771 1985 2015
NAME NOT GIVEN 13592151 17722 1964 2002
LI WEI 13615436 17298 1985 2015
VERZICHT DES ERFINDERS AUF NENNUNG 21108740 17260 1964 1993
WANG JUN 18500497 15755 1985 2015
LIU WEI 18697297 15319 1985 2015
LI JUN 18510590 14854 1985 2015
WANG LEI 18754169 14710 1986 2015
ZHANG LEI 18557049 14244 1987 2015
ZHANG JUN 18719351 12815 1985 2015
WANG JIAN 13113349 11936 1986 2015
WANG YONG 12656416 11844 1985 2016
ZHANG JIAN 14914085 11837 1985 2015
CHEN WEI 14837625 11706 1985 2015
WANG HUI 18663499 11452 1987 2015
LIU YANG 13930482 11126 1985 2015
LIU JUN 18710534 10927 1985 2015
LI LI 13632985 9958 1985 2015
AKTIENGESELLSCHAFT I. G. FARBENINDUSTRIE DE 17443080 9958 1897 1942
WANG TAO 18331978 9856 1985 2015
ZHANG YONG 18712075 9795 1985 2015
ZHANG LI 18704857 9716 1985 2015
10
Person_id is not an entity id: network analysis
SELECT
a.person_id, Count(DISTINCT b.person_id) AS n_coinv,
t6.PERSON_NAME, t6.PERSON_ADDRESS, t6.PERSON_CTRY_CODE
FROM
patstat.tls207_pers_appln a
INNER JOIN patstat.tls207_pers_appln b ON a.APPLN_ID = b.APPLN_ID
INNER JOIN patstat.tls206_person t6 ON t6.PERSON_ID = a.person_id
WHERE a.invt_seq_nr > 0 AND b.invt_seq_nr > 0
GROUP BY a.person_id, t6.PERSON_NAME, t6.PERSON_ADDRESS,
t6.PERSON_CTRY_CODE
ORDER BY person_id1 DESC
10
Person_id is not an entity id: network analysis
person_id n coinv name address
15786453 32384 WANG WEI
14837632 27602 ZHANG WEI
13615436 25550 LI WEI
18697297 21915 LIU WEI
18754169 21237 WANG LEI
18557049 20629 ZHANG LEI
18500497 20562 WANG JUN
18510590 19789 LI JUN
13113349 17270 WANG JIAN
13930482 16618 LIU YANG
18719351 16576 ZHANG JUN
14914085 16464 ZHANG JIAN
12656416 16208 WANG YONG
18663499 15686 WANG HUI
18704857 15224 ZHANG LI
14837625 15027 CHEN WEI
13632985 14882 LI LI
18331978 14780 WANG TAO
12656569 14656 LI YAN
18712075 14616 ZHANG YONG
10
Whang and Zhang Wei
have in common 120
Sipo patents; top 3
have 3 degrees of
distance networks of
about 900K inventors
person_id name 3 DoD
15786453 WANG WEI 943.562
14837632 ZHANG WEI 925.099
13615436 LI WEI 916.268
Person_id is not an entity id:
possible solution
 At analisys level the couple person_id –
appln_id identifies for sure one entity
 Starting at this level of disaggregation entities
should be disambiguated further with other
means
(FI appln 1
& 2 from same
applicant)
10
CPC codes coverage is incomplete (I)
 The Cooperative Patent Classification
(CPC) was initiated as a joint partnership
between the USPTO and the EPO;
 It has a more complete set of technologies
(fi green energy, nanotech);
 It started in 2011, it does not apply to all
type of patents (ie Utility models) and it has
backward data to be rebuilt.
10
CPC codes coverage is incomplete (II)
 Coverage of CPC allover patstat is far from
good and much smaller than IPC coverage
10
appln kind n app n with cpc cpc rate ipc rate
'A' 66.750.533 39.505.860 0.5918 0.8413
'U' 13.503.902 1.140.172 0.0844 0.9115
'W' 3.012.030 2.990.252 0.9928 0.9900
CPC coverage (type A)
10
APPLN
KIND
APPLN
AUTH
Count
APPLN_ID
count_app
with_cpc ratio
A AR 143884 103372 72%
A AT 587486 174977 30%
A AU 1374657 1114774 81%
A BE 646320 551552 85%
A BR 547104 374724 68%
A CA 3209303 1269659 40%
A CH 1048915 571085 54%
A CN 6343484 2155452 34%
A DE 4617268 3861583 84%
A DK 319177 119062 37%
A EP 3227647 3113078 96%
A ES 423071 202677 48%
A FI 251054 112028 45%
A FR 3098874 2387891 77%
A GB 3384892 2116655 63%
A GR 69272 24607 36%
A HK 133738 119890 90%
A HU 131491 73025 56%
A IE 91782 43044 47%
A IL 216193 122462 57%
A IN 106610 46024 43%
A IT 605707 326251 54%
A JP 13944907 4355789 31%
A KR 2831385 1425304 50%
A LU 68712 59814 87%
A MX 262534 236276 90%
A MY 50974 40612 80%
A NL 595393 528493 89%
A NO 222376 171392 77%
A NZ 141064 110223 78%
A PL 246209 79640 32%
A RU 658280 199365 30%
A SE 858651 330375 38%
A SG 102679 90508 88%
A SU 1363419 100573 7%
A TW 737206 497644 68%
A UA 55255 18206 33%
A US 12700957 11612249 91%
A ZA 293611 191492 65%
after Y2K
80%
71%
80%
94%
79%
25%
68%
32%
93%
6%
95%
88%
22%
98%
43%
80%
86%
51%
32%
77%
43%
58%
32%
54%
93%
94%
74%
90%
83%
81%
32%
32%
17%
90%
53%
66%
31%
98%
67%
SELECT a.APPLN_KIND, a.APPLN_AUTH,
Count(distinct a.APPLN_ID) AS Count_APPLN_ID, count(distinct
b.appln_id) count_app_with_cpc, count(distinct
b.appln_id)/Count(distinct a.APPLN_ID) as ratio
FROM
patstat.tls201_appln a LEFT JOIN patstat.tls224_appln_cpc b
ON a.APPLN_ID = b.appln_id
WHERE a.APPLN_KIND in ('A','W', 'U')
GROUP BY a.APPLN_KIND, a.APPLN_AUTH
Situation is not homegenueus
After Y2K things improve a bit
CPC coverage type U , W
APPLN
KIND
APPLN
AUTH
Count
APPLN_ID
count_app
with_cpc ratio after Y2K
U BR 103233 5179 5% 10%
U CN 5894022 251879 4% 4%
U DE 1406011 618249 44% 43%
U ES 327087 32007 10% 15%
U IT 139608 12912 9% 14%
U JP 4289887 113890 3% 7%
U KR 506761 44226 9% 16%
U RU 166613 5567 3% 4%
U TW 407155 30996 8% 7%
U UA 103880 2037 2% 2%
W CN 160005 155010 97% 97%
W DE 65673 65433 100% 100%
W EP 462944 461118 100% 100%
W FR 82356 81494 99% 98%
W GB 114614 114257 100% 100%
W IB 134635 133070 99% 99%
W JP 503441 497961 99% 99%
W KR 119158 118141 99% 99%
W SE 53444 53264 100% 100%
W US 1002525 1000291 100% 100%
10
Count for offices with > 50K
patents
Pct data coverage is almost full
Utility models not really possible to
use.
Missing data for PCT equivalent
 EP data where originated from regional
phase of a PCT patent can be partial
 At least Abstract and Citations could be
missign and have to be extracted from PCT
equivalent (column INTERNAT_APPLN_ID
in tls201)
10
APPLN_ID APPLN_AUTH APPLN_NR APPLN_KIND IPR_TYPE INTERNAT_APPLN_ID int_phase reg_phase nat_phase GRANTED
347305EP 99931561 A PI 30241523Y Y N 1
Missing abstracts
APPLN_KIND Count_APPLN_ID Abstracts ratio
A (ep) 3227647 1849737 57%
W (pct) 3012030 2992978 99%
10
select
a.APPLN_KIND,
Count(a.APPLN_ID) AS Count_APPLN_ID,
Count(b.APPLN_ID) AS Abstracts,
Count(b.APPLN_ID) / Count(a.APPLN_ID) AS ratio
FROM
patstat.tls201_appln a
LEFT JOIN patstat.tls203_appln_abstr b
ON a.APPLN_ID = b.APPLN_ID
WHERE
(a.APPLN_AUTH = 'EP' AND a.appln_kind = 'A') or
a.appln_kind = 'W‘ group by a.APPLN_KIND
About 40% of abstracts for EPO
Should be extracted from PCT
equivalent
Missing citations
 Euro -PCT applications:
 Citations of the WO publications are not repeated in
the later EP publication. Instead a NPL citation with
the text “See also references of WO xxxxxxx ” is
included.
 There are more citations in an Euro-PCT than is
obvious.
 In 2016 NPL citations that had the value “none” or
“see also references...” have been removed from the
data but related citations have not been replenished…
10
 Example: EP1103560 equivalent to WO0006594
 From citations table we would agree it has only 2
NPL (and one of them is "SEE ALSO
REFERENCES OF WO0006594 ”)
Missing PCT citations (II)
APPLN_ID
PUBLN_AUTH
+ NR PUBLN_ID
NPL_CITN
SEQ _NR NPL_PUBLN_ID NPL_BIBLIO
347305 EP1103560 511640 1 950236893
No further relevant documents
disclosed
347305 EP1103560 511640 3 950236894
See also references of WO
0006594A1
Missing PCT citations (III)
 As a matter of fact, seeking in espacenet the
corresponding WO we find:
http://worldwide.espacenet.com/publicationDetails/citedDocuments?CC=WO&NR=0006594A1&KC=A1&FT=D&
ND=4&date=20000210&DB=EPODOC&locale=en_EP
Data transmission gaps from national offices to
EPO (I)
 PATSTAT covers about 100 patent
authorities, but with inequal coverage and
pubblication lags.
 Good coverage and short lags for EU
countries; less good and regular for
national patent authorities outside EU
(except big players ie US JP…)
10
Data transmission gaps from national offices to
EPO (II)
 Data coverage for Docdb available at:
 https://www.epo.org/searching-for-patents/helpful-
resources/data/tables/weekly.html
 Nevertheless file is difficult to use
10
EDATE CC KC YEAR NB_DOC MIN_PN MAX_PN FIRST_DATE LAST_DATE LAST_ADDED LAST_EXCH
02/09/2017AM A2 2001 1 949 949 10/06/2001 10/06/2001 08/10/2015 15/10/2015
02/09/2017AM A2 2004 1 1402 1402 17/03/2004 17/03/2004 15/02/2011 24/02/2011
02/09/2017AM A2 2006 1 1813 1813 15/09/2006 15/09/2006 05/01/2017 13/04/2017
02/09/2017AM U 2009 1 170 170 26/10/2009 26/10/2009 01/08/2012 09/08/2012
02/09/2017AM U 2010 1 194 194 26/04/2010 26/04/2010 17/08/2017 24/08/2017
gaps
1011
912
0
182
Add a column GAPS for
same office, type of
publication
Data transmission gaps from national offices to
EPO (III)
10
A B U A B U
AT 496 2596 IS 912 1653
AU 2256 IT 4344
BA 803 JO 6193
BG 758 429 JP 376 0 451
BR 3183 0 3241 KR 476 874
BY 2678 KZ 2826 6488
CA 0 2394 LT 640 764
CH 66 0 LV 3087 867
CL 1533 MC 3085
CN 189 6370 6370 MD 860 3472
CR 1668 3062 MX 1114 1343
CY 614 MY 2575 0
DD 217 NL 3135 353
DE 0 196 NZ 1013
DK 1032 962 OA 2224
DO 3348 2057 PH 543
EC 1049 2513 PT 613 842 2083
EE 1360 RO 0 1009
EG 1291 RS 322 500 500
ES 336 RU 424
FI 63 402 SE 3100 4025
FR 0 SG 1335
GB 267 267 SI 1314 1287
GC 3084 SV 2487
GE 1554 1672 2344 TH 5597
GR 1136 586 6225 TJ 1470 2101 2030
GT 590 5556 TR 48 0 1225
HK 5061 TW 224 975 1642
HN 421 5510 UA 1449 1816
HU 1130 4996 839 US 70 336
ID 177 1127 UY 251 566
IL 1146 UZ 1641 1673
IN 1035 2153 YU 1229 1465 715
ZA 471
We see some countries for
some type of patents;
Orange / red : very
problematic cases; anyway
one application alone could
interrupt a gaps giving
misguiding results…
Data transmission gaps from national offices to
EPO (IV)
02/09/2017AU A A 2005 3 1475702 3432402 17/03/2005 25/08/2005 17/09/2005 02/03/2017 420
02/09/2017AU A A 2010 1 6326480 6326480 29/04/2010 29/04/2010 12/05/2010 20/05/2010 1708
10
02/09/2017IN B 2010 10 237550 264673 01/01/2010 17/12/2010 01/04/2016 17/08/2017 77
02/09/2017IN B 2011 6 239400 247731 07/01/2011 13/05/2011 16/02/2016 08/12/2016 21
02/09/2017IN B 2012 1 253973 253973 14/09/2012 14/09/2012 31/03/2016 07/04/2016 490
Australia: we have a problem
India: we have a problem bigger than expected
Authorities should be examined case by case, also using some count
by year, benchmarked with previous
 Two possible errors:
 different transmission timeframe (decay of patent count in BR starts before GB);
 Partial data transmission: counts are different than official data from patent office
Data transmission gaps from national
offices to EPO (V)
BR GB IN
1990 10851 30055 2209
1991 10122 29991 2002
1992 9103 30089 1958
1993 10272 29901 2032
1994 10992 29560 2529
1995 13557 29909 2554
1996 15580 30448 1679
1997 18589 31219 1383
1998 19032 32828 1026
1999 21019 35222 750
2000 20725 36996 690
2001 20626 36884 705
2002 19265 36318 757
2003 20909 35452 1049
2004 22816 33794 1113
2005 23973 31066 1691
2006 23472 30495 1973
2007 16078 30848 2215
2008 10088 28816 2541
2009 8843 27103 2507
2010 5028 25363 2988
2011 539 24010 872
2012 7 7955 28
Citations double counts (I)
 Citations in Patstat are stored as publication to
publication, by origin.
 Simple citation counts on TLS212 can lead on
misguiding results.
 Appln_id to appln_id citations help to clarify
10
Select sum(Count_CITED_PAT_PUBLN_ID) n_pub_cited, sum(count_distinct_appln_cited) n_distinct_appln_cited
(SELECT
t11.APPLN_ID, Count(t12.CITED_PAT_PUBLN_ID) AS Count_CITED_PAT_PUBLN_ID,
Count(DISTINCT t11b.APPLN_ID) AS count_distinct_appln_cited
FROM
patstat.tls212_citation t12
INNER JOIN patstat.tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = t12.PAT_PUBLN_ID
INNER JOIN patstat.tls211_pat_publn t11b ON t12.CITED_PAT_PUBLN_ID = t11b.PAT_PUBLN_ID
WHERE t12.CITED_PAT_PUBLN_ID > 0
GROUP BY t11.APPLN_ID
APPLN_ID PAT_PUBLN_ID CITED_PAT_PUBLN_ID CITED_APPLN_ID
1 293253293 306927614 16980819
1 293253293 301830017 17000979
1 293253293 298485954 13388690
1 387522680 306927614 16980819
1 387522680 301830017 17000979
1 387522680 298485954 13388690
Citations double counts (II)
Case 1: appln_id 1 has 2
publications showing exactly
same citations
3 291964096 300128315 49123163
3 291964096 295303503 13538355
3 387535649 300128315 49123163
3 387535649 296195755 53888801
3 387535649 306928379 52488529
Case 2: appln_id 1 has 2
publications showing 1 common
and 2 different citations
Citations double counts (III)
Case 3: same citation shows
with different origin
Case 4: same citation 4 times
same publication citing, same
origin [data error, could be
sistematic with multiple priority
from some offices]
APPLN_ID PAT_PUBLN_ID CITED_PAT_PUBLN_ID CITED_APPLN_ID
23 289129312 305684503 16736817
23 289129312 293787435 15702748
23 289129312 308462347 48996652APP
23 289129312 293787435 15702748
23 289129312 305684503 16736817
23 289129312 308462347 48996652SEA
23 289129312 327902045 50318244
23 289129312 296433878 20546518
23 289129312 297350607 24023115
23 289129312 296449840 47357637
APPLN_ID PAT_PUBLN_ID CITED_PAT_PUBLN_ID CITED_APPLN_ID
705 306929092 309035661 22852464SEA
705 306929092 309035659 22771241SEA
705 306929092 307833757 50586872SEA
705 306929092 385933558 9632978SEA
705 316022028 337326981 16587695APP
705 316022028 310119518 16447723APP
705 316022028 310119519 48241555APP
705 316022028 314809416 50308201APP
705 316022028 314809413 9718355APP
705 316022028 314809416 50308201APP
705 316022028 314809416 50308201APP
705 316022028 314809413 9718355APP
705 316022028 314809416 50308201APP
Citations double counts (IV)
publn_auth n pub cited
n app
cited ratio
GB 6174382 4038130 0,65
US 250838242 1,86E+08 0,74
DE 9428710 7927305 0,84
AT 331095 287720 0,86
JP 29539409 26474356 0,89
10
n pub cited n app cited ratio
225.858.262 189.409.321 0,83862
How does it perform allover
patstat?
Focused in (offices
with ratio < 0.9):
Citations doublecount (V)
 47K cases of self citations…
10
APPLN_ID PAT_PUBLN_ID CITED_APPLN_ID CITED_PAT_PUBLN_ID
53803383 55765553 53803383 278711582
US7478445 US2008052829
But is the same patent…
 Solution: count distinct citations by
appln_id citing and cited;
 Move to a separate table citation origin
data;
 Use also number of citing docdb families
(provided in TLS201).
Citations double counts (VI)
Counting correctly number of claims
 Number of claims is often used an indicator of value
 US data: relates to granted patents only (A documents
until 2000, B1 or B2 documents afterwards) which
were published on or after 1975-01-01
 EP data: relates to both published applications (kind
code "A") from 1978 and granted patents (kind code
"B") from 1980.
 The number of claims will be "0" for all EP A
documents originating from a PCT published in
English, French or German (so called "Euro-PCTs").
10
Counting correctly number of claims (II)
 Claims number changes overtime: select
the publication phase more relevant to your
research question; also language may
change number of claims (but PATSTAT
keeps the higher number)
10
PAT_PUBLN_ID PUBLN_AUTH PUBLN_NR PUBLN_KIND PUBLN_DATE PUBLN_CLAIMS Colonna1
311822768 'EP' '1878578' 'A1' '2008-01-16' '11'
311822783 'EP' '1878578' 'B1' '2009-09-30' '22'
100%
more…
311822763 'EP' '0000034' 'A1' '1978-12-20' '14'
311822766 'EP' '0000034' 'B1' '1984-05-23' '7' 50% less
Counting correctly number of claims (IV)
 Average change in
number of claims
10
SELECT PUBLN_AUTH,
sum(Min_PUBLN_CLAIMS) as min_claims,
sum(Max_PUBLN_CLAIMS) as
max_claims
from
(SELECT
b.PUBLN_AUTH,
b.appln_id,
Max(cast(b.PUBLN_CLAIMS as
unsigned)) AS Max_PUBLN_CLAIMS,
Min(cast(b.PUBLN_CLAIMS as
unsigned)) AS Min_PUBLN_CLAIMS
FROM
patstat.tls211_pat_publn b
WHERE
(b.PUBLN_AUTH = 'EP' OR
b.PUBLN_AUTH = 'US') AND
cast(b.PUBLN_CLAIMS as unsigned) > 0
GROUP BY
b.PUBLN_AUTH, b.appln_id) b
GROUP BY PUBLN_AUTH
PUBLN_AUTH min_claims max_claims ratio
EP 29.005.016 31.407.694 0,9235
US 87.474.259 87.474.617 0,999996
Average number of
claims can be a good
proxy
Conclusions
 PATSTAT is a great source of data but
cannot be taken ‘as is’.
 Data collection is examiner centered, thus
all ‘accessories’ data need a validation.
 Seeking data gaps ex-ante can save a lot
of work ex-post
10
Conclusions (II)
10
 A saint has a past; a sinner has a future
(Lord Illingworth).
 Both sentences mean a lot of work to do
when using patent data! (myself).
 A saint is a sinner who never gave up
(Yogananda)

Weitere ähnliche Inhalte

Ähnlich wie PATSTAT users 7 sins

Signal alert system using rf transmitter&reciever
Signal alert system using rf transmitter&recieverSignal alert system using rf transmitter&reciever
Signal alert system using rf transmitter&reciever
Prince Joseph
 
TLP3120 PSpice Model (Free SPICE Model)
TLP3120 PSpice Model  (Free SPICE Model)TLP3120 PSpice Model  (Free SPICE Model)
TLP3120 PSpice Model (Free SPICE Model)
Tsuyoshi Horigome
 
DISA Energy Presentation Draft
DISA Energy Presentation DraftDISA Energy Presentation Draft
DISA Energy Presentation Draft
Victor Morocho
 

Ähnlich wie PATSTAT users 7 sins (20)

ipad 4 full Schematic Diagram
ipad 4 full Schematic Diagramipad 4 full Schematic Diagram
ipad 4 full Schematic Diagram
 
I pad 4 full Schematic Diagram
I pad 4 full Schematic DiagramI pad 4 full Schematic Diagram
I pad 4 full Schematic Diagram
 
iPad 4 schematic
iPad 4 schematiciPad 4 schematic
iPad 4 schematic
 
I phone 5 full Schematic Diagram 820 3141-b
I phone 5 full Schematic Diagram 820 3141-bI phone 5 full Schematic Diagram 820 3141-b
I phone 5 full Schematic Diagram 820 3141-b
 
Signal alert system using rf transmitter&reciever
Signal alert system using rf transmitter&recieverSignal alert system using rf transmitter&reciever
Signal alert system using rf transmitter&reciever
 
Bart Van Looy a Quantitative approach to IP Management Research
Bart Van Looy a Quantitative approach to IP Management ResearchBart Van Looy a Quantitative approach to IP Management Research
Bart Van Looy a Quantitative approach to IP Management Research
 
TLP3120 PSpice Model (Free SPICE Model)
TLP3120 PSpice Model  (Free SPICE Model)TLP3120 PSpice Model  (Free SPICE Model)
TLP3120 PSpice Model (Free SPICE Model)
 
Econometrics Project
Econometrics ProjectEconometrics Project
Econometrics Project
 
Mandatory crime reduction program presentation
Mandatory crime reduction program presentationMandatory crime reduction program presentation
Mandatory crime reduction program presentation
 
1999 infiniti qx4 service repair manual
1999 infiniti qx4 service repair manual1999 infiniti qx4 service repair manual
1999 infiniti qx4 service repair manual
 
GMM result-1.docx
GMM result-1.docxGMM result-1.docx
GMM result-1.docx
 
anexos.pdf
anexos.pdfanexos.pdf
anexos.pdf
 
State of IPv6 in 2018: IETF 101
State of IPv6 in 2018: IETF 101State of IPv6 in 2018: IETF 101
State of IPv6 in 2018: IETF 101
 
Cytoscape Tutorial Session 2 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 2 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)Cytoscape Tutorial Session 2 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 2 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
 
DISA Energy Presentation Draft
DISA Energy Presentation DraftDISA Energy Presentation Draft
DISA Energy Presentation Draft
 
Diesel Production: Cost Estimation
Diesel Production: Cost EstimationDiesel Production: Cost Estimation
Diesel Production: Cost Estimation
 
Edu ciaa-nxp pinout-a4_v4r3_es
Edu ciaa-nxp pinout-a4_v4r3_esEdu ciaa-nxp pinout-a4_v4r3_es
Edu ciaa-nxp pinout-a4_v4r3_es
 
Pilot Customer PPT_Updated.pptx
Pilot Customer PPT_Updated.pptxPilot Customer PPT_Updated.pptx
Pilot Customer PPT_Updated.pptx
 
8238_lryd.pdf
8238_lryd.pdf8238_lryd.pdf
8238_lryd.pdf
 
Diagnostics Chart with Trouble Code - Subaru Legacy
Diagnostics Chart with Trouble Code - Subaru LegacyDiagnostics Chart with Trouble Code - Subaru Legacy
Diagnostics Chart with Trouble Code - Subaru Legacy
 

Mehr von Gianluca Tarasconi

Patent databases for business intelligence
Patent databases for business intelligencePatent databases for business intelligence
Patent databases for business intelligence
Gianluca Tarasconi
 

Mehr von Gianluca Tarasconi (15)

Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
PATSTAT & Patentsview: complements or substitutes?
PATSTAT & Patentsview: complements or substitutes?PATSTAT & Patentsview: complements or substitutes?
PATSTAT & Patentsview: complements or substitutes?
 
Patents applicants: how to create the full time series
Patents applicants: how to create the full time seriesPatents applicants: how to create the full time series
Patents applicants: how to create the full time series
 
Patstat indicators step by step
Patstat indicators step by stepPatstat indicators step by step
Patstat indicators step by step
 
Matching PATSTAT to Crunchbase
Matching PATSTAT to CrunchbaseMatching PATSTAT to Crunchbase
Matching PATSTAT to Crunchbase
 
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
 
Ep register for patent data analisys
Ep register for patent data analisysEp register for patent data analisys
Ep register for patent data analisys
 
Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures
 
Industria italiana dal 78
Industria italiana dal 78Industria italiana dal 78
Industria italiana dal 78
 
Patenting in the south
Patenting in the southPatenting in the south
Patenting in the south
 
PRS inpadoc legal data reclassification: db structure and some insights
 PRS inpadoc legal data reclassification: db structure and some insights PRS inpadoc legal data reclassification: db structure and some insights
PRS inpadoc legal data reclassification: db structure and some insights
 
Trackin patent applicant changes with a temporal database
Trackin patent applicant changes with a temporal databaseTrackin patent applicant changes with a temporal database
Trackin patent applicant changes with a temporal database
 
Sharing names and address cleaning patterns for Patstat
Sharing names and address cleaning patterns for PatstatSharing names and address cleaning patterns for Patstat
Sharing names and address cleaning patterns for Patstat
 
Patstat and patstat related resources for patent data analisys
Patstat and patstat related resources for patent data analisysPatstat and patstat related resources for patent data analisys
Patstat and patstat related resources for patent data analisys
 
Patent databases for business intelligence
Patent databases for business intelligencePatent databases for business intelligence
Patent databases for business intelligence
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

PATSTAT users 7 sins

  • 1. PATSTAT users 7 deadly sins Gianluca Tarasconi, ICRIOS DBA rawpatentdata.blogspot.com Leuven 20/9/2017
  • 2. In short  This presentation aims to show out 7 common errors user may incur in when they use PATSTAT;  This is in ideal the continuation of ‘PATSTAT 7 deadly sins’ from 2013  Nevertheless there is only one sin user have to avoid when using patent data:  … SLOTH …. 10
  • 3. Inventors / applicants are not always listed (I) A part of applications miss Inventors and/or applicants data SELECT Sum(If(b.APPLN_ID IS NULL, 1, 0)) AS noperson, Count(c.APPLN_ID) AS n_APPLN_ID FROM patstat.tls207_pers_appln b RIGHT JOIN patstat.tls201_appln c ON b.APPLN_ID = c.APPLN_ID WHERE Appln_kind <>”D2” 10 n appln_id no person % 221.595.818 18.202.821 9% Autumn 2016 data
  • 4. Inventors / applicants are not always listed (II) 10  Limit to A,W applications, offices with > 10.000 applications appln_auth appln_kind noperson n_APPLN_ID perc LU A 57057 88522 64% BE A 453348 784265 58% NL A 382777 681266 56% SE A 552912 1345982 41% AT A 154041 751803 20% CH A 341522 1839496 19% FR A 793741 4501015 18% DD A 99003 651159 15% EA A 17272 118772 15% GT A 2423 17932 14% CA A 928350 7300631 13% CS A 46955 381949 12% GB A 579052 4924739 12% DK A 68952 589986 12%
  • 5. Person_id is not an entity id (I)  Person_id in patstat do not identifies an entity but a distinct name – address- country  Same entity  more person_ids  Same person_id  more entity 10
  • 6. Person_id is not an entity id: top inventors SELECT a.PERSON_NAME, a.PERSON_ADDRESS, a.PERSON_CTRY_CODE, Count(c.APPLN_ID) AS Count_APPLN_ID, Min(c.EARLIEST_FILING_YEAR) AS Min_EARLIEST_FILING_YEAR, Max(c.EARLIEST_FILING_YEAR) AS Max_EARLIEST_FILING_YEAR FROM patstat.tls207_pers_appln b INNER JOIN patstat.tls206_person a ON a.PERSON_ID = b.person_id INNER JOIN patstat.tls201_appln c ON b.APPLN_ID = c.APPLN_ID WHERE b.invt_seq_nr > 0 and c.EARLIEST_FILING_YEAR < 9999 GROUP BY a.PERSON_NAME, a.PERSON_ADDRESS, a.PERSON_CTRY_CODE ORDER BY Count_APPLN_ID DESC 10
  • 7. Person_id is not an entity id : top inventors (II) person_name ctry_code person_id n_app minyear maxyear THE INVENTOR HAS WAIVED THE RIGHT TO BE MENTIONED 19584860 38067 2002 2015 KVASENKOV OLEG IVANOVICH RU 34298480 29682 2003 2015 WANG WEI 15786453 23156 1985 2015 ZHANG WEI 14837632 21771 1985 2015 NAME NOT GIVEN 13592151 17722 1964 2002 LI WEI 13615436 17298 1985 2015 VERZICHT DES ERFINDERS AUF NENNUNG 21108740 17260 1964 1993 WANG JUN 18500497 15755 1985 2015 LIU WEI 18697297 15319 1985 2015 LI JUN 18510590 14854 1985 2015 WANG LEI 18754169 14710 1986 2015 ZHANG LEI 18557049 14244 1987 2015 ZHANG JUN 18719351 12815 1985 2015 WANG JIAN 13113349 11936 1986 2015 WANG YONG 12656416 11844 1985 2016 ZHANG JIAN 14914085 11837 1985 2015 CHEN WEI 14837625 11706 1985 2015 WANG HUI 18663499 11452 1987 2015 LIU YANG 13930482 11126 1985 2015 LIU JUN 18710534 10927 1985 2015 LI LI 13632985 9958 1985 2015 AKTIENGESELLSCHAFT I. G. FARBENINDUSTRIE DE 17443080 9958 1897 1942 WANG TAO 18331978 9856 1985 2015 ZHANG YONG 18712075 9795 1985 2015 ZHANG LI 18704857 9716 1985 2015 10
  • 8. Person_id is not an entity id: network analysis SELECT a.person_id, Count(DISTINCT b.person_id) AS n_coinv, t6.PERSON_NAME, t6.PERSON_ADDRESS, t6.PERSON_CTRY_CODE FROM patstat.tls207_pers_appln a INNER JOIN patstat.tls207_pers_appln b ON a.APPLN_ID = b.APPLN_ID INNER JOIN patstat.tls206_person t6 ON t6.PERSON_ID = a.person_id WHERE a.invt_seq_nr > 0 AND b.invt_seq_nr > 0 GROUP BY a.person_id, t6.PERSON_NAME, t6.PERSON_ADDRESS, t6.PERSON_CTRY_CODE ORDER BY person_id1 DESC 10
  • 9. Person_id is not an entity id: network analysis person_id n coinv name address 15786453 32384 WANG WEI 14837632 27602 ZHANG WEI 13615436 25550 LI WEI 18697297 21915 LIU WEI 18754169 21237 WANG LEI 18557049 20629 ZHANG LEI 18500497 20562 WANG JUN 18510590 19789 LI JUN 13113349 17270 WANG JIAN 13930482 16618 LIU YANG 18719351 16576 ZHANG JUN 14914085 16464 ZHANG JIAN 12656416 16208 WANG YONG 18663499 15686 WANG HUI 18704857 15224 ZHANG LI 14837625 15027 CHEN WEI 13632985 14882 LI LI 18331978 14780 WANG TAO 12656569 14656 LI YAN 18712075 14616 ZHANG YONG 10 Whang and Zhang Wei have in common 120 Sipo patents; top 3 have 3 degrees of distance networks of about 900K inventors person_id name 3 DoD 15786453 WANG WEI 943.562 14837632 ZHANG WEI 925.099 13615436 LI WEI 916.268
  • 10. Person_id is not an entity id: possible solution  At analisys level the couple person_id – appln_id identifies for sure one entity  Starting at this level of disaggregation entities should be disambiguated further with other means (FI appln 1 & 2 from same applicant) 10
  • 11. CPC codes coverage is incomplete (I)  The Cooperative Patent Classification (CPC) was initiated as a joint partnership between the USPTO and the EPO;  It has a more complete set of technologies (fi green energy, nanotech);  It started in 2011, it does not apply to all type of patents (ie Utility models) and it has backward data to be rebuilt. 10
  • 12. CPC codes coverage is incomplete (II)  Coverage of CPC allover patstat is far from good and much smaller than IPC coverage 10 appln kind n app n with cpc cpc rate ipc rate 'A' 66.750.533 39.505.860 0.5918 0.8413 'U' 13.503.902 1.140.172 0.0844 0.9115 'W' 3.012.030 2.990.252 0.9928 0.9900
  • 13. CPC coverage (type A) 10 APPLN KIND APPLN AUTH Count APPLN_ID count_app with_cpc ratio A AR 143884 103372 72% A AT 587486 174977 30% A AU 1374657 1114774 81% A BE 646320 551552 85% A BR 547104 374724 68% A CA 3209303 1269659 40% A CH 1048915 571085 54% A CN 6343484 2155452 34% A DE 4617268 3861583 84% A DK 319177 119062 37% A EP 3227647 3113078 96% A ES 423071 202677 48% A FI 251054 112028 45% A FR 3098874 2387891 77% A GB 3384892 2116655 63% A GR 69272 24607 36% A HK 133738 119890 90% A HU 131491 73025 56% A IE 91782 43044 47% A IL 216193 122462 57% A IN 106610 46024 43% A IT 605707 326251 54% A JP 13944907 4355789 31% A KR 2831385 1425304 50% A LU 68712 59814 87% A MX 262534 236276 90% A MY 50974 40612 80% A NL 595393 528493 89% A NO 222376 171392 77% A NZ 141064 110223 78% A PL 246209 79640 32% A RU 658280 199365 30% A SE 858651 330375 38% A SG 102679 90508 88% A SU 1363419 100573 7% A TW 737206 497644 68% A UA 55255 18206 33% A US 12700957 11612249 91% A ZA 293611 191492 65% after Y2K 80% 71% 80% 94% 79% 25% 68% 32% 93% 6% 95% 88% 22% 98% 43% 80% 86% 51% 32% 77% 43% 58% 32% 54% 93% 94% 74% 90% 83% 81% 32% 32% 17% 90% 53% 66% 31% 98% 67% SELECT a.APPLN_KIND, a.APPLN_AUTH, Count(distinct a.APPLN_ID) AS Count_APPLN_ID, count(distinct b.appln_id) count_app_with_cpc, count(distinct b.appln_id)/Count(distinct a.APPLN_ID) as ratio FROM patstat.tls201_appln a LEFT JOIN patstat.tls224_appln_cpc b ON a.APPLN_ID = b.appln_id WHERE a.APPLN_KIND in ('A','W', 'U') GROUP BY a.APPLN_KIND, a.APPLN_AUTH Situation is not homegenueus After Y2K things improve a bit
  • 14. CPC coverage type U , W APPLN KIND APPLN AUTH Count APPLN_ID count_app with_cpc ratio after Y2K U BR 103233 5179 5% 10% U CN 5894022 251879 4% 4% U DE 1406011 618249 44% 43% U ES 327087 32007 10% 15% U IT 139608 12912 9% 14% U JP 4289887 113890 3% 7% U KR 506761 44226 9% 16% U RU 166613 5567 3% 4% U TW 407155 30996 8% 7% U UA 103880 2037 2% 2% W CN 160005 155010 97% 97% W DE 65673 65433 100% 100% W EP 462944 461118 100% 100% W FR 82356 81494 99% 98% W GB 114614 114257 100% 100% W IB 134635 133070 99% 99% W JP 503441 497961 99% 99% W KR 119158 118141 99% 99% W SE 53444 53264 100% 100% W US 1002525 1000291 100% 100% 10 Count for offices with > 50K patents Pct data coverage is almost full Utility models not really possible to use.
  • 15. Missing data for PCT equivalent  EP data where originated from regional phase of a PCT patent can be partial  At least Abstract and Citations could be missign and have to be extracted from PCT equivalent (column INTERNAT_APPLN_ID in tls201) 10 APPLN_ID APPLN_AUTH APPLN_NR APPLN_KIND IPR_TYPE INTERNAT_APPLN_ID int_phase reg_phase nat_phase GRANTED 347305EP 99931561 A PI 30241523Y Y N 1
  • 16. Missing abstracts APPLN_KIND Count_APPLN_ID Abstracts ratio A (ep) 3227647 1849737 57% W (pct) 3012030 2992978 99% 10 select a.APPLN_KIND, Count(a.APPLN_ID) AS Count_APPLN_ID, Count(b.APPLN_ID) AS Abstracts, Count(b.APPLN_ID) / Count(a.APPLN_ID) AS ratio FROM patstat.tls201_appln a LEFT JOIN patstat.tls203_appln_abstr b ON a.APPLN_ID = b.APPLN_ID WHERE (a.APPLN_AUTH = 'EP' AND a.appln_kind = 'A') or a.appln_kind = 'W‘ group by a.APPLN_KIND About 40% of abstracts for EPO Should be extracted from PCT equivalent
  • 17. Missing citations  Euro -PCT applications:  Citations of the WO publications are not repeated in the later EP publication. Instead a NPL citation with the text “See also references of WO xxxxxxx ” is included.  There are more citations in an Euro-PCT than is obvious.  In 2016 NPL citations that had the value “none” or “see also references...” have been removed from the data but related citations have not been replenished… 10
  • 18.  Example: EP1103560 equivalent to WO0006594  From citations table we would agree it has only 2 NPL (and one of them is "SEE ALSO REFERENCES OF WO0006594 ”) Missing PCT citations (II) APPLN_ID PUBLN_AUTH + NR PUBLN_ID NPL_CITN SEQ _NR NPL_PUBLN_ID NPL_BIBLIO 347305 EP1103560 511640 1 950236893 No further relevant documents disclosed 347305 EP1103560 511640 3 950236894 See also references of WO 0006594A1
  • 19. Missing PCT citations (III)  As a matter of fact, seeking in espacenet the corresponding WO we find: http://worldwide.espacenet.com/publicationDetails/citedDocuments?CC=WO&NR=0006594A1&KC=A1&FT=D& ND=4&date=20000210&DB=EPODOC&locale=en_EP
  • 20. Data transmission gaps from national offices to EPO (I)  PATSTAT covers about 100 patent authorities, but with inequal coverage and pubblication lags.  Good coverage and short lags for EU countries; less good and regular for national patent authorities outside EU (except big players ie US JP…) 10
  • 21. Data transmission gaps from national offices to EPO (II)  Data coverage for Docdb available at:  https://www.epo.org/searching-for-patents/helpful- resources/data/tables/weekly.html  Nevertheless file is difficult to use 10 EDATE CC KC YEAR NB_DOC MIN_PN MAX_PN FIRST_DATE LAST_DATE LAST_ADDED LAST_EXCH 02/09/2017AM A2 2001 1 949 949 10/06/2001 10/06/2001 08/10/2015 15/10/2015 02/09/2017AM A2 2004 1 1402 1402 17/03/2004 17/03/2004 15/02/2011 24/02/2011 02/09/2017AM A2 2006 1 1813 1813 15/09/2006 15/09/2006 05/01/2017 13/04/2017 02/09/2017AM U 2009 1 170 170 26/10/2009 26/10/2009 01/08/2012 09/08/2012 02/09/2017AM U 2010 1 194 194 26/04/2010 26/04/2010 17/08/2017 24/08/2017 gaps 1011 912 0 182 Add a column GAPS for same office, type of publication
  • 22. Data transmission gaps from national offices to EPO (III) 10 A B U A B U AT 496 2596 IS 912 1653 AU 2256 IT 4344 BA 803 JO 6193 BG 758 429 JP 376 0 451 BR 3183 0 3241 KR 476 874 BY 2678 KZ 2826 6488 CA 0 2394 LT 640 764 CH 66 0 LV 3087 867 CL 1533 MC 3085 CN 189 6370 6370 MD 860 3472 CR 1668 3062 MX 1114 1343 CY 614 MY 2575 0 DD 217 NL 3135 353 DE 0 196 NZ 1013 DK 1032 962 OA 2224 DO 3348 2057 PH 543 EC 1049 2513 PT 613 842 2083 EE 1360 RO 0 1009 EG 1291 RS 322 500 500 ES 336 RU 424 FI 63 402 SE 3100 4025 FR 0 SG 1335 GB 267 267 SI 1314 1287 GC 3084 SV 2487 GE 1554 1672 2344 TH 5597 GR 1136 586 6225 TJ 1470 2101 2030 GT 590 5556 TR 48 0 1225 HK 5061 TW 224 975 1642 HN 421 5510 UA 1449 1816 HU 1130 4996 839 US 70 336 ID 177 1127 UY 251 566 IL 1146 UZ 1641 1673 IN 1035 2153 YU 1229 1465 715 ZA 471 We see some countries for some type of patents; Orange / red : very problematic cases; anyway one application alone could interrupt a gaps giving misguiding results…
  • 23. Data transmission gaps from national offices to EPO (IV) 02/09/2017AU A A 2005 3 1475702 3432402 17/03/2005 25/08/2005 17/09/2005 02/03/2017 420 02/09/2017AU A A 2010 1 6326480 6326480 29/04/2010 29/04/2010 12/05/2010 20/05/2010 1708 10 02/09/2017IN B 2010 10 237550 264673 01/01/2010 17/12/2010 01/04/2016 17/08/2017 77 02/09/2017IN B 2011 6 239400 247731 07/01/2011 13/05/2011 16/02/2016 08/12/2016 21 02/09/2017IN B 2012 1 253973 253973 14/09/2012 14/09/2012 31/03/2016 07/04/2016 490 Australia: we have a problem India: we have a problem bigger than expected Authorities should be examined case by case, also using some count by year, benchmarked with previous
  • 24.  Two possible errors:  different transmission timeframe (decay of patent count in BR starts before GB);  Partial data transmission: counts are different than official data from patent office Data transmission gaps from national offices to EPO (V) BR GB IN 1990 10851 30055 2209 1991 10122 29991 2002 1992 9103 30089 1958 1993 10272 29901 2032 1994 10992 29560 2529 1995 13557 29909 2554 1996 15580 30448 1679 1997 18589 31219 1383 1998 19032 32828 1026 1999 21019 35222 750 2000 20725 36996 690 2001 20626 36884 705 2002 19265 36318 757 2003 20909 35452 1049 2004 22816 33794 1113 2005 23973 31066 1691 2006 23472 30495 1973 2007 16078 30848 2215 2008 10088 28816 2541 2009 8843 27103 2507 2010 5028 25363 2988 2011 539 24010 872 2012 7 7955 28
  • 25. Citations double counts (I)  Citations in Patstat are stored as publication to publication, by origin.  Simple citation counts on TLS212 can lead on misguiding results.  Appln_id to appln_id citations help to clarify 10 Select sum(Count_CITED_PAT_PUBLN_ID) n_pub_cited, sum(count_distinct_appln_cited) n_distinct_appln_cited (SELECT t11.APPLN_ID, Count(t12.CITED_PAT_PUBLN_ID) AS Count_CITED_PAT_PUBLN_ID, Count(DISTINCT t11b.APPLN_ID) AS count_distinct_appln_cited FROM patstat.tls212_citation t12 INNER JOIN patstat.tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = t12.PAT_PUBLN_ID INNER JOIN patstat.tls211_pat_publn t11b ON t12.CITED_PAT_PUBLN_ID = t11b.PAT_PUBLN_ID WHERE t12.CITED_PAT_PUBLN_ID > 0 GROUP BY t11.APPLN_ID
  • 26. APPLN_ID PAT_PUBLN_ID CITED_PAT_PUBLN_ID CITED_APPLN_ID 1 293253293 306927614 16980819 1 293253293 301830017 17000979 1 293253293 298485954 13388690 1 387522680 306927614 16980819 1 387522680 301830017 17000979 1 387522680 298485954 13388690 Citations double counts (II) Case 1: appln_id 1 has 2 publications showing exactly same citations 3 291964096 300128315 49123163 3 291964096 295303503 13538355 3 387535649 300128315 49123163 3 387535649 296195755 53888801 3 387535649 306928379 52488529 Case 2: appln_id 1 has 2 publications showing 1 common and 2 different citations
  • 27. Citations double counts (III) Case 3: same citation shows with different origin Case 4: same citation 4 times same publication citing, same origin [data error, could be sistematic with multiple priority from some offices] APPLN_ID PAT_PUBLN_ID CITED_PAT_PUBLN_ID CITED_APPLN_ID 23 289129312 305684503 16736817 23 289129312 293787435 15702748 23 289129312 308462347 48996652APP 23 289129312 293787435 15702748 23 289129312 305684503 16736817 23 289129312 308462347 48996652SEA 23 289129312 327902045 50318244 23 289129312 296433878 20546518 23 289129312 297350607 24023115 23 289129312 296449840 47357637 APPLN_ID PAT_PUBLN_ID CITED_PAT_PUBLN_ID CITED_APPLN_ID 705 306929092 309035661 22852464SEA 705 306929092 309035659 22771241SEA 705 306929092 307833757 50586872SEA 705 306929092 385933558 9632978SEA 705 316022028 337326981 16587695APP 705 316022028 310119518 16447723APP 705 316022028 310119519 48241555APP 705 316022028 314809416 50308201APP 705 316022028 314809413 9718355APP 705 316022028 314809416 50308201APP 705 316022028 314809416 50308201APP 705 316022028 314809413 9718355APP 705 316022028 314809416 50308201APP
  • 28. Citations double counts (IV) publn_auth n pub cited n app cited ratio GB 6174382 4038130 0,65 US 250838242 1,86E+08 0,74 DE 9428710 7927305 0,84 AT 331095 287720 0,86 JP 29539409 26474356 0,89 10 n pub cited n app cited ratio 225.858.262 189.409.321 0,83862 How does it perform allover patstat? Focused in (offices with ratio < 0.9):
  • 29. Citations doublecount (V)  47K cases of self citations… 10 APPLN_ID PAT_PUBLN_ID CITED_APPLN_ID CITED_PAT_PUBLN_ID 53803383 55765553 53803383 278711582 US7478445 US2008052829 But is the same patent…
  • 30.  Solution: count distinct citations by appln_id citing and cited;  Move to a separate table citation origin data;  Use also number of citing docdb families (provided in TLS201). Citations double counts (VI)
  • 31. Counting correctly number of claims  Number of claims is often used an indicator of value  US data: relates to granted patents only (A documents until 2000, B1 or B2 documents afterwards) which were published on or after 1975-01-01  EP data: relates to both published applications (kind code "A") from 1978 and granted patents (kind code "B") from 1980.  The number of claims will be "0" for all EP A documents originating from a PCT published in English, French or German (so called "Euro-PCTs"). 10
  • 32. Counting correctly number of claims (II)  Claims number changes overtime: select the publication phase more relevant to your research question; also language may change number of claims (but PATSTAT keeps the higher number) 10 PAT_PUBLN_ID PUBLN_AUTH PUBLN_NR PUBLN_KIND PUBLN_DATE PUBLN_CLAIMS Colonna1 311822768 'EP' '1878578' 'A1' '2008-01-16' '11' 311822783 'EP' '1878578' 'B1' '2009-09-30' '22' 100% more… 311822763 'EP' '0000034' 'A1' '1978-12-20' '14' 311822766 'EP' '0000034' 'B1' '1984-05-23' '7' 50% less
  • 33. Counting correctly number of claims (IV)  Average change in number of claims 10 SELECT PUBLN_AUTH, sum(Min_PUBLN_CLAIMS) as min_claims, sum(Max_PUBLN_CLAIMS) as max_claims from (SELECT b.PUBLN_AUTH, b.appln_id, Max(cast(b.PUBLN_CLAIMS as unsigned)) AS Max_PUBLN_CLAIMS, Min(cast(b.PUBLN_CLAIMS as unsigned)) AS Min_PUBLN_CLAIMS FROM patstat.tls211_pat_publn b WHERE (b.PUBLN_AUTH = 'EP' OR b.PUBLN_AUTH = 'US') AND cast(b.PUBLN_CLAIMS as unsigned) > 0 GROUP BY b.PUBLN_AUTH, b.appln_id) b GROUP BY PUBLN_AUTH PUBLN_AUTH min_claims max_claims ratio EP 29.005.016 31.407.694 0,9235 US 87.474.259 87.474.617 0,999996 Average number of claims can be a good proxy
  • 34. Conclusions  PATSTAT is a great source of data but cannot be taken ‘as is’.  Data collection is examiner centered, thus all ‘accessories’ data need a validation.  Seeking data gaps ex-ante can save a lot of work ex-post 10
  • 35. Conclusions (II) 10  A saint has a past; a sinner has a future (Lord Illingworth).  Both sentences mean a lot of work to do when using patent data! (myself).  A saint is a sinner who never gave up (Yogananda)