SlideShare ist ein Scribd-Unternehmen logo
1 von 29
PATSTAT: from data to
indicators
Leuven, 14/9/18
Gianluca Tarasconi
Icrios Dba - Univ. Bocconi
Introduction: PATSTAT and SQL
• Several papers exist providing SQL code for
didactic purposes
• [Squicciarini, M., Dernis, H., & Criscuolo, C. (2013).
Measuring patent quality. / De Rassenfosse, G., Dernis,
H., & Boedt, G. (2014). An introduction to the Patstat
database with example queries. Australian Economic
Review, 47(3), 395-408 / De Rassenfosse et al.. (2017).
Getting Started with PATSTAT Register. Australian
Economic Review, 50(1), 110-120 … many others ]
• Still some steps are often missing when creating
subsets and indicators, especially on selection criteria
(what patents do I include???), counting methodology,
data boundaries (years, geography etc…)
In short
In short (II)
Hands on data
• Data are extracted from a PATSTAT spring 2018
mini-dump, selecting EP office applications for
year 2007 (filing date) and related patents;
• A PATSTAT version is available in Oracle for those
who want to reply this exercise step by step.
• SQL and dump load
• https://www.dropbox.com/s/fo1wpqwo1j3buom/sqlleuven
2018.zip?dl=0
• https://bit.ly/2xmb29r
STEP 1: selection and output
• The population we will analyze are: year
2005 EP filings (PCT excluded)
• The output requested is Originality by
country and IPC35
• Two questions should arise now…
S1: Selection and output –
Q&A
• Q1,2: what year and what country?
• A1: Year stands for priority year (vs filing year)
• A2: Country stands for inventor’s country (vs
applicant’s country or invention vs ownership
country)
• A third question could be useful: what kind of patents?
• Maybe a last question might be ‘what is IPC35
reclassification’
Schmoch, U. (2008). Concept of a technology classification for
country comparisons. Final Report to the World Intellectual
Property Organization (WIPO), Fraunhofer Institute for
Systems and Innovation Research, Karlsruhe.
S1: Selection and output – Q&A 2
• Field appln_kind in TLS201 may contain
values:
• A patent
• U utility model
• W PCT application (in the international phase)
• T used by some offices for applications which are
"translations" of granted PCT or EP applications
• P provisional application (US only)
• F design patent
• V plant patent
• D2, D3 artificial applications
A kind for EP is the value to select
S1: Selection code commented
USE minidump;
-- table applications
drop table if exists applications;
-- EP - 2005
Create table applications
Select
T01.APPLN_ID,
T01.APPLN_AUTH,
T01.APPLN_NR,
T01.APPLN_KIND,
EARLIEST_FILING_YEAR as PRIORITY_YEAR,
From
patstat.tls201_appln T01
where
t01.appln_auth = "EP" and
earliest_filing_year =2005 ;
ALTER TABLE applications ADD INDEX `I1`(`APPLN_ID`);
31.166 applications
S1: Selection code exclusions
drop table if exists applicationsdropped;
-- deletes unpublished
-- *******************
create table applicationsdropped
Select a.* , "NOPUBBL" as reason
From applications a Left Join
patstat.tls211_pat_publn b On a.APPLN_ID = b.APPLN_ID
where b.appln_id is null;
-- deletes 9999
-- *******************
insert into applicationsdropped
Select a.* , "9999" as reason
From applications a
where a.priority_year = 9999;
-- deletes non A kind
-- *******************
insert into applicationsdropped
Select a.* , "NON_A" as reason
From applications a
where a.appln_kind <>"A";
DELETE A.* from APPLICATIONS A inner join
Applicationsdropped B on A.appln_id=B.appln_id;
Unpublished priorities
Priority 9999
(useless in this case)
EP specific: Keep only kind A
(patent of invention)
Table to store
excluded
6.408 distinct
applications
excluded: 24758
remain
STEP 2: weighting dimensions
• Indicator has to be calculated over 2
dimensions: invention country and IPC35
• Each application may have multiple
ipc35 and multiple country of invention
• Several weighting schemes exist: full count,
fractional…
S2a: fractional weight
IPC 35
3 7 7 7
0,25 0,25 0,25 0,25 1
INV
IT 0,3333 0,0833 0,0833 0,0833 0,0833
DE 0,3333 0,0833 0,0833 0,0833 0,0833
DE 0,3333 0,0833 0,0833 0,0833 0,0833
0,9999 0.9996
“Fractional” counting, meaning that if a patent has multiple
inventors, it will be counted proportionally per inventor: if a patent
is allocated to x regions (countries), each involved region (country)
receives a count of 1/x patents.
The benefit of such a counting scheme is its transparency across
different levels of aggregations: for each breakdown, the total
patent count adds up to the sum of the counts by region or by
country
IPC 35
3 7
0.5 0.75 1
INV
IT 0.3333 0.1667 0.2500
DE 0.6666 0.3333 0.5000
0,9999 1
S2b: files preparation
• This step aims to create one table for each dimension with relevant
data
• - appln_id
• - dimension considered
• - fractional weight
• Two dimension involved: inventor’s country and IPC35
• Table names: IPC35 and INVGEO
S2c: what to do with missing???
• Always consider some data are missing so if you want to
have consistent data you should also append data that exist
in applications table but have no IPC or inventor…
• This is a left join like
insert into ipc35 select a.appln_id, -1,1
from applications a
left join ipc35 b on a.appln_id=b.appln_id
where b.appln_id is null;
S2d: Weight code commented
use minidump;
drop table if exists ipc35;
create table ipc35
SELECT
b.appln_id,
b.techn_field_nr AS ipcclass,
b.weight
FROM
tls230_appln_techn_field b
INNER JOIN applications a ON a.APPLN_ID = b.appln_id;
alter table ipc35 add index i1(appln_id);
insert into ipc35 select a.appln_id, -1,1
from applications a left join ipc35 b on a.appln_id=b.appln_id where b.appln_id is null;
This one is easy:
weight already
provided in
PATSTAT
S2d: Weight code commented
drop table if exists invgeo0;
drop table if exists invgeo;
create table invgeo0
Select
a.APPLN_ID,
b.PERSON_CTRY_CODE CTRY_CODE,
Count(Distinct b.PERSON_ID) As Count_PERSON_ID
From applications x
Inner Join
tls207_pers_appln a on a.appln_id=x.appln_id
Inner Join tls206_person b On a.PERSON_ID = b.PERSON_ID
Where
a.INVT_SEQ_NR > 0
Group By a.APPLN_ID, b.PERSON_CTRY_CODE;
alter table invgeo0 add index i1(appln_id);
create table invgeo
Select a.APPLN_ID, a.CTRY_CODE, a.Count_PERSON_ID/b.tot_person_id as inv_share
From
invgeo0 a Inner Join
(select appln_id, sum(Count_PERSON_ID)tot_person_id from invgeo0 group by appln_id) b
On a.APPLN_ID = b.APPLN_ID;
alter table invgeo add index i1(appln_id);
insert into invgeo select a.appln_id, '',1
from applications a left join invgeo b on a.appln_id=b.appln_id where b.appln_id is null;
Weight calc in 2
steps
The indicator
Choosing citations
This graph
compares
trends for
backward
citations
counted by
distinct
number of
applications,
publications
or docdb
family cited
(by the
family) for
EP 2007
applications.
Is app and
pub are very
similar;
family are
much
different;
TOP CITED:
996 for
apps, pubs;
3066 for
family
Weight calc in 2
steps
Citations outlier
Note: EP patent cites only 8 publications !!!
Step 3a – appln_id to appln_id
create table c1
SELECT distinct a.APPLN_ID, t12.CITED_PAT_PUBLN_ID
FROM
tls212_citation t12
INNER JOIN tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = t12.PAT_PUBLN_ID
INNER JOIN applications a ON a.APPLN_ID = t11.APPLN_ID
WHERE
t12.CITED_PAT_PUBLN_ID > 0;
Step 3b – appln_id to appln_id
create table c2
SELECT DISTINCT
v.APPLN_ID,
t11.APPLN_ID AS cited_APPLN_ID
FROM c1 AS v
INNER JOIN patstat.tls211_pat_publn t11
ON t11.PAT_PUBLN_ID = v.CITED_PAT_PUBLN_ID;
Step 3c – appln_id - originality
create table apporig
SELECT
v.APPLN_ID,
pow(t30.weight , 2) AS Originality
FROM
c2 AS v
LEFT JOIN patstat.tls230_appln_techn_field t30 ON v.cited_APPLN_ID =
t30.appln_id
GROUP BY
v.APPLN_ID;
update apporig set Originality = 1-floor(Originality*10000)/10000;
insert into apporig select a.appln_id, null
from applications a left join apporig b on a.appln_id=b.appln_id where b.appln_id is
null;  missing data: keep null but could be 0 or…
Step 3c – appln_id - originality
create table apporig
SELECT
v.APPLN_ID,
pow(t30.weight , 2) AS Originality
FROM
c2 AS v
LEFT JOIN patstat.tls230_appln_techn_field t30 ON v.cited_APPLN_ID =
t30.appln_id
GROUP BY
v.APPLN_ID;
update apporig set Originality = 1-floor(Originality*10000)/10000;
insert into apporig select a.appln_id, null
from applications a left join apporig b on a.appln_id=b.appln_id where b.appln_id is
null;  missing data: keep null but could be 0 or…
Step 4 – consistency checks:
Now I should check my intermediate tables are consistent with
starting data:
0) discarded applications have a reason?
1)Are all applications I expect included? [crosscheck with
population]
2)Are there values out of range (>1 or > 0)?
3)Are there applications where the sum of weights is > 1?
4)Is the sum of weight + (ifnull=1) = number of applications?
The final indicator
Originality by country and year…
but how to count it ?
Average? Weighted? … ?
Results:
CTRY
CODE
Avg
Originality
BU 0,9796
VG 0,9764
DZ 0,96
KW 0,7778
DO 0,74655
JM 0,74655
GI 0,72225
PR 0,67385
HN 0,64
MG 0,64
KP 0,63078
MD 0,5991
CU 0,5744
CY 0,533693
SY 0,48
JO 0,449155
SM 0,44445
LA 0,4375
FO 0,4352
EG 0,406137
MY 0,39512
PK 0,38885
LT 0,378275
LV 0,37714
PT 0,368177
ID 0,36801
TH 0,367867
0,365484
JP 0,364953
CA 0,364003
IL 0,358494
US 0,358421 Some normalizations
needed? Suggestions?
The final indicator: thresh 100
SELECT
b.CTRY_CODE, a.priority_year,
Avg(o.Originality) AS Avg_Originality,
Count(DISTINCT o.APPLN_ID) AS Count_APPLN_ID
FROM
minidump.invgeo b
INNER JOIN minidump.applications a ON b.APPLN_ID = a.APPLN_ID
INNER JOIN minidump.apporig o ON o.APPLN_ID = b.APPLN_ID
WHERE
o.Originality IS NOT NULL
GROUP BY
b.CTRY_CODE,
a.priority_year
having Count_APPLN_ID > 100
ORDER BY
Avg_Originality DESC
Positive Consequence:
I have left several tables to
build other indicators with
(invgeo, ipc35 …)
Thank You
Questions?

Weitere ähnliche Inhalte

Ähnlich wie Patstat indicators step by step

The Role Of Software And Hardware As A Common Part Of The...
The Role Of Software And Hardware As A Common Part Of The...The Role Of Software And Hardware As A Common Part Of The...
The Role Of Software And Hardware As A Common Part Of The...Sheena Crouch
 
Fy secondsemester2016
Fy secondsemester2016Fy secondsemester2016
Fy secondsemester2016Ankit Dubey
 
Fy secondsemester2016
Fy secondsemester2016Fy secondsemester2016
Fy secondsemester2016Ankit Dubey
 
Fy secondsemester2016
Fy secondsemester2016Fy secondsemester2016
Fy secondsemester2016Ankit Dubey
 
Data structures using C
Data structures using CData structures using C
Data structures using CPdr Patnaik
 
Ds12 140715025807-phpapp02
Ds12 140715025807-phpapp02Ds12 140715025807-phpapp02
Ds12 140715025807-phpapp02Salman Qamar
 
Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002UGIF
 
Project Management System
Project Management SystemProject Management System
Project Management SystemDivyen Patel
 
Basic programming of 8085
Basic programming of 8085 Basic programming of 8085
Basic programming of 8085 vijaydeepakg
 
PageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop SummitPageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop SummitOfer Mendelevitch
 
PageRank for Anomaly Detection
PageRank for Anomaly DetectionPageRank for Anomaly Detection
PageRank for Anomaly DetectionDataWorks Summit
 
Virtual Instrumentation & LabVIEW-lini.ppt
Virtual Instrumentation & LabVIEW-lini.pptVirtual Instrumentation & LabVIEW-lini.ppt
Virtual Instrumentation & LabVIEW-lini.pptAvinashJain66
 
Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...IRJET Journal
 

Ähnlich wie Patstat indicators step by step (20)

The Role Of Software And Hardware As A Common Part Of The...
The Role Of Software And Hardware As A Common Part Of The...The Role Of Software And Hardware As A Common Part Of The...
The Role Of Software And Hardware As A Common Part Of The...
 
Fy secondsemester2016
Fy secondsemester2016Fy secondsemester2016
Fy secondsemester2016
 
Fy secondsemester2016
Fy secondsemester2016Fy secondsemester2016
Fy secondsemester2016
 
Fy secondsemester2016
Fy secondsemester2016Fy secondsemester2016
Fy secondsemester2016
 
Data structures using C
Data structures using CData structures using C
Data structures using C
 
Ds12 140715025807-phpapp02
Ds12 140715025807-phpapp02Ds12 140715025807-phpapp02
Ds12 140715025807-phpapp02
 
Anup2
Anup2Anup2
Anup2
 
Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002Ugif 10 2012 ppt0000002
Ugif 10 2012 ppt0000002
 
Project Management System
Project Management SystemProject Management System
Project Management System
 
Real Time Embedded System
Real Time Embedded SystemReal Time Embedded System
Real Time Embedded System
 
Basic programming of 8085
Basic programming of 8085 Basic programming of 8085
Basic programming of 8085
 
Switch Control and Time Delay - Keypad
Switch Control and Time Delay - KeypadSwitch Control and Time Delay - Keypad
Switch Control and Time Delay - Keypad
 
PageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop SummitPageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop Summit
 
PageRank for Anomaly Detection
PageRank for Anomaly DetectionPageRank for Anomaly Detection
PageRank for Anomaly Detection
 
DSD
DSDDSD
DSD
 
LOW COST SCADA SYSTEM FOR EDUCATION
LOW COST SCADA SYSTEM FOR EDUCATIONLOW COST SCADA SYSTEM FOR EDUCATION
LOW COST SCADA SYSTEM FOR EDUCATION
 
EE2356 Microprocessor and Microcontroller Lab Manuel
EE2356 Microprocessor and Microcontroller Lab ManuelEE2356 Microprocessor and Microcontroller Lab Manuel
EE2356 Microprocessor and Microcontroller Lab Manuel
 
Virtual Instrumentation & LabVIEW-lini.ppt
Virtual Instrumentation & LabVIEW-lini.pptVirtual Instrumentation & LabVIEW-lini.ppt
Virtual Instrumentation & LabVIEW-lini.ppt
 
Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...Design and Implementation of Test Vector Generation using Random Forest Techn...
Design and Implementation of Test Vector Generation using Random Forest Techn...
 
Cadancesimulation
CadancesimulationCadancesimulation
Cadancesimulation
 

Mehr von Gianluca Tarasconi

Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Gianluca Tarasconi
 
PATSTAT & Patentsview: complements or substitutes?
PATSTAT & Patentsview: complements or substitutes?PATSTAT & Patentsview: complements or substitutes?
PATSTAT & Patentsview: complements or substitutes?Gianluca Tarasconi
 
Matching PATSTAT to Crunchbase
Matching PATSTAT to CrunchbaseMatching PATSTAT to Crunchbase
Matching PATSTAT to CrunchbaseGianluca Tarasconi
 
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16Gianluca Tarasconi
 
Ep register for patent data analisys
Ep register for patent data analisysEp register for patent data analisys
Ep register for patent data analisysGianluca Tarasconi
 
Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures Gianluca Tarasconi
 
Patstat 7 deadly sin and how to solve them
Patstat 7 deadly sin and how to solve themPatstat 7 deadly sin and how to solve them
Patstat 7 deadly sin and how to solve themGianluca Tarasconi
 
PRS inpadoc legal data reclassification: db structure and some insights
 PRS inpadoc legal data reclassification: db structure and some insights PRS inpadoc legal data reclassification: db structure and some insights
PRS inpadoc legal data reclassification: db structure and some insightsGianluca Tarasconi
 
Trackin patent applicant changes with a temporal database
Trackin patent applicant changes with a temporal databaseTrackin patent applicant changes with a temporal database
Trackin patent applicant changes with a temporal databaseGianluca Tarasconi
 
Sharing names and address cleaning patterns for Patstat
Sharing names and address cleaning patterns for PatstatSharing names and address cleaning patterns for Patstat
Sharing names and address cleaning patterns for PatstatGianluca Tarasconi
 
Patstat and patstat related resources for patent data analisys
Patstat and patstat related resources for patent data analisysPatstat and patstat related resources for patent data analisys
Patstat and patstat related resources for patent data analisysGianluca Tarasconi
 
Patent databases for business intelligence
Patent databases for business intelligencePatent databases for business intelligence
Patent databases for business intelligenceGianluca Tarasconi
 

Mehr von Gianluca Tarasconi (15)

Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
PATSTAT & Patentsview: complements or substitutes?
PATSTAT & Patentsview: complements or substitutes?PATSTAT & Patentsview: complements or substitutes?
PATSTAT & Patentsview: complements or substitutes?
 
Matching PATSTAT to Crunchbase
Matching PATSTAT to CrunchbaseMatching PATSTAT to Crunchbase
Matching PATSTAT to Crunchbase
 
PATSTAT users 7 sins
PATSTAT users 7 sinsPATSTAT users 7 sins
PATSTAT users 7 sins
 
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
QUELLO CHE I BREVETTI NON DICONO Aidb 2/12/16
 
Ep register for patent data analisys
Ep register for patent data analisysEp register for patent data analisys
Ep register for patent data analisys
 
Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures Using patstat in universities evaluation procedures
Using patstat in universities evaluation procedures
 
Industria italiana dal 78
Industria italiana dal 78Industria italiana dal 78
Industria italiana dal 78
 
Patenting in the south
Patenting in the southPatenting in the south
Patenting in the south
 
Patstat 7 deadly sin and how to solve them
Patstat 7 deadly sin and how to solve themPatstat 7 deadly sin and how to solve them
Patstat 7 deadly sin and how to solve them
 
PRS inpadoc legal data reclassification: db structure and some insights
 PRS inpadoc legal data reclassification: db structure and some insights PRS inpadoc legal data reclassification: db structure and some insights
PRS inpadoc legal data reclassification: db structure and some insights
 
Trackin patent applicant changes with a temporal database
Trackin patent applicant changes with a temporal databaseTrackin patent applicant changes with a temporal database
Trackin patent applicant changes with a temporal database
 
Sharing names and address cleaning patterns for Patstat
Sharing names and address cleaning patterns for PatstatSharing names and address cleaning patterns for Patstat
Sharing names and address cleaning patterns for Patstat
 
Patstat and patstat related resources for patent data analisys
Patstat and patstat related resources for patent data analisysPatstat and patstat related resources for patent data analisys
Patstat and patstat related resources for patent data analisys
 
Patent databases for business intelligence
Patent databases for business intelligencePatent databases for business intelligence
Patent databases for business intelligence
 

Kürzlich hochgeladen

Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfSaviRakhecha1
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdfFinTech Belgium
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfGale Pooley
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Call Girls in Nagpur High Profile
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptxFinTech Belgium
 
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...ssifa0344
 
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikHigh Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...dipikadinghjn ( Why You Choose Us? ) Escorts
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfGale Pooley
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfGale Pooley
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfMichael Silva
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...dipikadinghjn ( Why You Choose Us? ) Escorts
 
Gurley shaw Theory of Monetary Economics.
Gurley shaw Theory of Monetary Economics.Gurley shaw Theory of Monetary Economics.
Gurley shaw Theory of Monetary Economics.Vinodha Devi
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfGale Pooley
 

Kürzlich hochgeladen (20)

Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdf
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
 
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf
 
The Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdfThe Economic History of the U.S. Lecture 22.pdf
The Economic History of the U.S. Lecture 22.pdf
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx
 
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
 
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikHigh Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik
 
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdf
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdf
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdf
 
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
 
Gurley shaw Theory of Monetary Economics.
Gurley shaw Theory of Monetary Economics.Gurley shaw Theory of Monetary Economics.
Gurley shaw Theory of Monetary Economics.
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdf
 
Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024
 

Patstat indicators step by step

  • 1. PATSTAT: from data to indicators Leuven, 14/9/18 Gianluca Tarasconi Icrios Dba - Univ. Bocconi
  • 2. Introduction: PATSTAT and SQL • Several papers exist providing SQL code for didactic purposes • [Squicciarini, M., Dernis, H., & Criscuolo, C. (2013). Measuring patent quality. / De Rassenfosse, G., Dernis, H., & Boedt, G. (2014). An introduction to the Patstat database with example queries. Australian Economic Review, 47(3), 395-408 / De Rassenfosse et al.. (2017). Getting Started with PATSTAT Register. Australian Economic Review, 50(1), 110-120 … many others ] • Still some steps are often missing when creating subsets and indicators, especially on selection criteria (what patents do I include???), counting methodology, data boundaries (years, geography etc…)
  • 5. Hands on data • Data are extracted from a PATSTAT spring 2018 mini-dump, selecting EP office applications for year 2007 (filing date) and related patents; • A PATSTAT version is available in Oracle for those who want to reply this exercise step by step. • SQL and dump load • https://www.dropbox.com/s/fo1wpqwo1j3buom/sqlleuven 2018.zip?dl=0 • https://bit.ly/2xmb29r
  • 6. STEP 1: selection and output • The population we will analyze are: year 2005 EP filings (PCT excluded) • The output requested is Originality by country and IPC35 • Two questions should arise now…
  • 7. S1: Selection and output – Q&A • Q1,2: what year and what country? • A1: Year stands for priority year (vs filing year) • A2: Country stands for inventor’s country (vs applicant’s country or invention vs ownership country) • A third question could be useful: what kind of patents? • Maybe a last question might be ‘what is IPC35 reclassification’ Schmoch, U. (2008). Concept of a technology classification for country comparisons. Final Report to the World Intellectual Property Organization (WIPO), Fraunhofer Institute for Systems and Innovation Research, Karlsruhe.
  • 8. S1: Selection and output – Q&A 2 • Field appln_kind in TLS201 may contain values: • A patent • U utility model • W PCT application (in the international phase) • T used by some offices for applications which are "translations" of granted PCT or EP applications • P provisional application (US only) • F design patent • V plant patent • D2, D3 artificial applications A kind for EP is the value to select
  • 9. S1: Selection code commented USE minidump; -- table applications drop table if exists applications; -- EP - 2005 Create table applications Select T01.APPLN_ID, T01.APPLN_AUTH, T01.APPLN_NR, T01.APPLN_KIND, EARLIEST_FILING_YEAR as PRIORITY_YEAR, From patstat.tls201_appln T01 where t01.appln_auth = "EP" and earliest_filing_year =2005 ; ALTER TABLE applications ADD INDEX `I1`(`APPLN_ID`); 31.166 applications
  • 10. S1: Selection code exclusions drop table if exists applicationsdropped; -- deletes unpublished -- ******************* create table applicationsdropped Select a.* , "NOPUBBL" as reason From applications a Left Join patstat.tls211_pat_publn b On a.APPLN_ID = b.APPLN_ID where b.appln_id is null; -- deletes 9999 -- ******************* insert into applicationsdropped Select a.* , "9999" as reason From applications a where a.priority_year = 9999; -- deletes non A kind -- ******************* insert into applicationsdropped Select a.* , "NON_A" as reason From applications a where a.appln_kind <>"A"; DELETE A.* from APPLICATIONS A inner join Applicationsdropped B on A.appln_id=B.appln_id; Unpublished priorities Priority 9999 (useless in this case) EP specific: Keep only kind A (patent of invention) Table to store excluded 6.408 distinct applications excluded: 24758 remain
  • 11. STEP 2: weighting dimensions • Indicator has to be calculated over 2 dimensions: invention country and IPC35 • Each application may have multiple ipc35 and multiple country of invention • Several weighting schemes exist: full count, fractional…
  • 12. S2a: fractional weight IPC 35 3 7 7 7 0,25 0,25 0,25 0,25 1 INV IT 0,3333 0,0833 0,0833 0,0833 0,0833 DE 0,3333 0,0833 0,0833 0,0833 0,0833 DE 0,3333 0,0833 0,0833 0,0833 0,0833 0,9999 0.9996 “Fractional” counting, meaning that if a patent has multiple inventors, it will be counted proportionally per inventor: if a patent is allocated to x regions (countries), each involved region (country) receives a count of 1/x patents. The benefit of such a counting scheme is its transparency across different levels of aggregations: for each breakdown, the total patent count adds up to the sum of the counts by region or by country IPC 35 3 7 0.5 0.75 1 INV IT 0.3333 0.1667 0.2500 DE 0.6666 0.3333 0.5000 0,9999 1
  • 13. S2b: files preparation • This step aims to create one table for each dimension with relevant data • - appln_id • - dimension considered • - fractional weight • Two dimension involved: inventor’s country and IPC35 • Table names: IPC35 and INVGEO
  • 14. S2c: what to do with missing??? • Always consider some data are missing so if you want to have consistent data you should also append data that exist in applications table but have no IPC or inventor… • This is a left join like insert into ipc35 select a.appln_id, -1,1 from applications a left join ipc35 b on a.appln_id=b.appln_id where b.appln_id is null;
  • 15. S2d: Weight code commented use minidump; drop table if exists ipc35; create table ipc35 SELECT b.appln_id, b.techn_field_nr AS ipcclass, b.weight FROM tls230_appln_techn_field b INNER JOIN applications a ON a.APPLN_ID = b.appln_id; alter table ipc35 add index i1(appln_id); insert into ipc35 select a.appln_id, -1,1 from applications a left join ipc35 b on a.appln_id=b.appln_id where b.appln_id is null; This one is easy: weight already provided in PATSTAT
  • 16. S2d: Weight code commented drop table if exists invgeo0; drop table if exists invgeo; create table invgeo0 Select a.APPLN_ID, b.PERSON_CTRY_CODE CTRY_CODE, Count(Distinct b.PERSON_ID) As Count_PERSON_ID From applications x Inner Join tls207_pers_appln a on a.appln_id=x.appln_id Inner Join tls206_person b On a.PERSON_ID = b.PERSON_ID Where a.INVT_SEQ_NR > 0 Group By a.APPLN_ID, b.PERSON_CTRY_CODE; alter table invgeo0 add index i1(appln_id); create table invgeo Select a.APPLN_ID, a.CTRY_CODE, a.Count_PERSON_ID/b.tot_person_id as inv_share From invgeo0 a Inner Join (select appln_id, sum(Count_PERSON_ID)tot_person_id from invgeo0 group by appln_id) b On a.APPLN_ID = b.APPLN_ID; alter table invgeo add index i1(appln_id); insert into invgeo select a.appln_id, '',1 from applications a left join invgeo b on a.appln_id=b.appln_id where b.appln_id is null; Weight calc in 2 steps
  • 18. Choosing citations This graph compares trends for backward citations counted by distinct number of applications, publications or docdb family cited (by the family) for EP 2007 applications. Is app and pub are very similar; family are much different; TOP CITED: 996 for apps, pubs; 3066 for family Weight calc in 2 steps
  • 19. Citations outlier Note: EP patent cites only 8 publications !!!
  • 20. Step 3a – appln_id to appln_id create table c1 SELECT distinct a.APPLN_ID, t12.CITED_PAT_PUBLN_ID FROM tls212_citation t12 INNER JOIN tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = t12.PAT_PUBLN_ID INNER JOIN applications a ON a.APPLN_ID = t11.APPLN_ID WHERE t12.CITED_PAT_PUBLN_ID > 0;
  • 21. Step 3b – appln_id to appln_id create table c2 SELECT DISTINCT v.APPLN_ID, t11.APPLN_ID AS cited_APPLN_ID FROM c1 AS v INNER JOIN patstat.tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = v.CITED_PAT_PUBLN_ID;
  • 22. Step 3c – appln_id - originality create table apporig SELECT v.APPLN_ID, pow(t30.weight , 2) AS Originality FROM c2 AS v LEFT JOIN patstat.tls230_appln_techn_field t30 ON v.cited_APPLN_ID = t30.appln_id GROUP BY v.APPLN_ID; update apporig set Originality = 1-floor(Originality*10000)/10000; insert into apporig select a.appln_id, null from applications a left join apporig b on a.appln_id=b.appln_id where b.appln_id is null;  missing data: keep null but could be 0 or…
  • 23. Step 3c – appln_id - originality create table apporig SELECT v.APPLN_ID, pow(t30.weight , 2) AS Originality FROM c2 AS v LEFT JOIN patstat.tls230_appln_techn_field t30 ON v.cited_APPLN_ID = t30.appln_id GROUP BY v.APPLN_ID; update apporig set Originality = 1-floor(Originality*10000)/10000; insert into apporig select a.appln_id, null from applications a left join apporig b on a.appln_id=b.appln_id where b.appln_id is null;  missing data: keep null but could be 0 or…
  • 24. Step 4 – consistency checks: Now I should check my intermediate tables are consistent with starting data: 0) discarded applications have a reason? 1)Are all applications I expect included? [crosscheck with population] 2)Are there values out of range (>1 or > 0)? 3)Are there applications where the sum of weights is > 1? 4)Is the sum of weight + (ifnull=1) = number of applications?
  • 25. The final indicator Originality by country and year… but how to count it ? Average? Weighted? … ?
  • 26. Results: CTRY CODE Avg Originality BU 0,9796 VG 0,9764 DZ 0,96 KW 0,7778 DO 0,74655 JM 0,74655 GI 0,72225 PR 0,67385 HN 0,64 MG 0,64 KP 0,63078 MD 0,5991 CU 0,5744 CY 0,533693 SY 0,48 JO 0,449155 SM 0,44445 LA 0,4375 FO 0,4352 EG 0,406137 MY 0,39512 PK 0,38885 LT 0,378275 LV 0,37714 PT 0,368177 ID 0,36801 TH 0,367867 0,365484 JP 0,364953 CA 0,364003 IL 0,358494 US 0,358421 Some normalizations needed? Suggestions?
  • 27. The final indicator: thresh 100 SELECT b.CTRY_CODE, a.priority_year, Avg(o.Originality) AS Avg_Originality, Count(DISTINCT o.APPLN_ID) AS Count_APPLN_ID FROM minidump.invgeo b INNER JOIN minidump.applications a ON b.APPLN_ID = a.APPLN_ID INNER JOIN minidump.apporig o ON o.APPLN_ID = b.APPLN_ID WHERE o.Originality IS NOT NULL GROUP BY b.CTRY_CODE, a.priority_year having Count_APPLN_ID > 100 ORDER BY Avg_Originality DESC
  • 28. Positive Consequence: I have left several tables to build other indicators with (invgeo, ipc35 …)