The EPO has recently launched a new product: Linked open EP data. This open and free data set contains bibliographic information of EP publications and the Cooperative Patent Classification (CPC) hierarchy. Linked data , also known as Semantic Web, facilitates combining a particular data set with other linked data sets in any domain including patents. Given its URI, data about a resource, e. g. a patent publication, can be retrieved in a variety of formats over the web. For occasional use there is a simple data browser, an API and a SPARQL query interface. For heavier use, bulk data is available for download.
In this presentation we will introduce this new EPO product and illustrate the different ways this data can be inspected and retrieved. We will explore the content and point out potential use scenarios.
3. European Patent Office
EPO Vienna: Patent publication
MunichMunich
The HagueThe Hague
BerlinBerlin
ViennaVienna
Liaison office
with the EU
Headquarters
BrusselsBrussels
3
4. European Patent Office
EPO's PI quality criteria mantra
▪ Completeness
▪ Accuracy
▪ Timeliness
▪ Usability by as many persons possible
4
5. European Patent Office 5
The past of EPO’s PI distribution
▪ Microfiche
▪ Punch cards
▪ Floppy disks
▪ Laser disk
▪ Tapes
▪ Tape cartridges
▪ CDs & DVDs
6. European Patent Office
The present of EPO’s PI distribution
6
Human access
European Patent RegisterEuropean Publication Server Espacenet
EP full-text search EP Bulletin search Global Patent Index PATSTAT Online
Global Dossier Common Citation Document
Computer access
Web services
Open Patent Services
European Publication Server
Data products
EP (EBD, XML, PDF/A)
worldwide (DOCDB, INPADOC)
PATSTAT data
7. European Patent Office
The future of EPO’s PI distribution ?
7
Source: http://5stardata.info/
Tim Berners-Lee 5 star Open Data plan
8. European Patent Office
From Web of Documents
to Web of Data (Semantic Web, Linked Data)
If HTML and the Web made all the online
documents look like one huge book,
RDF, schema and inference languages will
make all the data in the world look like one
huge database.
Tim Berners-Lee, Weaving the Web, Orion Publishing Group, UK, 1999
8
9. European Patent Office
HTTP names as unique identifiers
All business objects (called resources) will get a an HTTP name (URI)
as globally unique identifier.
In any Internet browser, each HTTP name will return some useful
data in a standard format about that resource. It can also return
relationships to other resources using their HTTP names.
9
Application identifier
http://data.epo.org/linked-data/id/application/EP/98925243
Publication identifier
http:// data.epo.org/linked-data/data/publication/EP/1010425/A2
10. European Patent Office
Linked data: Just a (huge) collection of very simple facts
10
Our patent world
inventor
name: W. Kosman
living in: NL
nr: 1000000
office: EPO
Linked Data model
http://data.epo.org/.../EP/1000000/A1 is a publication.
http://data.epo.org/.../EP/1000000/A1 publicationNumber "1000000".
http://data.epo.org/.../EP/1000000/A1 publicationAuthority “EP”
http://data.../vc/C9B6819....6B is a person.
http://data.../vc/C9B6819....6B fn "Kosman, W.".
http://data.../vc/C9B6819....6B countryCode "NL".
http://data.epo.org/.../EP/1000000/A1 has inventor http://data.../vc/C9B6819....6B
11. European Patent Office
All EP patents and all other related applications / publications
▪ “related” includes
− international applications
− priorities
− applications in same DOCDB family
− cited documents
▪ bibliographic data of EPs;
basic data for non-EPs
▪ references to full text in EPO’s official Publication Server
▪ weekly update
Citn
Content: EP patents
Internat.
Appln
Appln x EP Appln
Priority
Appln
Citn
cited
Patent
Simple family
11
12. European Patent Office
Content: CPC Cooperative Patent Classification
CPC hierarchy
▪ Most interesting data elements
▪ Linked to EP data set
▪ Recent CPC version
12
EP Appln
CPC
symbol
Broader
CPC
Narrower
CPC
Narrower
CPC
Narrower
CPC
13. European Patent Office 13
Data model: High level overview
EP application
EP publication
Simple Family
Citation
CPC
Applicant
Inventor
Agent
Non-EP applications
As published
IPC
KR patent data
As updated/corrected External data
14. European Patent Office
Open data license CC BY 4.0
▪ Standard license, not handcrafted
▪ No costs, no registration
▪ May be shared, copy, redistributed in any medium or format
▪ May be adapted, remixed, transformed
▪ For any purpose, even commercially
▪ Attribution required
14
15. European Patent Office
Access: https://data.epo.org/linked-data
Fair use policy
▪ GUI gives access to a reference data service
▪ Data browser and SPARQL endpoint are for occasional use:
exploration, trying out new ideas, ...
▪ For production use: data must be downloaded
15
16. European Patent Office 16
API – Interactive features
Simple browser for
data exploration
▪ Nice presentation of
resources
▪ Click to change
focus
17. European Patent Office 17
API – Parameterized URIs
Linked data API (LDA)
▪ Retrieve one or list of
resources
▪ Filter
▪ Sort
▪ Define return format
▪ Custom views
18. European Patent Office 18
SPARQL queries
Powerful query language
▪ for RDF graphs
▪ for heterogeneous data
sets
▪ to explore data
▪ to explore structure
(meta-data)
▪ SolR text index
19. European Patent Office
Nevertheless: it is pure data
19
<http://data.epo.org/linked-data/data/publication/EP/1676702/B1/->
rdfs:label "EP 1676702 B1" ;
patent:application <http://data.epo.org/linked-data/id/application/EP/05027699> ;
patent:publicationAuthority <http://data.epo.org/linked-data/id/st3/EP> ;
patent:publicationDate "2008-11-26"^^xsd:date ;
patent:publicationKind patent:publicationKind_B1 .
patent:publicationKind_B1 rdfs:label "B1"@en .
<http://data.epo.org/linked-data/id/application/EP/01945281>
patent:applicationNumber "01945281" .
patent:publicationKind_A1 rdfs:label "A1"@en .
20. European Patent Office 20
Download
▪ about 650 mio triples
▪ about 60 GB
(N-triple format)
▪ Updated weekly
22. European Patent Office
Benefits of linked data for data consumers
▪ Very simple data format: “triples”
▪ Re-use of established ontologies (classes, properties)
▪ Infrastructure and standards already exist:
The Web and various W3C recommendations
22
Less "data friction" when combining different data sets
Target group: Data scientists, web developer, ...
23. European Patent Office 23
2008200920102011
2014
2017
Linked Open Data cloud
Linking Open Data cloud
diagram, by Richard Cyganiak
and Anja Jentzsch. http://lod-cloud.net/
24. European Patent Office 24
Patent information can contribute to other domains
Academic
journals
Image
collections
Geographical
records
Telephone
directories
Court
decisions
Standards
Trade mark
data
Company
registers
Library of
Congress
National
patent dataDictionaries
and
encyclopaedias
Economic
data
National
statistics
University
libraries
Annual
reports
Technical
magazines
Classification
data
Government
subsidies
Patent data
25. European Patent Office
Thank you for your attention!
Questions?
25
Martin Kracker
European Patent Office
Directorate Publication
mkracker@epo.org