SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
A document-inspired way for
tracking changes of RDF data
The case of the OpenCitations Corpus
Silvio Peroni, David Shotton, Fabio Vitali
1st Drift-a-LOD Workshop: Detection, Representation and
Management of Concept Drift in Linked Open Data

Bologna, Italy, November 20, 2016
Paper: https://w3id.org/oc/paper/occ-driftalod2016.html
The Venice analogy
• Island = 

scholarly publication
• Bridge = citation
• Current situation:
– local travel to
the next island
is permitted
– unrestricted
travel over the
entire network
of bridges
requires an
expensive
season ticket
– general
populace is
excluded
https://w3id.org/oc/paper/the-venice-analogy.html
Opening the bridges
• What – Citation data are one of the main tools used by
researchers to gain knowledge about particular topics, and
they also serve institutional goals, for example in research
assessment
• Problem – The most authoritative databases of citation data,
Scopus and Web of Science, can only be accessed by paying
significant annual access fees
– The University of Bologna pays about 6,000,000 euros per year for
accessing to digital bibliographic resources
• Solution – To create a citation database that freely and legally
makes available citation data in an open repository to assist
scholars with their academic studies and serve knowledge to
the wider public
OpenCitations
• The OpenCitations Project aims at creating an open repository of scholarly
citation data – the OpenCitations Corpus (OCC) – made available under a
Creative Commons public domain dedication to provide in RDF accurate citation
information (bibliographic references) harvested from the scholarly literature
– All scripts are released with Open Source ISC Licence and available on GitHub at
http://github.com/essepuntato/opencitations
• Currently processing papers available in the PubMedCentral Open Access subset
• As of November 20, 2016 the OCC contains 2,076,645 citation links
• Six distinct kinds of bibliographic entities
– bibliographic resources (citing/cited articles, journals, books, proceedings, etc.)
– resource embodiments (format information about bibliographic resources)
– bibliographic entries (literal textual entries occurring in the reference lists)
– responsible agents (agents having certain roles with respect to the bibliographic
resources)
– agent roles (author, editor, publisher);
– identifiers (DOI, ORCID, PubMedID, URL, etc.)
http://opencitations.net
Ingestion workflow
BEE
EuropeanPubMedCentralProcessor
Parsing the
XML source of
PubMed Central
Open Access
articles.
1
SPACIN
Producing
JSON with DOI
and bib entries.
{

"doi": "10.1590/1414-431x20154655", 

"localid": "MED-26577845", 

"curator": "BEE EuropeanPubMedCentralProcessor", 

"source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4678653/
fullTextXML",
"source_provider": "Europe PubMed Central", 

"pmid": "26577845", 

"pmcid": “PMC4678653",

"references": [

{

"bibentry": "Wenger, NK. Coronary heart disease: an older woman's major
health risk, BMJ, 1997, 315, 1085, 1090, DOI: 10.1136/bmj.315.7115.1085, PMID:
9366743", 

"pmid": "9366743", 

"doi": "10.1136/bmj.315.7115.1085", 

"pmcid": "PMC2127693", 

"process_entry": "True"

} …
]
}
2
For each citing/cited resource,
if an ID (DOI, PMID, PMCID) is
specied check if the resource
exists already. If it does go to 5.
store
ResourceFinder
3
GraphSet
ProvSet
DatasetHandler

Storer
Load all the statements onthe triplestore and storethem in the le system for
easy recovering.
OCC
6
If the resource doesn’t exist,
extract possible IDs from the entry
and query CrossRef and ORCID.
CrossRefProcessor

ORCIDProcessor
4
GraphEntity
New metadata resources are created.
If CrossRef/ORCID returned something, all
the related metadata will be used,
otherwise only basic metadata (IDs and
entries) will be added.
5
Issues with data
• Automatic workflow built upon external services
– Efficient, but no human check of the data extracted
– Some errors could be propagated
• Data do change in time
– Information can be incomplete (e.g. citations added
in another iteration of the ingestion workflow)
– Information can be wrong (e.g. circular citations –
paper A cites paper A)
– Information can be ambiguous (e.g. author
disambiguation)
Document-inspired data drift
• Inspiration from the Document Engineering domain
– Well-known structure for keeping track of changes in word-
processor documents, e.g. OpenOffice Writer
✦ New content added directly within the existing text and marked in
some way
✦ Removed context moved out from the actual content of the document
and placed in an auxiliary space for easy retrieving and restoration
– Two basic operations (add & remove) are enough for keeping
track of document changes
• Solution for RDF data: using PROV-O + SPARQL
UPDATE (INSERT DATA and DELETE DATA only) for
keeping track of the way entities change in time
The approach
:sp a foaf:Person ;
foaf:name "Silvio Peroni" .
:sp a prov:Entity .
:sp-snapshot-1 a prov:Entity ;
prov:specializationOf :sp .
Data Provenance
:sp a foaf:Person ;
foaf:givenName "Silvio" ;
foaf:familyName "Peroni" .
:sp-snapshot-2 a prov:Entity ;
prov:specializationOf :sp ;
prov:wasDerivedFrom :sp-snapshot-1 ;
new:hasUpdateQuery
"
" .
INSERT DATA {
:sp foaf:givenName 'Silvio' ;
foaf:familyName 'Peroni' } ;
DELETE DATA {
:sp foaf:name 'Silvio Peroni' }
Time
T1
T2
A snapshot records the composition
of the entity it specialises (i.e. the set
of statements using such entity as
subject) at a fixed point in time
Advantages
• Easy to retrieve the current statements of an entity, since they
are those currently available in the dataset
• It is possible to restore the entity to a certain snapshot si by
applying the inverse operations (i.e. deletions instead of
insertions and vice versa) of all the update queries from the most
recent snapshot sn to si+1
– For instance, to get back to the status recorded by the first snapshot of
the previous example, we have to run all the inverse operations of the
update query specified in the second snapshot:



INSERT DATA { :sp foaf:name 'Silvio Peroni' } ;

DELETE DATA { 

:sp 

foaf:givenName 'Silvio' ; 

foaf:familyName 'Peroni' }
Implementation in the OCC
• We use:
– PROV-O
– PROV-DC, an extension of PROV-O mapping it with DC
– OpenCitations Ontology (OCO), which defines oco:hasUpdateQuery
• Each entity in the OCC tracks provenance information about:
– snapshot of entity metadata (prov:Entity), a particular snapshot recording the
metadata associated with an individual entity at a particular time
– curatorial activity (prov:Activity), a curatorial activity relating to that entity
✦ creation (prov:Create), the activity of creating a new entity with statements
✦ modification (prov:Modify), the activity of adding/removing statements of an entity
✦ merging (prov:Replace), the activity of unifying the statements relating to two entites
– provenance agent (prov:Agent), a person, organisation or process, that is
involved in some way in the creation of an entity (e.g. Crossref)
– curatorial role (prov:Association), a particular role held by a provenance
agent with respect to a curatorial activity (e.g. OCC curator, metadata source)
An example
br:525205
a fabio:Expression , fabio:JournalArticle ;
dcterms:title "The Electronic Patient ..." ;
datacite:hasIdentifier
id:816997 , id:816998 , ... ;
fabio:hasPublicationYear "2016"^^xsd:gYear ;
pro:isDocumentContextFor
ar:1591190 , ar:1591191 , ... ;
frbr:embodiment re:217773 ;
frbr:part be:727446 , be:727447 , ... .
se:1 a prov:Entity ;
prov:generatedAtTime "2016-08-08T22:25:48"^^xsd:dateTime ;
prov:hadPrimarySource
<http://api.crossref.org/works/10.2196/mhealth.5331> ;
prov:specializationOf br:525205 ;
prov:wasGeneratedBy ca:1 .
ca:1 a prov:Activity, prov:Create ;
dcterms:description
"The entity 'https://w3id.org/oc/corpus/br/525205'
has been created." ;
prov:qualifiedAssociation cr:1 , cr:2 .
cr:1 a prov:Association ;
prov:agent pa:1 ;
prov:hadRole oco:occ-curator .
pa:1 a prov:Agent ;
foaf:name "SPACIN CrossrefProcessor" . …
Data Provenance
br:525205
cites:cites br:1095420 , br:1095421 , ... ;
frbr:part be:727446 , be:727447 , ... .
se:1
prov:invalidatedAtTime "2016-08-29T22:42:06"^^xsd:dateTime ;
prov:wasInvalidatedBy ca:2 .
se:2 a prov:Entity ;
prov:generatedAtTime "2016-08-29T22:42:06"^^xsd:dateTime ;
prov:hadPrimarySource <http://www.ebi.ac.uk/europepmc/
webservices/rest/PMC4911509/fullTextXML> ;
prov:specializationOf br:525205 ;
prov:wasDerivedFrom se:1 ;
prov:wasGeneratedBy ca:2 ;
oco:hasUpdateQuery "INSERT DATA {
GRAPH <https://w3id.org/oc/corpus/br/> {
br:525205
cito:cites br:1095459 , br:525205 , ... ;
frbr:part be:727491 , be:727452 , ... } }" . …
Time
T1
T2
Conclusions
• Approach for keeping track of changes in RDF data inspired
by existing implementations in the Document Engineering
domain
– PROV-O
– SPARQL UPDATE (INSERT/DELETE DATA)
• Implementation: provenance data in the OpenCitations Corpus
– PROV-DC
– OpenCitations Ontology
• Future works
– automatic tools for snapshot restoration
– handling non-automatic modification to the OpenCitations Corpus
(e.g. curation by humans)
Silvio Peroni
silvio.peroni@unibo.it - @essepuntato
Digital And Semantic Publishing Laboratory
Department of Computer Science and Engineering
Alma Mater Studiorum – Università di Bologna
Bologna, Italy
dasplab.cs.unibo.it – www.unibo.it

Weitere ähnliche Inhalte

Was ist angesagt?

鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107皓仁 柯
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Janifer Gatenby
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)Charleston Conference
 
Open Annotation Model
Open Annotation ModelOpen Annotation Model
Open Annotation ModelPaolo Ciccarese
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordHerbert Van de Sompel
 
Best Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web ArchivingBest Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web ArchivingOCLC
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Rafal Kasprowski
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionTimothy Cole
 
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceEDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceRafal Kasprowski
 

Was ist angesagt? (20)

鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)How Libraries Use Publisher Metadata Redux (Steven Shadle)
How Libraries Use Publisher Metadata Redux (Steven Shadle)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Open Annotation Model
Open Annotation ModelOpen Annotation Model
Open Annotation Model
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 
Best Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web ArchivingBest Practices for Descriptive Metadata for Web Archiving
Best Practices for Descriptive Metadata for Web Archiving
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Open Annotation Collaboration Introduction
Open Annotation Collaboration IntroductionOpen Annotation Collaboration Introduction
Open Annotation Collaboration Introduction
 
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceEDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
 

Andere mochten auch

MediatizaciĂłn e interracciones sociales
MediatizaciĂłn e interracciones socialesMediatizaciĂłn e interracciones sociales
MediatizaciĂłn e interracciones sociales7121988
 
Tipo de datos de oracle base datos
Tipo de datos de oracle base datosTipo de datos de oracle base datos
Tipo de datos de oracle base datosPaul Vega
 
Clase tatiana adriana
Clase tatiana adrianaClase tatiana adriana
Clase tatiana adrianatatiana-vasquez
 
Vaidas Morkevičius - Open Data for Enhancing the Quality of Research
Vaidas Morkevičius - Open Data for Enhancing the Quality of ResearchVaidas Morkevičius - Open Data for Enhancing the Quality of Research
Vaidas Morkevičius - Open Data for Enhancing the Quality of ResearchAidis Stukas
 
Saulius GraĹžulis The Crystalography Open Database
Saulius GraĹžulis  The Crystalography Open DatabaseSaulius GraĹžulis  The Crystalography Open Database
Saulius GraĹžulis The Crystalography Open DatabaseAidis Stukas
 
Manual empresas eco turisticas
Manual empresas eco turisticasManual empresas eco turisticas
Manual empresas eco turisticasS. Marlene Ayala
 
Comunicacion interactiva christopher
Comunicacion interactiva christopherComunicacion interactiva christopher
Comunicacion interactiva christopherChristopher M. D' Hoy
 
Docker Mentor Week 2016 - Medan
Docker Mentor Week 2016 - MedanDocker Mentor Week 2016 - Medan
Docker Mentor Week 2016 - MedanAlbert Suwandhi
 
Presentazione di Gruppo Vectorys
Presentazione di Gruppo VectorysPresentazione di Gruppo Vectorys
Presentazione di Gruppo VectorysBechir Added
 
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt Aidis Stukas
 
MuĂąoz ximena camstudio
MuĂąoz ximena camstudioMuĂąoz ximena camstudio
MuĂąoz ximena camstudioXimena MuĂąoz
 
SesiĂłn informativa a
SesiĂłn informativa aSesiĂłn informativa a
SesiĂłn informativa ajacke04
 
PresentaciĂłn de escuela indigena
PresentaciĂłn de escuela indigenaPresentaciĂłn de escuela indigena
PresentaciĂłn de escuela indigenaJazmin Juarez
 
Clase tatiana adriana
Clase tatiana adrianaClase tatiana adriana
Clase tatiana adrianatatiana-vasquez
 
Clase tatiana adriana
Clase tatiana adrianaClase tatiana adriana
Clase tatiana adrianatatiana-vasquez
 

Andere mochten auch (20)

MediatizaciĂłn e interracciones sociales
MediatizaciĂłn e interracciones socialesMediatizaciĂłn e interracciones sociales
MediatizaciĂłn e interracciones sociales
 
Tipo de datos de oracle base datos
Tipo de datos de oracle base datosTipo de datos de oracle base datos
Tipo de datos de oracle base datos
 
Mensaje para las madres
Mensaje para las madresMensaje para las madres
Mensaje para las madres
 
Musica y baile en los afrocolombianos
Musica y baile en los afrocolombianosMusica y baile en los afrocolombianos
Musica y baile en los afrocolombianos
 
Clase tatiana adriana
Clase tatiana adrianaClase tatiana adriana
Clase tatiana adriana
 
Vaidas Morkevičius - Open Data for Enhancing the Quality of Research
Vaidas Morkevičius - Open Data for Enhancing the Quality of ResearchVaidas Morkevičius - Open Data for Enhancing the Quality of Research
Vaidas Morkevičius - Open Data for Enhancing the Quality of Research
 
Saulius GraĹžulis The Crystalography Open Database
Saulius GraĹžulis  The Crystalography Open DatabaseSaulius GraĹžulis  The Crystalography Open Database
Saulius GraĹžulis The Crystalography Open Database
 
ClasificaciĂłn de invertebrados
ClasificaciĂłn de invertebradosClasificaciĂłn de invertebrados
ClasificaciĂłn de invertebrados
 
Manual empresas eco turisticas
Manual empresas eco turisticasManual empresas eco turisticas
Manual empresas eco turisticas
 
Comunicacion interactiva christopher
Comunicacion interactiva christopherComunicacion interactiva christopher
Comunicacion interactiva christopher
 
Docker Mentor Week 2016 - Medan
Docker Mentor Week 2016 - MedanDocker Mentor Week 2016 - Medan
Docker Mentor Week 2016 - Medan
 
Presentazione di Gruppo Vectorys
Presentazione di Gruppo VectorysPresentazione di Gruppo Vectorys
Presentazione di Gruppo Vectorys
 
Group 1 fruits
Group 1 fruitsGroup 1 fruits
Group 1 fruits
 
Msc. hugo ortiz
Msc. hugo ortizMsc. hugo ortiz
Msc. hugo ortiz
 
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
Mantas Zimnickas - How Open is Lithuanian Government data? atviriduomenys.lt
 
MuĂąoz ximena camstudio
MuĂąoz ximena camstudioMuĂąoz ximena camstudio
MuĂąoz ximena camstudio
 
SesiĂłn informativa a
SesiĂłn informativa aSesiĂłn informativa a
SesiĂłn informativa a
 
PresentaciĂłn de escuela indigena
PresentaciĂłn de escuela indigenaPresentaciĂłn de escuela indigena
PresentaciĂłn de escuela indigena
 
Clase tatiana adriana
Clase tatiana adrianaClase tatiana adriana
Clase tatiana adriana
 
Clase tatiana adriana
Clase tatiana adrianaClase tatiana adriana
Clase tatiana adriana
 

Ähnlich wie A document-inspired way for tracking changes of RDF data - The case of the OpenCitations Corpus

FDO as building block for digitization technology stacks
FDO as building block for digitization technology stacksFDO as building block for digitization technology stacks
FDO as building block for digitization technology stacksRaul Palma
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxfordmatthewgamble
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Benoit Pauwels
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...ULB - Bibliothèques
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...University of Bologna
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformAndrea Bollini
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Next Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounNext Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounKaren S Calhoun
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)Stian Soiland-Reyes
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)Stian Soiland-Reyes
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKANChengjen Lee
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathonRaul Palma
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKANandrea huang
 
Author's workflow and the role of open access
Author's workflow and the role of open accessAuthor's workflow and the role of open access
Author's workflow and the role of open accessPaola Gargiulo
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...OpenAIRE
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integrationRaul Palma
 

Ähnlich wie A document-inspired way for tracking changes of RDF data - The case of the OpenCitations Corpus (20)

FDO as building block for digitization technology stacks
FDO as building block for digitization technology stacksFDO as building block for digitization technology stacks
FDO as building block for digitization technology stacks
 
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, OxfordIntroduction to Research Objects - Collaboartions Workshop 2015, Oxford
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...Exchange of usage metadata in a network of institutional repositories: the ...
Exchange of usage metadata in a network of institutional repositories: the ...
 
Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...Exchange of usage metadata in a network of institutional repositories: the ca...
Exchange of usage metadata in a network of institutional repositories: the ca...
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Next Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounNext Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 Calhoun
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathon
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
 
Author's workflow and the role of open access
Author's workflow and the role of open accessAuthor's workflow and the role of open access
Author's workflow and the role of open access
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
 

Mehr von University of Bologna

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsUniversity of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...University of Bologna
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachUniversity of Bologna
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLUniversity of Bologna
 

Mehr von University of Bologna (12)

A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 

KĂźrzlich hochgeladen

Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 

KĂźrzlich hochgeladen (20)

Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

A document-inspired way for tracking changes of RDF data - The case of the OpenCitations Corpus

  • 1. A document-inspired way for tracking changes of RDF data The case of the OpenCitations Corpus Silvio Peroni, David Shotton, Fabio Vitali 1st Drift-a-LOD Workshop: Detection, Representation and Management of Concept Drift in Linked Open Data
 Bologna, Italy, November 20, 2016 Paper: https://w3id.org/oc/paper/occ-driftalod2016.html
  • 2. The Venice analogy • Island = 
 scholarly publication • Bridge = citation • Current situation: – local travel to the next island is permitted – unrestricted travel over the entire network of bridges requires an expensive season ticket – general populace is excluded https://w3id.org/oc/paper/the-venice-analogy.html
  • 3. Opening the bridges • What – Citation data are one of the main tools used by researchers to gain knowledge about particular topics, and they also serve institutional goals, for example in research assessment • Problem – The most authoritative databases of citation data, Scopus and Web of Science, can only be accessed by paying significant annual access fees – The University of Bologna pays about 6,000,000 euros per year for accessing to digital bibliographic resources • Solution – To create a citation database that freely and legally makes available citation data in an open repository to assist scholars with their academic studies and serve knowledge to the wider public
  • 4. OpenCitations • The OpenCitations Project aims at creating an open repository of scholarly citation data – the OpenCitations Corpus (OCC) – made available under a Creative Commons public domain dedication to provide in RDF accurate citation information (bibliographic references) harvested from the scholarly literature – All scripts are released with Open Source ISC Licence and available on GitHub at http://github.com/essepuntato/opencitations • Currently processing papers available in the PubMedCentral Open Access subset • As of November 20, 2016 the OCC contains 2,076,645 citation links • Six distinct kinds of bibliographic entities – bibliographic resources (citing/cited articles, journals, books, proceedings, etc.) – resource embodiments (format information about bibliographic resources) – bibliographic entries (literal textual entries occurring in the reference lists) – responsible agents (agents having certain roles with respect to the bibliographic resources) – agent roles (author, editor, publisher); – identifiers (DOI, ORCID, PubMedID, URL, etc.) http://opencitations.net
  • 5. Ingestion workflow BEE EuropeanPubMedCentralProcessor Parsing the XML source of PubMed Central Open Access articles. 1 SPACIN Producing JSON with DOI and bib entries. {
 "doi": "10.1590/1414-431x20154655", 
 "localid": "MED-26577845", 
 "curator": "BEE EuropeanPubMedCentralProcessor", 
 "source": "http://www.ebi.ac.uk/europepmc/webservices/rest/PMC4678653/ fullTextXML", "source_provider": "Europe PubMed Central", 
 "pmid": "26577845", 
 "pmcid": “PMC4678653",
 "references": [
 {
 "bibentry": "Wenger, NK. Coronary heart disease: an older woman's major health risk, BMJ, 1997, 315, 1085, 1090, DOI: 10.1136/bmj.315.7115.1085, PMID: 9366743", 
 "pmid": "9366743", 
 "doi": "10.1136/bmj.315.7115.1085", 
 "pmcid": "PMC2127693", 
 "process_entry": "True"
 } … ] } 2 For each citing/cited resource, if an ID (DOI, PMID, PMCID) is specied check if the resource exists already. If it does go to 5. store ResourceFinder 3 GraphSet ProvSet DatasetHandler
 Storer Load all the statements onthe triplestore and storethem in the le system for easy recovering. OCC 6 If the resource doesn’t exist, extract possible IDs from the entry and query CrossRef and ORCID. CrossRefProcessor
 ORCIDProcessor 4 GraphEntity New metadata resources are created. If CrossRef/ORCID returned something, all the related metadata will be used, otherwise only basic metadata (IDs and entries) will be added. 5
  • 6. Issues with data • Automatic workflow built upon external services – Efficient, but no human check of the data extracted – Some errors could be propagated • Data do change in time – Information can be incomplete (e.g. citations added in another iteration of the ingestion workflow) – Information can be wrong (e.g. circular citations – paper A cites paper A) – Information can be ambiguous (e.g. author disambiguation)
  • 7. Document-inspired data drift • Inspiration from the Document Engineering domain – Well-known structure for keeping track of changes in word- processor documents, e.g. OpenOffice Writer ✦ New content added directly within the existing text and marked in some way ✦ Removed context moved out from the actual content of the document and placed in an auxiliary space for easy retrieving and restoration – Two basic operations (add & remove) are enough for keeping track of document changes • Solution for RDF data: using PROV-O + SPARQL UPDATE (INSERT DATA and DELETE DATA only) for keeping track of the way entities change in time
  • 8. The approach :sp a foaf:Person ; foaf:name "Silvio Peroni" . :sp a prov:Entity . :sp-snapshot-1 a prov:Entity ; prov:specializationOf :sp . Data Provenance :sp a foaf:Person ; foaf:givenName "Silvio" ; foaf:familyName "Peroni" . :sp-snapshot-2 a prov:Entity ; prov:specializationOf :sp ; prov:wasDerivedFrom :sp-snapshot-1 ; new:hasUpdateQuery " " . INSERT DATA { :sp foaf:givenName 'Silvio' ; foaf:familyName 'Peroni' } ; DELETE DATA { :sp foaf:name 'Silvio Peroni' } Time T1 T2 A snapshot records the composition of the entity it specialises (i.e. the set of statements using such entity as subject) at a fixed point in time
  • 9. Advantages • Easy to retrieve the current statements of an entity, since they are those currently available in the dataset • It is possible to restore the entity to a certain snapshot si by applying the inverse operations (i.e. deletions instead of insertions and vice versa) of all the update queries from the most recent snapshot sn to si+1 – For instance, to get back to the status recorded by the first snapshot of the previous example, we have to run all the inverse operations of the update query specified in the second snapshot:
 
 INSERT DATA { :sp foaf:name 'Silvio Peroni' } ;
 DELETE DATA { 
 :sp 
 foaf:givenName 'Silvio' ; 
 foaf:familyName 'Peroni' }
  • 10. Implementation in the OCC • We use: – PROV-O – PROV-DC, an extension of PROV-O mapping it with DC – OpenCitations Ontology (OCO), which defines oco:hasUpdateQuery • Each entity in the OCC tracks provenance information about: – snapshot of entity metadata (prov:Entity), a particular snapshot recording the metadata associated with an individual entity at a particular time – curatorial activity (prov:Activity), a curatorial activity relating to that entity ✦ creation (prov:Create), the activity of creating a new entity with statements ✦ modification (prov:Modify), the activity of adding/removing statements of an entity ✦ merging (prov:Replace), the activity of unifying the statements relating to two entites – provenance agent (prov:Agent), a person, organisation or process, that is involved in some way in the creation of an entity (e.g. Crossref) – curatorial role (prov:Association), a particular role held by a provenance agent with respect to a curatorial activity (e.g. OCC curator, metadata source)
  • 11. An example br:525205 a fabio:Expression , fabio:JournalArticle ; dcterms:title "The Electronic Patient ..." ; datacite:hasIdentifier id:816997 , id:816998 , ... ; fabio:hasPublicationYear "2016"^^xsd:gYear ; pro:isDocumentContextFor ar:1591190 , ar:1591191 , ... ; frbr:embodiment re:217773 ; frbr:part be:727446 , be:727447 , ... . se:1 a prov:Entity ; prov:generatedAtTime "2016-08-08T22:25:48"^^xsd:dateTime ; prov:hadPrimarySource <http://api.crossref.org/works/10.2196/mhealth.5331> ; prov:specializationOf br:525205 ; prov:wasGeneratedBy ca:1 . ca:1 a prov:Activity, prov:Create ; dcterms:description "The entity 'https://w3id.org/oc/corpus/br/525205' has been created." ; prov:qualifiedAssociation cr:1 , cr:2 . cr:1 a prov:Association ; prov:agent pa:1 ; prov:hadRole oco:occ-curator . pa:1 a prov:Agent ; foaf:name "SPACIN CrossrefProcessor" . … Data Provenance br:525205 cites:cites br:1095420 , br:1095421 , ... ; frbr:part be:727446 , be:727447 , ... . se:1 prov:invalidatedAtTime "2016-08-29T22:42:06"^^xsd:dateTime ; prov:wasInvalidatedBy ca:2 . se:2 a prov:Entity ; prov:generatedAtTime "2016-08-29T22:42:06"^^xsd:dateTime ; prov:hadPrimarySource <http://www.ebi.ac.uk/europepmc/ webservices/rest/PMC4911509/fullTextXML> ; prov:specializationOf br:525205 ; prov:wasDerivedFrom se:1 ; prov:wasGeneratedBy ca:2 ; oco:hasUpdateQuery "INSERT DATA { GRAPH <https://w3id.org/oc/corpus/br/> { br:525205 cito:cites br:1095459 , br:525205 , ... ; frbr:part be:727491 , be:727452 , ... } }" . … Time T1 T2
  • 12. Conclusions • Approach for keeping track of changes in RDF data inspired by existing implementations in the Document Engineering domain – PROV-O – SPARQL UPDATE (INSERT/DELETE DATA) • Implementation: provenance data in the OpenCitations Corpus – PROV-DC – OpenCitations Ontology • Future works – automatic tools for snapshot restoration – handling non-automatic modification to the OpenCitations Corpus (e.g. curation by humans)
  • 13. Silvio Peroni silvio.peroni@unibo.it - @essepuntato Digital And Semantic Publishing Laboratory Department of Computer Science and Engineering Alma Mater Studiorum – UniversitĂ  di Bologna Bologna, Italy dasplab.cs.unibo.it – www.unibo.it