SlideShare a Scribd company logo
1 of 36
Download to read offline
IR repository data in
          AuthorClaim
              Wolfram Horstmann1
              Thomas Krichel2,3,4
1
    Bielefeld University 2 Long Island University 3 Novosibirsk
            State University 4 Open Library Society

      Repository Fringe 2011–08–03
thanks

to the organizers
to the BASE contributors
 Bernd Fehling, Marek Imialek, Mathias
 Loesch, Renata Mitrenga, Dirk Pieper,
 Jochen Schirrwagen, Friedrich Summann,
 Sebastian Wolf
structure

background
3lib
Author claiming and AuthorClaim
BASE and
AuthorClaim and BASE
background

Wolfram is chief information officer
(scholarly Information) at Bielefeld
University.
They run the BASE search engine
since 2004
They are doing long-run work.
background


Thomas the founder of RePEc.
Thomas started this in the early 90s.
Thomas is doing long-run work.
motivation

Make (economics) papers freely
available.
Make information about the papers
freely available.
Have a self-sustaining infrastructure of
this, don’t rely on external sources.
RePEc

RePEc is misunderstood as a
repository.
In fact it is a collection of 1300+
institutional (subject) repositories.
 pre-date OAI
 reduced business model
 more tightly interoperable
RePEc sources of success

There are a lot of sources of success.
The reason can be classified
 business case
 technical matter
both are linked
RePEc business case

RePEc tries to decentralize as much as
we can.
RePEc run essentially on volunteer
power.
RePEc encourage reuse of RePEc
data.
RePEc technical case

RePEc registers authors with the
RePEc Author Service (RAS).
RePEc registers institutions.
RePEc provides evaluative data for
authors and institutions.
RePEc and IRs

RePEc is not a repository.
RePEc is a bibliographic layer over
repositories.
IRs can/will benefit from a similar
bibliographic layer.
requirement for such a layer


Not dependent on external funding.
Freely reusable instantaneously.
Must be there for the long-run.
a RePEc for all disciplines


RePEc bibliographic data → 3lib
RePEc Author Service → AuthorClaim
EDIRC → ARIW
3lib
 3lib is an initial attempt at building an
 aggregate of freely available
 bibliographic data.
 It’s a project by OLS sponsored by
 OKFN.
 About 35 million records from the
 usual suspects: PubMed, OpenLibrary,
 DBLP, RePEc.
3lib elements

The data elements in 3lib are very
simple
 title
 author name expressions
 link to item page on provider site
 identifier
3lib is meant to serve AuthorClaim.
AuthorClaim

AuthorClaim is an authorship claiming
service for 3lib data.
It lives at http://authorclaim.org.
It uses the same software as the
RePEc Author Service, called ACIS.
It is running since early 2008.
author claiming history I

Thomas started the first author
claiming system, the RePEc author
service in 1999.
The system was written by Markus
J.R. Klink.
author claiming history II
ISI created researcherID in 2006 (?)
arXiv have an author claiming system
since 2009.
NIH and Google Scholar are working
on it.
The ORCID initiative is looking into
author identification since 2009.
claiming vs identification
Author claiming records are NOT
author identification records.
The difference is called “Klink’s
problem”.
An person can claim to be an author
of a paper. If there are several author,
we don’t know what author (s)he is.
Klink’s problem example


Jane and John Smith write a paper.
Author list say “J. Smith and J.
Smith”
AuthorClaim data


ftp://ftp.authorclaim.org
CC0
more than 100 profiles, growing slowly.
an example
id: pbi1
name variations: Geoffrey Bilder — G.
Bilder — Bilder, G.
isauthorof: info:lib/elis:856
hasnoconnectionto:
info:lib/pubmed:11127885 —
info:lib/pubmed:7482633
more on the example
The refused papers are there for
services to build learning models for
author names. Actually learning is an
integral part of the way AuthorClaim
works.
Actually records also contain the 3lib
data for papers.
and they have ARIW-base affiliation
data.
IRs and author identification

IRs are generally too large to author
identification by IR staff.
Only registration of contributors is
usually required.
IRs and author claiming


IRs are too small to make it
meaningful for authors to claim papers
in them directly.
benefits of author claiming to
IR

All papers by an author can be put
together.
The task can be completely
automated once an AuthorClaim
record claims a paper in the IR.
to get it done

From the four elements in a 3lib
record only the link to a web page
describing the item is problematic.
But is cumbersome to customize to
close to 2k IRs.
partnership with BASE

We need a centralized collection.
BASE is already doing this job.
BASE can deliver all data to
AuthorClaim regularly.
BASE aggregation services

constant monitoring of OAI-PMH
world
configuration of harvesting for each
new and erroneous repository
metadata stores (raw and normalized)
BASE normalization

highly heterogeneous use of OAI-DC
requires cleaning and enrichment, e.g.
 dc:type
 dc:date
 dc:language
also enrichment with (missing) subject
classifications
BASE search services

builds index (solr)
end user interfaces (vufind)
iPhone-App
API for index usage by third parties
(http or SOAP)
BASE data services

repository profile service (REST)
raw metadata store access (http or
SOAP)
rsync for AuthorClaim
BASE data in AuthorClaim
selection of records that have required
data
 author
 title
 link
 identifier
incremental updates
repository exclusion
From BASE profile AuthorClaim
discards some IRs that contain
 student work
 digitized old material
 link collections
 primary research data
There are some minor manual
exclusions.
results so far
 1930 repositories, 12740116 records.
 534 records claimed.
 The documentation at
 http://wotan.liu.edu/base/ needs
 some debugging.
 The collection is not yet announced
 because it is being read.
the end

Contact
 whorstma@uni-bielefeld.de
 krichel@openlib.org
for more information.

More Related Content

What's hot

LAB4 Executing HIVEQL in Beeline and HUE.docx
LAB4 Executing HIVEQL in Beeline and HUE.docxLAB4 Executing HIVEQL in Beeline and HUE.docx
LAB4 Executing HIVEQL in Beeline and HUE.docxKarim Fathallah
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector BuilderMark Wilkinson
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables祺傑 林
 
SPARQL Query Forms
SPARQL Query FormsSPARQL Query Forms
SPARQL Query FormsLeigh Dodds
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Gilles Fedak
 
Connect Your Resources, Save Time, Save Money:: Connecting library electron...
Connect Your Resources, Save Time, Save Money:: Connecting library  electron...Connect Your Resources, Save Time, Save Money:: Connecting library  electron...
Connect Your Resources, Save Time, Save Money:: Connecting library electron...Richard Bernier
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemTerry Reese
 
Webtracks at JISC Managing Research Data Meeting
Webtracks at JISC Managing Research Data MeetingWebtracks at JISC Managing Research Data Meeting
Webtracks at JISC Managing Research Data MeetingCameron Neylon
 
Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015Martin Magdinier
 
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...Edward Blurock
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileAlasdair Gray
 
Semantic web application architecture
Semantic web   application architectureSemantic web   application architecture
Semantic web application architectureDon Willems
 
Open kent uploading data v1.0
Open kent   uploading data v1.0Open kent   uploading data v1.0
Open kent uploading data v1.0Noel Hatch
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...Edureka!
 
Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-finalAmin Chowdhury
 
Rapid Digitization of Latin American Ephemera with Hydra
Rapid Digitization of Latin American Ephemera with HydraRapid Digitization of Latin American Ephemera with Hydra
Rapid Digitization of Latin American Ephemera with HydraJon Stroop
 

What's hot (20)

LAB4 Executing HIVEQL in Beeline and HUE.docx
LAB4 Executing HIVEQL in Beeline and HUE.docxLAB4 Executing HIVEQL in Beeline and HUE.docx
LAB4 Executing HIVEQL in Beeline and HUE.docx
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables
 
SPARQL Query Forms
SPARQL Query FormsSPARQL Query Forms
SPARQL Query Forms
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 
Connect Your Resources, Save Time, Save Money:: Connecting library electron...
Connect Your Resources, Save Time, Save Money:: Connecting library  electron...Connect Your Resources, Save Time, Save Money:: Connecting library  electron...
Connect Your Resources, Save Time, Save Money:: Connecting library electron...
 
Fitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystemFitting MarcEdit into the library software ecosystem
Fitting MarcEdit into the library software ecosystem
 
Webtracks at JISC Managing Research Data Meeting
Webtracks at JISC Managing Research Data MeetingWebtracks at JISC Managing Research Data Meeting
Webtracks at JISC Managing Research Data Meeting
 
Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015Toronto OpenRefine MeetUp Nov 2015
Toronto OpenRefine MeetUp Nov 2015
 
Constructing SQL queries for AtoM
Constructing SQL queries for AtoMConstructing SQL queries for AtoM
Constructing SQL queries for AtoM
 
linklisr
linklisrlinklisr
linklisr
 
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
Semantic web application architecture
Semantic web   application architectureSemantic web   application architecture
Semantic web application architecture
 
Open kent uploading data v1.0
Open kent   uploading data v1.0Open kent   uploading data v1.0
Open kent uploading data v1.0
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
 
Database Research
Database ResearchDatabase Research
Database Research
 
Tlad 2015 presentation amin+charles-final
Tlad 2015 presentation   amin+charles-finalTlad 2015 presentation   amin+charles-final
Tlad 2015 presentation amin+charles-final
 
Rapid Digitization of Latin American Ephemera with Hydra
Rapid Digitization of Latin American Ephemera with HydraRapid Digitization of Latin American Ephemera with Hydra
Rapid Digitization of Latin American Ephemera with Hydra
 

Viewers also liked

Jodie Double (Univ. Leeds) – RePosit
Jodie Double (Univ. Leeds) – RePositJodie Double (Univ. Leeds) – RePosit
Jodie Double (Univ. Leeds) – RePositRepository Fringe
 
Niamh Brennan (Trinity College Dublin) – CERIFy
Niamh Brennan (Trinity College Dublin) – CERIFyNiamh Brennan (Trinity College Dublin) – CERIFy
Niamh Brennan (Trinity College Dublin) – CERIFyRepository Fringe
 
Chris Awre (Univ of Hull) – implement the Hydrangea software
Chris Awre (Univ of Hull) – implement the Hydrangea softwareChris Awre (Univ of Hull) – implement the Hydrangea software
Chris Awre (Univ of Hull) – implement the Hydrangea softwareRepository Fringe
 
Closing Keynote: Prof. Gary Hall (Coventry University)
Closing Keynote: Prof. Gary Hall (Coventry University) Closing Keynote: Prof. Gary Hall (Coventry University)
Closing Keynote: Prof. Gary Hall (Coventry University) Repository Fringe
 
Dan Needham & Phil Cross (mimas) – Names Project
Dan Needham & Phil Cross (mimas) – Names ProjectDan Needham & Phil Cross (mimas) – Names Project
Dan Needham & Phil Cross (mimas) – Names ProjectRepository Fringe
 
Ben Ryan (University of Leeds) – Timescapes Project
Ben Ryan (University of Leeds) – Timescapes ProjectBen Ryan (University of Leeds) – Timescapes Project
Ben Ryan (University of Leeds) – Timescapes ProjectRepository Fringe
 
Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...
Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...
Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...Repository Fringe
 
Stephanie Taylor (UKOLN) – Metadata Forum
Stephanie Taylor (UKOLN) – Metadata ForumStephanie Taylor (UKOLN) – Metadata Forum
Stephanie Taylor (UKOLN) – Metadata ForumRepository Fringe
 
Nicola Osborne (EDINA) – Social media and repositories
Nicola Osborne (EDINA) – Social media and repositoriesNicola Osborne (EDINA) – Social media and repositories
Nicola Osborne (EDINA) – Social media and repositoriesRepository Fringe
 
Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...
Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...
Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...Repository Fringe
 

Viewers also liked (10)

Jodie Double (Univ. Leeds) – RePosit
Jodie Double (Univ. Leeds) – RePositJodie Double (Univ. Leeds) – RePosit
Jodie Double (Univ. Leeds) – RePosit
 
Niamh Brennan (Trinity College Dublin) – CERIFy
Niamh Brennan (Trinity College Dublin) – CERIFyNiamh Brennan (Trinity College Dublin) – CERIFy
Niamh Brennan (Trinity College Dublin) – CERIFy
 
Chris Awre (Univ of Hull) – implement the Hydrangea software
Chris Awre (Univ of Hull) – implement the Hydrangea softwareChris Awre (Univ of Hull) – implement the Hydrangea software
Chris Awre (Univ of Hull) – implement the Hydrangea software
 
Closing Keynote: Prof. Gary Hall (Coventry University)
Closing Keynote: Prof. Gary Hall (Coventry University) Closing Keynote: Prof. Gary Hall (Coventry University)
Closing Keynote: Prof. Gary Hall (Coventry University)
 
Dan Needham & Phil Cross (mimas) – Names Project
Dan Needham & Phil Cross (mimas) – Names ProjectDan Needham & Phil Cross (mimas) – Names Project
Dan Needham & Phil Cross (mimas) – Names Project
 
Ben Ryan (University of Leeds) – Timescapes Project
Ben Ryan (University of Leeds) – Timescapes ProjectBen Ryan (University of Leeds) – Timescapes Project
Ben Ryan (University of Leeds) – Timescapes Project
 
Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...
Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...
Alan Cope (De Montfort University) – EXPLORER (create workflows and processes...
 
Stephanie Taylor (UKOLN) – Metadata Forum
Stephanie Taylor (UKOLN) – Metadata ForumStephanie Taylor (UKOLN) – Metadata Forum
Stephanie Taylor (UKOLN) – Metadata Forum
 
Nicola Osborne (EDINA) – Social media and repositories
Nicola Osborne (EDINA) – Social media and repositoriesNicola Osborne (EDINA) – Social media and repositories
Nicola Osborne (EDINA) – Social media and repositories
 
Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...
Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...
Xiaohong Gao (Middlesex University) – MIRAGE 2011 (developing an embedding vi...
 

Similar to Thomas Krichel (Long Island University) – AuthorClaim

A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processinglucianb
 
Open Bibliographic Data and Author Claiming
Open Bibliographic Data and Author ClaimingOpen Bibliographic Data and Author Claiming
Open Bibliographic Data and Author ClaimingJames Griffin
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarshipbenosteen
 
Encoding Patron Information in RDF
Encoding Patron Information in RDFEncoding Patron Information in RDF
Encoding Patron Information in RDFJakob .
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashupsgiurca
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community CallOpenAIRE
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015Cason Snow
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015Cason Snow
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using SolrStefano Bargioni
 
Fluidinfo: Publishing in an Openly Writeable World
Fluidinfo: Publishing in an Openly Writeable WorldFluidinfo: Publishing in an Openly Writeable World
Fluidinfo: Publishing in an Openly Writeable WorldFluidinfo
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible LibraryBeyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible LibraryKsenija Mincic Obradovic
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516charper
 
W3C Linked Data Platform Overview
W3C Linked Data Platform OverviewW3C Linked Data Platform Overview
W3C Linked Data Platform OverviewSteve Speicher
 
Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)ROHIT SAHU
 
Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020François Belleau
 

Similar to Thomas Krichel (Long Island University) – AuthorClaim (20)

A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
Open Bibliographic Data and Author Claiming
Open Bibliographic Data and Author ClaimingOpen Bibliographic Data and Author Claiming
Open Bibliographic Data and Author Claiming
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 
Encoding Patron Information in RDF
Encoding Patron Information in RDFEncoding Patron Information in RDF
Encoding Patron Information in RDF
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
Linked data MLA 2015
Linked data MLA 2015Linked data MLA 2015
Linked data MLA 2015
 
Linked Data MLA 2015
Linked Data MLA 2015Linked Data MLA 2015
Linked Data MLA 2015
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
Fluidinfo: Publishing in an Openly Writeable World
Fluidinfo: Publishing in an Openly Writeable WorldFluidinfo: Publishing in an Openly Writeable World
Fluidinfo: Publishing in an Openly Writeable World
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible LibraryBeyond the catalogue : BibFrame, Linked Data and Ending the 	Invisible Library
Beyond the catalogue : BibFrame, Linked Data and Ending the Invisible Library
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516
 
W3C Linked Data Platform Overview
W3C Linked Data Platform OverviewW3C Linked Data Platform Overview
W3C Linked Data Platform Overview
 
Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)
 
Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020
 
Linked Data
Linked DataLinked Data
Linked Data
 

More from Repository Fringe

Unlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonUnlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonRepository Fringe
 
Integration - the heart of researcher centric research data management system...
Integration - the heart of researcher centric research data management system...Integration - the heart of researcher centric research data management system...
Integration - the heart of researcher centric research data management system...Repository Fringe
 
Open Access workshop at Repository Fringe 2015 - Valerie McCutcheon
Open Access workshop at Repository Fringe 2015 - Valerie McCutcheonOpen Access workshop at Repository Fringe 2015 - Valerie McCutcheon
Open Access workshop at Repository Fringe 2015 - Valerie McCutcheonRepository Fringe
 
Repositories for OA, RDM and Beyond - Rory McNicholl
Repositories for OA, RDM and Beyond - Rory McNichollRepositories for OA, RDM and Beyond - Rory McNicholl
Repositories for OA, RDM and Beyond - Rory McNichollRepository Fringe
 
RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015Repository Fringe
 
Repository Fringe 2015 - Jisc RDM Session, Linda Naughton, Jisc
Repository Fringe 2015 - Jisc RDM Session, Linda Naughton, JiscRepository Fringe 2015 - Jisc RDM Session, Linda Naughton, Jisc
Repository Fringe 2015 - Jisc RDM Session, Linda Naughton, JiscRepository Fringe
 
Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...Repository Fringe
 
Jisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela DucaJisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela DucaRepository Fringe
 
IRUS-UK at Repository Fringe 2015 - Jo Alcock
IRUS-UK at Repository Fringe 2015 - Jo AlcockIRUS-UK at Repository Fringe 2015 - Jo Alcock
IRUS-UK at Repository Fringe 2015 - Jo AlcockRepository Fringe
 
Impact and EPrints - Rosie-Marie Barbeau and Mick Eadie
Impact and EPrints - Rosie-Marie Barbeau and Mick EadieImpact and EPrints - Rosie-Marie Barbeau and Mick Eadie
Impact and EPrints - Rosie-Marie Barbeau and Mick EadieRepository Fringe
 
Open Data and Sharing Science - Graham Steel, Contentmine
Open Data and Sharing Science - Graham Steel, ContentmineOpen Data and Sharing Science - Graham Steel, Contentmine
Open Data and Sharing Science - Graham Steel, ContentmineRepository Fringe
 
SHERPA Services breakout session - Bill Hubbard
SHERPA Services breakout session - Bill HubbardSHERPA Services breakout session - Bill Hubbard
SHERPA Services breakout session - Bill HubbardRepository Fringe
 
REF compliance - what Jisc is doing
REF compliance - what Jisc is doingREF compliance - what Jisc is doing
REF compliance - what Jisc is doingRepository Fringe
 
Linking Software: citations, roles, references and more
Linking Software: citations, roles, references and moreLinking Software: citations, roles, references and more
Linking Software: citations, roles, references and moreRepository Fringe
 
Linking Research Outputs - Rachel Kotarski
Linking Research Outputs - Rachel KotarskiLinking Research Outputs - Rachel Kotarski
Linking Research Outputs - Rachel KotarskiRepository Fringe
 
HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...
HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...
HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...Repository Fringe
 
Latest developments in Hydra-land - Chris Awre, University of Hull
Latest developments in Hydra-land - Chris Awre, University of HullLatest developments in Hydra-land - Chris Awre, University of Hull
Latest developments in Hydra-land - Chris Awre, University of HullRepository Fringe
 
ArchivesSpace - Scott Renton, University of Edinburgh
ArchivesSpace - Scott Renton, University of EdinburghArchivesSpace - Scott Renton, University of Edinburgh
ArchivesSpace - Scott Renton, University of EdinburghRepository Fringe
 

More from Repository Fringe (20)

Unlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East LondonUnlocking Thesis Data - Stephen Grace, University of East London
Unlocking Thesis Data - Stephen Grace, University of East London
 
Integration - the heart of researcher centric research data management system...
Integration - the heart of researcher centric research data management system...Integration - the heart of researcher centric research data management system...
Integration - the heart of researcher centric research data management system...
 
Open Access workshop at Repository Fringe 2015 - Valerie McCutcheon
Open Access workshop at Repository Fringe 2015 - Valerie McCutcheonOpen Access workshop at Repository Fringe 2015 - Valerie McCutcheon
Open Access workshop at Repository Fringe 2015 - Valerie McCutcheon
 
Repositories for OA, RDM and Beyond - Rory McNicholl
Repositories for OA, RDM and Beyond - Rory McNichollRepositories for OA, RDM and Beyond - Rory McNicholl
Repositories for OA, RDM and Beyond - Rory McNicholl
 
RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015
 
Repository Fringe 2015 - Jisc RDM Session, Linda Naughton, Jisc
Repository Fringe 2015 - Jisc RDM Session, Linda Naughton, JiscRepository Fringe 2015 - Jisc RDM Session, Linda Naughton, Jisc
Repository Fringe 2015 - Jisc RDM Session, Linda Naughton, Jisc
 
Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...
 
Jisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela DucaJisc on repositories unleashing data - Daniela Duca
Jisc on repositories unleashing data - Daniela Duca
 
IRUS-UK at Repository Fringe 2015 - Jo Alcock
IRUS-UK at Repository Fringe 2015 - Jo AlcockIRUS-UK at Repository Fringe 2015 - Jo Alcock
IRUS-UK at Repository Fringe 2015 - Jo Alcock
 
Impact and EPrints - Rosie-Marie Barbeau and Mick Eadie
Impact and EPrints - Rosie-Marie Barbeau and Mick EadieImpact and EPrints - Rosie-Marie Barbeau and Mick Eadie
Impact and EPrints - Rosie-Marie Barbeau and Mick Eadie
 
Open Data and Sharing Science - Graham Steel, Contentmine
Open Data and Sharing Science - Graham Steel, ContentmineOpen Data and Sharing Science - Graham Steel, Contentmine
Open Data and Sharing Science - Graham Steel, Contentmine
 
SHERPA Services breakout session - Bill Hubbard
SHERPA Services breakout session - Bill HubbardSHERPA Services breakout session - Bill Hubbard
SHERPA Services breakout session - Bill Hubbard
 
REF compliance - what Jisc is doing
REF compliance - what Jisc is doingREF compliance - what Jisc is doing
REF compliance - what Jisc is doing
 
RCUK - what Jisc is doing
RCUK - what Jisc is doingRCUK - what Jisc is doing
RCUK - what Jisc is doing
 
Linking Software: citations, roles, references and more
Linking Software: citations, roles, references and moreLinking Software: citations, roles, references and more
Linking Software: citations, roles, references and more
 
Jisc Publications Router
Jisc Publications RouterJisc Publications Router
Jisc Publications Router
 
Linking Research Outputs - Rachel Kotarski
Linking Research Outputs - Rachel KotarskiLinking Research Outputs - Rachel Kotarski
Linking Research Outputs - Rachel Kotarski
 
HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...
HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...
HHuLO Access – Hull, Huddersfield and Lincoln explore open access good practi...
 
Latest developments in Hydra-land - Chris Awre, University of Hull
Latest developments in Hydra-land - Chris Awre, University of HullLatest developments in Hydra-land - Chris Awre, University of Hull
Latest developments in Hydra-land - Chris Awre, University of Hull
 
ArchivesSpace - Scott Renton, University of Edinburgh
ArchivesSpace - Scott Renton, University of EdinburghArchivesSpace - Scott Renton, University of Edinburgh
ArchivesSpace - Scott Renton, University of Edinburgh
 

Thomas Krichel (Long Island University) – AuthorClaim

  • 1. IR repository data in AuthorClaim Wolfram Horstmann1 Thomas Krichel2,3,4 1 Bielefeld University 2 Long Island University 3 Novosibirsk State University 4 Open Library Society Repository Fringe 2011–08–03
  • 2. thanks to the organizers to the BASE contributors Bernd Fehling, Marek Imialek, Mathias Loesch, Renata Mitrenga, Dirk Pieper, Jochen Schirrwagen, Friedrich Summann, Sebastian Wolf
  • 3. structure background 3lib Author claiming and AuthorClaim BASE and AuthorClaim and BASE
  • 4. background Wolfram is chief information officer (scholarly Information) at Bielefeld University. They run the BASE search engine since 2004 They are doing long-run work.
  • 5. background Thomas the founder of RePEc. Thomas started this in the early 90s. Thomas is doing long-run work.
  • 6. motivation Make (economics) papers freely available. Make information about the papers freely available. Have a self-sustaining infrastructure of this, don’t rely on external sources.
  • 7. RePEc RePEc is misunderstood as a repository. In fact it is a collection of 1300+ institutional (subject) repositories. pre-date OAI reduced business model more tightly interoperable
  • 8. RePEc sources of success There are a lot of sources of success. The reason can be classified business case technical matter both are linked
  • 9. RePEc business case RePEc tries to decentralize as much as we can. RePEc run essentially on volunteer power. RePEc encourage reuse of RePEc data.
  • 10. RePEc technical case RePEc registers authors with the RePEc Author Service (RAS). RePEc registers institutions. RePEc provides evaluative data for authors and institutions.
  • 11. RePEc and IRs RePEc is not a repository. RePEc is a bibliographic layer over repositories. IRs can/will benefit from a similar bibliographic layer.
  • 12. requirement for such a layer Not dependent on external funding. Freely reusable instantaneously. Must be there for the long-run.
  • 13. a RePEc for all disciplines RePEc bibliographic data → 3lib RePEc Author Service → AuthorClaim EDIRC → ARIW
  • 14. 3lib 3lib is an initial attempt at building an aggregate of freely available bibliographic data. It’s a project by OLS sponsored by OKFN. About 35 million records from the usual suspects: PubMed, OpenLibrary, DBLP, RePEc.
  • 15. 3lib elements The data elements in 3lib are very simple title author name expressions link to item page on provider site identifier 3lib is meant to serve AuthorClaim.
  • 16. AuthorClaim AuthorClaim is an authorship claiming service for 3lib data. It lives at http://authorclaim.org. It uses the same software as the RePEc Author Service, called ACIS. It is running since early 2008.
  • 17. author claiming history I Thomas started the first author claiming system, the RePEc author service in 1999. The system was written by Markus J.R. Klink.
  • 18. author claiming history II ISI created researcherID in 2006 (?) arXiv have an author claiming system since 2009. NIH and Google Scholar are working on it. The ORCID initiative is looking into author identification since 2009.
  • 19. claiming vs identification Author claiming records are NOT author identification records. The difference is called “Klink’s problem”. An person can claim to be an author of a paper. If there are several author, we don’t know what author (s)he is.
  • 20. Klink’s problem example Jane and John Smith write a paper. Author list say “J. Smith and J. Smith”
  • 22. an example id: pbi1 name variations: Geoffrey Bilder — G. Bilder — Bilder, G. isauthorof: info:lib/elis:856 hasnoconnectionto: info:lib/pubmed:11127885 — info:lib/pubmed:7482633
  • 23. more on the example The refused papers are there for services to build learning models for author names. Actually learning is an integral part of the way AuthorClaim works. Actually records also contain the 3lib data for papers. and they have ARIW-base affiliation data.
  • 24. IRs and author identification IRs are generally too large to author identification by IR staff. Only registration of contributors is usually required.
  • 25. IRs and author claiming IRs are too small to make it meaningful for authors to claim papers in them directly.
  • 26. benefits of author claiming to IR All papers by an author can be put together. The task can be completely automated once an AuthorClaim record claims a paper in the IR.
  • 27. to get it done From the four elements in a 3lib record only the link to a web page describing the item is problematic. But is cumbersome to customize to close to 2k IRs.
  • 28. partnership with BASE We need a centralized collection. BASE is already doing this job. BASE can deliver all data to AuthorClaim regularly.
  • 29. BASE aggregation services constant monitoring of OAI-PMH world configuration of harvesting for each new and erroneous repository metadata stores (raw and normalized)
  • 30. BASE normalization highly heterogeneous use of OAI-DC requires cleaning and enrichment, e.g. dc:type dc:date dc:language also enrichment with (missing) subject classifications
  • 31. BASE search services builds index (solr) end user interfaces (vufind) iPhone-App API for index usage by third parties (http or SOAP)
  • 32. BASE data services repository profile service (REST) raw metadata store access (http or SOAP) rsync for AuthorClaim
  • 33. BASE data in AuthorClaim selection of records that have required data author title link identifier incremental updates
  • 34. repository exclusion From BASE profile AuthorClaim discards some IRs that contain student work digitized old material link collections primary research data There are some minor manual exclusions.
  • 35. results so far 1930 repositories, 12740116 records. 534 records claimed. The documentation at http://wotan.liu.edu/base/ needs some debugging. The collection is not yet announced because it is being read.
  • 36. the end Contact whorstma@uni-bielefeld.de krichel@openlib.org for more information.