SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Open Bibliography,
And why it shouldn't have to exist.
Ben O'Steen
“Mashspa” Mashed Libraries, Bath 29/10/2010
CC-By
Morning,
(don't worry, I'll be quick...)
Urgh, “Open” - what
does that mean?
Publishing bibliographic
information under a
permissive license to
encourage indexing, re-
use, and re-purposing.
But.... why?
In essence, an open
bibliography is all about
Advertising
Bibliographic info allows you to
● Identify and find an item you know you want
Bibliographic info allows you to
● Identify and find an item you know you want,
● Discover related items or items you believe you
want
Bibliographic info allows you to
● Identify and find an item you know you want,
● Discover related items or items you believe you
want
● Serendipitously discover items you would like
without knowing they might exist
● And so on.
Bibliographic info allows you to
● Identify and find an item you know you want,
● Discover related items or items you believe you
want
● Serendipitously discover items you would like
without knowing they might exist
● And so on.
Requires
Increasing
Investment!
Advertising 'proverb'
You never spend money on
advertising;
you invest with an expectation of
return on investment
To maximise returns, you
maximise the audience.
Should the advertising target
'b2b' or 'consumers'?
One thing I am not saying
must be necessary...
But, by not making
bibliographic data open, you
limit the audience.
(You also limit the data quality, but more on that
later.)
“Can't I just scrap sites and
reuse it anyway? It's just facts
after all...”
“Directive 96/9/EC of the European Parliament and
of the Council of 11 March 1996 on the legal
protection of databases”
http://is.gd/gqkqb
Databases have in the past been defended using
Copyright laws.
This new law codifies a new protection based on
“sui generis”* rights, rights earned by the “sweat
of the brow”
* http://en.wikipedia.org/wiki/Sui_generis
So far, noone seems to have any
evidence that this encouraged
database-based economies.
There is evidence that it 'awarded'
unending monopolies on existing
datasets.
Due to fluffy wording, it is a
timebomb
It is a right, like copyright, that
doesn't need to be defended
and can be assumed for almost
any aggregation.
When we asked UK PubMedCentral if we could
reproduce the bibliographic data they share through
their OAI-PMH service.
They said “Generally, No”*
(*me paraphrasing that they had non-transferable
licenses and contracts yada yada. Their 'OA subset' of
1876 journals is available however, mainly BMC.)
From OAI-PMH specification:
* Data Providers administer systems that support
the OAI-PMH as a means of exposing metadata; and
* Service Providers use metadata harvested via the
OAI-PMH as a basis for building value-added
services.
http://www.openarchives.org/OAI/openarchivesproto
col.html
“… Service Providers use metadata
harvested via the OAI-PMH as a basis
for building value-added
services.”
And the survey said...
X
Open Bibliographic principles
http://openbiblio.net/2010/10/15/principles-for-
open-bibliographic-data/
1 -When publishing data make
an explicit and robust license
statement.
2 -Use a recognized waiver or
license that is appropriate for
metadata.
3 - If you want your data to be
effectively used and added to
by others it should be open …
– in particular non-commercial
and other restrictive clauses
should not be used.
4 - We strongly recommend
explicitly placing bibliographic
data in the Public Domain via
PDDL or CC0.
5 – We strongly urge creators
of bibliographic metadata
explicitly either dedicate this to
the public domain or use an
open licence.
Identify
Title, Date, Any identifiers,
Publisher, Container (eg
Journal), Author names etc
Discover Keywords, Abstract, Author
Identifiers, etc
Serendipity Citations, citing text, Usage
data, supplemental data, etc.
Bibliographic Sliding Scale
Identify
Discover
Serendipity
Increasing
Investment
BUT
Increased
Chance of
usage
Bibliographic Sliding Scale
“So, we just pick a standard and
publish and we'll reap all the
benefits, right?”
Erm, no.
For three main reasons.
#1 “Where there is human
input, there is
interpretation”
Meanings of words and
usage of fields have changed
over time.
#1 (cont.) Interchange
standards don't make the
information any more
understandable.
Someone interprets them.
#2 Data has been entered
and curated without large-
scale sharing as a focus.
Lots of implicit, contextual
info left out.
#3 Data quality is typically
poor with formally closed
datasets.
For #1 - Collisions caused by
interpretation can really only be
solved by sharing data and seeing
how bad things are.
Standards and interoperability:
“The first follower transforms a
lone nut into a leader” -
Derek Sivers' TED Talk
http://www.ted.com/talks/lang/eng/derek_sivers_how_to_start_a_move
ment.html
Video:
http://www.youtube.com/watch?v=GA8z7f7a2Pk
The man dancing is joined by one or two, but he is
still doing his own thing.
Eventually a group decides to join him, and the
group grows.
The quality of the dance isn't important, but the
community dancing along with it is.
And so it is with standards.
For #2 (implicit info), provenance
and the source of data gives us
crucial clues.
Due to #1, I remain unconvinced
that this information can ever be
totally machine-readable.
And for #3, misleading or incorrect
data...
… um.
No easy answers – we just don't have
the info.
The data clean-up process is going to be
probabalistic.
(We cannot be sure – by definition - that we are
'accurate' when we de-duplicate or disambiguate.)
Typical methods then:
Natural Language Processing,
Machine learning techniques
and
String Metrics and old skool record deduplication
I <3 String Metrics and old
skool record deduplication
(out of the 3)
http://staffwww.dcs.shef.ac.uk/people/S.Chapman/stri
ngmetrics.html
http://is.gd/gqOjQ
Old skool record linkage:
“Felligi-Sunter” - probabilistic record
linkage (PRL).
It's not a great model, but it's achievable.
Machine-learning requires a reasonably large golden
set.
(http://en.wikipedia.org/wiki/Record_linkage)
PRL is not great in itself, BUT
It does lend itself to Map-Reduce style operations
And
It's a great way to filter down to those records that
really do need to be compared by eye.
http://datamining.anu.edu.au/projects/linkage.html
“Record or data linkage techniques are used to link
together records which relate to the same entity (e.g.
patient, customer, household) in one or more data
sets where a unique identifier for each entity is not
available in all or any of the data sets to be linked.”
ANU's Febrl python code
So far, much effort has been directed at the Works;
We need to put much more effort into their
Networks.
Bibliographic directions
Networks?
Networks?
● A cites B
Networks?
● A cites B
● Works by a given (identified) Author
● Works cited by a given Author
● Works citing articles that have since been disproved,
redacted or withdrawn.
● Co-authors
● And many more connections we've not even
considered yet ('betweenness', 'centrality', etc)
In Summary,
● Accessible Bibliography as Advertising.
● Bibliography authors choose how they wish to invest to gain usage
and real impact.
● Closed data has a much slimmer chance of increasing in quality
● Open data makes it easier to find problems and to improve the data
● Benefits will come from developing networks of information
● Don't get hung up on standards! A lone nut with followers doing
something copyable is enough!

Weitere ähnliche Inhalte

Was ist angesagt?

Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsJon Voss
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Open hpi semweb-06-part4
Open hpi semweb-06-part4Open hpi semweb-06-part4
Open hpi semweb-06-part4Nadine Ludwig
 
Inverted textindexing
Inverted textindexingInverted textindexing
Inverted textindexingKhwaja Aamer
 
Week 5 thursday
Week 5   thursdayWeek 5   thursday
Week 5 thursdayE Milanese
 
Linked Open Data for Archives
Linked Open Data for ArchivesLinked Open Data for Archives
Linked Open Data for ArchivesCliff Landis
 
Rivers vs. Ponds
Rivers vs. PondsRivers vs. Ponds
Rivers vs. PondsBert Zeeman
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data ArrivesRichard Wallis
 
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014Kimberly Hoffman
 
An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)Cliff Landis
 
Archives Hub - Data in :: Data out
Archives Hub - Data in :: Data outArchives Hub - Data in :: Data out
Archives Hub - Data in :: Data outJane Stevenson
 
Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma
 
Knowledge for Everyone
Knowledge for EveryoneKnowledge for Everyone
Knowledge for EveryoneJeni Tennison
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
 
Aallbibframe em-20130714
Aallbibframe em-20130714Aallbibframe em-20130714
Aallbibframe em-20130714zepheiraorg
 

Was ist angesagt? (20)

Linked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & MuseumsLinked Open Data in Libraries Archives & Museums
Linked Open Data in Libraries Archives & Museums
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Open hpi semweb-06-part4
Open hpi semweb-06-part4Open hpi semweb-06-part4
Open hpi semweb-06-part4
 
Inverted textindexing
Inverted textindexingInverted textindexing
Inverted textindexing
 
Week 5 thursday
Week 5   thursdayWeek 5   thursday
Week 5 thursday
 
Linked Open Data for Archives
Linked Open Data for ArchivesLinked Open Data for Archives
Linked Open Data for Archives
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Rivers vs. Ponds
Rivers vs. PondsRivers vs. Ponds
Rivers vs. Ponds
 
When the Web of Linked Data Arrives
When the Web of Linked Data ArrivesWhen the Web of Linked Data Arrives
When the Web of Linked Data Arrives
 
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
 
An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Archives Hub - Data in :: Data out
Archives Hub - Data in :: Data outArchives Hub - Data in :: Data out
Archives Hub - Data in :: Data out
 
Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...Introduction for skills seminar on Search and Data Mining, Master of European...
Introduction for skills seminar on Search and Data Mining, Master of European...
 
Knowledge for Everyone
Knowledge for EveryoneKnowledge for Everyone
Knowledge for Everyone
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
Aallbibframe em-20130714
Aallbibframe em-20130714Aallbibframe em-20130714
Aallbibframe em-20130714
 

Ähnlich wie Mashspa

The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarshipbenosteen
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
LIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersLIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersPrattSILS
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13Kristi Holmes
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Supporting The Health Researcher Of The Future
Supporting The Health Researcher Of The FutureSupporting The Health Researcher Of The Future
Supporting The Health Researcher Of The FutureAndy Tattersall
 
E knowledge presentation
E knowledge presentationE knowledge presentation
E knowledge presentationRolf Proske
 
10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.BenitoSumpter862
 
10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.SantosConleyha
 

Ähnlich wie Mashspa (20)

Current opinions in drug discovery public compound databases
Current opinions in drug discovery public compound databasesCurrent opinions in drug discovery public compound databases
Current opinions in drug discovery public compound databases
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Resource Description Pres and Paper
Resource Description Pres and PaperResource Description Pres and Paper
Resource Description Pres and Paper
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
 
WORLD CAT AS BIG DATA
WORLD CAT AS  BIG DATAWORLD CAT AS  BIG DATA
WORLD CAT AS BIG DATA
 
LIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project postersLIS 653 fall 2013 final project posters
LIS 653 fall 2013 final project posters
 
Public Compound Databases
Public Compound DatabasesPublic Compound Databases
Public Compound Databases
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Supporting The Health Researcher Of The Future
Supporting The Health Researcher Of The FutureSupporting The Health Researcher Of The Future
Supporting The Health Researcher Of The Future
 
E knowledge presentation
E knowledge presentationE knowledge presentation
E knowledge presentation
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.
 
10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.
 

Mehr von benosteen

Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 
Bl labs ucl-services
Bl labs ucl-servicesBl labs ucl-services
Bl labs ucl-servicesbenosteen
 
Bl labs what is british library labs
Bl labs   what is british library labsBl labs   what is british library labs
Bl labs what is british library labsbenosteen
 
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017benosteen
 
Uses of Library Collections
Uses of Library CollectionsUses of Library Collections
Uses of Library Collectionsbenosteen
 
CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016benosteen
 
NDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - KeynoteNDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - Keynotebenosteen
 
British library labs - What? Why?
British library labs - What? Why?British library labs - What? Why?
British library labs - What? Why?benosteen
 
UKSG 2015 Mechanical curator and British Library labs
UKSG 2015  Mechanical curator and British Library labsUKSG 2015  Mechanical curator and British Library labs
UKSG 2015 Mechanical curator and British Library labsbenosteen
 
Lightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 StanfordLightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 Stanfordbenosteen
 
104 Communicating our Collections Online
104 Communicating our Collections Online104 Communicating our Collections Online
104 Communicating our Collections Onlinebenosteen
 
Sharing and Serendipity
Sharing and SerendipitySharing and Serendipity
Sharing and Serendipitybenosteen
 
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)benosteen
 
BL Labs 2014 Symposium: The Mechanical Curator
BL Labs 2014 Symposium: The Mechanical CuratorBL Labs 2014 Symposium: The Mechanical Curator
BL Labs 2014 Symposium: The Mechanical Curatorbenosteen
 
The surprising adventures of the mechanical curator
The surprising adventures of the mechanical curatorThe surprising adventures of the mechanical curator
The surprising adventures of the mechanical curatorbenosteen
 
Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notesbenosteen
 
Apache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stoneApache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stonebenosteen
 
New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...benosteen
 
Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?benosteen
 
Postscript, books and binding
Postscript, books and bindingPostscript, books and binding
Postscript, books and bindingbenosteen
 

Mehr von benosteen (20)

Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
Bl labs ucl-services
Bl labs ucl-servicesBl labs ucl-services
Bl labs ucl-services
 
Bl labs what is british library labs
Bl labs   what is british library labsBl labs   what is british library labs
Bl labs what is british library labs
 
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
 
Uses of Library Collections
Uses of Library CollectionsUses of Library Collections
Uses of Library Collections
 
CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016
 
NDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - KeynoteNDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - Keynote
 
British library labs - What? Why?
British library labs - What? Why?British library labs - What? Why?
British library labs - What? Why?
 
UKSG 2015 Mechanical curator and British Library labs
UKSG 2015  Mechanical curator and British Library labsUKSG 2015  Mechanical curator and British Library labs
UKSG 2015 Mechanical curator and British Library labs
 
Lightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 StanfordLightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 Stanford
 
104 Communicating our Collections Online
104 Communicating our Collections Online104 Communicating our Collections Online
104 Communicating our Collections Online
 
Sharing and Serendipity
Sharing and SerendipitySharing and Serendipity
Sharing and Serendipity
 
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
 
BL Labs 2014 Symposium: The Mechanical Curator
BL Labs 2014 Symposium: The Mechanical CuratorBL Labs 2014 Symposium: The Mechanical Curator
BL Labs 2014 Symposium: The Mechanical Curator
 
The surprising adventures of the mechanical curator
The surprising adventures of the mechanical curatorThe surprising adventures of the mechanical curator
The surprising adventures of the mechanical curator
 
Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notes
 
Apache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stoneApache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stone
 
New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...
 
Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?
 
Postscript, books and binding
Postscript, books and bindingPostscript, books and binding
Postscript, books and binding
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Kürzlich hochgeladen (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Mashspa

  • 1. Open Bibliography, And why it shouldn't have to exist. Ben O'Steen “Mashspa” Mashed Libraries, Bath 29/10/2010 CC-By
  • 3. Urgh, “Open” - what does that mean?
  • 4. Publishing bibliographic information under a permissive license to encourage indexing, re- use, and re-purposing.
  • 6. In essence, an open bibliography is all about Advertising
  • 7. Bibliographic info allows you to ● Identify and find an item you know you want
  • 8. Bibliographic info allows you to ● Identify and find an item you know you want, ● Discover related items or items you believe you want
  • 9. Bibliographic info allows you to ● Identify and find an item you know you want, ● Discover related items or items you believe you want ● Serendipitously discover items you would like without knowing they might exist ● And so on.
  • 10. Bibliographic info allows you to ● Identify and find an item you know you want, ● Discover related items or items you believe you want ● Serendipitously discover items you would like without knowing they might exist ● And so on. Requires Increasing Investment!
  • 11. Advertising 'proverb' You never spend money on advertising; you invest with an expectation of return on investment
  • 12. To maximise returns, you maximise the audience.
  • 13. Should the advertising target 'b2b' or 'consumers'?
  • 14. One thing I am not saying must be necessary...
  • 15.
  • 16. But, by not making bibliographic data open, you limit the audience. (You also limit the data quality, but more on that later.)
  • 17. “Can't I just scrap sites and reuse it anyway? It's just facts after all...”
  • 18. “Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases” http://is.gd/gqkqb
  • 19. Databases have in the past been defended using Copyright laws. This new law codifies a new protection based on “sui generis”* rights, rights earned by the “sweat of the brow” * http://en.wikipedia.org/wiki/Sui_generis
  • 20. So far, noone seems to have any evidence that this encouraged database-based economies. There is evidence that it 'awarded' unending monopolies on existing datasets.
  • 21. Due to fluffy wording, it is a timebomb It is a right, like copyright, that doesn't need to be defended and can be assumed for almost any aggregation.
  • 22. When we asked UK PubMedCentral if we could reproduce the bibliographic data they share through their OAI-PMH service. They said “Generally, No”* (*me paraphrasing that they had non-transferable licenses and contracts yada yada. Their 'OA subset' of 1876 journals is available however, mainly BMC.)
  • 23. From OAI-PMH specification: * Data Providers administer systems that support the OAI-PMH as a means of exposing metadata; and * Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services. http://www.openarchives.org/OAI/openarchivesproto col.html
  • 24. “… Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services.” And the survey said...
  • 25. X
  • 27. 1 -When publishing data make an explicit and robust license statement.
  • 28. 2 -Use a recognized waiver or license that is appropriate for metadata.
  • 29. 3 - If you want your data to be effectively used and added to by others it should be open … – in particular non-commercial and other restrictive clauses should not be used.
  • 30. 4 - We strongly recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0.
  • 31. 5 – We strongly urge creators of bibliographic metadata explicitly either dedicate this to the public domain or use an open licence.
  • 32. Identify Title, Date, Any identifiers, Publisher, Container (eg Journal), Author names etc Discover Keywords, Abstract, Author Identifiers, etc Serendipity Citations, citing text, Usage data, supplemental data, etc. Bibliographic Sliding Scale
  • 34. “So, we just pick a standard and publish and we'll reap all the benefits, right?”
  • 35. Erm, no. For three main reasons.
  • 36. #1 “Where there is human input, there is interpretation” Meanings of words and usage of fields have changed over time.
  • 37. #1 (cont.) Interchange standards don't make the information any more understandable. Someone interprets them.
  • 38. #2 Data has been entered and curated without large- scale sharing as a focus. Lots of implicit, contextual info left out.
  • 39. #3 Data quality is typically poor with formally closed datasets.
  • 40.
  • 41. For #1 - Collisions caused by interpretation can really only be solved by sharing data and seeing how bad things are.
  • 42. Standards and interoperability: “The first follower transforms a lone nut into a leader” - Derek Sivers' TED Talk http://www.ted.com/talks/lang/eng/derek_sivers_how_to_start_a_move ment.html
  • 43. Video: http://www.youtube.com/watch?v=GA8z7f7a2Pk The man dancing is joined by one or two, but he is still doing his own thing. Eventually a group decides to join him, and the group grows. The quality of the dance isn't important, but the community dancing along with it is. And so it is with standards.
  • 44. For #2 (implicit info), provenance and the source of data gives us crucial clues. Due to #1, I remain unconvinced that this information can ever be totally machine-readable.
  • 45. And for #3, misleading or incorrect data... … um. No easy answers – we just don't have the info.
  • 46. The data clean-up process is going to be probabalistic. (We cannot be sure – by definition - that we are 'accurate' when we de-duplicate or disambiguate.)
  • 47. Typical methods then: Natural Language Processing, Machine learning techniques and String Metrics and old skool record deduplication
  • 48. I <3 String Metrics and old skool record deduplication (out of the 3)
  • 49.
  • 51. Old skool record linkage: “Felligi-Sunter” - probabilistic record linkage (PRL). It's not a great model, but it's achievable. Machine-learning requires a reasonably large golden set. (http://en.wikipedia.org/wiki/Record_linkage)
  • 52. PRL is not great in itself, BUT It does lend itself to Map-Reduce style operations And It's a great way to filter down to those records that really do need to be compared by eye.
  • 53. http://datamining.anu.edu.au/projects/linkage.html “Record or data linkage techniques are used to link together records which relate to the same entity (e.g. patient, customer, household) in one or more data sets where a unique identifier for each entity is not available in all or any of the data sets to be linked.” ANU's Febrl python code
  • 54. So far, much effort has been directed at the Works; We need to put much more effort into their Networks. Bibliographic directions
  • 57. Networks? ● A cites B ● Works by a given (identified) Author ● Works cited by a given Author ● Works citing articles that have since been disproved, redacted or withdrawn. ● Co-authors ● And many more connections we've not even considered yet ('betweenness', 'centrality', etc)
  • 58. In Summary, ● Accessible Bibliography as Advertising. ● Bibliography authors choose how they wish to invest to gain usage and real impact. ● Closed data has a much slimmer chance of increasing in quality ● Open data makes it easier to find problems and to improve the data ● Benefits will come from developing networks of information ● Don't get hung up on standards! A lone nut with followers doing something copyable is enough!