SlideShare ist ein Scribd-Unternehmen logo
1 von 24
+




    Text to data
    MashCat 2012
    Ed Chamberlain
+
    Me
       Librarian (systems)

       Data ‘munger’

       Data consumer?
+
    The way it used to be …




                            Control over record
                             consumption
                            Control over record
                             environment
                            Control over technology
+
+
    Competition …


    No longer the single     authority for content and description


    Commercial, social     and academic discovery mechanisms


    Explosion of digital   content


     Illusion   of ‘all on the web’
+
    Fit for purpose?


       Studies into Google Generation /
        ‘Generation Y’ 1



       Cambridge Arcadia IRIS report 2009 2


           Preference for search engine over
            catalogue

           Online over in-building

           Trust tutors and peers over Librarian
                                                    1) ”The Google generation: the information behaviour of the researcher
                                                    of the future”
           Still respect the library ‘brand’       Aslib Proceedings, V60, issue 4 10.1108/00012530810887953

                                                    2) Arcadia IRIS Project report -
                                                    http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf
+
    Improve catalogues
       Keyword based discovery
        services



       New ways to exploit old data


           Relevancy ranking

           Rich faceting

           Greater linking

           Search is the new browse

           Repositories and archives


       Is the OPAC dead?
+
    Different but the same?

    Catalogue data is now:


        Consumed as keywords
         (not left anchored access
         points)
        Faceted (not browsed)
        Supplemented
        Transformed
        Merged
        Amalgamated
+
    Prepare for the future …

       „Use case you‟ve not yet thought of‟

       „Consumer as producer‟

       „Pro-Am‟

       „Free from silo‟



       Developers as well as readers

       Preference for data over text
+
                                          Our local catalogues
         Research group website
                                                                   Wikipedia
                     Web start-ups

         National /
         international
         aggregations
                             Joe Public               Library data

    Search engines                                                          Other
                              Booksellers                                   libraries
                                                       Teenage software
                                                       developer / hacker
+
    Libraries have a lot to offer

       Bibliographic data linked to
        many aspects of successful
        teaching and research
           Citation lists – measure output

           Shared bibliography – core of
            research group work

           Reading lists – backbone of
            undergraduate teaching

           High quality data needed for re-
            use

       Not all possible whilst data
        resides in the library ‘silo’
+

        'Open metadata creates the opportunity for
        enhancing impact through the release of
        descriptive data about library, archival and
        museum resources. It allows such data to be
        made freely available and innovatively
        reused to serve researchers, teachers,
        students, service providers and the wider
        community in the UK and internationally.'




                                          http://discovery.ac.uk
+
    Open data releases …
+
    But …

       Is Marc21 the right format for developers (or libraries?)



       Is it easy to convert into something more palatable?
+
    What can we do with an ISBN?

       Build Union catalogues

       Find existing or alternative records (copy catalogue)

       Find related works (XISBN, ISBNThing)



       Match and mash with resources on the web:
           Images
           Reviews
           Citations and references
+
    020 - ISBN

    What cataloguer record users    What data consumers want:
    want:

       Accuracy                       – Accuracy

       Contextualization              – Contextualization
       Access point
                                       – Access point
       Something legible to read
                                       – Reusability

                                       – Granularity
+
    So …

       Take ISBN from an 020$a
           my $isbn = $record->field('020')->as_string("a");
           0123456789(pbk)

       (pbk) ?

       Is it the same as (.pbk) I noticed earlier?

       I‟m a developer – I can solve this …

       Regex /^[0-9]+$/ - just gets numbers …

       Oh hang on, don‟t some ISBNS end in X?

       And all that information on hardback /paperback is lost …
+
    Non Marc …

       <identifier type=“isbn” relation=“hardback”>0123456789x</isbn>



       identifier: {"id": "0123456789", "type": "isbn”, “rel”:”hardback”}



       <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_100045>
        <http://purl.org/dc/terms/identifier >"urn:isbn:2853990060"
        .<http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b56
        70335d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        http://purl.org/ontology/bibo/Book.
+
    Advantages

       Self describing (if you read English)

       Granular

       Data NOT text for display (although this can be easily
        generated)
+
    $100 …

        • 1001_ |a Greenwood, James, |d 1832-1929.
        • Greenwood, James, 1832-1929.


    "author" : [

            {

                "birthDate" : "1832",

                "firstname" : " James",

                "deathDate" : "1929",

                "name" : "Greenwood, James",

                "lastname" : "Greenwood"

            }

    ]
my @exportAuthors=();
             my @authors =();

+            my $eachAuthor ='';
             if ($record->field('100')) {
                   @authors = $record->field('100');
                   foreach $eachAuthor(@authors) {
                         my %exportAuthor =();
                         my $authorFull = trim($eachAuthor->subfield('a'));
                         $exportAuthor{'name'} = $authorFull;
                          my @parsed_author=split(/,/, $authorFull);
                         $exportAuthor{'lastname'} = $parsed_author[0];
                         $exportAuthor{'firstname'} = $parsed_author[1];
                         my $dates = $eachAuthor->subfield('d');
                        my ($birthDate,$deathDate);

                   # The glorious 100$d disassembled ...
                         if ($dates) {
                                     #first of all, get rid of ca. and fl. which aren't real birth or death dates
                                     if ($dates=~/fl.|ca./){
                                            #do nothing
                                     }
                                     #otherwise, if date contains a hyphen, assume range
                                     #but fix also works for unterminated dates?
                                     elsif ($dates=~/-/) {

                                              my @dates=split(/-/,$dates);
                                              $exportAuthor{'birthDate'} = trim($dates[0]);
                                              if ($dates[1]) {
                                                    $exportAuthor{'deathDate'} = trim($dates[1]);
                                              }

                              #No Hyphen - assume single date - look for definitive birth event with a 'd' ...
                                     } elsif ($dates=~/b./) {

                                               $exportAuthor{'birthDate'} = trim($dates[0]);

                              # - look for definitive death event with a 'd' ...
                                        } elsif ($dates=~/d./) {

                                             $exportAuthor{'deathDate'} = trim($dates[0]);
                              # Final assumption for authors with recorded dates but with single date no hyphen. Assume its a birthdate?
                                       } else {
                                             $exportAuthor{'birthDate'} = trim($dates[0]);
                                       }
                              # produce output for dates ...
                          }

                  # Assemble author object
                  push(@exportAuthors,%exportAuthor);
                  # End author loop
                  }
              # Add list of authors to export object
             $exportRecord{'author'} = @exportAuthors;
             }
+
    How is this being solved?

       Fix it at the source:
           RDA
           Marc transition initiative
           Other initiatives – BL, OCLC linked data releases
           Onyx
           Mods
+
    Pragmatism: the end of big
    standards
       Adoption of one new standard (or several) for its own sake is
        pointless

       Fit in around changing needs of libraries and systems

       Data needs to be flexible and re-purposable

       No standard to „rule them all‟ in the post Marc21 world
+
    If we do nothing?

Weitere ähnliche Inhalte

Was ist angesagt? (9)

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
PHP API
PHP APIPHP API
PHP API
 
Introduction to DBIx::Lite - Kyoto.pm tech talk #2
Introduction to DBIx::Lite - Kyoto.pm tech talk #2Introduction to DBIx::Lite - Kyoto.pm tech talk #2
Introduction to DBIx::Lite - Kyoto.pm tech talk #2
 
Entity Relationships in a Document Database at CouchConf Boston
Entity Relationships in a Document Database at CouchConf BostonEntity Relationships in a Document Database at CouchConf Boston
Entity Relationships in a Document Database at CouchConf Boston
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
 
SlingQuery
SlingQuerySlingQuery
SlingQuery
 

Andere mochten auch

Aula virtual
Aula virtualAula virtual
Aula virtual
salsa558
 
Future Search 2011
Future Search 2011Future Search 2011
Future Search 2011
HZMCI
 
WRCCISD Technology Plan
WRCCISD Technology PlanWRCCISD Technology Plan
WRCCISD Technology Plan
angelmc43
 
Portfolio Julie Ariens
Portfolio Julie AriensPortfolio Julie Ariens
Portfolio Julie Ariens
Julie Ann
 
Debt Outlook Negative
Debt Outlook NegativeDebt Outlook Negative
Debt Outlook Negative
Mike Plant
 
WRCCISD Technology Plan
WRCCISD Technology PlanWRCCISD Technology Plan
WRCCISD Technology Plan
angelmc43
 

Andere mochten auch (16)

Aula virtual
Aula virtualAula virtual
Aula virtual
 
Future Search 2011
Future Search 2011Future Search 2011
Future Search 2011
 
WRCCISD Technology Plan
WRCCISD Technology PlanWRCCISD Technology Plan
WRCCISD Technology Plan
 
Portfolio Julie Ariens
Portfolio Julie AriensPortfolio Julie Ariens
Portfolio Julie Ariens
 
Open (linked) bibliographic data
Open (linked) bibliographic dataOpen (linked) bibliographic data
Open (linked) bibliographic data
 
The kove
The koveThe kove
The kove
 
Debt Outlook Negative
Debt Outlook NegativeDebt Outlook Negative
Debt Outlook Negative
 
Developments in catalogues and data sharing
Developments in catalogues and data sharingDevelopments in catalogues and data sharing
Developments in catalogues and data sharing
 
CreativeBloc 2011
CreativeBloc 2011CreativeBloc 2011
CreativeBloc 2011
 
State of fusion
State of fusionState of fusion
State of fusion
 
Sharing data
Sharing dataSharing data
Sharing data
 
Cambridge university library ess update for ucs
Cambridge university library  ess update for ucsCambridge university library  ess update for ucs
Cambridge university library ess update for ucs
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
WRCCISD Technology Plan
WRCCISD Technology PlanWRCCISD Technology Plan
WRCCISD Technology Plan
 
Comet project
Comet projectComet project
Comet project
 
DeCosta Properties Listing Presentation
DeCosta Properties Listing PresentationDeCosta Properties Listing Presentation
DeCosta Properties Listing Presentation
 

Ähnlich wie Text to data

Improving RDF Search Performance with Lucene and SIREN
Improving RDF Search Performance with Lucene and SIRENImproving RDF Search Performance with Lucene and SIREN
Improving RDF Search Performance with Lucene and SIREN
Mike Hugo
 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_boston
MongoDB
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
Sergio Bossa
 

Ähnlich wie Text to data (20)

The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Catmandu Librecat
Catmandu LibrecatCatmandu Librecat
Catmandu Librecat
 
Polyglot Persistence
Polyglot PersistencePolyglot Persistence
Polyglot Persistence
 
Improving RDF Search Performance with Lucene and SIREN
Improving RDF Search Performance with Lucene and SIRENImproving RDF Search Performance with Lucene and SIREN
Improving RDF Search Performance with Lucene and SIREN
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Connections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystifiedConnections that work: Linked Open Data demystified
Connections that work: Linked Open Data demystified
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_boston
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Terrastore - A document database for developers
Terrastore - A document database for developersTerrastore - A document database for developers
Terrastore - A document database for developers
 
Linked Data in Learning Analytics Tools
Linked Data in Learning Analytics ToolsLinked Data in Learning Analytics Tools
Linked Data in Learning Analytics Tools
 
Php introduction
Php introductionPhp introduction
Php introduction
 
A hint of_mint
A hint of_mintA hint of_mint
A hint of_mint
 
Play á la Rails
Play á la RailsPlay á la Rails
Play á la Rails
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Text to data

  • 1. + Text to data MashCat 2012 Ed Chamberlain
  • 2. + Me  Librarian (systems)  Data ‘munger’  Data consumer?
  • 3. + The way it used to be …  Control over record consumption  Control over record environment  Control over technology
  • 4. +
  • 5. + Competition … No longer the single authority for content and description Commercial, social and academic discovery mechanisms Explosion of digital content  Illusion of ‘all on the web’
  • 6. + Fit for purpose?  Studies into Google Generation / ‘Generation Y’ 1  Cambridge Arcadia IRIS report 2009 2  Preference for search engine over catalogue  Online over in-building  Trust tutors and peers over Librarian 1) ”The Google generation: the information behaviour of the researcher of the future”  Still respect the library ‘brand’ Aslib Proceedings, V60, issue 4 10.1108/00012530810887953 2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf
  • 7. + Improve catalogues  Keyword based discovery services  New ways to exploit old data  Relevancy ranking  Rich faceting  Greater linking  Search is the new browse  Repositories and archives  Is the OPAC dead?
  • 8. + Different but the same? Catalogue data is now:  Consumed as keywords (not left anchored access points)  Faceted (not browsed)  Supplemented  Transformed  Merged  Amalgamated
  • 9. + Prepare for the future …  „Use case you‟ve not yet thought of‟  „Consumer as producer‟  „Pro-Am‟  „Free from silo‟  Developers as well as readers  Preference for data over text
  • 10. + Our local catalogues Research group website Wikipedia Web start-ups National / international aggregations Joe Public Library data Search engines Other Booksellers libraries Teenage software developer / hacker
  • 11. + Libraries have a lot to offer  Bibliographic data linked to many aspects of successful teaching and research  Citation lists – measure output  Shared bibliography – core of research group work  Reading lists – backbone of undergraduate teaching  High quality data needed for re- use  Not all possible whilst data resides in the library ‘silo’
  • 12. +  'Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.' http://discovery.ac.uk
  • 13. + Open data releases …
  • 14. + But …  Is Marc21 the right format for developers (or libraries?)  Is it easy to convert into something more palatable?
  • 15. + What can we do with an ISBN?  Build Union catalogues  Find existing or alternative records (copy catalogue)  Find related works (XISBN, ISBNThing)  Match and mash with resources on the web:  Images  Reviews  Citations and references
  • 16. + 020 - ISBN What cataloguer record users What data consumers want: want:  Accuracy – Accuracy  Contextualization – Contextualization  Access point – Access point  Something legible to read – Reusability – Granularity
  • 17. + So …  Take ISBN from an 020$a  my $isbn = $record->field('020')->as_string("a");  0123456789(pbk)  (pbk) ?  Is it the same as (.pbk) I noticed earlier?  I‟m a developer – I can solve this …  Regex /^[0-9]+$/ - just gets numbers …  Oh hang on, don‟t some ISBNS end in X?  And all that information on hardback /paperback is lost …
  • 18. + Non Marc …  <identifier type=“isbn” relation=“hardback”>0123456789x</isbn>  identifier: {"id": "0123456789", "type": "isbn”, “rel”:”hardback”}  <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_100045> <http://purl.org/dc/terms/identifier >"urn:isbn:2853990060" .<http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b56 70335d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> http://purl.org/ontology/bibo/Book.
  • 19. + Advantages  Self describing (if you read English)  Granular  Data NOT text for display (although this can be easily generated)
  • 20. + $100 … • 1001_ |a Greenwood, James, |d 1832-1929. • Greenwood, James, 1832-1929. "author" : [ { "birthDate" : "1832", "firstname" : " James", "deathDate" : "1929", "name" : "Greenwood, James", "lastname" : "Greenwood" } ]
  • 21. my @exportAuthors=(); my @authors =(); + my $eachAuthor =''; if ($record->field('100')) { @authors = $record->field('100'); foreach $eachAuthor(@authors) { my %exportAuthor =(); my $authorFull = trim($eachAuthor->subfield('a')); $exportAuthor{'name'} = $authorFull; my @parsed_author=split(/,/, $authorFull); $exportAuthor{'lastname'} = $parsed_author[0]; $exportAuthor{'firstname'} = $parsed_author[1]; my $dates = $eachAuthor->subfield('d'); my ($birthDate,$deathDate); # The glorious 100$d disassembled ... if ($dates) { #first of all, get rid of ca. and fl. which aren't real birth or death dates if ($dates=~/fl.|ca./){ #do nothing } #otherwise, if date contains a hyphen, assume range #but fix also works for unterminated dates? elsif ($dates=~/-/) { my @dates=split(/-/,$dates); $exportAuthor{'birthDate'} = trim($dates[0]); if ($dates[1]) { $exportAuthor{'deathDate'} = trim($dates[1]); } #No Hyphen - assume single date - look for definitive birth event with a 'd' ... } elsif ($dates=~/b./) { $exportAuthor{'birthDate'} = trim($dates[0]); # - look for definitive death event with a 'd' ... } elsif ($dates=~/d./) { $exportAuthor{'deathDate'} = trim($dates[0]); # Final assumption for authors with recorded dates but with single date no hyphen. Assume its a birthdate? } else { $exportAuthor{'birthDate'} = trim($dates[0]); } # produce output for dates ... } # Assemble author object push(@exportAuthors,%exportAuthor); # End author loop } # Add list of authors to export object $exportRecord{'author'} = @exportAuthors; }
  • 22. + How is this being solved?  Fix it at the source:  RDA  Marc transition initiative  Other initiatives – BL, OCLC linked data releases  Onyx  Mods
  • 23. + Pragmatism: the end of big standards  Adoption of one new standard (or several) for its own sake is pointless  Fit in around changing needs of libraries and systems  Data needs to be flexible and re-purposable  No standard to „rule them all‟ in the post Marc21 world
  • 24. + If we do nothing?

Hinweis der Redaktion

  1. I’m trying to frame the next 40 minutes or so as a narrative
  2. When attempting to guess where we are going, it helps if we take a step back1) To simplify things (a little) Librarians and cataloguers used to have full control of their data and the way it was used (consumed) - We created it (or paid others to do so for us) - Our readers consumed it, in our libraries, served via ledgers, card indexes and OPACs - We had / have policies + standards (AACR2, Marc21) procedures (LOC Authority control, organisation (RLUK, OCLC), technology (Z39.50, OPACS)
  3. Library still in its bubbleAlternative discovery mechanisms and academic data &amp; content sources suddenly existed alongside our sealed environment – all very heavily branded, very slick, constantly evolvingSome we pay for, some we contribute to, some we view as inferior competition – but they exist – all legitimate means to discover bibliographic material of interest to the researcher or the scholar and they act as a direct alternative to our traditional modelAll with their own data environments, standards, procedures, protocols – not necessarily oursIn light of this I argue that we could not longer maintain the closed ecosystem – to argue as such has become a fallacy, even in the mighty libraries of Oxford and Cambridge with world class special collections
  4. 2) - We slowly lost our place as a single prime authority - for data- Commercial, social and academic discovery mechanisms Other sources of information for our users to turn toand eventually for content Also had to cope with a growth in digital content - Publishing shift to digital(took as while as journals came first, they were only a small part of our business - analytical cataloguing not standard practice) – this is resulting in massive changes in metadata and discovery usage …
  5. In the new environment, come new users termed Generation Y. Generation Y, it is argued have grown up and worked outside of our bubble all along - used to a very different mode of consumption for data and resourcesThey are born between 1984 and 1990. but I would argue the concept can be stretched further, way back, probably anyone who has studied science since the mid to late 1990s …Cambridge Arcadia report 2009Preference for search engine over catalogueOnline over in-buildingTrust peers of librarianStill respect the library ‘brand’All of this has lead to a direct and open questioning of the purpose of the academic library – never mind the public one
  6. Keyword based discovery servicesRich facetingGreater linkingNew ways to OPAC is dead? -it is in your case, and I’m quite jealous…All possible due to richness of data – our authority controlled catalogue records generally work quite well in faceted environments – we gain a competitive edge over folk whose data is not in such good shapeCatalogues are easier to pick up, easier to teach and provide a more cohesive experience, even if they don’t always work in the way we as Librarians would always like. Our data is still in use, it is valuable and relevant, partly as a result of these changes in interfaceAnd I know this, because when you launched Solo a couple of years ago, some of your undergrads became our post grads and told us what they thought of our interfaces
  7. Catalogue data now goes through several processesThe record you create is not always the record readers will seeThe way it is searched and accessed Yet we still build it with the same rules and container formats as we did 20 years ago
  8. Gets us so far. Need to move forward. One way to prepare is to open up. We need to share and open up our raw data and to make it easier for others to re-use. I would argue each of these groups has an equal right to our raw data as much as we do, each would have different use cases for itAnd by and large, in the field of online services, I’m talking about software developers but in many areasAllow others to innovate on our data on our behalf, think of those use cases and explore them.
  9. And there is demand. This slide is based on the ideas of a certain Cambridge academic.Bibliographic data linked to many aspects of teaching and researchCitation lists – measure outputShared bibliography – core of research group workReading lists – backbone of undergraduate teachingQuality of data – in terms of consistency and accuracy and form we are much easier to handle than museums and archivesAll exists already, but not in an open, linked capacity that can be tied quickly and easily into other institutional and external services
  10. This is recognised nationally by the JISC, who earlier this year launched the discovery initiativeOxford text archive contributed a project, we did with catalogue data and they are funding some very exciting work …