Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

The Future of Metadata Management & Making Library Collections Discoverable on the Web

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 45 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie The Future of Metadata Management & Making Library Collections Discoverable on the Web (20)

Anzeige

Aktuellste (20)

The Future of Metadata Management & Making Library Collections Discoverable on the Web

  1. 1. The Future of Metadata Management & Making Library Collections Discoverable on the Web Ted Fons, OCLC The National Library Descriptors Conference Proposed Changes in the Development of Library Collections in the Era of the Semantic Web Warsaw - 21-22 April, 2015
  2. 2. Cataloging Harvesting Datamining The Future
  3. 3. Our Goal
  4. 4. The Goal S.R. Ranganathan 1. Books are for use. 2. Every reader his [or her] book. 3. Every book its reader. 4. Save the time of the reader. 5. The library is a growing organism. Image credit: http://static.guim.co.uk/sys- images/Guardian/Pix/pictures/2009/3/23/1237806064989/Young-man- Connect the reader to content.
  5. 5. Cataloging
  6. 6. How We Work Today Local Group Global We catalog: • Books • Music • Journal titles • Authorities
  7. 7. What is in a Global Discovery System Readers want: • eBooks • Articles • Unique content We catalog: • Books • Music • Journal titles • Authorities
  8. 8. • Calhoun: “Metadata has changed as collections have changed. It remains important, but it comes in many forms and from many sources. The centrality of bibliographic control has been disrupted.” P. 15. • And: “There is less need and place for traditional bibliographic control as a set of methods for providing [metadata] for discovery, access and management of the content of mainstream books and serials. “p. 24. Catalogue 2.0 by Karen Calhoun
  9. 9. “Ken Chad examines the distinction betweeen redundant cataloging (re-editing records to suit local practices) and redundant catalogs [in the UK]. He enumerates the benefits of moving from … 160 standalone catalogues to a single shared catalogue at the network level for all of these libraries” Karen Calhoun in Catalogue 2.0 Duplicating records for local purposes
  10. 10. The world’s libraries. Connected. What is in a Global Discovery System Readers want: • eBooks • Articles • Unique content We catalog: • Books • Music • Journal titles • Authorities
  11. 11. The world’s libraries. Connected. The value of authorities FRAD Tasks  Find  Identify  Clarify  Contextualize http://www.ifla.org/publications/functional-requirements-for-authority-data
  12. 12. What is in a Global Discovery System So, what should we do? 1. Catalog unique materials 2. Create authorities 3. Use harvesting and data mining for everything else
  13. 13. Where? Local Group Global Catalog • Unique Materials • Create Authorities
  14. 14. Data Harvesting
  15. 15. What is in a Global Discovery System Readers want: • eBooks • Articles • Unique content We catalog: • Books • Music • Journal titles • Authorities
  16. 16. Local Group Global
  17. 17. Data Mining & The Web
  18. 18. Local Group Global
  19. 19. The Web of … Documents Active Documents Discovery Data Knowledge ☌☌☌ Libraries can connect to the web of knowledge
  20. 20. The Knowledge Graph ☌ Libraries can connect to the web of knowledge Libraries can create a knowledge graph Documents Entities
  21. 21. Establishing Semantic Identity For Accurate Representation on the Web 12/09/2014 Kenning Arlitsch Dean of the Library Kenning Arlitsch, Dean of the Library Patrick OBrien, Semantic Web Research Director
  22. 22. The Point Libraries are poorly defined and represented on the Semantic Web… …but we know how to fix that problem… …mostly
  23. 23. Google’s Perception of MSU Lib - 2012
  24. 24. MSU Library - 2014
  25. 25. DBPedia entry - 2012
  26. 26. 2014 DBpedia entry
  27. 27. 2014 Dbpedia entry DBPedia entry - 2012
  28. 28. Summary • Define library organization in Wikipedia – Beware of *pedia culture and process • Engage with other trusted data sources – FreeBase – Google Places/Google My Business – Google+ • Mark-up metadata with Schema.org
  29. 29. The Knowledge Graph ☌ Libraries can connect to the web of knowledge Libraries can create a knowledge graph Documents Entities
  30. 30. person place object concept organization work author subjectitem availability The solution starts here. Thelibraryknowledgegraph
  31. 31. person place object concept organization work Thelibraryknowledgegraph http://www.ifla.org/publications/functional-requirements-for-bibliographic-records FRBR Entities  Work  Expression  Manifestation  Item
  32. 32. Exampleofbenefits… Discovery The Name of the Rose Summary: The year is 1327. Franciscans in a wealthy Italian abbey are suspected of heresy, and Brother William of Baskerville arrives to investigate. His delicate mission is suddenly overshadowed by seven bizarre deaths that take place in seven days and nights of apocalyptic terror. Subjects Borrowing Options eBooks | Printed Books | Audio Books Other Languages Monastic libraries -- Italy – Fiction | Semiotics -- Fiction
  33. 33. Example of Benefits: Web Exposure data.BnF.fr Number of Visits - 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 January February March April May June July August September October Visits to WorldCat 2012 2013 2014
  34. 34. Photo credit: http://media02.hongkiat.com/freebies-for-web-designers-2011/progress-bar.jpg What has OCLC done? How does data mining work?
  35. 35. The Data Strategy: WorldCat Entities Work and Person Creation Process Flow Extractors Enhanced WC Records Harvested Triples Refined Triples CreateWorkReducer 1. Harvest 3. Reduce There are three components to the pipeline for creating Work and Person entities. The harvest component extracts the data from the different sources. The map component identifies the objects and combines the triples through name recognition and authority linkages. The reduce component pulls together the entity descriptions and writes them out to HBase. VIAF LCNAF DBPedi a CreatePersonReduc 2. Map ObjectMappe r PersonCombi ne WorkCombin e Datamining
  36. 36. • 197+ million Work descriptions and URIs • Schema.org + BiblioGraph.net • RDF Data formats • RDF/XML, Turtle, Triples, JSON-LD • Links to WorldCat manifestations • Links to Dewey, LCSH, LCNAF, VIAF, FAST • Open Data license via Linked Data Explorer • 2015: Discovery API, Metadata API • Released April 2014 http://www.oclc.org/dataThe Work Entity
  37. 37. • 98+ million Person descriptions and URIs • Person entities with authority: 20.2 million • Person entities without authority: 78.3 million • Schema.org + BiblioGraph.net • Harvested from WorldCat data and enriched from other hubs RDF Data formats • RDF/XML, Turtle, Triples, JSON-LD • Links to WorldCat Works. Added links from WC Works. • Open Data license via Linked Data Explorer • 2015: Linked Data Explorer, Discovery API http://www.oclc.org/dataThe Person Entity
  38. 38. person place object concept organization work Thelibraryknowledgegraph
  39. 39. Local Group Global Datamining Harvesting Cataloging
  40. 40. The Future
  41. 41. So, what should we do? 1.Catalog unique materials 2.Create authorities 3.Use harvesting and data mining for everything else
  42. 42. Cataloging Harvesting Datamining The Future
  43. 43. Discussion Ted Fons Executive Director, Data Services & WorldCat Quality fonst@oclc.org

Hinweis der Redaktion

  • The Web has and continues to evolve:
    Linked Documents – documents built on the fly from databases – search engines analyzing the links to create discovery – sites starting to publish the [linked] data behind the documents.
    How have libraries engaged with the web:
    Enthusiastic & leading for documents – actively disengaged with the search engines (technology issues and commercial concerns) – partial engagement with the web of data.
    A Web of knowledge is forming as the search engines analyze the relationships in the data – how will libraries participate?

  • The Web has and continues to evolve:
    Linked Documents – documents built on the fly from databases – search engines analyzing the links to create discovery – sites starting to publish the [linked] data behind the documents.
    How have libraries engaged with the web:
    Enthusiastic & leading for documents – actively disengaged with the search engines (technology issues and commercial concerns) – partial engagement with the web of data.
    A Web of knowledge is forming as the search engines analyze the relationships in the data – how will libraries participate?

  • Google's knowledge Graph navigating between entity descriptions….
  • 0 - Search for MSU in 2012
    1 - On the left traditional Organic search results
    2 – notice poor description of the MSU Library
    3 – Google’s Knowledge Card of the “Thing” they believe to be Montana State University Library
    4 – However, it’s the wrong phone number, wrong city, wrong address, wrong map

  • 0 – After doing research in this area we have concluded a Library must establish and maintain its semantic identity on the Web. This is the same search for Montana state university library in 2014. Lets walk though the changes on this slide and then talk about how they came about in the rest of the presentation
    1 – Improved description of the library
    2 - Correct address, phone number and a Google map link
    3 - more links to key areas of our web site as determined by Google’s algorithms
    4 - link to more results from Montana.edu
    5 - link to our G+ page w/ a picture of our building and the number of followers
    6 – the correct MSU Library Logo
    7 – link to a robust Wikipedia description of the MSU library
  • The Web has and continues to evolve:
    Linked Documents – documents built on the fly from databases – search engines analyzing the links to create discovery – sites starting to publish the [linked] data behind the documents.
    How have libraries engaged with the web:
    Enthusiastic & leading for documents – actively disengaged with the search engines (technology issues and commercial concerns) – partial engagement with the web of data.
    A Web of knowledge is forming as the search engines analyze the relationships in the data – how will libraries participate?

  • We, at OCLC, with our major data ingest and processing techniques – Big Data tech
    Matching incoming data with what we have
    Identifying the entities and associating their role attributes
    Works – not so far very visible in libraries – important on the web

    Building a graph of relationships
  • Data to underpin innovation! - A person knowledge card in a prototype WorldCat Discovery interface
  • Refined from 320M harvested entities.

    f there is a 100 or 700 field for a Person entity, then there will be a BY relationship (creator, contributor, author, illustrator, etc) in the WC Work description that includes a WC Person URI.

    If there is a 600 field for a Person entity, then there will be an ABOUT relationship (subject, etc) in the WC Work description that includes a WC Person URI.

    Other sources:

    After creating the set of Person entities, we started the process of enriching the entities with data harvested from other sources - images and other information from DBPedia, preferred names from LC, see also links from VIAF, and profile information (subjects, genres, and roles most known for) from WC Identities.

×