SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
Outline

                                                                                                                              The Context
           Twenty Years of Metadata:
                 Lessons from the                                                                                             Dublin Core in the Metadata Matrix
          First Two Decades of the Web                                                                                        What we did right
                                Stuart Weibel
                        University of Tsukuba Visiting Scholar
                                     May 13, 2011                                                                             The major impediments

                                                                                                                              A few words about models

                                                                                                                              What about the future?
                                                                                                                                                Image: Carved figures (Morikawa Toen), Tokyo National Museum




                     THe Context                                                                           And now?

When I started working at OCLC in 1985:                                             A cell phone has more computing power than the Space Shuttle

   I was 4 years away from my first email address                                   An iPod will hold WorldCat
   A PC hard drive wasn’t large enough to store a
                                                                                    Bandwidth is more important than computing power
   single high resolution digital image.
   (which was ok, because…)
                                                                                    The library is still mostly mired in MARC
   Cameras still used film me… circa 1994
                                                                                    There are many metadata standards (mostly struggling for traction)
   Cell phones were suitcase-sized                               me… circa 1994


                                                                                    People (mostly) find things with Google
   MARC Cataloging stood alone as the discovery tool for intellectual assets of
   libraries
                                                                                    but….
   No end-user access to the global library catalogs




Metadata is more than just                                                                  50 years of Metadata
         search
                                                                                  MARC standards (library metadata)
                  Metadata-dependent actions
                                                                                        OCLC founded (shared library cataloging)
                                  Describe                                                    ARPANET Operational - forerunner of the Internet
                                                                                                       Networking diffuses throughout academia
                                    Access
                                                                                                                   The Web begins... FRBR work begins
                            Encode/Render                                                                                     First Dublin Core Workshop
                                                                                                                                DCMI established
                                   Preserve
                                                                                                                                  Google is founded
                         Rights Management                                                                                            First Dublin Core Conference (Tokyo)
                                                                                                             my first email
                                                                                                                                          WorldCat introduced
                                Administer                                                                      address


                                                                                                                                             RDA introduced
            “Bind” digital pages in digital books
                                                                                   1960s      1970s      1980s                1990s    2000s
The confusion:
                                                                                                              Jenn Riley’s Metadata Map
                        How bad is it?
                                                                                                               105 standards

                                                                                                               30 most common across the top (3 predate the Web)

                                                                                                               some share common models… most do not
                                        Text
                                                                                                               much overlap

                                                                                                               many work together

                                                                                                               Who among us can choose rationally from the array of
   “This visual map of the metadata landscape is intended to assist                                            standards, platforms, technologies?
planners with the selection and implementation of metadata standards.”
                                                                                                               Will the results have any reasonable expectation of
http://www.dlib.indiana.edu/~jenlrile/metadatamap/                                                             interoperability?




                The real world is not                                                                              The map is much more
                 standards-centric                                                                                      complicated
    Metadata-
 dependent actions        Standard             Information Entities (ex.)                                        “This visual map of the metadata landscape is intended to assist
                         MARC, DC, MODS,                     Agents                                           planners with the selection and implementation of metadata standards.”
     Describe           RDA, LCSH, MeSH….      (persons, corporate entities, devices)

      Access             HTTP, FTP….                         Events
                         RDF, media-type
 Encode/render          dependent (many)
                                                  Time intervals or eras

     Preserve               PREMIS                         Concepts
      Rights               CC licenses,
    Management          eCommerce systems                 Collections
                                                                                                               “selection and implementation of metadata standards requires a clear
   Administer           METS, MARC….                     Media-types                                             understanding of the information entities, the standards, and the
 “Bind” digital pages     METS, eBook
                                                   Structured data type
                                                                                                                        functional requirements of the system under design”
   in digital books        standards                                                                                                                                        Image: Kyoto horizon from above the Tenru-ji Temple




               Dublin Core in the
                                                                                                                         Things we did right
                metadata matrix
                                                                                                               We didn’t call it ‘cataloging’ (Web, not libraries)
   The first metadata standard for
   the Web                                                                                                     A hybrid of technical engineering
                                                                                                               and social engineering
   General and cross-disciplinary
                                                                                                               International - Major events on
                                                                                                               5 continents, element definitions
   Simple starting place, but
                                                                                                               in 20+ languages (maintained in
   extensible
                                                                                                               Tsukuba)

   International and multilingual                                                                              Separated syntax and semantics

   Consensus-driven (bottom-up,                                                                                Built a community of practice
   rather than top-down)
                                                                                                               About the right level of complexity for a core element set
                                                               Image: Jomon Pottery, Tokyo National Museum,
                                                                                                                                                                                       Image: Harajuku train station platform, Tokyo
Impediments that tripped
us up                                                                                                  Data Modeling: what is it?
                                                                                                         Entity-relationship model defines the important concepts or things
 Too many syntaxes to support
 (HTML, XML, RDF-XML)
                                                                                                         (entities), and the relationships among them

 No common data model                                                                                    A model is a model, not reality
 but we tried hard:
 data model group,                                                                                       Designed to solve a problem,
 architecture group,
                                                                                                         not to emulate the real world
 abstract model,
 Singapore Framework...
                                                                                                         The complexity of the model
 Without a data model, the story we told was not consistent: confusion resulted                          should be mapped to the
                                                                                                         problem, not to reality
 Without a data model, details of implementation become arbitrary (and less
 interoperable)
                                                               Image: Netsuke, Tokyo National Museum     Identifying the right level of abstraction is an art                                    Image: Edo Museum




Data Modeling: why is it                                                                               An example of modeling
necessary?                                                                                             mismatch
                                                                                                       Citation information
 Without a shared
                                                                                                                       Date
 understanding of the
 important entities, and the                                                                                            Title
 relationships among them,                                                                                           Author
 systems will not                                                                                                 Affiliation
 interoperate easily
                                                                                                             Email address

 Cross-walks become
 necessary: clumsy,                                                                                    - Which of the attributes are Dublin Core?
                                       Changing rail car ‘bogeys’ on the
 inaccurate, inefficient                    China/Mongolia border
                                                                                                       - Is “email address” an attribute of the resource, or the person?
                                                                                                       - Should there be a distinction between Title and Subtitle?




Is Dublin Core well-matched to the
problem of bibliographic description?                                                                  The problem with models
                                                                                                         Matching the complexity of models to a diverse and evolving
 It is too simple to capture the precision of detailed
                                                                                                         problem is challenging, and full of compromises
 bibliographic description
                                                                                                           too much complexity
 BUT… It is good enough for many purposes, including the                                                   leads to failure
 description of most simple internet resources                                                             (creeping elegance)

 The trade-off between perfect matching of model and                                                       too little complexity
 problem, and simplicity of use is always a compromise                                                     leads to failure
                                                                                                           (insufficient richness
 DC was intended for general resource description, not to                                                  to solve the problem)
 replace MARC                                                                                            HOW DO YOU KNOW WHEN IT IS RIGHT?
                                                                                                                                                        Image: figures from a model in the Kyushu National Museum
Conceptual Models in the                                                                                          The Next Chapters in the Web
         Library World                                                                                                     Metadata story...

                                            The dominant models for                                               ...are being written in the W3C Incubator Group on Library Linked Data (http://
       FRBR and FRAD                                                                                              www.w3.org/2005/Incubator/lld/)
                                         bibliographic and authority data
                                           Reference model for Open                                               Many questions:
              OAIS
                                          Archive Information Systems
                                                                                                                     Will the data be open?
                                          Conceptual Reference Model for
         CIDOC CRM                                                                                                   Who will maintain it?
                                          cultural heritage documentation
                                                                                                                     Is semantic web infrastructure stable?

                                                                                                                     Can existing metadata be integrate
                                         Largely unintelligible data model                                           seamlessly into the web?
  Dublin Core Abstract Model
                                          for Dublin Core instance data
                                                                                                                     Can a model be agreed upon?
                                           A vague framework describing
    Singapore Framework
                                         levels of metadata interoperability                                         Will we ever have interoperability across domain silos?
                                                                                                                                                                                   Image: Stone Monk in the Nezu Museum Garden




stuart.weibel@gmail.com


http://weibel-lines.typepad.com


@stuartweibel on twitter


stuartweibel on Facebook

         all photographs by the author
                                                Image: Lantern overlooking the Irises in the Nezu Museum Garden

Weitere ähnliche Inhalte

Andere mochten auch

Аренда аукционного зала в Москве
Аренда аукционного зала в МосквеАренда аукционного зала в Москве
Аренда аукционного зала в МосквеSokol77
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Stuart Weibel
 
Amco Metal Maharashtra India
Amco Metal Maharashtra IndiaAmco Metal Maharashtra India
Amco Metal Maharashtra IndiaAmco Metal
 
Persentation Of Vijay pal(BLSIE)Retail Management
Persentation Of Vijay pal(BLSIE)Retail ManagementPersentation Of Vijay pal(BLSIE)Retail Management
Persentation Of Vijay pal(BLSIE)Retail Managementvijayroks01
 
Be Suited For The Job
Be Suited For The JobBe Suited For The Job
Be Suited For The JobMartijn Spohr
 
Be suited for the job
Be suited for the jobBe suited for the job
Be suited for the jobMartijn Spohr
 
Missing pieces in_the_global_metadata_landscap
Missing pieces in_the_global_metadata_landscapMissing pieces in_the_global_metadata_landscap
Missing pieces in_the_global_metadata_landscapStuart Weibel
 
Российский аукционный дом Технологии продаж
Российский аукционный дом  Технологии продажРоссийский аукционный дом  Технологии продаж
Российский аукционный дом Технологии продажSokol77
 

Andere mochten auch (13)

Perryville Business Centers Value
Perryville Business Centers ValuePerryville Business Centers Value
Perryville Business Centers Value
 
Аренда аукционного зала в Москве
Аренда аукционного зала в МосквеАренда аукционного зала в Москве
Аренда аукционного зала в Москве
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?
 
Amco Metal Maharashtra India
Amco Metal Maharashtra IndiaAmco Metal Maharashtra India
Amco Metal Maharashtra India
 
Be social
Be socialBe social
Be social
 
Channels of collaboration
Channels of collaborationChannels of collaboration
Channels of collaboration
 
Persentation Of Vijay pal(BLSIE)Retail Management
Persentation Of Vijay pal(BLSIE)Retail ManagementPersentation Of Vijay pal(BLSIE)Retail Management
Persentation Of Vijay pal(BLSIE)Retail Management
 
Be Suited For The Job
Be Suited For The JobBe Suited For The Job
Be Suited For The Job
 
Be social
Be socialBe social
Be social
 
Be the first
Be the firstBe the first
Be the first
 
Be suited for the job
Be suited for the jobBe suited for the job
Be suited for the job
 
Missing pieces in_the_global_metadata_landscap
Missing pieces in_the_global_metadata_landscapMissing pieces in_the_global_metadata_landscap
Missing pieces in_the_global_metadata_landscap
 
Российский аукционный дом Технологии продаж
Российский аукционный дом  Технологии продажРоссийский аукционный дом  Технологии продаж
Российский аукционный дом Технологии продаж
 

Ähnlich wie Weibel tsukuba-colloquium-6-up-2011-05-13

Machine learning and multimedia information retrieval
Machine learning and multimedia information retrievalMachine learning and multimedia information retrieval
Machine learning and multimedia information retrievalSi Krishan
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
Poster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen ManepatilPoster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen Manepatilap
 
Future It Services? Ask A Teenager!
Future It Services? Ask A Teenager!Future It Services? Ask A Teenager!
Future It Services? Ask A Teenager!Friprogsenteret
 
Living the life electric
Living the life electricLiving the life electric
Living the life electricDoctorG
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
The Inside Out Library.
The Inside Out Library. The Inside Out Library.
The Inside Out Library. lisld
 
Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)Joe Gollner
 
Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011INAF-OAC
 
Content Fusion; or, There a Piece of Data Lodged in my Document
Content Fusion; or, There a Piece of Data Lodged in my DocumentContent Fusion; or, There a Piece of Data Lodged in my Document
Content Fusion; or, There a Piece of Data Lodged in my DocumentJoe Gollner
 
GLA COMO WorldShare ILL/WorldShare
GLA COMO WorldShare ILL/WorldShare GLA COMO WorldShare ILL/WorldShare
GLA COMO WorldShare ILL/WorldShare LYRASIS_PRODEV
 
The Future of R&E networks and cyber-infrastructure
The Future of R&E networks and cyber-infrastructureThe Future of R&E networks and cyber-infrastructure
The Future of R&E networks and cyber-infrastructureBill St. Arnaud
 
MARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesMARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesDorothea Salo
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
 
Automated Experimentation in Social Informatics
Automated Experimentation in Social InformaticsAutomated Experimentation in Social Informatics
Automated Experimentation in Social InformaticsAliaksandr Birukou
 

Ähnlich wie Weibel tsukuba-colloquium-6-up-2011-05-13 (20)

Machine learning and multimedia information retrieval
Machine learning and multimedia information retrievalMachine learning and multimedia information retrieval
Machine learning and multimedia information retrieval
 
Lib.vision
Lib.visionLib.vision
Lib.vision
 
Lib.vision
Lib.visionLib.vision
Lib.vision
 
Why Cloud Computing is Different
Why Cloud Computing is DifferentWhy Cloud Computing is Different
Why Cloud Computing is Different
 
Cloud based Web Intelligence
Cloud based Web IntelligenceCloud based Web Intelligence
Cloud based Web Intelligence
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Poster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen ManepatilPoster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen Manepatil
 
Semantic Digital Libraries
Semantic Digital LibrariesSemantic Digital Libraries
Semantic Digital Libraries
 
Future It Services? Ask A Teenager!
Future It Services? Ask A Teenager!Future It Services? Ask A Teenager!
Future It Services? Ask A Teenager!
 
Living the life electric
Living the life electricLiving the life electric
Living the life electric
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
The Inside Out Library.
The Inside Out Library. The Inside Out Library.
The Inside Out Library.
 
Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)Beyond Publishing (Web Content 2009)
Beyond Publishing (Web Content 2009)
 
Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011
 
Content Fusion; or, There a Piece of Data Lodged in my Document
Content Fusion; or, There a Piece of Data Lodged in my DocumentContent Fusion; or, There a Piece of Data Lodged in my Document
Content Fusion; or, There a Piece of Data Lodged in my Document
 
GLA COMO WorldShare ILL/WorldShare
GLA COMO WorldShare ILL/WorldShare GLA COMO WorldShare ILL/WorldShare
GLA COMO WorldShare ILL/WorldShare
 
The Future of R&E networks and cyber-infrastructure
The Future of R&E networks and cyber-infrastructureThe Future of R&E networks and cyber-infrastructure
The Future of R&E networks and cyber-infrastructure
 
MARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archivesMARC and BIBFRAME; Linking libraries and archives
MARC and BIBFRAME; Linking libraries and archives
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
Automated Experimentation in Social Informatics
Automated Experimentation in Social InformaticsAutomated Experimentation in Social Informatics
Automated Experimentation in Social Informatics
 

Weibel tsukuba-colloquium-6-up-2011-05-13

  • 1. Outline The Context Twenty Years of Metadata: Lessons from the Dublin Core in the Metadata Matrix First Two Decades of the Web What we did right Stuart Weibel University of Tsukuba Visiting Scholar May 13, 2011 The major impediments A few words about models What about the future? Image: Carved figures (Morikawa Toen), Tokyo National Museum THe Context And now? When I started working at OCLC in 1985: A cell phone has more computing power than the Space Shuttle I was 4 years away from my first email address An iPod will hold WorldCat A PC hard drive wasn’t large enough to store a Bandwidth is more important than computing power single high resolution digital image. (which was ok, because…) The library is still mostly mired in MARC Cameras still used film me… circa 1994 There are many metadata standards (mostly struggling for traction) Cell phones were suitcase-sized me… circa 1994 People (mostly) find things with Google MARC Cataloging stood alone as the discovery tool for intellectual assets of libraries but…. No end-user access to the global library catalogs Metadata is more than just 50 years of Metadata search MARC standards (library metadata) Metadata-dependent actions OCLC founded (shared library cataloging) Describe ARPANET Operational - forerunner of the Internet Networking diffuses throughout academia Access The Web begins... FRBR work begins Encode/Render First Dublin Core Workshop DCMI established Preserve Google is founded Rights Management First Dublin Core Conference (Tokyo) my first email WorldCat introduced Administer address RDA introduced “Bind” digital pages in digital books 1960s 1970s 1980s 1990s 2000s
  • 2. The confusion: Jenn Riley’s Metadata Map How bad is it? 105 standards 30 most common across the top (3 predate the Web) some share common models… most do not Text much overlap many work together Who among us can choose rationally from the array of “This visual map of the metadata landscape is intended to assist standards, platforms, technologies? planners with the selection and implementation of metadata standards.” Will the results have any reasonable expectation of http://www.dlib.indiana.edu/~jenlrile/metadatamap/ interoperability? The real world is not The map is much more standards-centric complicated Metadata- dependent actions Standard Information Entities (ex.) “This visual map of the metadata landscape is intended to assist MARC, DC, MODS, Agents planners with the selection and implementation of metadata standards.” Describe RDA, LCSH, MeSH…. (persons, corporate entities, devices) Access HTTP, FTP…. Events RDF, media-type Encode/render dependent (many) Time intervals or eras Preserve PREMIS Concepts Rights CC licenses, Management eCommerce systems Collections “selection and implementation of metadata standards requires a clear Administer METS, MARC…. Media-types understanding of the information entities, the standards, and the “Bind” digital pages METS, eBook Structured data type functional requirements of the system under design” in digital books standards Image: Kyoto horizon from above the Tenru-ji Temple Dublin Core in the Things we did right metadata matrix We didn’t call it ‘cataloging’ (Web, not libraries) The first metadata standard for the Web A hybrid of technical engineering and social engineering General and cross-disciplinary International - Major events on 5 continents, element definitions Simple starting place, but in 20+ languages (maintained in extensible Tsukuba) International and multilingual Separated syntax and semantics Consensus-driven (bottom-up, Built a community of practice rather than top-down) About the right level of complexity for a core element set Image: Jomon Pottery, Tokyo National Museum, Image: Harajuku train station platform, Tokyo
  • 3. Impediments that tripped us up Data Modeling: what is it? Entity-relationship model defines the important concepts or things Too many syntaxes to support (HTML, XML, RDF-XML) (entities), and the relationships among them No common data model A model is a model, not reality but we tried hard: data model group, Designed to solve a problem, architecture group, not to emulate the real world abstract model, Singapore Framework... The complexity of the model Without a data model, the story we told was not consistent: confusion resulted should be mapped to the problem, not to reality Without a data model, details of implementation become arbitrary (and less interoperable) Image: Netsuke, Tokyo National Museum Identifying the right level of abstraction is an art Image: Edo Museum Data Modeling: why is it An example of modeling necessary? mismatch Citation information Without a shared Date understanding of the important entities, and the Title relationships among them, Author systems will not Affiliation interoperate easily Email address Cross-walks become necessary: clumsy, - Which of the attributes are Dublin Core? Changing rail car ‘bogeys’ on the inaccurate, inefficient China/Mongolia border - Is “email address” an attribute of the resource, or the person? - Should there be a distinction between Title and Subtitle? Is Dublin Core well-matched to the problem of bibliographic description? The problem with models Matching the complexity of models to a diverse and evolving It is too simple to capture the precision of detailed problem is challenging, and full of compromises bibliographic description too much complexity BUT… It is good enough for many purposes, including the leads to failure description of most simple internet resources (creeping elegance) The trade-off between perfect matching of model and too little complexity problem, and simplicity of use is always a compromise leads to failure (insufficient richness DC was intended for general resource description, not to to solve the problem) replace MARC HOW DO YOU KNOW WHEN IT IS RIGHT? Image: figures from a model in the Kyushu National Museum
  • 4. Conceptual Models in the The Next Chapters in the Web Library World Metadata story... The dominant models for ...are being written in the W3C Incubator Group on Library Linked Data (http:// FRBR and FRAD www.w3.org/2005/Incubator/lld/) bibliographic and authority data Reference model for Open Many questions: OAIS Archive Information Systems Will the data be open? Conceptual Reference Model for CIDOC CRM Who will maintain it? cultural heritage documentation Is semantic web infrastructure stable? Can existing metadata be integrate Largely unintelligible data model seamlessly into the web? Dublin Core Abstract Model for Dublin Core instance data Can a model be agreed upon? A vague framework describing Singapore Framework levels of metadata interoperability Will we ever have interoperability across domain silos? Image: Stone Monk in the Nezu Museum Garden stuart.weibel@gmail.com http://weibel-lines.typepad.com @stuartweibel on twitter stuartweibel on Facebook all photographs by the author Image: Lantern overlooking the Irises in the Nezu Museum Garden