This presentation, invited for a workshop on Open Access and Scholarly Books (sponsored by the Berkman Center and Knowledge Unlatched), provides a very brief overview of metadata design principles, approaches to evaluation metrics, and some relevant standards and exemplars in scholarly publishing. It is intended to provoke discussion on approaches to evaluation of the use, characteristics, and value of OA publications.
1. Prepared for
Open Access and Scholarly Books
Berkman Center/Knowledge Unlatched
June 2013
Metadata and Metrics to Support Open
Access Monographs
Dr. Micah Altman
<escience@mit.edu>
Director of Research, MIT Libraries
2. DISCLAIMER
These opinions are my own, they are not the opinions
of MIT, Brookings, any of the project funders, nor (with
the exception of co-authored previously published
work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the
future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill,
Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi,
Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle,
George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White,
etc.
Metadata and Metrics to Support Open
Access Monographs
2
3. Related Work
• Altman (2012) “Mitigating Threats To Data Quality Throughout the
Curation Lifecycle, 1-119. In Curating For Quality.
• CODATA-ICSTI Task Group on Data Citation Standards and Practices,
(Forthcoming 2013), Citation of Data: The Current State of Practice, Policy,
and Technology, CODATA.
• National Digital Stewardship Alliance, (Forthcoming 2013), National
Agenda for Digital Stewardship.
• Uhlir (ed.) (2012), Developing Data Attribution and Citation Practices
and Standards Report from an International Workshop. National
Academies Press, 2012
Most reprints available from:
informatics.mit.edu
Metadata and Metrics to Support Open
Access Monographs
3
4. The Next 10 Minutes
• Level setting
• Start Discussion questions
Metadata and Metrics to Support Open
Access Monographs
4
5. Preview: Some Discussion Questions
• Successful examples/exemplars:
– existing metadata and effective uses of it with books?
– graceful degradation, increasing returns, etc.?
• Emerging requirements:
– Explicit metadata (or identifier, integration, etc.) requirements from
stakeholders?
– In what ways do these explicit support use, evaluation, and integration?
– Clear implicit requirements? … Licensing (CC-BY, CC0)? Identifier
schemes (ISBN, DOI)? Indexing integration requirements?
– What evidence could you envision showing your stakeholders to
demonstrate success?
• Opportunities
– ‘Easy pickings’ – metadata already produced in production,
dissemination, use, but not retained?
– “‘Looks-easy’ pickings” – opportunities for automated extraction; crowd-
sourced entry and refinement?
– Leverage points – e.g. where can effort applied to prime the pump,
coordinate practice, or build infrastructure yield network effects, lower
barriers to entry, create norms/nudges, or coordinating equlibria that
generate incentives to continue production? 5
6. What is metadata anyway?
(a) “data about data”
(b) something the NSA
wants a lot of
(c) magic pixie dust
(d)digital breadcrumbs
(e) all of the above
Metadata and Metrics to Support Open
Access Monographs 6
Source:
http://www.guardian.co.uk/technology/inte
ractive/2013/jun/12/what-is-metadata-nsa-
surveillance#meta=0000000
7. What good is it?
• Support decision & workflow for production
• Add value to product
– Support discovery -- descriptive information
– Support use – re-presentation, navigation
– Support reuse/integration – descriptive, structural,
provenance
• Grow the evidence base regarding OA books
– characteristics of production, products, and use
– E.g., costs, content features, authors, quality
• Support evaluation
Metadata and Metrics to Support Open
Access Monographs
7
8. Selected Characteristics
• Purpose
– Descriptive
– Structural
– Administrative
• Identification
• Rights
• Provenance
• Fixity
• Preservation
– Linkages/relationships
– Annotation
Metadata and Metrics to Support Open
Access Monographs
8
• Granularity
• Association Model
– Embedded
– Associated
– Third party
• Schema
– Mandatory elements
– Structure
• Ontology
– Semantics
– Relationships among
elements and concepts
9. Design Heuristics
• Dublin Core Design
Principles
[Duval, et al. 2002]
– Modularity
– Extensibility
– Capacity for refinement
– Multilingual
• Early capture
• Automated extraction
• Approaching richness
– Progressive enhancement
– Graceful degradation
– Increasing returns to
investment
– requirement -> barrier
Metadata and Metrics to Support Open
Access Monographs
9
10. Evaluation
• Measurement characteristics
– Scope: Local measures vs. Ego-centric vs.
Global
– Duration: Point in time vs. period vs. trend
– Measurement Scale: Absolute vs. proportion
vs. rank vs. pairwise comparisons vs. purely
descriptive (e.g. usage stories)
• Inputs
– Content
– Associated meta-information
– External behaviors, actions (awards),
reputation
• Use characteristics:
– … understandability (cognitive burden) of
metrics
– … dissemination and adoption strategy
– … incentives to be strategic to effect measures
Metadata and Metrics to Support Open
Access Monographs
10
• Some emerging
approaches:
– Proxies for interest
(citation counts)
– Proxies for use
(downloads, reading
patterns, annotation
patterns, data citations)
– Proxies for (predictive)
value
(journal impact metrics,
h(g,i)-indices, PageRank,
Google rank, models of
network evolution)
[See Borner, et al. 2004; Kurtz & Bollen 2010;
Bollen et. al 2009; Uhlir 2012; CODATA-ICSTI Task
Group on Data Citation Standards and
Practices… 2013]
11. Ecosystem Integration
• Usage
– SUSHI / COUNTER
http://www.niso.org/workrooms/sushi/
• (NISO Standardized Usage Statistics Harvesting Initiative)
• Protocol for transmission of usage statistics / practices & schema for
formatting and collecting usage statistics
• Digital work identifiers / locators
– Exemplars: DOI’s / OpenURL
– Use of identifier internal to monograph adds value for
later use and evaluation
– Use of identifier / standard locator to refer to work
provides potential leverage point for usage metrics
collection
• Other identifiers
– FUNDREF – funding identifiers
– ORCID/ISNI – contributor identifiers
– Data Citations – citations to data and other non-
traditional scholarly publication
– Embedding in monograph adds value to evidence base
– Useful for evaluations – esp. those that are likely to
align incentives among funders & contributors
• De facto discovery, use & evaluation
– e.g. Google, Amazon
Metadata and Metrics to Support Open
Access Monographs
11
12. Examples: Current State of the Practice
• Institutional Repository Metrics
– Harvard DASH User Stories
https://osc.hul.harvard.edu/dash/stories
– MIT Global Impact
http://dspace.mit.edu/handle/1721.1/49433
– SSRN Author and Paper Metrics:
http://hq.ssrn.com/rankings/Ranking_display.cfm?TRN_gID=
10&requesttimeout=900
• Aggregators
– Project Muse
http://muse.jhu.edu/about/stats.html
– Highwire
http://sushi.highwire.org/
– HathiTrust Research Center
http://www.hathitrust.org/htrc
Metadata and Metrics to Support Open
Access Monographs
12
13. Some Discussion Questions
• Successful examples/exemplars:
– existing metadata and effective uses of it with books?
– graceful degradation, increasing returns, etc.?
• Emerging requirements:
– Explicit metadata (or identifier, integration, etc.) requirements from
stakeholders?
– In what ways do these explicit support use, evaluation, and integration?
– Clear implicit requirements? … Licensing (CC-BY, CC0)? Identifier
schemes (ISBN, DOI)? Indexing integration requirements?
– What evidence could you envision showing your stakeholders to
demonstrate success?
• Opportunities
– ‘Easy pickings’ – metadata already produced in production,
dissemination, use, but not retained?
– “‘Looks-easy’ pickings” – opportunities for automated extraction; crowd-
sourced entry and refinement?
– Leverage points – e.g. where can effort applied to prime the pump,
coordinate practice, or build infrastructure yield network effects, lower
barriers to entry, create norms/nudges, or coordinating equlibria that
generate incentives to continue production? 13
This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Metdata can be defined variously as "data about data", digital 'breadcrumbs', magic pixie dust, and "something that everyone now knows the NSA wants a lot of". It's all of the above. Metadata is used to support decision and workflows, add value to objects (through enhancing discover, use, reuse, and integration), and to support evaluation and analysis. It's not the whole story for any of these things, but it can be a big part. This presentation, invited for a workshop on Open Access and Scholarly Books (sponsored by the Berkman Center and Knowledge Unlatched), provides a very brief overview of metadata design principles, approaches to evaluation metrics, and some relevant standards and exemplars in scholarly publishing. It is intended to provoke discussion on approaches to evaluation of the use, characteristics, and value of OA publications.