TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Summary of data citation synthesis activity & Review
1. Prepared for
Data Citation Synthesis Group
Open Workshop
s
Sept 2013
Summary of data citation synthesis activity &
Next steps for review
<bit.ly/dsynthrev>
Dr. Micah Altman
<escience@mit.edu>
MIT Libraries
Joan Starr
Joan.starr@ucop.edu
California Digital Library
3. Refining Approaches to Data Citation
Summary of synthesis activity & Next steps for
review
2000
-2004
NESSTAR,
Virtual Data Center
Cite research data in
publications; Use
persistent identifiers;
Facilitate direct
access to data
through URI’s
[Ryssevik & Musgrave 2001]
[Altman, et al. 2001]
2005-
2009
Dataverse Network
System, TIB Data
DOI Registration
Include versioning,
fixity, and granularity
for verification; use
permanent
institutions; facilitate
attribution
[Buhneman 2006]
[Altman & King 2007]
[OECD 2009]
2010-
DataCite;
Thomson-Reuters
Data Citation
Index; FigShare;
Data Dryad
Include data citations
in standard locations;
index data citations
in catalogs; facilitate
machine
understanding
[NAS 2012]
[DCC 2012]
[Force 11 2013]
[CODATA 2013]
Example Systems Core
Recommendations
Key References
5. Synthesis Group Activity
• Hosted by Force 11
– Charter here: http://www.force11.org/node/4432
• Formed early summer
• Meeting weekly
• Reviewed current key recommendations
& engaged lead authors:
– Force 11/Amsterdam Manifesto [FORCE11 2012]
– Co-Data/”Out of Cite” Recommendations [CODATA 2013]
– DCC Guide [DCC 2012]
– DataCite/Metadata Core [Datacite 2012]
– Research Data Alliance
• Identified core principles that are consistent across recommendation groups
• Formulated a draft synthesis of principles
• Agreed to use key documents above for definitions of terms, detailed
explanation of issues
• Out of scope: specific detailed standards, protocols, infrastructure, tools
Summary of synthesis activity & Next steps for
review
6. Yesterday
• Open Workshop
• Line-by-line review of draft
• Open editing of document
– In shared document
– Using revision control
• Convergence on principles
– 8 principles revised and approved by consensus
– 1 recommendation struck
– 1 recommendation tabled for discussion today
• Summary
– Substantial core of agreement need for citation; use of persistent identifiers;
support for human and machine access; facilitation of verification, attribution.
– Maintain conceptual boundaries among data citation; publication &
evaluation
– Recognize that terminology cannot always be aligned with colloquial or
disciplinary usage
Summary of synthesis activity & Next steps for
review
7. The principles
1. Importance
2. Credit and attribution
3. Evidence
4. Unique Identification
5. Access
6. Persistence
7. Versioning and granularity
8. Interoperability and flexibility
Summary of synthesis activity & Next steps for
review
8. Open Question:
Data Repository Recommendations
Summary of synthesis activity & Next steps for
review
6. Persistence
Metadata describing the data, and unique identifiers
should persist, even beyond the lifespan of the data they
describe.
Data citations should be resolvable to data stored in
repositories with a commitment and demonstrated
capability to maintain long term access. Data stored in
such repositories may not always be publicly
accessible. Although such repositories should be
committed to long term maintenance and preservation
of data, the nature of digital data is such that they may
not persist indefinitely.
9. Review Process
• Synthesis group will supplement today’s consensus
principles with background:
– Illustrative examples for each recommendation
– References with each principle to detailed discussion of
embedded issues in prior reports.
– Glossary.
• Public release of draft for open online commentary
• Integration of commentary and release of final draft
Summary of synthesis activity & Next steps for
review
10. Questions for Review & Decisions
• Nomination of additional members to synthesis group for preparation of
summary material (glossary, references, example, preamble)?
– Decision: anyone in attendance who can substantively (if not officially) represent a group
– Decision: Identify additional key organizations for commentary,
• Public release of draft – when, to whom?
– Decision: Available for open public commentary mid November
– Decision: Will specifically request comments from key organizations, including:
• Organizations listed earlier ( Force11, DCC, CoData, ESIP, RDA, DataVerse, Data-PASS, DataCite)
• Additional suggested organizations: NLM, ARL
• Additional organization identified by synthesis group
• Open commentary via mailing list & force11 website. Period for commentary?
– Decision: 6-8 weeks for public commentary
• Integration of commentary by synthesis group and release of updated draft.
Number of drafts necessary? When to declare “done”?
– Decision: Single round of revisions by synthesis group. Will then seek endorsements.
Summary of synthesis activity & Next steps for
review
11. Additional References
• [Ryssevik & Musgrave 2001]
J Ryssevik , S. Musgrave. 2001. The Social Science Dream Machine
Social Science Computer Review [Altman, et al. 2001]
M. Altman, et al. 2001. A Digital Library for the Dissemination and Replication of Quantitative Social
Science Research: The Virtual Data Center, Social Science Computer Review
• [Buhneman 2006]
P. Buhneman 2006. How to Cite Curated Databases and Make them Citable
SSDBM ’06
• [Altman & King 2007]
M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib
• [OECD 2009]
T. Green. 2009, We need publishing standards for datasets and data tables. OECD.
• [NAS 2012]
P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation Practices and Standards.
National Academies of Sciences.
Summary of synthesis activity & Next steps for
review
12. Synthesis Group Contacts
About the synthesis group:
http://www.force11.org/node/4432
Questions for the synthesis group:
datacitationworkgroup@force11.org
Consensus document, with revision
history:
https://docs.google.com/document/d/1Ko
sNqBPgE8ziWDuJgBIrk20KxcOXoZdA
t_TdJV3xoz8/edit?usp=drive_webSummary of synthesis activity & Next steps
for review
13. Key Recommendations
• [[Force11 2013]
M. Crosas, T. Carptenter, C. Borgman, D. Shotton 2013, The Amsterdam Manifesto on Data Citation
Principles, Force11
• [CODATA 2013]
CODATA-ICSTI Task Group on Data Citation, 2013; Out of Cite, Out of Mind: The Current State of Practice,
Policy, and Technology for the Citation of Data. Data Science Journal
• [DCC 2012]
Ball, A., Duke, M. (2012). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation
Centre.
Summary of synthesis activity & Next steps for
review
14. Additional References
• [Ryssevik & Musgrave 2001]
J Ryssevik , S. Musgrave. 2001. The Social Science Dream Machine
Social Science Computer Review [Altman, et al. 2001]
M. Altman, et al. 2001. A Digital Library for the Dissemination and Replication of Quantitative Social
Science Research: The Virtual Data Center, Social Science Computer Review
• [Buhneman 2006]
P. Buhneman 2006. How to Cite Curated Databases and Make them Citable
SSDBM ’06
• [Altman & King 2007]
M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib
• [OECD 2009]
T. Green. 2009, We need publishing standards for datasets and data tables. OECD.
• [NAS 2012]
P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation Practices and Standards.
National Academies of Sciences.
• [Datacite 2012]
Datacite metadata schema, v 3.0 http://schema.datacite.org/
Summary of synthesis activity & Next steps for
review
16. The principles
1. Importance
Data should be considered legitimate, citable
products of research. Data citations should be
accorded the same importance in the scholarly
record as citations of other research objects, such
as publications.
Summary of synthesis activity & Next steps for
review
17. The principles
2. Credit and attribution
Data citations should facilitate giving scholarly
credit and normative and legal attribution to all
contributors to the data, recognizing that a single
style or mechanism of attribution may not be
applicable to all data.
3. Evidence
Where a specific claim rests upon data, the
corresponding data citation should be provided.
Summary of synthesis activity & Next steps for
review
18. The principles
4. Unique identification
A data citation should include a persistent method
for identification that is machine actionable,
globally unique, and widely used by a community.
5. Access
Data citations should facilitate access to the data
themselves and to such associated metadata,
documentation, and other materials, as are
necessary for both humans and machines to make
informed use of the referenced data.
Summary of synthesis activity & Next steps for
review
19. The principles
6. Persistence
Metadata describing the data, and unique
identifiers should persist, even beyond the
lifespan of the data they describe.
[more to be decided upon]
Summary of synthesis activity & Next steps for
review
20. The principles
7. Versioning and granularity
Data citations should facilitate identification and
access to different versions and/or subsets of data.
Citations should include sufficient detail to verifiably
link the citing work to the portion and version of data
cited.
8. Interoperability and flexibility
Data citation methods should be sufficiently flexible to
accommodate the variant practices among
communities but should not differ so much that they
compromise interoperability of data citation practices
across communities.
Summary of synthesis activity & Next steps for
review
Hinweis der Redaktion
This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.