The Research Data Alliance--Creating the culture and technology for an international data infrastructure
1. The Research Data Alliance
Creating the culture and technology for an international data infrastructure
Mark A. Parsons
Secretary General
Australian National Data Service
Melbourne, Australia
24 October 2014
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License
2. All of society’s grand challenges require diverse
(often large) data to be shared and integrated
across cultures, scales, and technologies.
3. Research Data Alliance
Vision
Researchers and innovators openly share data across
technologies, disciplines, and countries to address the
grand challenges of society.
Mission
RDA builds the social and technical bridges that enable
open sharing of data.
9. Dynamics of Infrastructure
Edwards, et al. 2007 Understanding Infrastructure: Dynamics,
Tensions, and Design.
• Infrastructures become “ubiquitous, accessible, reliable, and
transparent” as they mature.
• Systems Networks Inter-networks
• “system-building, characterized by the deliberate and successful
design of technology-based services.”
• “technology transfer across domains and locations results in
variations on the original design, as well as the emergence of
competing systems.”
• Finally, “a process of consolidation characterized by gateways
that allow dissimilar systems to be linked into networks.”
12. Bridges and
Gateways
Gateways are often wrongly
understood as “technologies,”
i.e. hardware or software
alone. A more accurate
approach conceives them as
combining a technical solution
with a social choice, i.e. a
standard, both of which must
be integrated into existing
users’ communities of
practice. Because of this,
gateways rarely perform
perfectly.
— Edwards et al. 2007
23. Themes from A. Tsing on Collaboration
Friction—An ethnography of global connection
•“Actual existing universalisms are
hybrid, transient, and involved in
constant reformulation through
dialogue.” They work out through
friction.
•“There is no reason to think
collaborators have common goals.”
•Unity and diversity cover each
other up. Need to remember the
local.
24. Where Good Ideas Come From
• The Adjacent Possible—the
importance of local
•Often not “Eureka!” but rather a
slow hunch fading in to view over
time.
• Hunches need to collide with other
hunches so create that
environment. Don’t protect IP
share it. Connecting vs. protecting
• Sharing of failures as well.
• Create spaces for that to happen—
virtual and real coffee shops
• “Chance favors the connected mind.”
25. It’s all about Relationships
(I’m an introvert)
• The central challenge is diversity.
•We address it through variety and myriad interfaces and
connections.
• Fostering relationships is central to community and data science.
• they build social capital—success through giving
• they uncover tacit knowledge
• they inform methods
26. Data Science and Collaborative Methods
• User-driven design is not just end user. Engage providers and funders too.
• Case studies not just use cases.
• Ethnography—study relationships because data are often at the center of that
interaction—a boundary object.
• Agile is not just for software (courtesy Bruce Caron).
• Individuals and interactions over processes and tools
• Working volunteers over comprehensive documentation
• Member collaboration over contract negotiation
• Responding to change over following a plan.
27. But what does this all have to do with
RDA?
1. RDA focusses on developing “gateways”
2. RDA doesn’t do “architecture,” but it does provide a level of unity.
28. Deliverables that make data work
“Create - Adopt - Use”
• Adopted code, policy, specifications, standards, or practices that
enable data sharing
• “Harvestable” efforts for which 12-18 months of work can eliminate
a roadblock
• Efforts that have substantive applicability to
groups within the data community but may
RDA Principles
not apply to all
Openness
Consensus
• Efforts that can start today
Balance
Harmonization
Community Driven
Non-profit
30. RDA Working Groups
1. Brokering Governance*
2. Data Citation WG
3. Data Description Registry
Interoperability
4. Data Foundation and Terminology
WG
5. Data Type Registries WG
6. Metadata Standards Directory
Working Group
7. PID Information Types WG
8. Practical Policy WG
9. RDA/CODATA Summer Schools in
Data Science and Cloud Computing
in the Developing World*
10.RDA/WDS Publishing Data
Bibliometrics WG
11.RDA/WDS Publishing Data Services
WG
12.RDA/WDS Publishing Data
Workflows WG
13.Repository Audit and Certification
DSA–WDS Partnership WG
14.Standardisation of Data Categories
and Codes WG
15.The BioSharing Registry:
connecting data policies, standards
& databases in life sciences*
16.Urban Quality of Life Indicators*
17.Wheat Data Interoperability WG
* in review
31. But what does this all have to do with
RDA?
1. RDA focusses on developing “gateways”
2. RDA doesn’t do “architecture,” but it does provide a level of unity.
3. RDA plays both globally and locally—Think “glocal”.
32. Other
Private6%
13%
Government
18% Academia
63%
Distribution of 2,353 Individual RDA Members in 96 Countries
12 September 2014
Map courtesy traveltip.org
Europe
50%
North America
36%
Austral-pacific
5%
Africa
3%
South
America
1%
Asia
5%
33. Regional RDAs
• Australian National Data Service, RDA/United States, RDA/Europe,
• Implement RDA deliverables locally and enhance adoption.
• Ensure regional or national issues are addressed globally.
• Support plenaries and support attendance at plenaries.
34. But what does this all have to do with
RDA?
1. RDA focusses on developing “gateways”
2. RDA doesn’t do “architecture,” but it does provide a level of unity.
3. RDA plays both globally and locally—Think glocal.
4. RDA fosters relationships, interfaces, and connections.
5. RDA provides a “neutral place” to identify and work through friction.
36. RDA Interest Groups
1. Agricultural Data Interoperability IG
2. Big Data Analytics IG
3. Biodiversity Data Integration IG
4. Brokering IG
5. Community Capability Model IG
6. Data Fabric IG
7. Data for Development
8. Data in Context IG
9. Defining Urban Data Exchange for Science IG*
10.Development of cloud computing capacity and
education in developing world research
11.Digital Practices in History and Ethnography IG
12.Domain Repositories Interest Group
13.Education and Training on handling of research
data
14.ELIXIR Bridging Force IG*
15.Engagement IG
16.Federated Identity Management
17.Geospatial IG*
18.Libraries for Research Data*
19.Long tail of research data IG
20.Marine Data Harmonization IG
21.Metabolomics
22.Metadata IG
23.PID Interest Group
24.Preservation e-Infrastructure IG
25.RDA/CODATA Legal Interoperability IG
26.RDA/CODATA Materials Data, Infrastructure &
Interoperability IG
27.RDA/WDS Certification of Digital Repositories IG
28.RDA/WDS Publishing Data Cost Recovery for
Data Centres
29.RDA/WDS Publishing Data IG
30.Reproducibility IG*
31.Research data needs of the Photon and Neutron
Science community
32.Research Data Provenance
33.Service Management IG
34.Structural Biology IG
35.Toxicogenomics Interoperability IG
* in review
39. Fran Berman
39
§ Council:
§ Fran Berman (US), co-Chair
§ Patrick Cocquet (France)
§ Tony Hey (US)
§ Kaye Raseroka (Botswana)
§ Satoshi Sekiguchi (Japan)
§ Doris Wedlich (Germany)
§ Ross Wilkinson (Australia)
§ John Wood (UK), co-Chair
• Secretariat
§ Timea Biro
§ Hilary Hanahoe
§ Fotis Karayannis
§ Stefanie Kethers
§ Kathy Fontaine
§ Yolanda Meleco
§ Mark Parsons, Sec Gen
§ Herman Stehouwer
• Organisational Assembly
§ Juan Bicarregui, co-Chair
§ Walter Stewart, co-Chair
§ Technical Advisory Board
§ Bridget Almas
§ Simon Cox
§ Liu Chuang
§ Peter Fox
§ Francoise Genova
§ Carole Palmer
§ Beth Plale, Chair
§ Susanna-Assunta Sansone,
§ Jamie Shiers
§ Rainer Stotzka
§ Andrew Treloar, Chair
§ Peter Wittenburg
RDA Leadership
40. Organisational
Partners—key linkages
• Organisations play an essential
role as adopters!
• Organisational Assembly =
Organisational Members and
Affiliates.
• Organisational Advisory Board will
represent Organisational
Assembly to Council
• Organisational Members pay
(modest) dues and have a special
voice within RDA helping ensure
RDA products stay relevant
Image courtesy anybots.com
41. Organisational Members and Affiliates
§ Organisational Members:
§ Alliance for Permanent Access
§ American University Library
§ Australian National Data Service
§ Barcelona Supercomputing Center - Centro
Nacional de Supercomputación
§ Columbia University Library
§ CNRI
§ CSC
§ Digital Curation Center
§ EIROForum IT Working Group
§ eResearch Services and Scholarly
Application Development Division of
Information Services, Griffith University
§ European Data Infrastructure (EUDAT)
§ National Institute of Advanced Industrial
Science and Technology (AIST), Japan
§ International Association of STM Publishers
§ Internet2
§ Microsoft Research
§ NZ eScience Infrastructure
§ Purdue University Libraries
§ Research Data Canada
§ Scholarly Publishing and Academic
Resources Coalition (SPARC)
§ Washington University in St. Louis Libraries
§ Science and Technology Facilities Council
§ Affiliates
§ CODATA
§ ICSU World Data System
§ ORCID
§ DataCite
§ CASRAI
§ Global Alliance for Genomics and Health
43. RDA Colloquium—RDAC
• The group of government and non-profit science funding
organisations that support the data and science communities to
participate in RDA activities:
• Australian Government
• US Government (NSF and NIST)
• European Commission
• Allows agencies the opportunity to share funding program plans
that support data exchange, interoperability, and data
infrastructures across the globe, and thereby amplify their impact.
• Related to but distinct from RDA. A parallel organisation.
44. Initial Products—adopt one today!
• A basic vocabulary of foundational terminology and query tool to make sure we know
what we’re talking about.
• A data type model and registry (“MIME-types” for data) to help tools interpret, display,
and process data.
• A persistent identifier type registry to help search engines understand what they are
pointing to and retrieving.
• Coming soon:
• A basic set of machine actionable rules to enhance trust
• A metadata standards directory so we can describe similar things consistently
• A dynamic data citation methodology so we can reference precise subsets of
changing data.
• Semantically linked terms describing wheat data so we can share harvest and
related information around the world
45. Get involved!
• Join RDA as an individual member supporting our principles at
http://rd-alliance.org
• Join as an Organisational Member (nominal fee) or an
Organisational Affiliate (jointly sponsored efforts).
• Initiate or join an Interest Group
• Propose or join a Working Group
• Attend the RDA Plenaries
Coming together is a beginning;
keeping together is progress;
working together is success.
—Henry Ford
46. Summary
• Infrastructure is created in phases with the final consolidation phase relying on
gateways and bridges.
• Diversity is a central problem, but only diversity absorbs diversity.
• Networking and interconnection are the way to solve complex problems.
• We are in more global and democratic world, but also a more local world.
Coalition politics with new kinds of coalitions because there are new kinds of
identity.
• Data science needs to focus on relationships, connections, interfaces.
• You must participate “glocally” to succeed.
• RDA provides mechanisms to address all of the above!