1. The Research Data Alliance in Europe, an
update…
EXTREME SCALE SCIENTIFIC COMPUTING WORKSHOP
Moscow – 30 June & 1 July 2014
Fabrizio Gagliardi
BSC, Spain - ACM Europe Chair
2. 2
Fabrizio Gagliardi reborn in BSC, Spain
After 30 years at CERN in Geneva
Many EU projects
And last 8 years in Microsoft and Microsoft
Research
Long history of projects in Russia on Grid
computing, Big data, HPC and computing vision
@ MSU and MSR HPC summer schools 2009-
2012
Introduction
3. 3
Big data, hype and HPC
“Big data” means different things to different people
(consider Satoshi’s previous talk)
• corporate data are not so big and demanding when
compared to scientific data
• social data are large but access is easy and trivially
parallel
• scientific data in new research domains like genetics is a
bigger challenge
• not true for all scientific data, CERN will produce 100
PB/year starting next year but with easy access and simple
processing models, still a very expensive game…
4. 4
Horizon2020: Research and Innovation
Horizon 2020 is the biggest EU Research and Innovation
programme ever with nearly €80 billion of funding available
over 7 years (2014 to 2020).
In addition to the private investment that this money will
attract. It promises more breakthroughs, discoveries and world-
firsts by taking great ideas from the lab to the market.
5. 5Research and Innovation
Research AND Innovation, not Research OR
Innovation
Research activities with innovation in mind
Innovation should have job creation in mind
But how to take great ideas from the lab to the
market?
What can a research funder do?
Which instruments do we have?
6. 6job creation is important
Following slides adapted from Joe McKendrick/Forbes, September 2012
http://www.smartplanet.com/blog/bulletin/7-new-types-of-jobs-created-by-big-
data/682
7 new types of jobs created by Big Data
In today’s unforgiving global economy, those organizations
that compete on analytics stand the best chance of outsmarting
the competition. The only catch is, they need skilled
professionals who know how to manage, mine and draw
actionable insights from all the “Big Data” now streaming
across enterprises.
7. 7job creation is important
1. Data scientists: this emerging role is taking the lead in
processing raw data and determining what types of analysis would
deliver the best results
2. Data architects: organizations managing Big Data need
professionals who will be able to build a data model, and plan out a
roadmap of how and when various data sources and analytical tools
will come online, and how they will all fit together
3. Data visualizers: organizations need professionals who can
“harness the data and put it in context, in layman’s language,
exploring what the data means and how it will impact the company”
8. 8job creation is important
4. Data change agents: driving “changes in internal operations
and processes based on data analytics.” They need to be good
communicators, they know how to apply statistics to improve quality
on a continuous basis
5. Data engineer/operators: people that make the Big Data
infrastructure hum on a day-to-day basis. “They develop the
architecture that helps analyse and supply data in the way the
business needs, and make sure systems are performing smoothly”
6. Data stewards: ensure that data sources are properly
accounted for
7. Data virtualization/cloud specialists: ability to build and
maintain a virtualized data service layer; organizations need
professionals that can also build and support these virtualized layers or
clouds
10. 10issues to be addressed (e-infrastructure)
The EC in coordination with EU Member States is looking
after research data as an infrastructure
As a valuable and a strategic resource, research data
opens at least three key issues to be addressed(*)
:
How data can be networked
How to envision and set up data governance on a
global scale
How the EU can play a leading role in helping start and
steer this global trend
(*) Fred Friend, Jean-Claude Guédon Herbert van Sompel
“Beyond Sharing and Re-using: Toward Global Data
Networking”
11. 11Policy context
A Reinforced European Research Area Partnership for
Excellence and Growth, COM(2012) 392 – July 2012
Towards better access to scientific information: boosting the
benefits of public investments in research, COM(2012) 401 final -
July2012
Commission, Recommendation on access and preservation of
scientific information, C(2012) 4890 final – July 2012
Horizon 2020
- Open Access to Scientific Publications
- Pilot on research data
Data Management Plan
Open Science
12. 12
RESEARCH INFRASTRUCTURE (E-INFRASTRUCTURE HIGHLIHGTED)
Work Programme 2014-2015
CALL 1
DEVELOPING NEW
WORLD CLASS
INFRASTRUCTURES
CALL 2
INTEGRATING AND OPENING
RESEARCH
INFRASTRUCTURES
OF PAN-EUROPEAN
INTEREST
CALL 3
E-INFRASTRUCTURES
CALL 4
SUPPORT TO INNOVATION,
HUMAN RESOURCES,
POLICY AND INTERNATIONAL
COOPERATION
FOR RESEARCH
INFRASTRUCTURES
DESIG
N
STUDIE
S
SUPPORT TO
PREPARATORY
PHASE OF ESFRI
PROJECTS
SUPPORT TO THE
INDIVIDUAL
IMPLEMENTATION
AND OPERATION
OF ESFRI PROJECTS
SUPPORT TO THE
IMPLEMENTATION OF CROSS-
CUTTING INFRASTRUCTURE
SERVICES AND SOLUTIONS FOR
CLUSTER OF ESFRI AND OTHER
RILEVANT RESEARCH
INFRASTRUCTURE INITIATIVES IN
A GIVEN THEMATIC AREA
INTEGRATING AND OPENING
EXISTING NATIONAL AND
REGIONAL
RESEARCH INFRASTRUCTURES OF
PAN-EUTROPEAN INTEREST
MANAGING,
PRESERVING AND
COMPUTING WITH
BIG RESERACH DATA
E-
INFRASTRUCTURE
S FOR OPEN
ACCESS
TOWARDS GLOBAL
DATA
E-INFRASTRUCTURES:
RESEARCH DATA
ALLIANCE
Pan-European
High Performance Computing
infrastructure and services
Centres
of Excellence
for Computing
applications
Network of
HPC Competence
Centres for SMEs
PROVISION OF
CORE SERVICES
ACROSS
E-
INFRASTRUCTURE
S
RESEARCH
AND
EDUCATION
NETWORKING
– GEANT
E-INFRASTRUCTURES
FOR VIRTUAL
RESEARCH
ENVIRONMENTS (VRE)
INNOVATI
ON
SUPPORT
MEASURE
S
INNOVATIVE
PROCUREMENT
PILOT ACTION IN THE FIELD
OF
SCIENTIFIC
INSTRUMENTATION
STRENGTHENING
THE HUMAN
CAPITAL OF
RESEARCH
INFRASTRUCTURES
NEW PROFESSIONS
AND SKILLS
FOR E-
INFRASTRUCTURES
POLICY
MEASURES
FOR RESEARCH
INFRASTRUCTUR
ES
INTERNATIONAL
COOPERATION
FOR RESEARCH
INFRASTRUCTURES
E-INFRASTRUCTURE
POLICY DEVELOPMENT
AND INTERNATIONAL
COOPERATION
NETWORK OF
NATIONAL
CONTACT POINTS
CALLS IN 2014
DEADLINES SEPT 2014 AND JAN
2015
INITIATIVES STARTING IN 2015
UNTIL 2018
13. Fran Berman
Research Data Driving Solutions to Complex
Scientific and Societal Challenges
Who is most
at risk to
contract
asthma?
How can we increase
wheat yields?
How accurate is the
Standard Model of
Physics?
Image: Lucas
Taylor
How can we
best address
energy needs
and
sustain the
environment?
Image: Ceinturion, Wikipedia
15. Fran Berman
World-wide Efforts Focusing on Infrastructure to Support
Research Data Sharing, Access, Use
Science, Humanities, Arts
Communities
E-Infrastructure professionals, data
analysts, data center staff, …
Data
Scientists
Libraries, Archives,
Repositories, Museums
16. Fran Berman
Institutional Data
Sharing Practice
Data Access and Distribution
Policy
Data
Discovery Tools
Common
Metadata Standards
Digital Object
Identifiers
Data Citation
Standards
Data
Analytics Algorithms
Data
Preservation Practice
Data Scientists and
Expert Support
Sustainable
Economic Models
Curation Practice and
Policy
Auditing, Certification and
Reporting Practice
Fran Berman
Many Infrastructure Building Blocks Needed to
Accelerate Progress
Data Use
and
Re-use
Data Discovery
and Data
Sharing
Research
Dissemination and
Reproducibility
Data Access (now)
and Preservation
(later)
17. Fran Berman
Research Data Alliance Created to Accelerate
Development of Research Data Sharing
Infrastructure Worldwide
RDA community efforts focus
on building social,
organizational and technical
infrastructure to
reduce barriers to data
sharing and exchange
accelerate the development
of coordinated global data
infrastructure
RDA and RDA/US are supported in part by the National Science Foundation.
18. Fran Berman
RDA Approach:
CREATE ADOPT USE
RDA Members come together as
• Working Groups – 12-18 month efforts
to build, adopt, and use specific pieces
of infrastructure
• Interest Groups – longer-lived discussion forums that spawn Working Groups
as specific pieces of needed infrastructure are identified.
Working Group efforts focus on the development and use of data
sharing infrastructure
• Code, policy, infrastructure, standards, or best practices that are adopted
and used by communities to enable data sharing
• “Harvestable” efforts for which 12-18 months of work can eliminate a
roadblock
• Efforts that have substantive applicability to groups within the data
community, but may not apply to everyone
• Efforts for which working scientists and researchers can start today
RDA and RDA/US are supported in part by the National Science
19. Fran Berman
Precipitous Growth
RDA Launch /
First Plenary
March 2013
RDA Second
Plenary
September 2013
RDA Third
Plenary
March 2014
First RDA
organizational
telecon: August
2012
Global Data
Planning
Meeting:
October
2012
First Working
Groups and
Interest Groups
240 participants
First “neutral
space”
community
meeting (Data
Citation Summit)
First Org. Partner
Meet-up
First BOFs
380 participants
from 22 countries
RDA Fourth
Plenary
September 2014
First
Organizational
Assembly
6 co-located
events
14 BOF,
12 Working
Groups, 22
Interest Groups
497
participants Amsterdam
First Working
Group exchange
meeting
RDA Plenary 2
Washington, DC
RDA Plenary 1 /
Launch
Gothenburg, Sweden
RDA Plenary 3
Dublin, Ireland
RDA and RDA/US are supported in part by the National Science Foundation.
21. Fran Berman
RDA Interest (IG) and Working Groups (WG) by
Focus (as of 6/14)
Domain Science - focused
• Toxicogenomics
Interoperability IG
• Structural Biology IG
• Biodiversity Data
Integration IG
• Agricultural Data
Interoperability IG
• Wheat Data Interoperability WG
• Digital Practices in History and
Ethnography IG
• Defining Urban Data Exchange for
Science IG
• Geospatial IG
• Marine Data Harmonization IG
• RDA/CODATA Materials Data
Infrastructure and Interoperability IG
• Research Data Needs of the Photon
and Neutron Science Community IG
Data Stewardship -
focused
• Research Data Provenance IG
• RDA/WDS Certification of
Digital Repositories IG
• Preservation e-infrastructure
IG
• Long-tail of Research Data IG
• RDA/WDS Publishing Data IG
• RDA/WDS Repository Audit
and Certification Working
Group
• Domain Repositories Interest
Group
Reference and Sharing - focused
• Data Citation WG
• Standardization of Data Categories and Codes
WG
• RDA/CODATA Legal Interoperability IG
• Data Description Registry Interoperability
Working Group
Community Needs -
focused
• Community Capability Model IG
• Engagement IG
• Development of Cloud
Computing Capacity and
Education in Developing World
Research IG
• Ethics and Social Aspects of Data
IG
Base Infrastructure - focused
• Data Foundation and Terminology WG
• Metadata Standards Directory WG
• Practical Policy WG
• PID Information Types WG
• Data Type Registries WG
• Data in Context IG
• Big Data Analytics IG
• Data Brokering IG
• Federated Identity Management IG
• Metadata IG
• PID Interest Group
• Service Management IG
22. Fran Berman
RDA/US Goals:
Contribute to RDA “international”
efforts and leadership
Bring US efforts to broader RDA
community
Build the RDA community within
the US
Leverage and implement RDA
deliverables in the US to amplify
impact
Collaborate closely with other RDA
“regions” on key programs and
initiatives
RDA/US: Collaborate Globally,
Contribute Locally
RDA and RDA/US are supported in part by the National Science Foundation.
NSF-supported RDA/US
initiatives:
• Outreach (RDA RDA/US)
• RDA Deliverables Amplification
• Student / Early Career
Engagement
RDA/US Steering Committee
• Fran Berman, RPI
• Larry Lannom, CNRI
• Mark Parsons, RPI
• Beth Plale, IU
RDA US
membership
(yellow states)
23. 23
The European plug-in to RDA …
RDA Europe Forum – strategic advice
RDA Europe Science Workshops –
interaction & feedback from target
audience
RDA Europe national & pan-European
outreach – to engage new members &
disseminate outputs
RDA Europe policy report – to support
European policy-makers & funders
RDA Europe, the European plug-in to the global RDA, supports
RDA global and brings European voice to the table
24. 24Europe as a Global Partner
Societal challenges of our time transcend borders
Data and computing intensive science is made of
global collaborations
Research data are global
Research Data Alliance: enable data exchange at
global scale
25. 25
Domain initiatives are very important
Marine data sharing – Southern Ocean Observing
System
Genetic data sharing – human genome project
Astronomy – SKA
CERN LHC
But domain initiatives will not necessarily enable
bridges to be constructed across disciplines, time, and
industry
So the EC, the USA, and Australia committed resources
to forming the Research Data Alliance
International
26. 26
RDA has so far not got enough traction with the HPC
big data and computer science communities
This will need to be addressed urgently since the HPC
community dealing with Big Data will need a close
interaction with application user communities, support
from the policy makers at national and international
level and of course adequate financial support by the
relevant funding agencies
Important therefore to work together…
And link with relevant other initiatives such as NDS in
the US (presented by Ed Seidel yesterday) and such as
EUDAT in EU
Relation to HPC
27. 27
“We are taking our work beyond Europe's borders, to
reach global scale. To make the scientific resources of
the world work together, interoperating and open to
discovery. For example we are working with partners
like the US and Australia in the Research Data Alliance
to make scientific progress broader, deeper and more
workable”.
Neelie Kroes, Vice-President of the European Commission
responsible for the Digital Agenda - Open Access to science and data
= cash and economic bonanza, 19 November 2013
Why a Research Data Alliance?
… So much to gain from collaboration …
30. 30
Input to this presentation kindly provided by Fran
Berman, Hilary Hanahoe and public presentations by
EC officials
But the opinions expressed in this talk are under my
entire responsibility as any mistake or omission
Thanks for your attention!
Acknowledgments
32. 32First RDA Infrastructure Deliverables
Coming this Fall
Data Type Registries WG
• Deliverables: System of data type registries, formal
model for describing types, working model of a
registry.
• Initial Adopters and Users: CNRI, International
DOI Foundation, Deep Carbon Observatory
Practical Code Policies
• Deliverables: Survey of policies in production use,
testbed of machine actionable policies, deployment
of 5 policy sets, policy starter kits
• Initial Adopters and Users: RENCI, DataNet
Federation Consortium, CESNET, Odum Institute,
EUDAT
Persistent Identifier Information
Types
• Deliverables: Minimal set of PID types, API
• Initial Adopters and Users: Data Conservancy,
DKRZ
Language Codes
• Deliverables: Operationalization of ISO
language categories for repositories.
• Initial Adopters and Users: Language Archive,
Paradisec
Data Foundations and
Terminology
• Deliverables: Common vocabulary for data
terms, formal definitions and open registry for
data terms
• Initial Adopters and Users: EUDAT, DKRZ,
Deep Carbon Observatory, CLARIN, EPOS
Metadata Standards
• Deliverables: Use cases and prototype
directory of current metadata standards starting
from DCC directory
• Initial Adopters and Users: JISC, DataOne
33. Fran Berman
Next Steps for the RDA
Continuing pipeline of infrastructure
deliverables adopted and used to accelerate
data sharing
Increasing coordination of infrastructure
Increasing cross-boundary collaborations
between domains, sectors, organizations
International and regional programs
focusing on workforce, outreach, expansion
of infrastructure impact
New partners in the Organizational Assembly
Focused strategy to support development of
industry infrastructure for data sharing
More Infrastructure
Focus on Industry
Synergistic Programs
Effective Community
RDA/US is supported in part by the National Science Foundation.