SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery Research Data Access & Preservation Denver, Colorado March 31 - April 1, 2011 Steve Hughes Dan Crichton Chris Mattmann Sean Kelly
Topics E-Science Trends Software Architectures Open Source Object-Oriented Data Technology Use Case Data Driven 2 Leveraging Open Source Technologies to Enable Scientific Discovery
“eScience” Trends Highly distributed, multi-organizational systems Systems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environments Sharing of data and services which allow for the discovery, access, and transformation of data  Systems are moving towards publishing of services and data in order to address data and computationally-intensive problems Infrastructures which are being built to handle future demand Use of commodity services to address elasticity Address complex modeling, inter-disciplinary science and decision support needs Need a dynamic environment where data and services can be used quickly as the building blocks for constructing predictive models and answering critical science questions Need to ensure information architecture support the varying science needs Changing the way in which data analysis is performed Moving towards analysis of distributed data to increase the study power Enabling greater collaboration across centers Systematizing, where possible 3 Leveraging Open Source Technologies to Enable Scientific Discovery
Highly Distributed Science Environments Leveraging Open Source Technologies to Enable Scientific Discovery 4 Highly distributed/federated Collaborative Information-centric Discipline-specific Growing/evolving Heterogeneous (Implementations)
Why Software Architecture? Software Architecture: The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution.  (ANSI/IEEE Std. 1471-2000) Architecture is about strategy to address key architectural concerns… How can we exploit common patterns to improve reuse? Can we develop software product lines? Can we improve interoperability? Can we reduce dependencies?  What are the architectural principles..?: loosely-coupled, data-driven, highly distributed, commodity services, service oriented, collaborative/multi-institutional 5 Leveraging Open Source Technologies to Enable Scientific Discovery
Notional Service Architectures Concept 6 Leveraging Open Source Technologies to Enable Scientific Discovery Client B Client A C Service Interface  Service  ,[object Object]
Loosely coupled
Elasticity (e.g. Commodity-based)
Multi-organizational
 etc
At an enterprise-scale, architectures don’t need to prescribe what’s inside services….just their interfaces, function, behavior, etc…
Services might include….
Data discovery
Data access
Security
TransformationC2 Architectural Style
What does this have to do with open source? The identification of core software product lines and tools, that can be reused, are excellent examples of opportunities to create open source projects Across a federation of organizations, systems and users, what be developed and shared? How can software components be developed in generic ways, but allow for extensions? Open source itself is a strategy Can improve collaborations  Can drive a robust set of reusable software components and tools Can push standards development Can encourage use of common architectural patterns Leveraging Open Source Technologies to Enable Scientific Discovery 7
Open Source Models Software sharing with an open source license (e.g, BSD-style license) Software distribution through open source organizations (e.g., SourceForge) Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation) Ad hoc open source project communities with their own governance Leveraging Open Source Technologies to Enable Scientific Discovery 8
Open Source Models: Our Opinion Software sharing with an open source license (e.g, BSD-style license) It’s a great start Limited community involvement Software distribution through open source organizations (e.g., SourceForge) Provides good software distribution support Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation) This moves from just distribution support to collaboration and governance over the development Ad hoc open source project communities with their own governance This can make a lot of sense for larger federations… Leveraging Open Source Technologies to Enable Scientific Discovery 9
The Apache Software Foundation Largest open sourcesoftware development entity in the world Over 2300+ committers Over 3500+ contributors 84 Top Level Projects 36 Incubating 30 Lab Projects 8 retired projects in the “Attic” Over 1.2 million revisions Leveraging Open Source Technologies to Enable Scientific Discovery 10 ,[object Object]
HTTPD web server used on 100+ million web sites (52+% of the market),[object Object]
distribute environments

Weitere ähnliche Inhalte

Was ist angesagt?

What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
D4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementResearch Data Alliance
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyinscit2006
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesKarel Charvat
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureResearch Data Alliance
 
An On-line Collaborative Data Management System
An On-line Collaborative Data Management SystemAn On-line Collaborative Data Management System
An On-line Collaborative Data Management SystemCameron Kiddle
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordJisc
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...librarianrafia
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to MetadataEUDAT
 

Was ist angesagt? (20)

What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
D4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data management
 
OpenData Public Research, University of Toronto, Open Access Week, 25/11/2011
OpenData Public Research, University of Toronto, Open Access Week, 25/11/2011OpenData Public Research, University of Toronto, Open Access Week, 25/11/2011
OpenData Public Research, University of Toronto, Open Access Week, 25/11/2011
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology Infrastructure
 
An On-line Collaborative Data Management System
An On-line Collaborative Data Management SystemAn On-line Collaborative Data Management System
An On-line Collaborative Data Management System
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: Metadata
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 

Ähnlich wie Hughes RDAP11 Data Publication Repositories

eROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC ArchitectureeROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC Architecturee-ROSA
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertWansoo Im
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchangelagoze
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13Kristi Holmes
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederOpenAIRE
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperVasu S
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemChris Mattmann
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...ResearchSpace
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...Pedro Príncipe
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 

Ähnlich wie Hughes RDAP11 Data Publication Repositories (20)

eROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC ArchitectureeROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC Architecture
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 

Mehr von ASIS&T

RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)ASIS&T
 
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...ASIS&T
 
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesRDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesASIS&T
 
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...ASIS&T
 
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...ASIS&T
 
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...ASIS&T
 
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)ASIS&T
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeASIS&T
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...ASIS&T
 
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...ASIS&T
 
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...ASIS&T
 
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?ASIS&T
 
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...ASIS&T
 
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerRDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerASIS&T
 
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
 
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataRDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataASIS&T
 
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationRDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationASIS&T
 
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...ASIS&T
 

Mehr von ASIS&T (20)

RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
 
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
 
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesRDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
 
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
 
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
 
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
 
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in Practice
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
 
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
 
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
 
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
 
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
 
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerRDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
 
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
 
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataRDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
 
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationRDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
 
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
 

Hughes RDAP11 Data Publication Repositories

  • 1. Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery Research Data Access & Preservation Denver, Colorado March 31 - April 1, 2011 Steve Hughes Dan Crichton Chris Mattmann Sean Kelly
  • 2. Topics E-Science Trends Software Architectures Open Source Object-Oriented Data Technology Use Case Data Driven 2 Leveraging Open Source Technologies to Enable Scientific Discovery
  • 3. “eScience” Trends Highly distributed, multi-organizational systems Systems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environments Sharing of data and services which allow for the discovery, access, and transformation of data Systems are moving towards publishing of services and data in order to address data and computationally-intensive problems Infrastructures which are being built to handle future demand Use of commodity services to address elasticity Address complex modeling, inter-disciplinary science and decision support needs Need a dynamic environment where data and services can be used quickly as the building blocks for constructing predictive models and answering critical science questions Need to ensure information architecture support the varying science needs Changing the way in which data analysis is performed Moving towards analysis of distributed data to increase the study power Enabling greater collaboration across centers Systematizing, where possible 3 Leveraging Open Source Technologies to Enable Scientific Discovery
  • 4. Highly Distributed Science Environments Leveraging Open Source Technologies to Enable Scientific Discovery 4 Highly distributed/federated Collaborative Information-centric Discipline-specific Growing/evolving Heterogeneous (Implementations)
  • 5. Why Software Architecture? Software Architecture: The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. (ANSI/IEEE Std. 1471-2000) Architecture is about strategy to address key architectural concerns… How can we exploit common patterns to improve reuse? Can we develop software product lines? Can we improve interoperability? Can we reduce dependencies? What are the architectural principles..?: loosely-coupled, data-driven, highly distributed, commodity services, service oriented, collaborative/multi-institutional 5 Leveraging Open Source Technologies to Enable Scientific Discovery
  • 6.
  • 11. At an enterprise-scale, architectures don’t need to prescribe what’s inside services….just their interfaces, function, behavior, etc…
  • 17. What does this have to do with open source? The identification of core software product lines and tools, that can be reused, are excellent examples of opportunities to create open source projects Across a federation of organizations, systems and users, what be developed and shared? How can software components be developed in generic ways, but allow for extensions? Open source itself is a strategy Can improve collaborations Can drive a robust set of reusable software components and tools Can push standards development Can encourage use of common architectural patterns Leveraging Open Source Technologies to Enable Scientific Discovery 7
  • 18. Open Source Models Software sharing with an open source license (e.g, BSD-style license) Software distribution through open source organizations (e.g., SourceForge) Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation) Ad hoc open source project communities with their own governance Leveraging Open Source Technologies to Enable Scientific Discovery 8
  • 19. Open Source Models: Our Opinion Software sharing with an open source license (e.g, BSD-style license) It’s a great start Limited community involvement Software distribution through open source organizations (e.g., SourceForge) Provides good software distribution support Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation) This moves from just distribution support to collaboration and governance over the development Ad hoc open source project communities with their own governance This can make a lot of sense for larger federations… Leveraging Open Source Technologies to Enable Scientific Discovery 9
  • 20.
  • 21.
  • 25. access to science data by the community
  • 26. A set of building blocks/services to exploit common system patterns for reuse
  • 27. 04-FEB-2011 - Apache OODT v0.2 Released
  • 28. Used for a number of science data system activities11 Leveraging Open Source Technologies to Enable Scientific Discovery http://oodt.apache.org/
  • 29.
  • 30. New centers plugging in (i.e. data nodes)
  • 31. Multi-center data system infrastructure
  • 32.
  • 33. New centers plugging in (i.e. data nodes)
  • 34. Multi-center data system infrastructure
  • 35. Heterogeneous sites with common interfaces allowing access to distributed portals Integrated based on common data standards Secure (e.g. encryption, authentication, authorization) 12 Leveraging Open Source Technologies to Enable Scientific Discovery
  • 36.
  • 37. Used OODT Catalog and Archive Service software
  • 38. Focus is on “process management”
  • 40. Execution of “processors” based on a set of rules
  • 41. Explicit separation of workflow management from management of computational resources
  • 46. Orbiting Carbon Observatory (OCO), OCO-2…
  • 48. SMAPSeaWinds on ADEOS II (Launched Dec 2002) DJC-13 Leveraging Open Source Technologies to Enable Scientific Discovery Credit: D. Freeborn, C. Mattmann, D. Woollard
  • 49. Conceptual Capabilities OODT Apache Suite (oodt.apache.org) File Management Workflow Management (for jobs/processing) Data Transformation Data Access Metadata Query Registry (future addition to OODT) Metadata Management based on ebXML registry specification Used to manage different type of “extrinsic” objects (metadata descriptions of data, services, etc) “targets”, “science data products”, “documents”, “services”, etc Product identification, versioning, tracking, and subscription/notification Indexing, Classification, and Associations
  • 50. Information Architecture OODT + Registry contains two different types of “models” Core Infrastructure model Discipline model Core infrastructure model is intrinsic (integrated with the software) It is built in and used by the software; this never changes and you don’t need to worry about it Services are part of the core infrastructure (“intrinsic”) but all other metadata objects are “extrinsic” Discipline model is extrinsic (defined outside the software) It is dynamically configured For example, the registry can be configured to use whatever “extrinsic” metadata objects are important to manage This allows for the registry to be used for tracking artifacts, managing services, etc. This is what projects need to define
  • 52. PDS4 High Level Concept Map
  • 53. Defining Extrinsic Objects and their Context (Ontology)
  • 54. External Data Standards Open Archival Information System (OAIS) Reference Model - Defines the “Information Object” a key component of the model. ISO/IEC 11179-3: Registry Metamodel and Basic Attributes - Provides the schema for the data dictionary. Defines the concepts of registration authority and steward for governance. Object_Oriented Data Modeling – Used as a standard modeling methodology. XML/XML Schema – Provides the label syntax and validation mechanism. OASIS/ebXML Registry Information Model - Provides attributes for object registration within a federated registry/repository. ISO 15836:2009 The Dublin Core Metadata Element Set – Provides standard web resource identification attributes. Semantics - RDF, RDFS, OWL - Provides W3C standards for knowledge representation.
  • 55. A perspective to leave you with… Agency science federations, based on an open source/collaborative model, are very attractive for the following reasons: Science benefits: can drive a growing enterprise of shared science services and software infrastructure support Technology benefits: can drive innovation through its peer review and collaboration process Infusion benefits: creates a defined process for contributing new ideas and capabilities Architecture benefits: helps you build towards a common architectural vision and drive community standards Cost benefits: can enable better leveraging and reuse of skills and capabilities across institutions Tech Transfer Benefits: may benefit other science (and non-science disciplines) 20 Leveraging Open Source Technologies to Enable Scientific Discovery
  • 56. Questions? Thank You!!! Steve Hughes Steve.Hughes@jpl.nasa.gov Chris Mattmann Chris.Mattmann@jpl.nasa.gov Note…we have several papers, book chapters on data intensive systems, etc that we’d be happy to share! A few key ones… D. Crichton, C. Mattmann, J. S. Hughes, S. Kelly, and A. Hart. “A Multi-Disciplinary, Model- Driven, Distributed Science Data System Architecture.” Guide to e-Science: Next Generation Scientific Research and Discovery. X. Yang, L. L. Wang, W. Jie, eds. Spring Verlag, 2010, To appear. D. Crichton, S. Kelly, C. Mattmann, Q. Xiao, J. S. Hughes, J. Oh, M. Thornquist, D. Johnsey, S. Srivastava, L. Esserman, and B. Bigbee. “A Distributed Information Services Architecture to Support Biomarker Discovery in Early Detection of Cancer”. Accepted for publication at the 2nd IEEE International Conference on e-Science and Grid Computing, Amsterdam, the Netherlands, December 4th-6th, 2006. C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. “A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications”. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp. 721-730, Shanghai, China, May 20th-28th, 2006. 21 Leveraging Open Source Technologies to Enable Scientific Discovery Dan Crichton Dan.Crichton@jpl.nasa.gov Sean Kelly Sean.Kelly@jpl.nasa.gov
  • 57. Backup 22 Leveraging Open Source Technologies to Enable Scientific Discovery