Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Hughes RDAP11 Data Publication Repositories

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 22 Anzeige

Hughes RDAP11 Data Publication Repositories

Herunterladen, um offline zu lesen

Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html

Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Hughes RDAP11 Data Publication Repositories (20)

Anzeige

Weitere von ASIS&T (20)

Hughes RDAP11 Data Publication Repositories

  1. 1. Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery<br />Research Data Access & Preservation<br />Denver, Colorado<br />March 31 - April 1, 2011<br />Steve Hughes<br />Dan Crichton<br />Chris Mattmann<br />Sean Kelly<br />
  2. 2. Topics<br />E-Science Trends<br />Software Architectures<br />Open Source<br />Object-Oriented Data Technology<br />Use Case<br />Data Driven<br />2<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />
  3. 3. “eScience” Trends<br />Highly distributed, multi-organizational systems<br />Systems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environments<br />Sharing of data and services which allow for the discovery, access, and transformation of data <br />Systems are moving towards publishing of services and data in order to address data and computationally-intensive problems<br />Infrastructures which are being built to handle future demand<br />Use of commodity services to address elasticity<br />Address complex modeling, inter-disciplinary science and decision support needs<br />Need a dynamic environment where data and services can be used quickly as the building blocks for constructing predictive models and answering critical science questions<br />Need to ensure information architecture support the varying science needs<br />Changing the way in which data analysis is performed<br />Moving towards analysis of distributed data to increase the study power<br />Enabling greater collaboration across centers<br />Systematizing, where possible<br />3<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />
  4. 4. Highly Distributed Science Environments<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />4<br />Highly distributed/federated<br />Collaborative<br />Information-centric<br />Discipline-specific<br />Growing/evolving<br />Heterogeneous (Implementations)<br />
  5. 5. Why Software Architecture?<br />Software Architecture: The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. (ANSI/IEEE Std. 1471-2000)<br />Architecture is about strategy to address key architectural concerns…<br />How can we exploit common patterns to improve reuse?<br />Can we develop software product lines?<br />Can we improve interoperability?<br />Can we reduce dependencies? <br />What are the architectural principles..?: loosely-coupled, data-driven, highly distributed, commodity services, service oriented, collaborative/multi-institutional<br />5<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />
  6. 6. Notional Service Architectures Concept<br />6<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />Client B<br />Client A<br />C<br />Service Interface <br />Service <br /><ul><li>The service architecture concept exploits many of the architectural concepts discussed
  7. 7. Loosely coupled
  8. 8. Elasticity (e.g. Commodity-based)
  9. 9. Multi-organizational
  10. 10. etc
  11. 11. At an enterprise-scale, architectures don’t need to prescribe what’s inside services….just their interfaces, function, behavior, etc…
  12. 12. Services might include….
  13. 13. Data discovery
  14. 14. Data access
  15. 15. Security
  16. 16. Transformation</li></ul>C2 Architectural Style<br />
  17. 17. What does this have to do with open source?<br />The identification of core software product lines and tools, that can be reused, are excellent examples of opportunities to create open source projects<br />Across a federation of organizations, systems and users, what be developed and shared?<br />How can software components be developed in generic ways, but allow for extensions?<br />Open source itself is a strategy<br />Can improve collaborations <br />Can drive a robust set of reusable software components and tools<br />Can push standards development<br />Can encourage use of common architectural patterns<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />7<br />
  18. 18. Open Source Models<br />Software sharing with an open source license (e.g, BSD-style license)<br />Software distribution through open source organizations (e.g., SourceForge)<br />Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation)<br />Ad hoc open source project communities with their own governance<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />8<br />
  19. 19. Open Source Models: Our Opinion<br />Software sharing with an open source license (e.g, BSD-style license)<br />It’s a great start<br />Limited community involvement<br />Software distribution through open source organizations (e.g., SourceForge)<br />Provides good software distribution support<br />Software projects under the governance of an open source community/foundation (e.g., Apache Software Foundation)<br />This moves from just distribution support to collaboration and governance over the development<br />Ad hoc open source project communities with their own governance<br />This can make a lot of sense for larger federations…<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />9<br />
  20. 20. The Apache Software Foundation<br />Largest open sourcesoftware development entity in the world<br />Over 2300+ committers<br />Over 3500+ contributors<br />84 Top Level Projects<br />36 Incubating<br />30 Lab Projects<br />8 retired projects in the “Attic”<br />Over 1.2 million revisions<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />10<br /><ul><li>Over 10M successful requests served a day across the world
  21. 21. HTTPD web server used on 100+ million web sites (52+% of the market)</li></li></ul><li>OODT: An Open Source Framework for Building Distributed Science Data Mgmt Environments<br /><ul><li>Focus on
  22. 22. distribute environments
  23. 23. science data generation
  24. 24. data capture, end-to-end
  25. 25. access to science data by the community
  26. 26. A set of building blocks/services to exploit common system patterns for reuse
  27. 27. 04-FEB-2011 - Apache OODT v0.2 Released
  28. 28. Used for a number of science data system activities</li></ul>11<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />http://oodt.apache.org/<br />
  29. 29. e-Science Examples and OODT<br />Planetary Science Data System<br /><ul><li> Highly diverse (40 years of science data</li></ul> from NASA and Int’l missions)<br /><ul><li> Geographically distributed; moving int’l
  30. 30. New centers plugging in (i.e. data nodes)
  31. 31. Multi-center data system infrastructure
  32. 32. Heterogeneous nodes with common</li></ul> interfaces<br /><ul><li> Integrated based on enterprise-wide data</li></ul> standards<br /><ul><li> Sits on top of COTS-based middleware</li></ul>EDRN Cancer Research<br /><ul><li> Highly diverse (30+ centers performing </li></ul> parallel studies using different instruments)<br /><ul><li> Geographically distributed
  33. 33. New centers plugging in (i.e. data nodes)
  34. 34. Multi-center data system infrastructure
  35. 35. Heterogeneous sites with common</li></ul> interfaces allowing access to distributed<br /> portals<br /> Integrated based on common data standards<br /> Secure (e.g. encryption, authentication,<br /> authorization)<br />12<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />
  36. 36. Mission Pipelines – Data Generation and Archive<br /><ul><li>Leveraged OODT software framework for constructing ground data systems for earth science missions
  37. 37. Used OODT Catalog and Archive Service software
  38. 38. Focus is on “process management”
  39. 39. Constructed “workflows”
  40. 40. Execution of “processors” based on a set of rules
  41. 41. Explicit separation of workflow management from management of computational resources
  42. 42. Provided “lights out” operations
  43. 43. Multiple Missions
  44. 44. SeaWinds
  45. 45. QuikSCAT
  46. 46. Orbiting Carbon Observatory (OCO), OCO-2…
  47. 47. NP Sounder PEATE
  48. 48. SMAP</li></ul>SeaWinds on ADEOS II (Launched <br />Dec 2002)<br />DJC-13<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />Credit: D. Freeborn, C. Mattmann, D. Woollard<br />
  49. 49. Conceptual Capabilities<br />OODT Apache Suite (oodt.apache.org)<br />File Management<br />Workflow Management (for jobs/processing)<br />Data Transformation<br />Data Access<br />Metadata Query<br />Registry (future addition to OODT)<br />Metadata Management based on ebXML registry specification<br />Used to manage different type of “extrinsic” objects (metadata descriptions of data, services, etc)<br />“targets”, “science data products”, “documents”, “services”, etc<br />Product identification, versioning, tracking, and subscription/notification<br />Indexing, Classification, and Associations<br />
  50. 50. Information Architecture<br />OODT + Registry contains two different types of “models”<br />Core Infrastructure model<br />Discipline model<br />Core infrastructure model is intrinsic (integrated with the software)<br />It is built in and used by the software; this never changes and you don’t need to worry about it<br />Services are part of the core infrastructure (“intrinsic”) but all other metadata objects are “extrinsic”<br />Discipline model is extrinsic (defined outside the software)<br />It is dynamically configured <br />For example, the registry can be configured to use whatever “extrinsic” metadata objects are important to manage<br />This allows for the registry to be used for tracking artifacts, managing services, etc.<br />This is what projects need to define<br />
  51. 51. Observational Product – Concept Map<br />
  52. 52. PDS4 High Level Concept Map<br />
  53. 53. Defining Extrinsic Objects and their Context (Ontology)<br />
  54. 54. External Data Standards<br />Open Archival Information System (OAIS) Reference Model - Defines the “Information Object” a key component of the model.<br />ISO/IEC 11179-3: Registry Metamodel and Basic Attributes - Provides the schema for the data dictionary. Defines the concepts of registration authority and steward for governance.<br />Object_Oriented Data Modeling – Used as a standard modeling methodology.<br />XML/XML Schema – Provides the label syntax and validation mechanism.<br />OASIS/ebXML Registry Information Model - Provides attributes for object registration within a federated registry/repository.<br />ISO 15836:2009 The Dublin Core Metadata Element Set – Provides standard web resource identification attributes.<br />Semantics - RDF, RDFS, OWL - Provides W3C standards for knowledge representation. <br />
  55. 55. A perspective to leave you with…<br />Agency science federations, based on an open source/collaborative model, are very attractive for the following reasons:<br />Science benefits: can drive a growing enterprise of shared science services and software infrastructure support<br />Technology benefits: can drive innovation through its peer review and collaboration process<br />Infusion benefits: creates a defined process for contributing new ideas and capabilities<br />Architecture benefits: helps you build towards a common architectural vision and drive community standards<br />Cost benefits: can enable better leveraging and reuse of skills and capabilities across institutions<br />Tech Transfer Benefits: may benefit other science (and non-science disciplines)<br />20<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />
  56. 56. Questions?<br />Thank You!!!<br />Steve Hughes<br />Steve.Hughes@jpl.nasa.gov<br />Chris Mattmann<br />Chris.Mattmann@jpl.nasa.gov<br />Note…we have several papers, book chapters on data intensive systems, etc that we’d be happy to share! A few key ones…<br />D. Crichton, C. Mattmann, J. S. Hughes, S. Kelly, and A. Hart. “A Multi-Disciplinary, Model- Driven, Distributed Science Data System Architecture.” Guide to e-Science: Next Generation Scientific Research and Discovery. X. Yang, L. L. Wang, W. Jie, eds. Spring Verlag, 2010, To appear.<br />D. Crichton, S. Kelly, C. Mattmann, Q. Xiao, J. S. Hughes, J. Oh, M. Thornquist, D. Johnsey, S. Srivastava, L. Esserman, and B. Bigbee. “A Distributed Information Services Architecture to Support Biomarker Discovery in Early Detection of Cancer”. Accepted for publication at the 2nd IEEE International Conference on e-Science and Grid Computing, Amsterdam, the Netherlands, December 4th-6th, 2006.<br />C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. “A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications”. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp. 721-730, Shanghai, China, May 20th-28th, 2006.<br />21<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />Dan Crichton<br />Dan.Crichton@jpl.nasa.gov<br />Sean Kelly <br />Sean.Kelly@jpl.nasa.gov<br />
  57. 57. Backup<br />22<br />Leveraging Open Source Technologies to Enable Scientific Discovery<br />

×