SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Hot Topics Web Seminar Series: Research
Data in Repositories
The UC San Diego Experience
Second Webinar: Metadata and Repository Services
for Research Data Curation
General Series Intro

•

First webinar: Intro and Framing: UC San Diego
decisions and planning

•

Second Webinar: Deep dive into technology and
metadata

•

Third Webinar: The perspective from researchers,
next steps
Your esteemed presenters …
First webinar:
David Minor – Program Director, Research Data Curation
Declan Fleming - Chief Technology Strategist

Second webinar:
Declan Fleming - Chief Technology Strategist
Arwen Hutt - Metadata Librarian
Matt Critchlow - Manager of Development and Web Services

Third webinar:
Dick Norris – Professor, Scripps Institution of Oceanography
Rick Wagner – Data Scientist at San Diego Supercomputer Center
Today we will …
• Discuss real-world researcher interaction
• Document how metadata and files combine to make
digital objects
• Describe the DAMS data model and how it supports
complex research objects
• Detail the technology driving the DAMS
• Point to the future
Working with Researchers: Pilots
• The Brain Observatory
• NSF OpenTopography Facility
• Levantine Archaeology Laboratory
• Scripps Institute of Oceanography
Geological Collections

• The Laboratory for Computational
Astrophysics
Working with Researchers: Process

•
•
•
•

Introductory meeting
Metadata point person
Ongoing discussions
One on one work

Iterative, collaborative, customized, experimental…pilot!
Working with Researchers: Data management

•
•
•
•

Collocation
Clean up
Identifiers
Metadata
Working with Researchers: What is an object?

• What are the boundaries on a discreet set or
subset of data? What is required to make the
data intelligible, usable and reusable?
• What needs to be preserved?
• What do they want to display and/or share?
• What do they want to be able to refer to or
cite?
Working with Researchers: What is an object?
Brain
or
Slice
Etc…

Artifact
Site

or
Working with Researchers: Take Aways

They are the subject experts

There are a lot of broad level similarities
But no such thing as one size fits all
We want a new data model…
• One that is flexible and accommodates disparate
metadata from a variety of sources
• While promoting consistency within the data store
• One that supports relationships within and between
objects
• One that is more community engaged, both sharing
vocabularies and technology, and utilizing others
shared vocabularies and technologies
• One that supports improved management of objects
and metadata
DAMS Data Model Development Process
• Five people, in a room, 16 hours a week for 4
months
• Worked through existing data, use case scenarios,
known data requirements, investigated known
ontologies, etc.
• Lots and lots and lots of discussion
• Utilizes MADS (Metadata Authority Description
Schema)
• Results = a data dictionary and an OWL ontology
• Living document
DAMS Data Model: Flexibility

• The data model provides enough flexibility
that we can accommodate a wide variety of
data within the schema
– Vocabularies
– Use of “types” or “display labels” to distinguish
specific subtypes of a data field
– Flexible structures and relationships
– Extensible
DAMS Data Model: Consistency

• But enough consistency that searching and
display rules do not need to be customized for
each individual collection of material
– Rules can be applied at the level of the broader
concept

• As well as establishing the organizational
structure necessary for maintaining
consistency over time
– Evaluation and approval of modifications
DAMS Data Model: Relationships

• It allows us to create a number
of different relationships
– Collections and sub-collections
– Collections and objects
– Objects and components
(complex hierarchical objects)
– Other related resources internal
or external to the DAMS
complex object
example
DAMS Data Model: Vocabularies

• Allow management of local & community
vocabularies
– Vocabulary terms as entities
– Ability to encode authority data (vocabulary
source, value uri, etc.) as well as sameAs
relationships between the same term expressed in
multiple sources
– Ability to update authority records as community
vocabularies become more formalized.
DAMS Data Model: Management

• One that supports improved management of
objects and metadata
– Authority management of vocabulary terms
– Event metadata!
DAMS Architecture
Preservation: Chronopolis
Current DAMS Process
1. Create Bagit bags for all objects
2. Host via HTTP(S)
3. Bags are retrieved and ingested into Chronopolis

DAMS4 Process
1. Create Bagit bags for Δ objects using Event metadata
2. Host via HTTP(S) or enqueue on messaging queue for
ingestion
Storage
Storage: EMC Isilon 72NL
Storage For Library Collections
1 cluster of 5 Nodes
1 Node = 36 x 2TB Drives
Total Current Usable Storage of 320TB
OneFS 7.0.2.1
Storage: OpenStack
Storage For Research Data Collections
Testing:
• Performance versus Local Storage
• Large Files (up to 1TB)
– Segmenting files > 5GB
– Lexical order bug fix: 1,10,2 -> 0001,0002,…0010
• Rackspace CloudFiles API VS OpenStack REST API
Testing Notes:
https://libraries.ucsd.edu/blogs/dams/openstack-testing-notes/
DAMS Repository
DAMS Repository
Core Repository Application: Create, Read, Update, Delete (CRUD)
Uses:
Jena, ActiveMQ, JHOVE, Apache Tika, FFMPEG, ImageMagick
Manages:
• Metadata Triplestore
• Storage
• Solr
DAMS Repository: Metadata Triplestore
DAMS Repository: Metadata Triplestore
Triplestore was: Allegrograph
Triplestore is: PostgresSQL DB + Jena
• Schema: (ID), Parent, Subject, Predicate, Object
Jena Usage:
• Core/RDF API – Parsing, loading, updating, serializing RDF
• ARQ API – SPARQL queries
DAMS Repository: REST API
Hydra Framework

Source: https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts
DAMS Repository: Fedora API-ish
Fedora API – Next PID
Fedora API – Next PID
DAMS Manager
DAMS Manager
Java application using Spring MVC framework
• Collection Management
–
–
–
–

Metadata Ingest and Export
File Ingest
Derivative Generation
Solr indexing by Collection

• Administrative Reporting and Statistics
DAMS Hydra Head
DAMS Hydra Head
DAMS Hydra Head: Blacklight
RDF in Hydra
RDF in Hydra: (Read) Nested Attributes
RDF in Hydra: (Create) Nested Attributes
DAMS Hydra Head: Complex Objects
Next Steps
Beta Release: Late October
Production Release: January
Future:
• Sufia/Curate Integration for administrative functionality
• Additional Linked Data Integration and Crosswalks
– Schema.org, OpenURL, Dublin Core, ResourceSync

• Fedora4
More Information
DAMS Overview
https://github.com/ucsdlib/dams/wiki/DAMS-Manual
DAMS Hydra Head
https://github.com/ucsdlib/damspas
DAMS Ontology
https://github.com/ucsdlib/dams/tree/master/ontology
DAMS REST API
https://github.com/ucsdlib/dams/wiki/REST-API
Hot Topics Series 3: Get a Head on the Repository with Hydra
http://duraspace.org/hot-topics
Hydra Technical Overview
https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts
OneFS Technical Overview
http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf
Isilon Overview
http://www.emc.com/collateral/software/data-sheet/h10541-ds-isilon-platform.pdf
Coming Up Next
Final Webinar (October 31)
The researcher perspective from two of our pilot
participants
Dick Norris – Professor, Scripps Institution of
Oceanography
Rick Wagner – Data Scientist at San Diego
Supercomputer Center
Questions?
Thanks!
Declan Fleming
@declan | dfleming@ucsd.edu
Arwen Hutt
@arwenh | ahutt@ucsd.edu
Matt Critchlow
@mattcritchlow | mcritchlow@ucsd.edu

Weitere ähnliche Inhalte

Was ist angesagt?

DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformAndrea Bollini
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformAndrea Bollini
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data networkJisc RDM
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
An Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesAn Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesLuiz Henrique Zambom Santana
 
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar SlidesDuraSpace
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOCMerce Crosas
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesTony Hammond
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportPascal-Nicolas Becker
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonMicrosoft Azure for Research
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
Rots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesRots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesASIS&T
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Microsoft Azure for Research
 

Was ist angesagt? (20)

Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
 
Leverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platformLeverage DSpace for an enterprise, mission critical platform
Leverage DSpace for an enterprise, mission critical platform
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data network
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
An Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesAn Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL Repositories
 
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
Rots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesRots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal Agencies
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 

Andere mochten auch

Code4Lib 2013 - All THE Metadatas Re-Revisited
Code4Lib 2013 - All THE Metadatas Re-RevisitedCode4Lib 2013 - All THE Metadatas Re-Revisited
Code4Lib 2013 - All THE Metadatas Re-RevisitedMatthew Critchlow
 
Dpla chicago
Dpla chicagoDpla chicago
Dpla chicagoNate Hill
 
The Evolution of the UC San Diego Library DAMS
The Evolution of the  UC San Diego Library DAMSThe Evolution of the  UC San Diego Library DAMS
The Evolution of the UC San Diego Library DAMSMatthew Critchlow
 
Chris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future EnvironmentsChris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future EnvironmentsALATechSource
 
UC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code ManagementUC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code ManagementMatthew Critchlow
 
Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit WorldNate Hill
 
Technology & Archives: Exchange Forum Programmer & Archivist Collaboration
Technology & Archives: Exchange Forum Programmer & Archivist CollaborationTechnology & Archives: Exchange Forum Programmer & Archivist Collaboration
Technology & Archives: Exchange Forum Programmer & Archivist CollaborationMatthew Critchlow
 
CfA-summit-natehill
CfA-summit-natehillCfA-summit-natehill
CfA-summit-natehillNate Hill
 

Andere mochten auch (8)

Code4Lib 2013 - All THE Metadatas Re-Revisited
Code4Lib 2013 - All THE Metadatas Re-RevisitedCode4Lib 2013 - All THE Metadatas Re-Revisited
Code4Lib 2013 - All THE Metadatas Re-Revisited
 
Dpla chicago
Dpla chicagoDpla chicago
Dpla chicago
 
The Evolution of the UC San Diego Library DAMS
The Evolution of the  UC San Diego Library DAMSThe Evolution of the  UC San Diego Library DAMS
The Evolution of the UC San Diego Library DAMS
 
Chris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future EnvironmentsChris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future Environments
 
UC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code ManagementUC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code Management
 
Libraries in the Gigabit World
Libraries in the Gigabit WorldLibraries in the Gigabit World
Libraries in the Gigabit World
 
Technology & Archives: Exchange Forum Programmer & Archivist Collaboration
Technology & Archives: Exchange Forum Programmer & Archivist CollaborationTechnology & Archives: Exchange Forum Programmer & Archivist Collaboration
Technology & Archives: Exchange Forum Programmer & Archivist Collaboration
 
CfA-summit-natehill
CfA-summit-natehillCfA-summit-natehill
CfA-summit-natehill
 

Ähnlich wie Duraspace Hot Topics Series 6: Metadata and Repository Services

Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Hong (Jenny) Jing
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data DiscoveryARDC
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgOCLC
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and SharingC. Tobin Magle
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
The workflows for the ingest of digital objects into a repository/digital li...
The workflows for the ingest of digital objects into a repository/digital li...The workflows for the ingest of digital objects into a repository/digital li...
The workflows for the ingest of digital objects into a repository/digital li...Hong (Jenny) Jing
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsCarly Strasser
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...Jenny Mitcham
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache SparkQuantUniversity
 

Ähnlich wie Duraspace Hot Topics Series 6: Metadata and Repository Services (20)

Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)Linked Open Data and Digital Curation (Islandora)
Linked Open Data and Digital Curation (Islandora)
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data Discovery
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.org
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
The workflows for the ingest of digital objects into a repository/digital li...
The workflows for the ingest of digital objects into a repository/digital li...The workflows for the ingest of digital objects into a repository/digital li...
The workflows for the ingest of digital objects into a repository/digital li...
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheets
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
 
Edinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for DataEdinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for Data
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
 

Kürzlich hochgeladen

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Kürzlich hochgeladen (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Duraspace Hot Topics Series 6: Metadata and Repository Services

  • 1. Hot Topics Web Seminar Series: Research Data in Repositories The UC San Diego Experience Second Webinar: Metadata and Repository Services for Research Data Curation
  • 2. General Series Intro • First webinar: Intro and Framing: UC San Diego decisions and planning • Second Webinar: Deep dive into technology and metadata • Third Webinar: The perspective from researchers, next steps
  • 3. Your esteemed presenters … First webinar: David Minor – Program Director, Research Data Curation Declan Fleming - Chief Technology Strategist Second webinar: Declan Fleming - Chief Technology Strategist Arwen Hutt - Metadata Librarian Matt Critchlow - Manager of Development and Web Services Third webinar: Dick Norris – Professor, Scripps Institution of Oceanography Rick Wagner – Data Scientist at San Diego Supercomputer Center
  • 4. Today we will … • Discuss real-world researcher interaction • Document how metadata and files combine to make digital objects • Describe the DAMS data model and how it supports complex research objects • Detail the technology driving the DAMS • Point to the future
  • 5. Working with Researchers: Pilots • The Brain Observatory • NSF OpenTopography Facility • Levantine Archaeology Laboratory • Scripps Institute of Oceanography Geological Collections • The Laboratory for Computational Astrophysics
  • 6. Working with Researchers: Process • • • • Introductory meeting Metadata point person Ongoing discussions One on one work Iterative, collaborative, customized, experimental…pilot!
  • 7. Working with Researchers: Data management • • • • Collocation Clean up Identifiers Metadata
  • 8. Working with Researchers: What is an object? • What are the boundaries on a discreet set or subset of data? What is required to make the data intelligible, usable and reusable? • What needs to be preserved? • What do they want to display and/or share? • What do they want to be able to refer to or cite?
  • 9. Working with Researchers: What is an object? Brain or Slice Etc… Artifact Site or
  • 10. Working with Researchers: Take Aways They are the subject experts There are a lot of broad level similarities But no such thing as one size fits all
  • 11. We want a new data model… • One that is flexible and accommodates disparate metadata from a variety of sources • While promoting consistency within the data store • One that supports relationships within and between objects • One that is more community engaged, both sharing vocabularies and technology, and utilizing others shared vocabularies and technologies • One that supports improved management of objects and metadata
  • 12. DAMS Data Model Development Process • Five people, in a room, 16 hours a week for 4 months • Worked through existing data, use case scenarios, known data requirements, investigated known ontologies, etc. • Lots and lots and lots of discussion • Utilizes MADS (Metadata Authority Description Schema) • Results = a data dictionary and an OWL ontology • Living document
  • 13. DAMS Data Model: Flexibility • The data model provides enough flexibility that we can accommodate a wide variety of data within the schema – Vocabularies – Use of “types” or “display labels” to distinguish specific subtypes of a data field – Flexible structures and relationships – Extensible
  • 14. DAMS Data Model: Consistency • But enough consistency that searching and display rules do not need to be customized for each individual collection of material – Rules can be applied at the level of the broader concept • As well as establishing the organizational structure necessary for maintaining consistency over time – Evaluation and approval of modifications
  • 15. DAMS Data Model: Relationships • It allows us to create a number of different relationships – Collections and sub-collections – Collections and objects – Objects and components (complex hierarchical objects) – Other related resources internal or external to the DAMS complex object example
  • 16. DAMS Data Model: Vocabularies • Allow management of local & community vocabularies – Vocabulary terms as entities – Ability to encode authority data (vocabulary source, value uri, etc.) as well as sameAs relationships between the same term expressed in multiple sources – Ability to update authority records as community vocabularies become more formalized.
  • 17. DAMS Data Model: Management • One that supports improved management of objects and metadata – Authority management of vocabulary terms – Event metadata!
  • 19. Preservation: Chronopolis Current DAMS Process 1. Create Bagit bags for all objects 2. Host via HTTP(S) 3. Bags are retrieved and ingested into Chronopolis DAMS4 Process 1. Create Bagit bags for Δ objects using Event metadata 2. Host via HTTP(S) or enqueue on messaging queue for ingestion
  • 21. Storage: EMC Isilon 72NL Storage For Library Collections 1 cluster of 5 Nodes 1 Node = 36 x 2TB Drives Total Current Usable Storage of 320TB OneFS 7.0.2.1
  • 22. Storage: OpenStack Storage For Research Data Collections Testing: • Performance versus Local Storage • Large Files (up to 1TB) – Segmenting files > 5GB – Lexical order bug fix: 1,10,2 -> 0001,0002,…0010 • Rackspace CloudFiles API VS OpenStack REST API Testing Notes: https://libraries.ucsd.edu/blogs/dams/openstack-testing-notes/
  • 24. DAMS Repository Core Repository Application: Create, Read, Update, Delete (CRUD) Uses: Jena, ActiveMQ, JHOVE, Apache Tika, FFMPEG, ImageMagick Manages: • Metadata Triplestore • Storage • Solr
  • 26. DAMS Repository: Metadata Triplestore Triplestore was: Allegrograph Triplestore is: PostgresSQL DB + Jena • Schema: (ID), Parent, Subject, Predicate, Object Jena Usage: • Core/RDF API – Parsing, loading, updating, serializing RDF • ARQ API – SPARQL queries
  • 30. Fedora API – Next PID
  • 31. Fedora API – Next PID
  • 33. DAMS Manager Java application using Spring MVC framework • Collection Management – – – – Metadata Ingest and Export File Ingest Derivative Generation Solr indexing by Collection • Administrative Reporting and Statistics
  • 36. DAMS Hydra Head: Blacklight
  • 38. RDF in Hydra: (Read) Nested Attributes
  • 39. RDF in Hydra: (Create) Nested Attributes
  • 40. DAMS Hydra Head: Complex Objects
  • 41. Next Steps Beta Release: Late October Production Release: January Future: • Sufia/Curate Integration for administrative functionality • Additional Linked Data Integration and Crosswalks – Schema.org, OpenURL, Dublin Core, ResourceSync • Fedora4
  • 42. More Information DAMS Overview https://github.com/ucsdlib/dams/wiki/DAMS-Manual DAMS Hydra Head https://github.com/ucsdlib/damspas DAMS Ontology https://github.com/ucsdlib/dams/tree/master/ontology DAMS REST API https://github.com/ucsdlib/dams/wiki/REST-API Hot Topics Series 3: Get a Head on the Repository with Hydra http://duraspace.org/hot-topics Hydra Technical Overview https://wiki.duraspace.org/display/hydra/Technical+Framework+and+its+Parts OneFS Technical Overview http://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf Isilon Overview http://www.emc.com/collateral/software/data-sheet/h10541-ds-isilon-platform.pdf
  • 43. Coming Up Next Final Webinar (October 31) The researcher perspective from two of our pilot participants Dick Norris – Professor, Scripps Institution of Oceanography Rick Wagner – Data Scientist at San Diego Supercomputer Center
  • 44. Questions? Thanks! Declan Fleming @declan | dfleming@ucsd.edu Arwen Hutt @arwenh | ahutt@ucsd.edu Matt Critchlow @mattcritchlow | mcritchlow@ucsd.edu