8. Research Object, Components of
• Identity : unique ID
• Entities : core data or software objects themselves
• Properties : Aggregation : “belongs to” relationship, used to aggregate
within Research Object
• Properties : Relationships : “related to” relationship
• Properties: Descriptive/Annotative : metadata
• Properties: Provenance : “derived from”, “versioned from” relationship as
well as others
• Properties: Agents : data creator (author list), curator, data scientist
• State : external to the RO
26. Packaging and Mapping (BagIT / ORE)
• BagIt format
• standardized “envelopes” (bags)
• no requirements for “knowing” internal semantics
• 3 elements: a bag declaration (bag.txt), a manifest file (manifest‐
<algorithm>.txt, folder with content (data)
• Tools available for bagging
• SEAD BagIt service
• LOC Bagger tool (http://sourceforge.net/projects/loc‐xferutils/files/loc‐
bagger/2.1.2/)
27. Resource Maps
• OAI/ORE standard
• Exposes rich content
• Captures semantic of relationships among RO items
• Identifies aggregations
• SEAD VA OAI/ORE relationship classes:
• Aggregation
• Description
• Authorship
• Copyright / rights
• Modification
• Derivation
• Citation
• Processing (calculation, computation, etc.)
29. OAI/ORE Map Example
<rdf:RDF
…
<rdf:Description rdf:about=URI> <!‐‐ data item‐‐>
<ore:isAggregatedBy>ID</ore:isAggregatedBy>
<dcterms:identifier rdf:datatype=URI>ID</dcterms:identifier>
<dcterms:title rdf:datatype=URI>Vortex_Mining.xlsx</dcterms:title>
<dcterms:source rdf:datatype=URI>test_bag/data/Vortex_Mining.xlsx</dcterms:source>
<!‐‐ A related resource from which the described resource is derived. ‐‐>
</rdf:Description>
…..
</rdf:RDF>
45. Service Level Agreement
‐ Requirements and Privileges (summary)
• RO properties – Requirements
• Data contributor Institutional Affiliation
• Scientific Domain
• Data Organization (e.g.: BagIt or SWORD)
• Size
• Versioning
• Minimal Metadata
• Licensing (eg: open, embargoed)
• Repository privileges
• Repository is free to re‐distribute the RO received from SEAD VA, except in case of
embargo.
• Repository can migrate RO into other formats and re‐distribute migrate ROs.
• Repository curators can annotate data collections to comply with standards or
upgrades in our policies.
47. Excerpt from from SLA for IU Scholarworks
• Institutional Affiliation
• At least one author, at the time of deposit, belongs to the same institution as our
repository.
• RO Size
• 150 MB for items uploaded directly to IUScholarWorks, 10 GB total
• 5 TB for items hosted on the SDA
• Versioning
• Only final PO is accepted, subsequent versions will substitute the version of record.
• Scientific Domain – Curator review might be needed
• ROs are associated with research in the domains of ANY (identify specific domains or
put “sustainability science” for a broader match)
61. Overview: The Data Scientist
Data Scientist uses research objects that were created by someone else
for his/her purposes and creates new research objects by modifying
existing objects.
Super Simple Example: Putting images in given RO 3 into a single
presentation and creating a new RO
Data scientist can:
• Search
• Download (bags)
• Modify
• Re‐upload
65. Provenance Capture in SEAD VA
• Uses Komadu provenance system
• Captures activity in real time, assembles new activity into internal
representation as provenance graphs
• W3C PROV spec compliant
• Terminology
• Activity : Some Processing Event in SEAD VA
• Entity : A Research Object (in CO or PO state)
• Agent : Data Creator, Curator, Data Scientist
72. Curation Time Provenance Capture
• Curation Activities
• Curation‐Edit‐Event
• Publish‐Event
• Provenance relationships captured in Komadu
• Agent‐Activity : When some Agent triggers one of above Activities
• Activity‐Entity : When an Activity Generates (Updates) a Research Object
• Example Scenario
• Curator X edits metadata on research object Y
• Agent‐Activity relationship (association) between X and Curation‐Edit‐Event
• Activity‐Entity relationship (generation) between Curation‐Edit‐Event and Y