Screenshots prepared by Ben Blaiszik and Kyle Chard, used in our Globus publication demo at GlobusWorld 2014. See https://www.globus.org/data-publication for more information and the notes on the slides for details.
Globus publishing capabilities are delivered through a hosted service.Metadata is stored in the CloudPublished data is stored on campus, institutional, group resources that are managed and operated by external administratorsTo associate storage with a collection administrators must configure Globus Connect Server with sharing on their resources and then associate the endpoint with the collection through Globus. Published datasets are organized by “communities” and their member “collections”E.g., Argonne National Laboratory community has several member collections (APS, CNM, CELS)Often collections will map to a department or group within an institution, but they don’t have to. Globus users can create and manage their own communities and collections through the serviceA Collection enables the submission of datasets with policies regarding accessA Dataset is data andmetadataPolicies can be set on communities or collectionsMetadata (schema, requirements)Access control (user and group based)Curation workflowSubmission and distribution licenseStorage
Demo scenario:A scientist, referred to throughout as “the Scientist” and associated with the user Blaiszik, has just published a paper associated with his research on nanoscale materials. He now wants to go ahead and publish the data associated with this publication.Using the Globus publication system, he is able to select the Argonne community, and the Center for Nanoscale Materials (CNM) collection. He selects to publish his dataHe describes the submission with both publication (Dublin core) and scientific metadataThe CNM collection has been preconfigured with its own storage provided at ArgonneAs part of this submission, a unique endpoint is created for “The Scientist", the endpoint is created so that only "The Scientist" can write to it"The Scientist" assembles his dataset on this endpoint by transferring files from 1 or more locations. He can assemble this dataset over a long period of time and can return to the submission workflow when he is happy with the submission. The CNM collection has also been preconfigured with a workflow requiring that an Argonne curator must approve the submissionA curator, referred to throughout as “the Curator” and associated with the user Chard, is able to view and edit the metadata and files of the datasetOnce approved the submission is published in the CNM collection with a DOIOther users (with permission to view the collection) can then discover published datasets by their DOI or using the Globus discovery interface to find datasets by their metadataThese users can choose to browse published datasets and download datasets to other resources (including local resources)
"The Scientist" will log in using his Globus identity, here as user Blaiszik.
Users can login using any of their linked Globus identities, e.g., Campus credentials (via InCommon), Google Account, XSEDE account, ..
The publish dashboard shows all current submissions at any stage of the submission workflow. Here users can view accepted submissions, see a list of all submissions currently in the curation process, view/edit their unfinished submissions, and start a new submission. "The Scientist" will now start a new submission.
The first step of submission is to select a collection. In this case "The Scientist" selects the “Center for Nanoscale Materials”, as this is the department through which he conducted his research. Note: "The Scientist" can only see collections he is allowed to publish to.
"The Scientist" must first describe the dataset he is publishing. There are two types of metadata required for submission to the CNM collection: 1) Dublin core and 2) scientific metadata. These metadata requirements are defined by the collection and can be configured depending on the domain. Additional pages can also be defined. Here, "The Scientist" enters information about the Authors, their ORCID (a unique researcher identity), the submission title, the date of publication, the accompanying publication to which this dataset is related, and the DOI for that publication. Note: "The Scientist" has missed an ORCID for one of his co-authors.
The second type of metadata required by the CNM relates to the materials science research at the Advanced Photon Source. Here, "The Scientist" enters information such as keywords describing the dataset, information about the sponsors who funded this research, a description of the dataset, the experiment name, the materials analyzed in this dataset, the energy density of the materials (this is important for research into battery development) and the Argonne General User Proposal (GUP) number. The GUP number is a unique identifier for all beam time allocations at the APS and is used by administrators to associate researchers, experiments, and allocations. All of this entered information can be subsequently used by other researchers with appropriate access to discover this dataset.
Having described the dataset, "The Scientist" must now assemble the dataset. To do so, he first chooses to select the files to be published.
Using the familiar Globus interface, "The Scientist" is able to select files from multiple sources and transfer them to his unique submission endpoint (publish#submission_11).This submission endpoint is created on shared Argonne storage resources, but is initially accessible only to "The Scientist" The dataset may be assembled over any period of time. "The Scientist" can create new files and folders on the endpoint and he can arrange these files in any hierarchy. At the completion of the submission the permissions on the endpoint will be changed such that the dataset is immutable. "The Scientist” will be given read access to the dataset, collection curators will also be given read access to the data so that they can view the contents.
When "The Scientist" is happy with his assembled dataset, he can return to the publication workflow. Here, he sees a summary of the dataset and may confirm the correct file sizes and names are associated. The system attempts to determine the file types for each of the dataset’s files. "The Scientist" can choose to edit, remove or add files if necessary.
Having verified the submission, "The Scientist" must grant the submission license. This license is again configured by the collection (i.e. each collection can customize their individual licenses), and allows the submitting user to grant rights to the collection (CNM) and the Globus system to manage and disseminate the dataset based on the agreed upon policies.
When submitted, the dataset now enters a pre-determined curation workflow. "The Scientist” can check the progress of the submission through his dashboard. If any further attention is required, it will be displayed through his dashboard.
The Argonne CNM collection has defined a workflow that requires a curatorto view and approve all submissions. The curation workflow enables the curator to view the submitted files and to edit the submitted metadata.
The curator also logs in to Globus. In this case "The Curator" uses his UChicago campus identity to authenticate, in this case as user Chard.
"The Curator" also has a dashboard which shows tasks that he is able to perform. In this case, the submission from "The Scientist” (Blaisizk) is awaiting curation.
Having selected to take a task, “The Curator” can first preview the dataset to decide if he wants to accept the curation task. In this view, he can see a summary of some metadata fields as well as the files in the dataset.
"The Curator" has the option to approve the submission for publication, reject it back to “The Scientist”, or edit the metadata to fix errors. In the case of rejection, the dataset is returned to “The Scientist” with a message describing the reason for rejection. “The Scientist” can edit the submission and resubmit the dataset for approval. Here, "The Curator" will take a closer look at the metadata, by clicking “Edit Metadata”.
"The Curator" can update missing metadata fields. Here, for example, “The Scientist” missed an ORCID for one of his coauthors. "The Curator" can also update other metadata fields to correct metadata values.
"The Curator" can verify the files in the submitted dataset.
Assuming “The Curator”is happy with the dataset, he may approve it for publication in the collection.
At this point, the dataset is now published in the collection with a unique DOI (handle in this case) for other researchers to reference this published dataset. Access to the dataset (both metadata and files) is changed to reflect the policies of the collection. Access may be restricted to particular users, or groups of users, or it may be made public for any user to access.
Having published the dataset, another user ”The Researcher” (with access to the collection) may now want to discover published datasets.
The first step of discovery is to define the context in which the user wants to search. In this case the user chooses to search the CNM collection. They may use free text search terms, key-value terms, or even range queries.
“The Researcher” chooses to search for all published data in the CNM collection. The results show a brief summary of each published dataset including information about the publication time, collection, summary of number of files, name, authors, description and a set of keyword tags as well as key-value tags. Each of these fields can be used to search for a particular dataset.
Knowing that other collections may well have datasets of interest , “The Researcher” may broaden the search context to all accessible collections and search for datasets related to “Li-ion” and “autonomic”. Here, the results show datasets from 2 collections: the CNM and the Chemical Sciences and Engineering collection (red boxes). Results are ranked according to their relevance to the search.
Going further, “The Researcher” can use different queries such as key-value and ranges. In this case, “The Researcher” searchers for energy density > 1500 and microcapsules, and finds the dataset previously published in this demo with an associated key-value pair of energy-density:2000 that fits the range query criteria.
Having found the desired published dataset, “The Researcher”can navigate to the summary page.
The summary page shows a summary of the dataset and the list of files. “The Researcher” can choose to download individual files, browse the dataset using Globus, or download the entire dataset. Ability to view the dataset and download files is governed by the access control on the collection and permissions associated with “The Researcher”.
Having chosen to download the dataset, “The Researcher”can choose the destination. In this case, we select the desktop of “The Researcher” which is associated with a Globus Connect Personal endpoint.
A Globus transfer is started from the published endpoint. “The Researcher” can monitor the transfer in Globus. The transfer uses Globus to ensure that bandwidth is maximized, file integrity is checked, and files are transferred securely if required.
Finally,“The Researcher” can view the downloaded dataset on their desktop PC.