SlideShare a Scribd company logo
1 of 52
Download to read offline
Guidance and Survey Results from
the Trustworthy Data Working Group
Andrew Adams, Kay Avila, Jim Basney, Laura Christopherson, Melissa
Cragin, Jeannette Dopheide, Terry Fleury, Calvin Frye, Florence Hudson,
Manisha Kanodia, Jenna Kim, W. John MacMullen, Mats Rynge, Scott
Sakai, Sandra Thompson, Karan Vahi, John Zage
https://trustedci.org/trustworthy-data
SGCI Webinar
October 7, 2020
About the TDWG
The Trustworthy Data Working Group aims to provide guidance on data security
for open science, to improve scientific productivity and trust in scientific results.
Open science relies on data integrity, collaboration, high performance computing,
and scalable tools to achieve results, but currently lacks effective cybersecurity
programs that address the trustworthiness of scientific data.
2
TDWG Goals
● Survey science projects
● Understand the spectrum of data security concerns
● Understand current data security practices
● Produce broadly-applicable security guidance for the data science community
3
Trustworthy Data and Science Gateways
Science Gateways support:
● Computing on datasets
● Data discovery
● Data movement
● Data repositories
● Data sharing
● Data visualization
● etc.
… in a "trustworthy" environment.
4
What attributes of trustworthiness do
science gateway users expect?
What capabilities for managing
trustworthiness do science gateways
provide?
5
All 524 gateways in
the catalog care about
"data"!
Webinar Outline
6
1. The Survey
2. The Guidance
3. Discussion
The Survey
7
Survey Development Process
● Brainstorm goals and questions
● Understand prior guidance and ongoing discussions of trustworthiness in
published literature
● Interesting findings
○ No single definition of “trustworthiness” - but integrity is most common
■ NIST 1800-25 and NIST 1800-26 focus on data integrity
○ Failures in trustworthiness
■ ‘Willoughby–Hoye’ nuclear magnetic resonance scripts bug
■ Errors detected by the Pegasus Scientific Workflow Integrity tool for jobs run
on the Open Science Grid
8
Survey Goals
● Identify the concerns of researchers and operators with respect to the
trustworthiness of scientific data.
● Understand how individuals define trustworthiness and how their
roles impact this.
● Categorize tools and processes used to maintain trustworthiness, and
discern their shortcomings.
● Identify opportunities for guidance.
9
Survey Methodology
● Sections
○ Respondent demographics
○ Information about respondent data and views on trustworthiness
○ Use of tools and technologies
○ Wrap-up questions
● 16 survey questions plus 4 wrap-up
questions
○ 7 multiple choice
○ 5 Likert scale
○ 4 free form responses
● Gathered 111 responses from April 21 to
May 31 of 2020
10
What is your primary job title?
11
Keywords in job titles Responses
research 35
director 19
scientist 14
professor 14
manager 13
senior 9
data 7
cyberinfrastructure 4
security / cybersecurity 4
consultant 3
systems administrator 2
chief information officer 2 N=108
Please select all roles that describe your
work, even if they do not correspond to your job title.
12N=110
Science
Gateway
Operators
In the above roles, what field(s) of science
do you primarily work in or support?
13N=110
What sector(s) do you work with?
14N=111
Which attributes do you believe scientific
data must have in order to be trustworthy?
15N=86
Which attributes do you believe scientific
data must have in order to be trustworthy?
● Accuracy - The data is free from error.
● Integrity - The data has not been altered.
● Methodology - The processes and inputs used to create the data are well-established and
accepted by the community.
● Provenance - The data's origin and lineage can be readily established.
● Reproducibility - The data can be re-created, or the associated scientific results are
replicable.
● Reputation - The data was generated by a credible or trusted source.
● Responsible stewardship - The ownership of the data is well managed and can be
transferred.
● Significance - The data enables future research directions (with associated funding/support).
16
Do you agree with the following statement?
I think that protecting the trustworthiness of scientific
data is important.
17N=110
Do you agree with the following statement?
My job responsibilities include establishing and/or
maintaining the trustworthiness of scientific data.
18N=111
Do you agree with the following statement?
I am confident in the trustworthiness of the scientific
data I use/produce/curate.
19N=111
Do you perform/support research using
sensitive data? Please select any/all that apply.
20N=111
Do any of the following regulations apply
to the research data that you use/produce/curate?
21N=111
Which of the following tools and technologies
(if any) help to secure the research data that you
use/produce/curate?
22N=111
Do you agree with the following statement?
The tools and technologies that protect the research
data that I use/produce/curate provide sufficient
assurance against unauthorized changes and/or
reputational attacks.
23N=111
If you were provided with additional guidance,
resources, or support, would you apply additional
tools or technologies to help maintain the
trustworthiness of the research data that you
use/produce/curate?
24N=111
Free Text Questions
Please explain why protecting the trustworthiness of scientific data is or is not important to
you. Does its importance change during different phases of the research cycle (e.g., data
collection, calibration, processing, analysis, sharing, and publication)?
In your work, what are potential consequences (if any) to using/producing/curating scientific
data that is not trustworthy?
If you answered Yes or Maybe (that you would apply additional tools or technologies), please
explain. For example, are there specific tools or technologies you would like to use or
specific needs you would like guidance addressing?
Is there anything else related to trustworthy research data that you would like us to know?
25
Keyword
Analysis
26
Keywords and Themes
Total
Respondents
Impact on scientific results - "bad conclusions," "wrong conclusions" 44
Reputational risk - reputational harm to scientist or institution 27
Integrity of scientific process - the scientific method 18
Trust in science - combating "misinformation," "politics," "loss of public
trust" 15
Provenance of data 15
Loss of funding 15
Concerns about storage, data integrity management, and/or data
quality mgmt. 14
Seeking guidance, training, and/or audits 13
Reusability 11
Reproducibility 10
Retraction of publication 6
Encryption 4
Compliance - managing sensitive data, controlled unclassified
information (CUI) 4
Difficult to work with restrictions - too cumbersome to follow 3
Wasted funds and/or resources 3
Threats from insiders - theft and/or surveillance 1
Keywords
and
Themes
identified in
free text
responses
Keyword
Analysis
27
Keywords and Themes
Total
Respondents
Impact on scientific results - "bad conclusions," "wrong conclusions" 44
Reputational risk - reputational harm to scientist or institution 27
Integrity of scientific process - the scientific method 18
Trust in science - combating "misinformation," "politics," "loss of public
trust" 15
Provenance of data 15
Loss of funding 15
Concerns about storage, data integrity management, and/or data
quality mgmt. 14
Seeking guidance, training, and/or audits 13
Reusability 11
Reproducibility 10
Retraction of publication 6
Encryption 4
Compliance - managing sensitive data, controlled unclassified
information (CUI) 4
Difficult to work with restrictions - too cumbersome to follow 3
Wasted funds and/or resources 3
Threats from insiders - theft and/or surveillance 1
Keywords
and
Themes
identified in
free text
responses
Keyword
Analysis
28
Keywords and Themes
Total
Respondents
Impact on scientific results - "bad conclusions," "wrong conclusions" 44
Reputational risk - reputational harm to scientist or institution 27
Integrity of scientific process - the scientific method 18
Trust in science - combating "misinformation," "politics," "loss of public
trust" 15
Provenance of data 15
Loss of funding 15
Concerns about storage, data integrity management, and/or data
quality mgmt. 14
Seeking guidance, training, and/or audits 13
Reusability 11
Reproducibility 10
Retraction of publication 6
Encryption 4
Compliance - managing sensitive data, controlled unclassified
information (CUI) 4
Difficult to work with restrictions - too cumbersome to follow 3
Wasted funds and/or resources 3
Threats from insiders - theft and/or surveillance 1
Keywords
and
Themes
identified in
free text
responses
Keyword
Analysis
29
Keywords and Themes
Total
Respondents
Impact on scientific results - "bad conclusions," "wrong conclusions" 44
Reputational risk - reputational harm to scientist or institution 27
Integrity of scientific process - the scientific method 18
Trust in science - combating "misinformation," "politics," "loss of public
trust" 15
Provenance of data 15
Loss of funding 15
Concerns about storage, data integrity management, and/or data
quality mgmt. 14
Seeking guidance, training, and/or audits 13
Reusability 11
Reproducibility 10
Retraction of publication 6
Encryption 4
Compliance - managing sensitive data, controlled unclassified
information (CUI) 4
Difficult to work with restrictions - too cumbersome to follow 3
Wasted funds and/or resources 3
Threats from insiders - theft and/or surveillance 1
Keywords
and
Themes
identified in
free text
responses
The Guidance
30
Barriers to Trustworthiness
Two potential barriers were identified in the survey responses:
● Lack of a common definition for trustworthiness
● Lack of consensus on which role(s) should own responsibility for
trustworthiness
Four additional barriers emerged in our literature review:
● Conflicts between cybersecurity and reproducibility
● Accidental failures through systems or human error
● Lack of funding and incentives for making supporting data and provenance
information available
● Perceived data quality issues
31
Stakeholders
32
● Data User works with data for use in research.
● Data Provider has data; can speak authoritatively
about its integrity, authenticity, and means of creation;
and has the authority to make it available to others.
● Secure Infrastructure Provider supplies one or more
services around the data's use.
● Facilitation and Compliance Professional supports
and consults with others on the use of technology with
data and ethical treatment of data.
Distilled from the survey:
Attributes of Trustworthy Data (Revised)
33
● Integrity - The data has not been damaged by system faults or malicious actors.
● Reproducibility - The data can be re-created, or the associated results are replicable.
● Accepted Techniques of Creation - The processes and inputs used to create the data are
well-established and accepted by the community. (Methodology)
● Authenticity - The data has a clear provenance (origin and lineage) which can be verified.
Metadata describing the data and versioning are included. (Provenance)
● Authorization - Appropriate access controls are enforced.
● Availability - The data is accessible to authorized users, served from a reliable system, during
the agreed-to timeline, and according to usage policies.
● Confidentiality - The data repository protects personally identified information (PII) or other
sensitive information from those not granted access.
● Credible Source and Stewardship - The ownership of the data is well managed and can be
transferred securely. (Reputation and Responsible Stewardship)
Attributes of Trustworthy Data for each Stakeholder
34
Data User Data Provider
Secure Infrastructure
Provider
Facilitation and
Compliance Professional
I want to ensure that the
data I download is available
to me when I need it, has
integrity, is authentic, was
created in a proven manner,
does not disclose
information I have no rights
to, has been credibly and
trustworthily tended, and is
reproducible.
I want to ensure my data is
protected and made
accessible to only those I
have had the opportunity to
vet and grant approval to
after they've agreed to any
stipulations I might have on
the data's use.
I want to guarantee that the
data we serve in the
infrastructure we provide
abides by the Data
Provider’s data policies and
satisfies the Data User’s
needs for verification of its
trustworthiness.
I want to ensure all parties
concerned have the support
they need to ensure the data
is trustworthy and the
research is sound. I want to
ensure all parties are
honoring any agreements or
policies on data sharing and
use.
Secure Infrastructure Providers
● Availability - My infrastructure enables the secure and timely access and use
of the data as defined in any of the Data Provider's usage policies.
● Integrity - My infrastructure is designed and maintained to guarantee the
data's integrity. It protects the data from alteration, corruption, loss of
completeness, or loss of accuracy.
● Authenticity - The data housed/served from my infrastructure is authentic. It is
the same data that was originally given to us by the Data Provider and is as
promised to the user. Its provenance can be verified.
● Accepted Techniques of Creation - My infrastructure allows data owners to
document the means of creation when storing data.
35
Secure Infrastructure Providers
● Authorization - My infrastructure was built with access controls that enable
only those users the Data Provider has deemed acceptable access to data.
Users must provide their credentials to access the data. Their identity must be
verified and communications between the Data User, Data Provider, and the
infrastructure are secure.
● Confidentiality - My infrastructure supports tools and technologies for
maintaining data confidentiality.
● Credibility - My infrastructure protects the data from unwanted, unapproved
exfiltration, transmission, or use (if use occurs within the infrastructure).
● Reproducibility - My infrastructure protects the data from unwanted
transformations that would result in different scientific results.
36
Tools and Technologies: Analysis
37
Tools and Technologies: Analysis
● Strong Coverage: Availability, Integrity, and Authorization
● Accepted Techniques of Creation and Reproducibility
○ Containers, Notebooks
○ Transparency: publishing source code, metadata
● Confidentiality:
○ Encryption, Access Controls
○ De-identification, Differential Privacy
● Authenticity and Credible Source and Stewardship
○ Cryptographic hash or signature on data
38
Communicating Trustworthiness
● Provenance - In general, perceived trustworthiness of data is improved by
communication about the data per se, describing its source(s), the methods of
generation, processing, transformations, and corrections.
● Transparency - Data service organizations should provide clear policies and
procedures about their data stewardship for the resources they provide. This
is essential to enabling stakeholders, either users or content distributors, to
verify and independently assess the data provided.
● Certification - Resource providers may also wish to formalize community
recognition through the use of third-party certification processes such as
CoreTrustSeal.
39
Related Work
● TRUST Principles for digital repositories:
https://doi.org/10.1038/s41597-020-0486-7
● Risk Assessment for Scientific Data:
https://doi.org/10.5334/dsj-2020-010
● FAIR Principles:
https://www.go-fair.org/fair-principles/
● CoreTrustSeal Data Repository certification:
https://www.coretrustseal.org/
● Integrity Protection for Scientific Workflow Data:
https://doi.org/10.1145/3332186.3332222
40
Discussion
41
Trustworthy Data and Science Gateways
42
What attributes of trustworthiness do science gateway users expect?
What capabilities for managing trustworthiness do science gateways provide?
How to science gateways communicate trustworthiness of the data they manage?
Which attributes of trustworthiness are handled well by today's science gateways
(i.e., examples of "best practices" to share)?
Which attributes of trustworthiness are difficult to achieve in science gateways
(i.e., examples where future work is needed)?
Next steps
Visit https://trustedci.org/trustworthy-data to:
● read our reports and send us your comments
● join our email list and help us develop community guidance
You can also send questions/comments later to jbasney@illinois.edu.
The working group will conclude in December 2020.
43
Thanks!
Thanks to all the working group members and the people who responded to our survey.
Trusted CI is supported by the National Science Foundation under Grant 1920430. The Cyberinfrastructure Center
of Excellence (CI CoE) Pilot project is supported by the National Science Foundation under Grant 1842042. The
Engagement and Performance Operations Center (EPOC) is supported by the National Science Foundation under
Grant 1826994. The Open Storage Network is supported by Schmidt Futures and the National Science
Foundation under Grants 1747483, 1747490, 1747493, 1747507, and 1747552. The NSF Big Data Innovation
Hubs are supported in part by the National Science Foundation under awards 1916613, 1916585, 1916589, and
1916573. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the National Science Foundation.
44
Extra Slides
45
Attributes of Trustworthiness
46
TDWG Survey TDWG Guidance Report TRUST FAIR
Accuracy Availability Transparency Findable
Integrity Integrity Responsibility Accessible
Methodology Authenticity User Focus Interoperable
Provenance Accepted Techniques of Creation Sustainability Reusable
Reproducibility Reproducibility Technology
Reputation Confidentiality
Responsible Stewardship Credible Source and Stewardship
Significance Authorization
Attributes of Trustworthy Data for each Stakeholder
47
Data User Data Provider Secure Infrastructure
Provider
Facilitation and
Compliance
Professional
Integrity The data is complete,
unaltered from its
promised form,
undamaged (not corrupt),
and accurate (free from
errors). I can rely on it to
be in the condition the
Data Provider or
Infrastructure Provider
promises.
The integrity of the data
will be maintained no
matter where the data
lives and how the data is
served or transferred
before the user comes
into its possession.
My infrastructure is
designed and maintained
to guarantee the data's
integrity. It protects the
data from alteration,
corruption, loss of
completeness, or loss of
accuracy.
Facilitation: We make
data users and providers
aware of and help them
employ the methods,
services, tools, and other
resources that are
available to assess and
ensure the integrity and
accuracy of the data.
Tools and Technologies
● Access Controls - A process by which use of system resources is regulated
according to a security policy and is permitted only by authorized entities
(users, programs, processes, or other systems) according to that policy.
● Archival Storage - A system designed for storing data for long periods of time.
An archival storage system may or may not permit instantaneous modification
or read-access to data (e.g., tape), however it must be designed with the
intent to protect the integrity and availability of the data stored in it for the
lifetime of the archive.
● Network Controls - Access controls applied to a computer network.
Enforcement of policy is generally accomplished through the use of a
combination of dedicated network firewall devices, host-based firewall
software, and judicious network service configuration. 48
Tools and Technologies
● RAID Filesystem - A filesystem built upon a redundant array of independent
disks, configured in a manner that preserves the availability of data stored in
the filesystem during a disk failure and while the failed disk's replacement is
undergoing integration to the array.
● Logging - An infrastructure for generating, collecting, and analyzing security
events -- occurrences in a system that are relevant to the security of the
system.
● Multi-Factor Authentication - An authentication process requiring proof of
more than one of: something you know, something you have, something you
are. A common example of multifactor authentication is the use of a password
in conjunction with a proof of possession of a registered mobile phone by
interacting with an application on the phone. 49
Tools and Technologies
● External Backups - The creation of a reserve copy of data and storing that
copy such that it is not accessible from (i.e., is external to) the computer
system it was made from. Note that backup copies, unlike archives, are
meant as a contingency measure, and should be expected to have a short,
finite lifespan.
● Physical Security Protections - Tangible means of preventing unauthorized
physical access to a system. Examples: Fences, cages, walls, and other
barriers; locks, safes, and vaults; dogs and armed guards; sensors and alarm
bells.
50
Tools and Technologies
● Intrusion Detection / Prevention - A process or subsystem, implemented in
software or hardware, that automates the tasks of (a) monitoring events that
occur in a computer network and (b) analyzing them for signs of security
problems. An intrusion detection system may be augmented with an
automated-response capability to interrupt detected security problems,
creating an intrusion prevention system.
● File/Host Integrity Check - The application of a (cryptographic) hash function
to determine if the contents of files (data and/or program and/or configuration)
have unexpectedly changed. A hash function is a (mathematical) function
which maps values from a large (possibly very large) domain into a smaller
range. Common examples of hash functions include sha256 and md5.
51
Tools and Technologies
● Policy for Network / Cloud Storage - Guidance and/or requirements that
specify how data is to be stored and transported outside an organization's
trust boundary.
● 3rd Party Data Repository - A system for storing and retrieving data, operated
by an entity external to the Data User and Data Provider. We assume that
3rd-party data repositories are capable of ensuring the security of the data
stored in them, subject to the limitations in their service agreement or use
policy. We also assume that the Data Provider finds these limitations
acceptable.
● Workflow Integrity Checking - The integration of cryptographic hash functions
within a scientific workflow to ensure that the data being operated on is the
same as the data when it was originally collected, generated, or published. 52

More Related Content

What's hot

SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017Sandra Gesing
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck finalPistoia Alliance
 
Writing successful data management plans
Writing successful data management plansWriting successful data management plans
Writing successful data management plansIzzyChad
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management Wendy Mears
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data ThingsKatina Toufexis
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data ThingsKatina Toufexis
 

What's hot (11)

SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
 
Fair by design
Fair by designFair by design
Fair by design
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
 
Writing successful data management plans
Writing successful data management plansWriting successful data management plans
Writing successful data management plans
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Machine Learning For Stock Broking
Machine Learning For Stock BrokingMachine Learning For Stock Broking
Machine Learning For Stock Broking
 
Machine Learning in ICU mortality prediction
Machine Learning in ICU mortality predictionMachine Learning in ICU mortality prediction
Machine Learning in ICU mortality prediction
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data Things
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data Things
 

Similar to Guidance and Survey Results from the Trustworthy Data Working Group

Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...Glenn Villanueva
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxARDC
 
Towards Tangible Trusted Learning Analytics
Towards Tangible Trusted Learning AnalyticsTowards Tangible Trusted Learning Analytics
Towards Tangible Trusted Learning AnalyticsHendrik Drachsler
 
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)Hendrik Drachsler
 
Data Management, Research Integrity and Ethics
Data Management, Research Integrity and EthicsData Management, Research Integrity and Ethics
Data Management, Research Integrity and EthicsARDC
 
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...Steven Miller
 
Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...ARDC
 
Learning Analytics in action: ethics and privacy issues in the classroom
Learning Analytics in action: ethics and privacy issues in the classroomLearning Analytics in action: ethics and privacy issues in the classroom
Learning Analytics in action: ethics and privacy issues in the classroomMaría Jesús Rodríguez Triana
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRARDC
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
 
Data Science at Intersection of Security and Privacy
Data Science at Intersection of Security and PrivacyData Science at Intersection of Security and Privacy
Data Science at Intersection of Security and PrivacyTarun Chopra
 
Open Educational Resources for Big Data Science
Open Educational Resources for Big Data ScienceOpen Educational Resources for Big Data Science
Open Educational Resources for Big Data ScienceWilliam Hersh, MD
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementdri_ireland
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Katina Toufexis
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Academy of Science of South Africa (ASSAf)
 
Winter school in research data science research data management - final
Winter school in research data science research data management - finalWinter school in research data science research data management - final
Winter school in research data science research data management - finalARDC
 
Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Leon Osinski
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interityIUPUI
 

Similar to Guidance and Survey Results from the Trustworthy Data Working Group (20)

Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
Research Ethics and Integrity | Ethical Standards | Data Mining | Mixed Metho...
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptx
 
Towards Tangible Trusted Learning Analytics
Towards Tangible Trusted Learning AnalyticsTowards Tangible Trusted Learning Analytics
Towards Tangible Trusted Learning Analytics
 
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
Ethics and Privacy in the Application of Learning Analytics (#EP4LA)
 
Data Management, Research Integrity and Ethics
Data Management, Research Integrity and EthicsData Management, Research Integrity and Ethics
Data Management, Research Integrity and Ethics
 
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...
Driving Data and Cognitive Sciences Curriculum at the Nexus of Society, Polic...
 
Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...
 
Learning Analytics in action: ethics and privacy issues in the classroom
Learning Analytics in action: ethics and privacy issues in the classroomLearning Analytics in action: ethics and privacy issues in the classroom
Learning Analytics in action: ethics and privacy issues in the classroom
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
Data Science at Intersection of Security and Privacy
Data Science at Intersection of Security and PrivacyData Science at Intersection of Security and Privacy
Data Science at Intersection of Security and Privacy
 
Open Educational Resources for Big Data Science
Open Educational Resources for Big Data ScienceOpen Educational Resources for Big Data Science
Open Educational Resources for Big Data Science
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Research Data Management and your PhD
Research Data Management and your PhDResearch Data Management and your PhD
Research Data Management and your PhD
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
 
Winter school in research data science research data management - final
Winter school in research data science research data management - finalWinter school in research data science research data management - final
Winter school in research data science research data management - final
 
Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...Research data management during and after your research ; an introduction / L...
Research data management during and after your research ; an introduction / L...
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 

More from jbasney

Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)jbasney
 
CILogon & SciTokens: OIDC/OAuth Federation
CILogon & SciTokens: OIDC/OAuth FederationCILogon & SciTokens: OIDC/OAuth Federation
CILogon & SciTokens: OIDC/OAuth Federationjbasney
 
CILogon 2.0 - IAM Online Webinar Series
CILogon 2.0 - IAM Online Webinar SeriesCILogon 2.0 - IAM Online Webinar Series
CILogon 2.0 - IAM Online Webinar Seriesjbasney
 
Lightweight Cybersecurity Risk Assessment Tools for Cyberinfrastructure
Lightweight Cybersecurity Risk Assessment Tools for CyberinfrastructureLightweight Cybersecurity Risk Assessment Tools for Cyberinfrastructure
Lightweight Cybersecurity Risk Assessment Tools for Cyberinfrastructurejbasney
 
CILogon 2.0 at Oct 2017 CICI PI meeting
CILogon 2.0 at Oct 2017 CICI PI meetingCILogon 2.0 at Oct 2017 CICI PI meeting
CILogon 2.0 at Oct 2017 CICI PI meetingjbasney
 
11th FIM4R Workshop: US Projects Update
11th FIM4R Workshop: US Projects Update11th FIM4R Workshop: US Projects Update
11th FIM4R Workshop: US Projects Updatejbasney
 
CILogon PEARC17
CILogon PEARC17CILogon PEARC17
CILogon PEARC17jbasney
 
CILogon 2.0 at 2017 Internet2 Global Summit
CILogon 2.0 at 2017 Internet2 Global SummitCILogon 2.0 at 2017 Internet2 Global Summit
CILogon 2.0 at 2017 Internet2 Global Summitjbasney
 
CTSC+SWAMP: cybersecurity resources for your campus
CTSC+SWAMP: cybersecurity resources for your campusCTSC+SWAMP: cybersecurity resources for your campus
CTSC+SWAMP: cybersecurity resources for your campusjbasney
 
CILogon: An Integrated Identity and Access Management Platform for Science
CILogon: An Integrated Identity and Access Management Platform for ScienceCILogon: An Integrated Identity and Access Management Platform for Science
CILogon: An Integrated Identity and Access Management Platform for Sciencejbasney
 
CILogon 2.0 MAGIC SC16
CILogon 2.0 MAGIC SC16CILogon 2.0 MAGIC SC16
CILogon 2.0 MAGIC SC16jbasney
 
CILogon 2.0 Update at TechEx 2016
CILogon 2.0 Update at TechEx 2016CILogon 2.0 Update at TechEx 2016
CILogon 2.0 Update at TechEx 2016jbasney
 
Trusting External Identity Providers for Global Research Collaborations
Trusting External Identity Providers for Global Research CollaborationsTrusting External Identity Providers for Global Research Collaborations
Trusting External Identity Providers for Global Research Collaborationsjbasney
 
Cybersecurity for Conservation
Cybersecurity for ConservationCybersecurity for Conservation
Cybersecurity for Conservationjbasney
 
CTSC at TNC16
CTSC at TNC16CTSC at TNC16
CTSC at TNC16jbasney
 
CILogon 2.0 at 2016 Internet2 Global Summit
CILogon 2.0 at 2016 Internet2 Global SummitCILogon 2.0 at 2016 Internet2 Global Summit
CILogon 2.0 at 2016 Internet2 Global Summitjbasney
 
SAML Security Contacts
SAML Security ContactsSAML Security Contacts
SAML Security Contactsjbasney
 
FeduShare TechEx15
FeduShare TechEx15FeduShare TechEx15
FeduShare TechEx15jbasney
 
CILogon 2.0 at REFEDS 30
CILogon 2.0 at REFEDS 30CILogon 2.0 at REFEDS 30
CILogon 2.0 at REFEDS 30jbasney
 
CILogon and InCommon: Technical Update
CILogon and InCommon: Technical UpdateCILogon and InCommon: Technical Update
CILogon and InCommon: Technical Updatejbasney
 

More from jbasney (20)

Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
Federated Identity Needs for the Large Synoptic Survey Telescope (LSST)
 
CILogon & SciTokens: OIDC/OAuth Federation
CILogon & SciTokens: OIDC/OAuth FederationCILogon & SciTokens: OIDC/OAuth Federation
CILogon & SciTokens: OIDC/OAuth Federation
 
CILogon 2.0 - IAM Online Webinar Series
CILogon 2.0 - IAM Online Webinar SeriesCILogon 2.0 - IAM Online Webinar Series
CILogon 2.0 - IAM Online Webinar Series
 
Lightweight Cybersecurity Risk Assessment Tools for Cyberinfrastructure
Lightweight Cybersecurity Risk Assessment Tools for CyberinfrastructureLightweight Cybersecurity Risk Assessment Tools for Cyberinfrastructure
Lightweight Cybersecurity Risk Assessment Tools for Cyberinfrastructure
 
CILogon 2.0 at Oct 2017 CICI PI meeting
CILogon 2.0 at Oct 2017 CICI PI meetingCILogon 2.0 at Oct 2017 CICI PI meeting
CILogon 2.0 at Oct 2017 CICI PI meeting
 
11th FIM4R Workshop: US Projects Update
11th FIM4R Workshop: US Projects Update11th FIM4R Workshop: US Projects Update
11th FIM4R Workshop: US Projects Update
 
CILogon PEARC17
CILogon PEARC17CILogon PEARC17
CILogon PEARC17
 
CILogon 2.0 at 2017 Internet2 Global Summit
CILogon 2.0 at 2017 Internet2 Global SummitCILogon 2.0 at 2017 Internet2 Global Summit
CILogon 2.0 at 2017 Internet2 Global Summit
 
CTSC+SWAMP: cybersecurity resources for your campus
CTSC+SWAMP: cybersecurity resources for your campusCTSC+SWAMP: cybersecurity resources for your campus
CTSC+SWAMP: cybersecurity resources for your campus
 
CILogon: An Integrated Identity and Access Management Platform for Science
CILogon: An Integrated Identity and Access Management Platform for ScienceCILogon: An Integrated Identity and Access Management Platform for Science
CILogon: An Integrated Identity and Access Management Platform for Science
 
CILogon 2.0 MAGIC SC16
CILogon 2.0 MAGIC SC16CILogon 2.0 MAGIC SC16
CILogon 2.0 MAGIC SC16
 
CILogon 2.0 Update at TechEx 2016
CILogon 2.0 Update at TechEx 2016CILogon 2.0 Update at TechEx 2016
CILogon 2.0 Update at TechEx 2016
 
Trusting External Identity Providers for Global Research Collaborations
Trusting External Identity Providers for Global Research CollaborationsTrusting External Identity Providers for Global Research Collaborations
Trusting External Identity Providers for Global Research Collaborations
 
Cybersecurity for Conservation
Cybersecurity for ConservationCybersecurity for Conservation
Cybersecurity for Conservation
 
CTSC at TNC16
CTSC at TNC16CTSC at TNC16
CTSC at TNC16
 
CILogon 2.0 at 2016 Internet2 Global Summit
CILogon 2.0 at 2016 Internet2 Global SummitCILogon 2.0 at 2016 Internet2 Global Summit
CILogon 2.0 at 2016 Internet2 Global Summit
 
SAML Security Contacts
SAML Security ContactsSAML Security Contacts
SAML Security Contacts
 
FeduShare TechEx15
FeduShare TechEx15FeduShare TechEx15
FeduShare TechEx15
 
CILogon 2.0 at REFEDS 30
CILogon 2.0 at REFEDS 30CILogon 2.0 at REFEDS 30
CILogon 2.0 at REFEDS 30
 
CILogon and InCommon: Technical Update
CILogon and InCommon: Technical UpdateCILogon and InCommon: Technical Update
CILogon and InCommon: Technical Update
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Guidance and Survey Results from the Trustworthy Data Working Group

  • 1. Guidance and Survey Results from the Trustworthy Data Working Group Andrew Adams, Kay Avila, Jim Basney, Laura Christopherson, Melissa Cragin, Jeannette Dopheide, Terry Fleury, Calvin Frye, Florence Hudson, Manisha Kanodia, Jenna Kim, W. John MacMullen, Mats Rynge, Scott Sakai, Sandra Thompson, Karan Vahi, John Zage https://trustedci.org/trustworthy-data SGCI Webinar October 7, 2020
  • 2. About the TDWG The Trustworthy Data Working Group aims to provide guidance on data security for open science, to improve scientific productivity and trust in scientific results. Open science relies on data integrity, collaboration, high performance computing, and scalable tools to achieve results, but currently lacks effective cybersecurity programs that address the trustworthiness of scientific data. 2
  • 3. TDWG Goals ● Survey science projects ● Understand the spectrum of data security concerns ● Understand current data security practices ● Produce broadly-applicable security guidance for the data science community 3
  • 4. Trustworthy Data and Science Gateways Science Gateways support: ● Computing on datasets ● Data discovery ● Data movement ● Data repositories ● Data sharing ● Data visualization ● etc. … in a "trustworthy" environment. 4 What attributes of trustworthiness do science gateway users expect? What capabilities for managing trustworthiness do science gateways provide?
  • 5. 5 All 524 gateways in the catalog care about "data"!
  • 6. Webinar Outline 6 1. The Survey 2. The Guidance 3. Discussion
  • 8. Survey Development Process ● Brainstorm goals and questions ● Understand prior guidance and ongoing discussions of trustworthiness in published literature ● Interesting findings ○ No single definition of “trustworthiness” - but integrity is most common ■ NIST 1800-25 and NIST 1800-26 focus on data integrity ○ Failures in trustworthiness ■ ‘Willoughby–Hoye’ nuclear magnetic resonance scripts bug ■ Errors detected by the Pegasus Scientific Workflow Integrity tool for jobs run on the Open Science Grid 8
  • 9. Survey Goals ● Identify the concerns of researchers and operators with respect to the trustworthiness of scientific data. ● Understand how individuals define trustworthiness and how their roles impact this. ● Categorize tools and processes used to maintain trustworthiness, and discern their shortcomings. ● Identify opportunities for guidance. 9
  • 10. Survey Methodology ● Sections ○ Respondent demographics ○ Information about respondent data and views on trustworthiness ○ Use of tools and technologies ○ Wrap-up questions ● 16 survey questions plus 4 wrap-up questions ○ 7 multiple choice ○ 5 Likert scale ○ 4 free form responses ● Gathered 111 responses from April 21 to May 31 of 2020 10
  • 11. What is your primary job title? 11 Keywords in job titles Responses research 35 director 19 scientist 14 professor 14 manager 13 senior 9 data 7 cyberinfrastructure 4 security / cybersecurity 4 consultant 3 systems administrator 2 chief information officer 2 N=108
  • 12. Please select all roles that describe your work, even if they do not correspond to your job title. 12N=110 Science Gateway Operators
  • 13. In the above roles, what field(s) of science do you primarily work in or support? 13N=110
  • 14. What sector(s) do you work with? 14N=111
  • 15. Which attributes do you believe scientific data must have in order to be trustworthy? 15N=86
  • 16. Which attributes do you believe scientific data must have in order to be trustworthy? ● Accuracy - The data is free from error. ● Integrity - The data has not been altered. ● Methodology - The processes and inputs used to create the data are well-established and accepted by the community. ● Provenance - The data's origin and lineage can be readily established. ● Reproducibility - The data can be re-created, or the associated scientific results are replicable. ● Reputation - The data was generated by a credible or trusted source. ● Responsible stewardship - The ownership of the data is well managed and can be transferred. ● Significance - The data enables future research directions (with associated funding/support). 16
  • 17. Do you agree with the following statement? I think that protecting the trustworthiness of scientific data is important. 17N=110
  • 18. Do you agree with the following statement? My job responsibilities include establishing and/or maintaining the trustworthiness of scientific data. 18N=111
  • 19. Do you agree with the following statement? I am confident in the trustworthiness of the scientific data I use/produce/curate. 19N=111
  • 20. Do you perform/support research using sensitive data? Please select any/all that apply. 20N=111
  • 21. Do any of the following regulations apply to the research data that you use/produce/curate? 21N=111
  • 22. Which of the following tools and technologies (if any) help to secure the research data that you use/produce/curate? 22N=111
  • 23. Do you agree with the following statement? The tools and technologies that protect the research data that I use/produce/curate provide sufficient assurance against unauthorized changes and/or reputational attacks. 23N=111
  • 24. If you were provided with additional guidance, resources, or support, would you apply additional tools or technologies to help maintain the trustworthiness of the research data that you use/produce/curate? 24N=111
  • 25. Free Text Questions Please explain why protecting the trustworthiness of scientific data is or is not important to you. Does its importance change during different phases of the research cycle (e.g., data collection, calibration, processing, analysis, sharing, and publication)? In your work, what are potential consequences (if any) to using/producing/curating scientific data that is not trustworthy? If you answered Yes or Maybe (that you would apply additional tools or technologies), please explain. For example, are there specific tools or technologies you would like to use or specific needs you would like guidance addressing? Is there anything else related to trustworthy research data that you would like us to know? 25
  • 26. Keyword Analysis 26 Keywords and Themes Total Respondents Impact on scientific results - "bad conclusions," "wrong conclusions" 44 Reputational risk - reputational harm to scientist or institution 27 Integrity of scientific process - the scientific method 18 Trust in science - combating "misinformation," "politics," "loss of public trust" 15 Provenance of data 15 Loss of funding 15 Concerns about storage, data integrity management, and/or data quality mgmt. 14 Seeking guidance, training, and/or audits 13 Reusability 11 Reproducibility 10 Retraction of publication 6 Encryption 4 Compliance - managing sensitive data, controlled unclassified information (CUI) 4 Difficult to work with restrictions - too cumbersome to follow 3 Wasted funds and/or resources 3 Threats from insiders - theft and/or surveillance 1 Keywords and Themes identified in free text responses
  • 27. Keyword Analysis 27 Keywords and Themes Total Respondents Impact on scientific results - "bad conclusions," "wrong conclusions" 44 Reputational risk - reputational harm to scientist or institution 27 Integrity of scientific process - the scientific method 18 Trust in science - combating "misinformation," "politics," "loss of public trust" 15 Provenance of data 15 Loss of funding 15 Concerns about storage, data integrity management, and/or data quality mgmt. 14 Seeking guidance, training, and/or audits 13 Reusability 11 Reproducibility 10 Retraction of publication 6 Encryption 4 Compliance - managing sensitive data, controlled unclassified information (CUI) 4 Difficult to work with restrictions - too cumbersome to follow 3 Wasted funds and/or resources 3 Threats from insiders - theft and/or surveillance 1 Keywords and Themes identified in free text responses
  • 28. Keyword Analysis 28 Keywords and Themes Total Respondents Impact on scientific results - "bad conclusions," "wrong conclusions" 44 Reputational risk - reputational harm to scientist or institution 27 Integrity of scientific process - the scientific method 18 Trust in science - combating "misinformation," "politics," "loss of public trust" 15 Provenance of data 15 Loss of funding 15 Concerns about storage, data integrity management, and/or data quality mgmt. 14 Seeking guidance, training, and/or audits 13 Reusability 11 Reproducibility 10 Retraction of publication 6 Encryption 4 Compliance - managing sensitive data, controlled unclassified information (CUI) 4 Difficult to work with restrictions - too cumbersome to follow 3 Wasted funds and/or resources 3 Threats from insiders - theft and/or surveillance 1 Keywords and Themes identified in free text responses
  • 29. Keyword Analysis 29 Keywords and Themes Total Respondents Impact on scientific results - "bad conclusions," "wrong conclusions" 44 Reputational risk - reputational harm to scientist or institution 27 Integrity of scientific process - the scientific method 18 Trust in science - combating "misinformation," "politics," "loss of public trust" 15 Provenance of data 15 Loss of funding 15 Concerns about storage, data integrity management, and/or data quality mgmt. 14 Seeking guidance, training, and/or audits 13 Reusability 11 Reproducibility 10 Retraction of publication 6 Encryption 4 Compliance - managing sensitive data, controlled unclassified information (CUI) 4 Difficult to work with restrictions - too cumbersome to follow 3 Wasted funds and/or resources 3 Threats from insiders - theft and/or surveillance 1 Keywords and Themes identified in free text responses
  • 31. Barriers to Trustworthiness Two potential barriers were identified in the survey responses: ● Lack of a common definition for trustworthiness ● Lack of consensus on which role(s) should own responsibility for trustworthiness Four additional barriers emerged in our literature review: ● Conflicts between cybersecurity and reproducibility ● Accidental failures through systems or human error ● Lack of funding and incentives for making supporting data and provenance information available ● Perceived data quality issues 31
  • 32. Stakeholders 32 ● Data User works with data for use in research. ● Data Provider has data; can speak authoritatively about its integrity, authenticity, and means of creation; and has the authority to make it available to others. ● Secure Infrastructure Provider supplies one or more services around the data's use. ● Facilitation and Compliance Professional supports and consults with others on the use of technology with data and ethical treatment of data. Distilled from the survey:
  • 33. Attributes of Trustworthy Data (Revised) 33 ● Integrity - The data has not been damaged by system faults or malicious actors. ● Reproducibility - The data can be re-created, or the associated results are replicable. ● Accepted Techniques of Creation - The processes and inputs used to create the data are well-established and accepted by the community. (Methodology) ● Authenticity - The data has a clear provenance (origin and lineage) which can be verified. Metadata describing the data and versioning are included. (Provenance) ● Authorization - Appropriate access controls are enforced. ● Availability - The data is accessible to authorized users, served from a reliable system, during the agreed-to timeline, and according to usage policies. ● Confidentiality - The data repository protects personally identified information (PII) or other sensitive information from those not granted access. ● Credible Source and Stewardship - The ownership of the data is well managed and can be transferred securely. (Reputation and Responsible Stewardship)
  • 34. Attributes of Trustworthy Data for each Stakeholder 34 Data User Data Provider Secure Infrastructure Provider Facilitation and Compliance Professional I want to ensure that the data I download is available to me when I need it, has integrity, is authentic, was created in a proven manner, does not disclose information I have no rights to, has been credibly and trustworthily tended, and is reproducible. I want to ensure my data is protected and made accessible to only those I have had the opportunity to vet and grant approval to after they've agreed to any stipulations I might have on the data's use. I want to guarantee that the data we serve in the infrastructure we provide abides by the Data Provider’s data policies and satisfies the Data User’s needs for verification of its trustworthiness. I want to ensure all parties concerned have the support they need to ensure the data is trustworthy and the research is sound. I want to ensure all parties are honoring any agreements or policies on data sharing and use.
  • 35. Secure Infrastructure Providers ● Availability - My infrastructure enables the secure and timely access and use of the data as defined in any of the Data Provider's usage policies. ● Integrity - My infrastructure is designed and maintained to guarantee the data's integrity. It protects the data from alteration, corruption, loss of completeness, or loss of accuracy. ● Authenticity - The data housed/served from my infrastructure is authentic. It is the same data that was originally given to us by the Data Provider and is as promised to the user. Its provenance can be verified. ● Accepted Techniques of Creation - My infrastructure allows data owners to document the means of creation when storing data. 35
  • 36. Secure Infrastructure Providers ● Authorization - My infrastructure was built with access controls that enable only those users the Data Provider has deemed acceptable access to data. Users must provide their credentials to access the data. Their identity must be verified and communications between the Data User, Data Provider, and the infrastructure are secure. ● Confidentiality - My infrastructure supports tools and technologies for maintaining data confidentiality. ● Credibility - My infrastructure protects the data from unwanted, unapproved exfiltration, transmission, or use (if use occurs within the infrastructure). ● Reproducibility - My infrastructure protects the data from unwanted transformations that would result in different scientific results. 36
  • 37. Tools and Technologies: Analysis 37
  • 38. Tools and Technologies: Analysis ● Strong Coverage: Availability, Integrity, and Authorization ● Accepted Techniques of Creation and Reproducibility ○ Containers, Notebooks ○ Transparency: publishing source code, metadata ● Confidentiality: ○ Encryption, Access Controls ○ De-identification, Differential Privacy ● Authenticity and Credible Source and Stewardship ○ Cryptographic hash or signature on data 38
  • 39. Communicating Trustworthiness ● Provenance - In general, perceived trustworthiness of data is improved by communication about the data per se, describing its source(s), the methods of generation, processing, transformations, and corrections. ● Transparency - Data service organizations should provide clear policies and procedures about their data stewardship for the resources they provide. This is essential to enabling stakeholders, either users or content distributors, to verify and independently assess the data provided. ● Certification - Resource providers may also wish to formalize community recognition through the use of third-party certification processes such as CoreTrustSeal. 39
  • 40. Related Work ● TRUST Principles for digital repositories: https://doi.org/10.1038/s41597-020-0486-7 ● Risk Assessment for Scientific Data: https://doi.org/10.5334/dsj-2020-010 ● FAIR Principles: https://www.go-fair.org/fair-principles/ ● CoreTrustSeal Data Repository certification: https://www.coretrustseal.org/ ● Integrity Protection for Scientific Workflow Data: https://doi.org/10.1145/3332186.3332222 40
  • 42. Trustworthy Data and Science Gateways 42 What attributes of trustworthiness do science gateway users expect? What capabilities for managing trustworthiness do science gateways provide? How to science gateways communicate trustworthiness of the data they manage? Which attributes of trustworthiness are handled well by today's science gateways (i.e., examples of "best practices" to share)? Which attributes of trustworthiness are difficult to achieve in science gateways (i.e., examples where future work is needed)?
  • 43. Next steps Visit https://trustedci.org/trustworthy-data to: ● read our reports and send us your comments ● join our email list and help us develop community guidance You can also send questions/comments later to jbasney@illinois.edu. The working group will conclude in December 2020. 43
  • 44. Thanks! Thanks to all the working group members and the people who responded to our survey. Trusted CI is supported by the National Science Foundation under Grant 1920430. The Cyberinfrastructure Center of Excellence (CI CoE) Pilot project is supported by the National Science Foundation under Grant 1842042. The Engagement and Performance Operations Center (EPOC) is supported by the National Science Foundation under Grant 1826994. The Open Storage Network is supported by Schmidt Futures and the National Science Foundation under Grants 1747483, 1747490, 1747493, 1747507, and 1747552. The NSF Big Data Innovation Hubs are supported in part by the National Science Foundation under awards 1916613, 1916585, 1916589, and 1916573. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. 44
  • 46. Attributes of Trustworthiness 46 TDWG Survey TDWG Guidance Report TRUST FAIR Accuracy Availability Transparency Findable Integrity Integrity Responsibility Accessible Methodology Authenticity User Focus Interoperable Provenance Accepted Techniques of Creation Sustainability Reusable Reproducibility Reproducibility Technology Reputation Confidentiality Responsible Stewardship Credible Source and Stewardship Significance Authorization
  • 47. Attributes of Trustworthy Data for each Stakeholder 47 Data User Data Provider Secure Infrastructure Provider Facilitation and Compliance Professional Integrity The data is complete, unaltered from its promised form, undamaged (not corrupt), and accurate (free from errors). I can rely on it to be in the condition the Data Provider or Infrastructure Provider promises. The integrity of the data will be maintained no matter where the data lives and how the data is served or transferred before the user comes into its possession. My infrastructure is designed and maintained to guarantee the data's integrity. It protects the data from alteration, corruption, loss of completeness, or loss of accuracy. Facilitation: We make data users and providers aware of and help them employ the methods, services, tools, and other resources that are available to assess and ensure the integrity and accuracy of the data.
  • 48. Tools and Technologies ● Access Controls - A process by which use of system resources is regulated according to a security policy and is permitted only by authorized entities (users, programs, processes, or other systems) according to that policy. ● Archival Storage - A system designed for storing data for long periods of time. An archival storage system may or may not permit instantaneous modification or read-access to data (e.g., tape), however it must be designed with the intent to protect the integrity and availability of the data stored in it for the lifetime of the archive. ● Network Controls - Access controls applied to a computer network. Enforcement of policy is generally accomplished through the use of a combination of dedicated network firewall devices, host-based firewall software, and judicious network service configuration. 48
  • 49. Tools and Technologies ● RAID Filesystem - A filesystem built upon a redundant array of independent disks, configured in a manner that preserves the availability of data stored in the filesystem during a disk failure and while the failed disk's replacement is undergoing integration to the array. ● Logging - An infrastructure for generating, collecting, and analyzing security events -- occurrences in a system that are relevant to the security of the system. ● Multi-Factor Authentication - An authentication process requiring proof of more than one of: something you know, something you have, something you are. A common example of multifactor authentication is the use of a password in conjunction with a proof of possession of a registered mobile phone by interacting with an application on the phone. 49
  • 50. Tools and Technologies ● External Backups - The creation of a reserve copy of data and storing that copy such that it is not accessible from (i.e., is external to) the computer system it was made from. Note that backup copies, unlike archives, are meant as a contingency measure, and should be expected to have a short, finite lifespan. ● Physical Security Protections - Tangible means of preventing unauthorized physical access to a system. Examples: Fences, cages, walls, and other barriers; locks, safes, and vaults; dogs and armed guards; sensors and alarm bells. 50
  • 51. Tools and Technologies ● Intrusion Detection / Prevention - A process or subsystem, implemented in software or hardware, that automates the tasks of (a) monitoring events that occur in a computer network and (b) analyzing them for signs of security problems. An intrusion detection system may be augmented with an automated-response capability to interrupt detected security problems, creating an intrusion prevention system. ● File/Host Integrity Check - The application of a (cryptographic) hash function to determine if the contents of files (data and/or program and/or configuration) have unexpectedly changed. A hash function is a (mathematical) function which maps values from a large (possibly very large) domain into a smaller range. Common examples of hash functions include sha256 and md5. 51
  • 52. Tools and Technologies ● Policy for Network / Cloud Storage - Guidance and/or requirements that specify how data is to be stored and transported outside an organization's trust boundary. ● 3rd Party Data Repository - A system for storing and retrieving data, operated by an entity external to the Data User and Data Provider. We assume that 3rd-party data repositories are capable of ensuring the security of the data stored in them, subject to the limitations in their service agreement or use policy. We also assume that the Data Provider finds these limitations acceptable. ● Workflow Integrity Checking - The integration of cryptographic hash functions within a scientific workflow to ensure that the data being operated on is the same as the data when it was originally collected, generated, or published. 52