SlideShare ist ein Scribd-Unternehmen logo
1 von 163
Managing Confidential Data
Micah Altman
Director of Research
MIT Libraries
DISCLAIMER
These opinions are my own, they are not the
opinions of MIT, Brookings, any of the project
funders, nor (with the exception of co-authored
previously published work) my collaborators
Secondary disclaimer:

“It‟s tough to make predictions, especially about
the future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston
Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert
Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan
Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel,
Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.

2

Managing Confidential Data

v. 24 (January IAP
Session 1)
Collaborators & Co-Conspirators
Privacy Tools for Sharing Research Data Team
(Salil Vadhan, P.I.)
http://privacytools.seas.harvard.edu/people
Research Support





Supported in part by NSF grant CNS-1237235

3

Managing Confidential Data

v. 24 (January IAP
Session 1)
Related Work

Main Project:
 Privacy Tools for Sharing Research Data
http://privacytools.seas.harvard.edu/
Related publications:
 Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier,
D., Laevart, C., et al. (2011). Communicating Science and Engineering
Data in the Information Age. Computer Science and
Telecommunications. National Academies Press
 Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking:
Human Subjects Research Protections”. Available from:
http://dataprivacylab.org/projects/irb/Vadhan.pdf

Slides and reprints available from:
informatics.mit.edu
4

Managing Confidential Data

v. 24 (January IAP
Session 1)
Confidential data matters in research
Researchers often
collect or use
information from
people – much of
which is not clearly
public
Professional and
ethical responsibility
to manage subject
privacy and
confidentiality
appropriately
Possibility of huge
administrative
penalties







5

2010

Managing Confidential Data

v. 24 (January IAP
Session 1)
Why protect confidential information?


It‟s your money –





It‟s your job –











Data management/sharing plans strengthen grant
proposals
Data management/sharing plans can clarify research
design
Data management/sharing plans can strengthen research
program
Confidentiality can improve subject recruitment, responses

It‟s the law -






Institutional reputation
Loss of external funding

It‟s your research -



Protection is individual responsibility
Violations can lead to sanctions

It‟s your institution -



8.3 Million identify theft victims/year
Lose a week of your life and $$

Civil penalties
Criminal penalties
Administrative penalties

It‟s the right thing to do -


Professional behavior
Obligations to research subjects
6

Managing Confidential Data

v. 24 (January IAP
Session 1)
How Identifiable Are You?

Try this:
http://aboutmyinfo.org

7

Managing Confidential Data

v. 24 (January IAP
Session 1)
Personally identifiable private
information is surprisingly common
Includes information from a variety of
sources, such as…







Research data, even if you aren‟t the
original collector
Student “records” such as e-mail, grades
Logs from web-servers, other systems

Lots of things are potentially identifying:



Under some federal laws: IP addresses,
dates, zipcodes, …
 Birth date + zipcode + gender uniquely
identify ~87% of people in the U.S.
[Sweeney 2002]
Try it:
http://aboutmyinfo.org/index.html
 With date and place of birth, can guess
first five digits of social security number
(SSN) > 60% of the time. (Can guess the
whole thing in under 10 tries, for a
significant minority of people.) [Aquisti &
Gross 2009]
 Analysis of writing style or eclectic tastes
has been used to identify individuals


Brownstein, et al., 2006 , NEJM 355(16),

Tables, graphs and maps can also
reveal identifiable information



8

Managing Confidential Data

v. 24 (January IAP
Session 1)
Traditional approaches are failing
Modal traditional approach:








removing subjects‟ names
storing descriptive information in a locked filing cabinet
publishing summary tables
(sometimes) release a public use version that suppressed
and recoded descriptive information

Problems







9

law is changing – requirements are becoming more
complex
research computing is moving towards the cloud, other
distributed storage
researchers are using new forms of data that create new
privacy issues
advances in the formal analysis of disclosure risk imply
the impracticality of “de-identification” as required by law
Managing Confidential Data

v. 24 (January IAP
Session 1)
The MIT libraries provide support for all researchers at MIT:







Research consulting, including:
bibliographic information management; literature searches; subject-specific
consultation
Data management, including:
data management plan consulting; data archiving; metadata creation
Data acquisition and analysis, including:
database licensing; statistical software training; GIS consulting, analysis & data
collection
Scholarly publishing:
open access publication & licensing

10

libraries.mit.edu
Managing Confidential Data

v. 24 (January IAP
Session 1)
Goals for course






Overview of key areas
Identify key concepts & issues
Summarize MIT policies, procedures, resources
Establish framework for action
Provide connection to resources, literature

11

Managing Confidential Data

v. 24 (January IAP
Session 1)
Outline


[Preliminaries]



Law, policy, ethics
Research methods,
design, management
Information Security
(Storage, Transmission,
Use)
Disclosure Limitation







[Additional Resources &
Summary of
Recommendations]
12

Managing Confidential Data

v. 24 (January IAP
Session 1)
Steps to Manage Confidential Research Data


Identify potentially sensitive information in planning









Reduce sensitivity of collected data in design
Separate sensitive information in collection
Encrypt sensitive information in transit
Desensitize information in processing





Removing names and other direct identifiers
Suppressing, aggregating, or perturbing indirect identifiers

Protect sensitive information in systems





Identify legal requirements, institutional requirements, data use agreements
Consider obtaining a certificate of confidentiality
Plan for IRB review

Use systems that are controlled, securely configured, and audited
Ensure people are authenticated, authorized, licensed

Review sensitive information before dissemination








13

Review disclosure risk
Apply non-statistical disclosure limitation
Apply statistical disclosure limitation
Review past releases and publically available data
Check for changes in the law
Require a use agreement

Managing Confidential Data

v. 24 (January IAP
Session 1)
Handling Confidential Information @ MIT


1. Identify confidential information



2. You are responsible for confidential information you store, access, or
share


Obtaining appropriate approval to access information



Approval from PI for access to research data





Approval from IRB to use or collect private identified information
Approval from to sign external data use agreements on behalf of university

3. Safely dispose of confidential information





When you change computers – contact IT for cleanup
Shred the rest

4. If unsure, seek help or approval


Data-management planning: libraries.mit.edu/guides/subjects/datamanagement/



Research approvals:

web.mit.edu/committees/couhes/



General policies:

web.mit.edu/policies/

14

Managing Confidential Data

v. 24 (January IAP
Session 1)
Data Management Consulting:
Libraries @MIT


The libraries can help:





Assist with data management plans
Individual consultation/collaboration with researchers
General workshops & guides
Dissemination of public data through



DSpace@MIT
Referrals to subject-based repositories

For more information:
libraries.mit.edu/guides/subjects/data-management/
<data-management@mit.edu>

15

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, Policy & Ethics

Law, policy, ethics
Research design …
Information security






16

Ethical Obligations
Laws
Fun and games 
[Summary]

Managing Confidential Data

Disclosure
limitation

v. 24 (January IAP
Session 1)
Confidentiality & Research Ethics

Law, policy, ethics
Research design …
Information security



Belmont Principles


Respect for Persons







individuals should be treated as autonomous agents
persons with diminished autonomy are entitled to protection
implies “informed consent”
implies respect for confidentiality and privacy

Beneficence



17

Disclosure
limitation

research must have individual and/or societal benefit to justify
risks
implies minimizing risk/benefit ratio

Managing Confidential Data

v. 24 (January IAP
Session 1)
Scientific & Societal Benefits
of Data Sharing

Law, policy, ethics
Research design …
Information security



Increases replicability of research




Increases scientific impact of research







Funder policies may apply

Public interest in data that supports public policy




Follow up studies
Extensions
Citations

Public interest in data produced by public funder




Journal publication policies may apply

Disclosure
limitation

FOIA and state FOI laws may apply

Open data facilitates…








Transparent government
Scientific collaboration
Scientific verification
New forms of science
Participation in science
Hands-on education
Continuity of research

Sources: Fienberg et. al 1985; ICSU 2004; Nature 2009
18

Managing Confidential Data

v. 24 (January IAP
Session 1)
Sources of Confidentiality Restrictions for
University Research Data

Law, policy, ethics
Research design …
Information security






Overlapping laws
Different laws apply
to different cases
All affiliates subject to
university policy

Disclosure
limitation

(Not included: EU
directive, foreign
laws, classified data,
…)

19

Managing Confidential Data

v. 24 (January IAP
Session 1)
Legal Constraints
Intellectual
Property

Contract
Trade
Secret

Contract

Intellectua
l
Attribution
Moral Rights

Patent

Click-Wrap
TOU

License

Database Rights
Journal
Replication
Requirements

FOIA
State
FOI
Laws

Access
Rights

Funder
Open
Access

Fair Use

Copyrigh
t

HIPA
A
FERP
A

Trademark

DMCA
Common
Rule
45 CFR 26
EU Privacy
Directive

CIPSE
A
State
Privacy
Laws
Classifie
d
Sensitive
but
Unclassified

Rights of
Publicity
Privacy
Torts
(Invasion,
Defamation)

Potentially
Harmful

(Archeologica
l Sites,
Endangered
Export
Species,
Restriction
Animal
s
Testing, …)

EA
R

ITA

Confidentiality
45 CFR 46 [Overview]

Law, policy, ethics
Research design …
Information security

“The Common Rule”
 Governs human subject research








Disclosure
limitation

With federal funds/ at federal institution

Establishes rules for conduct of research
Establishes confidentiality and consent requirement
for for identified private data
However, some information may be required to be
disclosed under state and federal laws (e.g. in cases
of child abuse)
Delegates procedural decisions to Institutional
Review Boards (IRB‟s)
21

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

HIPAA [Overview]

Research design …
Information security

Health Insurance Portability and Accountability Act Disclosure
limitation
 Protects personal health care information for „covered
entities‟
 Detailed technical protection requirements
 Provides clearest legal standards for dissemination
 Provides a „safe harbor‟
 Has become an accepted practice for dissemination in
other areas where laws are less clear
 HITECH Act of 2009 extends HIPAA




Extends coverage to associated entities of covered entities
Additional technical safeguards
Adds breach reporting requirement

HIPAA provides three dissemination options …
22

Managing Confidential Data

v. 24 (January IAP
Session 1)
Dissemination under HIPAA
[option 1]


“safe harbor” -- remove 18 identifiers





Information security
Disclosure
limitation

Names
Social Security #‟s; Personal Account #‟s; Certificate/License #‟s; full face
photos (and comparable images); biometric id‟s; medical
Any other unique identifying number, characteristic, or code

[Asset identifiers]






Research design …

[Personal identifiers]




Law, policy, ethics

fax #‟s; phone #‟s; vehicle #‟s;
personal URL‟s; IP addresses; e-mail addresses
Device ID‟s and serial numbers

[Quasi identifiers]



dates smaller than a year (and ages > 89 collapsed into one category)
geographic subdivisions smaller than a state (except for 3 digits of zipcode, if
unit > 20,000 people)

And


23

Entity does not have actual knowledge [direct and clear awareness] that it
would be possible to use the remaining information alone or in
combination with other information to identify the subject

Managing Confidential Data

v. 24 (January IAP
Session 1)
Dissemination under HIPAA
[Option 2]


Law, policy, ethics
Research design …
Information security

“limited dataset” – leave some quasi-id‟s





Disclosure
limitation

Remove personal and asset identifiers
Permitted dates: dates of birth, death, service, years
Permitted geographic subdivisions: town, city, state, zip
code

And


24

Require access control and data use agreement.

Managing Confidential Data

v. 24 (January IAP
Session 1)
Dissemination under HIPAA
[Option 3]




Law, policy, ethics
Research design …
Information security

“qualified statistician” – statistical determination Disclosure
limitation
Have qualified statistician determine, using generally
accepted statistical and scientific principles and methods,
that the risk is very small that the information could be
used, alone or in combination with other reasonably
available information, by the anticipated recipient to
identify the subject of the information.
Important caveats






25

Methods and results of the analysis must be documented
No bright line for “qualified”, text of rule is:
“a person with appropriate knowledge of and experience with
generally accepted statistical and scientific principles and
methods for rendering information not individually identifiable.”
[Section 164.514(b)(1)]
No clear definitions for “generally accepted” or “very small” or
“reasonably available information”…
however, there are references in the federal register to
statistical publications to be used as “starting points”
Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

FERPA

Research design …
Information security

Family Educational Rights and Privacy Act




Applies schools that receive federal (D.O.E.) funding
Restricts use of student (not employee) information
Establishes







Right to privacy of educational records
Right to inspect and correct records (with appeal to Federal government)
Definition of public “directory” information
Right to block access to public “directory” information, and to other records

Educational records include:





Identified information about student
Maintained by institution
Not …






Disclosure
limitation

Employee records
Some medical and law-enforcement records
Records solely in the possession and for use by the creator (e.g. unpublished instructor notes)

Personally identifiable information includes:





26

Direct identifiers
Indirect (quasi) identifiers
Indirectly linkable identifiers
“Information requested by a person who the educational agency or institution reasonably
believes knows the identity of the student to whom the education record relates.”
Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

MA 201 CMR 17

Research design …
Information security

Disclosure
Standards for the Protection of Personal Information
limitation
 Strongest U.S. general privacy protection law
 Has been delayed/modified repeatedly
 Requires reporting of breaches






If data is not encrypted
Or encryption key is released in conjunction with data

Requires specific technical protections:






27

Firewalls
Encryption of data transmitted in public
Anti-virus software
Software updates
Managing Confidential Data

v. 24 (January IAP
Session 1)
Inconsistencies in Requirements
and Definitions
Inconsistent definitions of “personally identifiable”
Inconsistent definitions of sensitive information
Requirements for to de-identify jibes with statistical realities

Law, policy, ethics
Research design …
Information security
Disclosure
limitation

FERPA

HIPAA

Common
Rule

Coverage

Students in
Educational
Institutions

Medical Information
in “Covered
Entities”

Living persons in Mass. Residents
research by
funded
institutions

Identification
Criteria

-Direct
-Indirect
-Linked
-Bad intent (!)

-Direct
-Indirect
-Linked

-Direct
-Indirect
-Linked

-Direct

Sensitivity
Criteria

Any non-directory
information

Any medical
information

Private
information –
based on harm

Financial, State,
Federal
Identifiers

Management - Directory opt-out - Consent
- Consent
Requirement - [Implied] good
- Specific technical
- [Implied] risk
s 28
practice
safeguards
minimization
Managing Confidential Data
- Breach

MA 201 CMR
17

- Specific
technical
safeguards
v. 24 (January IAP
- Breach 1)
Session
Third Party Requirements

Law, policy, ethics
Research design …
Information security





Licensing requirements
Intellectual property requirements
Federal/state law and/or policy requirements






State protection of personal information laws
Freedom of information laws (FOIA & State FOI)
State mandatory abuse/neglect notification laws

And … think ahead to publisher requirements





Disclosure
limitation

Replication requirements
IP requirements

Examples





29

NSF requires data from funded research be shared
NIH requires a data sharing plan for large projects
Wellcome Trust requires a data sharing plan
Many leading journals require data sharing
Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

US (some other parts of the) World

Research design …
Information security
Disclosure
limitation








Sectoral approach
Designed to limit
government access
Limited core principles:
“reasonable”
expectation of privacy;
“unreasonable” search
and seizure; intrusion
onto seclusion; right of
publicity
Varies by state
Enforcement uneven
30








Omnibus approach
Designed to limit
commercial access
Broad core principles:
privacy is a fundamental
human right; not all
rights waivable, even by
contract
Varies by country
Enforcement uneven

Managing Confidential Data

v. 24 (January IAP
Session 1)
(Some) More Laws & Standards








California Laws

Lots of rules


Applies any data about California residents

Privacy policy

Disclosure

Reporting policy
EU Directive 95/46/EC

Data protection directive

Provides for notice, limits on purpose of use,
consent, security, disclosure, access, accountability

Forbids transfer of data to entities in countries

compliant with directive

Extended by EU Cookie Directive EU 2009/136,
requiring notice on use of web cookies

U.S. is not compliant but …
 Organizations can certify compliance with FTC
 No auditing/enforcement !
 Substantial criticism of this arrangement
Payment Card Industry (PCI) Security Standards

Governs treatment of credit card numbers

Requires reports, audits, fines

Detailed technical measures

Not a law, but helps define good practice


Nevada law mandates PCI standards



Law, policy, ethics
Research design …

Information security
Detailed technical controls over information
Disclosure
systems
limitation

Sarbanes-Oxley (aka, SOX, aka SARBOX)

Corporate and Auditing Accountability and
Responsibility Act of 2002

Applies to U.S. public company boards,
management and public accounting firms

Rarely applies to research in universities

Section 404 requires annual assessment of
organizational internal controls – but does not
specify details of controls
Classified Data

Separate and complex rules and requirements

The University does not accept classified data

But, may have “Controlled But Unclassified”
 Vaguely defined area
 Mostly government produced area
 Penalties unclear

And… export controlled information, under ITAR
and EAR
 Export control include technologies, software,
documentation/design documents may be
included
 Large penalties
… and over 1100 International Human Subjects
laws…


FISMA

Federal Information Security Management Act
(FISMA), Public Law (P.L.) 107-347.

31 Is starting to be applied to NIH sponsored Confidential Data
Managing
research

v. 24 (January IAP
Session 1)
Possible Legal/Regulatory Changes
for 2013-16

Law, policy, ethics
Research design …
Information security



2013






Thanks to the NSA. Everyone knows what “metadata” means
California – right to be forgotten, from social media, for teenagers
OECD Updated Principles for transborder privacy

Likely






Disclosure
limitation

New information privacy laws in selected states
Increased open data requirements from federal funders
Adoption of data availability requirements by increasing numbers of
journals

Possible




Anti “revenge-porn” laws; anti mug-shot blackmail laws
Right to be forgotten progress in EU
Adoption of data availability requirements by more journals, subject to
least restrictive terms




Changes to common rule to deal explicitly with informational risks


32

See: http://openeconomics.net/principles/
See: http://www.nap.edu/catalog.php?record_id=18614
Managing Confidential Data

v. 24 (January IAP
Session 1)
Researcher Responsibilities

Law, policy, ethics
Research design …
Information security









… for knowing the rules
Disclosure
limitation
… for identifying potentially confidential information in all forms
(digital/analogue; on-line/off-line)
… for notifying recipients of their responsibility to protect confidentiality
… for obtaining IRB approval for any human subjects research
… for obtaining proper training
… for following an IRB approved plan
… for obtaining approval of restricted data use agreements with providers, even if no
money involved

… and for proper





Storage
Access
Transmission
Disposal
Confidentiality is not an “IT problem”

33

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Key Concepts & Issues Review

Research design …
Information security



Privacy




Confidentiality




Information that would cause harm if disclosed and linked to an individual

Personally/individually identifiable information





Treatment of private, sensitive information

Sensitive information




Disclosure
limitation

Control over extent and circumstances of sharing

Private information
Directly or indirectly linked to an identifiable individual

Human subjects
A living person …

who is interacted with to obtain research data

who‟s private identifiable information is included in research data



Research





“Common Rule”




Law governing funded human subjects research

HIPAA




Systematic investigation
Designed to develop or contribute to generalizable knowledge

Law governing use of personal health information in covered and associated entities

MA 201 CMR 17


34

Law governing use of certain personal identifiers for Massachusetts residents

Managing Confidential Data

v. 24 (January IAP
Session 1)
Checklist: Identify Requirements

Law, policy, ethics
Research design …
Information security

Check if research includes …
 Interaction with humans  Common Rule applies

Disclosure
limitation

Check if data used includes identified …
 Student records  FERPA applies
 State, federal, financial id‟s  state law applies
 Medical/health information  HIPAA (likely) applies
 Human subjects & private info
 Common Rule applies
Understand what comprises a breach, typically:
 Release of unencrypted information
 To unauthorized entity
 Containing prohibited identifiers OR
sufficient information to link a record to an identifiable person
Check for other requirements/restriction on data dissemination:
 Data provider restrictions and University approvals thereof
 Open data requirements and norms
 University information policy
35

Managing Confidential Data

v. 24 (January IAP
Session 1)
Preliminary Recommendation:
Choose the Lesser of Three Evils






(1) Use only information that has already been made public, is
entirely innocuous, or has been declared legally deidentified;
or
(2) Obtain informed consent from research subjects, at the
time of data collection, that includes acceptance of the
potential risks of disclosure of personally identifiable
information; or
(3) Pay close attention to the technical requirements imposed
by law:



Apply appropriate IT security controls
Apply disclosure analysis limitation for publicly released data






36

Consider using suppression and recoding to achieve k-anonymity with ldiversity on data before releasing it or generating detailed figures, maps, or
summary tables.
For interactive disclosure systems, consider differentially private algorithms

Supplement data sharing with data-use agreements.
Consider providing for access to full data through a virtual data
enclave/restricted environment
Managing Confidential Data
v. 24 (January IAP
Session 1)
Law, policy, ethics

Resources




Research design …

E.A. Bankert & R.J. Andur, 2006, Institutional Review Board: Management and Function,
Jones and Bartlett Publishers

Information security

Disclosure
limitation
Ohm, Paul. "Broken promises of privacy: Responding to the surprising failure of anonymization."
UCLA Law Review 57 (2010): 1701.



D. J. Mazur, 2007. Evaluating the Science and ethics of Research on Humans, Johns Hopkins
University Press



IRB: Ethics & Human Research [Journal], Hastings Press
www.thehastingscenter.org/Publications/IRB/



Journal of Empirical Research on Human Research Ethics, University of California Press
ucpressjournals.com/journal.asp?j=jer



201 CMR 17 text
www.mass.gov/Eoca/docs/idtheft/201CMR17amended.pdf



FERPA Website
www.ed.gov/policy/gen/guid/fpco/ferpa/index.html



HIPAA Website
www.hhs.gov/ocr/privacy/



Common Rule Website
www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm



State laws
www.ncsl.org/Default.aspx?TabId=13489



MIT Resources



Research approvals:

cuhs@fas.harvard.edu



37

Data-management planning: libraries.mit.edu/guides/subjects/data-management/

General policies:

http://web.mit.edu/policies/

Managing Confidential Data

v. 24 (January IAP
Session 1)
Research design, methods,
management

Law, policy, ethics
Research design
…
Information security




Information Lifecycle
Reducing risk








38

Disclosure
limitation

Sensitivity of information
Partitioning

Decreasing identification
Managing confidentiality and
dissemination
[Summary]

Managing Confidential Data

v. 24 (January IAP
Session 1)
What’s wrong with this picture?

Law, policy, ethics
Research design …
Information security
Disclosure
limitation

Name

SSN

Birthdate

Zipcode

Gender

Favorite
Ice Cream

# of crimes
committed

A. Jones

12341

01011961

02145

M

Raspberr
y

0

B. Jones

12342

02021961

02138

M

Pistachio

0

C. Jones

12343

11111972

94043

M

Chocolat
e

0

D. Jones

12344

12121972

94043

M

Hazelnut

0

E. Jones

12345

03251972

94041

F

Lemon

0

F. Jones

12346

03251972

02127

F

Lemon

1

G. Jones

12347

08081989

02138

F

Peach

1

H. Smith

12348

01011973

63200

F

Lime

2

I. Smith

12349

02021973

63300

M

Mango

4

J. Smith

12350

02021973

63400

M

Coconut

16

K. Smith

12351

03031974

64500

M

Frog

32

L. Smith

12352

04041974

64600

M

Vanilla

64

M. Smith

12353

04041974

64700

F

Pumpkin

128

N.
39

12354

04041974

64800
F
Allergic
256
Managing Confidential Data

Smi

v. 24 (January IAP
Session 1)
What’s wrong with this picture?

Law, policy, ethics
Research design …
Information security

Identifier

Sensitive
Private
Identifier

Private
Identifier

Sensitive

Identifier

Name

SSN

Birthdate

Zipcode

Gender

Favorite
Ice Cream

# of crimes
committed

A. Jones

12341

01011961

02145

M

Raspberr
y

0

B. Jones

12342

02021961

02138

M

Pistachio

0

C. Jones

12343

11111972

94043

M

Chocolat
e

0

D. Jones

12344

12121972

94043

M

Hazelnut

0

E. Jones

12345

03251972

94041

F

Lemon

0

F. Jones

12346

03251972

02127

F

Lemon

1

G. Jones

12347

08081989

02138

F

Peach

1

H. Smith

12348

01011973

63200

F

Lime

2

I. Smith

12349

02021973

63300

M

Mango

4

J. Smith

12350

02021973

63400

M

Coconut

16

K. Smith

12351

03031974

64500

M

Frog

32

L. Smith

12352

04041974

64600

M

Vanilla

64

M. Smith 12353
40
N. Smith 12354

04041974

64700

04041974

64800

Disclosure
limitation

F
Pumpkin
128
Managing Confidential Data
F
Allergic
256

Mass resident

Californian

Twins, separated at birth?

FERPA too?

Unexpected Response?

v. 24 (January IAP
Session 1)
Name/Description

Examples

Comparison case: Official Statistics

•
•

U.S. Census dissemination
European statistical agencies

•

Data Sharing Systems for Open Access
Journals
American Political Science Association
Data Access and Research Transparency
[DART] Policy Initiative

Well-resourced data collector summarizes
tables/relational data in the form of summary
statistics and contingency tables

Example Use Cases

Privacy-Aware Replicable Research
Scholarly journals adopting policies for deposit
and disposition of data for verification and
replication. How to balance privacy and
replicability without intensive review?

Long-term Longitudinal data Collection
Data collections tracking individual subjects (and
possibly friends and relations) over decades

Computational Social Science

•
•
•

National Longitudinal Study of Adolescent
Health (Add Health)
Framingham Heart Study
Panel Study of Income Dynamics

Analysis of …

• Netflix
• Facebook
• Hubway
• GPS
• Blogs
Privacy TOOLS FOR SHARING
RESEARCH DATA -- Policy Update

“Big” data. New forms and sources of data.
Cutting-edge analytical methods and algorithms.

41

•
Exemplar: Social Media
Attribute Type

Examples

Data: Structure

-

network

Data: Attribute Types

-

Continuous/Discrete/
Scale: ratio/interval/ordinal/nominal

Data: Performance
Characteristics

-

10M-1B observations
Sample from stream of continuously
updated corpus
Dozens of dimensions/measures

Measurement: Unit of
Observation

-

Individuals; Interactions

Measurement:
Measurement type

-

Observational

Measurement:
Performance characteristic

-

High volume
Complex network structure
Sparsity
Systematic and sparse metadata

Management Constraints

-

License; Replication

Analysis methods

-

Bespoke algorithms (clustering);
nonlinear optimization; Bayesian
methods

Desired Outputs

-

Summary scalars (model
coefficients)
Summary table
Static /interactive visualization

-

More Information
•

42

Grimmer, Justin, and Gary King. "General purpose computerassisted clustering and conceptualization." Proceedings of the
National Academy of Sciences 108.7 (2011): 2643-2650.
•
King, Gary, Jennifer Pan, and Molly Roberts. "How
censorship in China allows government criticism but silences
collective expression." APSA 2012 Annual Meeting Paper.
2012.
Privacy TOOLS FOR SHARING et al. "Life in the network: the coming age of
•
Lazer, David,
computational
RESEARCH DATA -- Policy Update social science." Science (New York, NY)
Research Information Life Cycle Model
Long-term
access

Creation/Coll
ection

Re-use

Storage
/Ingest

External
dissemination/public
ation

Processing

Analysis

43

Internal
Sharing

Managing Confidential Data

v. 24 (January IAP
Session 1)
Research Design Decisions Relevant to
Confidential Information


Collection:






Consent/licensing terms
Methods
Measures

Re-use

Storage





Longterm
access

Systems information
security
Data structures and
partitioning

Dissemination




Vetting
Disclosure limitation
Data use agreements
44

External
dissemination/pu
blication
Analysi
s

Managing Confidential Data

Creation/C
ollection

Researc
h
methods
Statistical /
Computational
Frameworks
Data
Management
Systems
Legal / Policy
Frameworks
∂∂

Storag
e/Inge
st

Processing

Internal
Sharing

v. 24 (January IAP
Session 1)
Law, policy, ethics

Trade-offs


Anonymity vs. research utility






Research design
…

Immediate utility
Replicability and reuse

Information security
Disclosure
limitation

Sensitivity vs. research utility
(Anonymity * Sensitivity) vs. research costs/efforts

45

Managing Confidential Data

v. 24 (January IAP
Session 1)
Types of Sensitive Information



Law, policy, ethics
Research design
…
Information security

Information is sensitive, if, once disclosed there
Disclosure
limitation
is a “significant” likelihood of harm
IRB literature suggests possible categories of
harm:









46

loss of insurability
loss of employability
criminal liability
psychological harm
social harm to a vulnerable group
loss of reputational harm
emotional harm
dignitary harm
physical harm: risk of disease, injury, or death
Managing Confidential Data

v. 24 (January IAP
Session 1)
Levels of sensitivity




No widely accepted scale
Publicly available data not sensitive under “common rule”
Common rule anchors scale at “minimal risk”:

Law, policy, ethics
Research design
…
Information security
Disclosure
limitation

“if disclosed, the probability and magnitude of harm or discomfort anticipated are not greater in and of
themselves than those ordinarily encountered in daily life or during the performance of routine
physical or psychological examinations or tests”


Example: Harvard Research Data Security Policy


Level 5- Extremely sensitive information about individually identifiable people.
Information that if exposed poses significant risk of serious harm.
Includes information posing serious risk of criminal liability, serious psychological harm or other
significant injury, loss of insurability or employability, or significant social harm to an individual or
group.



Level 4 - Very sensitive information about individually identifiable people
Information that if exposed poses a non-minimal risk of moderate harm.
Includes civil liability, moderate psychological harm, or material social harm to individuals or
groups, medical records not classified as Level 5, sensitive-but-unclassified national security
information, and financial identifiers (as per HRCI standards).



Level 3- Sensitive information about individually identifiable people
Information that if disclosed poses a significant risk of minor harm.
Includes information that would reasonably be expected to damage reputation or cause
embarrassment; and FERPA records.



Level 2 – Benign information about individually identifiable people
Information that would not be considered harmful, but that as to which a subject has been
promised confidentiality.



Level 1 – De-identified information about people, and information not about people

47

Managing Confidential Data

v. 24 (January IAP
Session 1)
Evaluating sensitivity


If “sensitive information” was disclosed
and linked to a person or entity






Law, policy, ethics
Research design
…
Information security
Disclosure
limitation

Whom would it harm?
How much would the harm be?
What would the likelihood of harm
manifesting?
An the individual at risk protect against or
mitigate harm?

48

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

IRB Review Scope


Research design
…

IRB approval needed for all:

Information security

federally-funded research;
Disclosure
limitation
 or any research at (almost all) institutions receiving federal funding
that involve “human subjects”.
( Any organization operating under a general “federal-wide
assurance”)
 All human subjects research at MIT
Research: "a systematic investigation, including research development,
testing and evaluation, designed to develop or contribute to generalizable
knowledge.”






Human subject:
“a living individual about whom an investigator (whether professional
or student) conducting research obtains:





Data through intervention or interaction with the individual, or
Identifiable private information.”

See

www.hhs.gov/ohrp/
49

Managing Confidential Data

v. 24 (January IAP
Session 1)
Research not requiring IRB approval




Non-research:
not generalizable knowledge & systematic inquiry
Non-funded:
institution receives no federal funds for research
Not human subject:





Law, policy, ethics
Research design
…
Information security
Disclosure
limitation

No living people described
Observation only AND no private identifiable information is obtained

Human Subjects, but “exempt” under 45 CFR 46







use of existing, publicly-available data
use of existing non-public data, if data is individuals cannot be directly or
indirectly identified
research conducted in educational settings, involving normal educational
practices
taste & food quality evaluation
federal program evaluation approved by agency head
observational, survey, test & interview of public officials and candidates
(in their formal capacity, or not identified)



Caution not all “exempt” is exempt…

Some research on prisoners, children, not exemptable

Some universities require review of “exempt” research



MIT requires review of all human subject research



See:
www.hhs.gov/ohrp/humansubjects/guidance/
decisioncharts.htm

50

Managing Confidential Data

v. 24 (January IAP
Session 1)
IRB’s and Confidential Information

Law, policy, ethics
Research design
…
Information security




Disclosure
limitation

IRB‟s review consent procedures and documentation
IRB‟s may review data management plans








May require procedures to minimize risk of disclosure
May require procedures to minimize harm resulting from
disclosure

IRB‟s make determination of sensitivity of information
-- potential harm resulting from disclosure
IRB‟s make determination regarding whether data is
de-identified for “public use”
[see NHRPAC,
“Recommendations on Public Use Data Files”]
51

Managing Confidential Data

v. 24 (January IAP
Session 1)
Human Subjects @MIT:
COUHES

Law, policy, ethics
Research design
…
Information security




The Institutional Review Board (IRB) must approve all human subjects research at MIT Disclosure
prior to
limitation
data collection or use
Research comprises




Research involves human subjects if:










Surveys
Behavioral experiments
Educational tests and evaluations
Analysis of identified private data collected from people
(your e-mail inbox, logs of web-browsing activity, facebook activity, ebay bids … )

The IRB will:






There is any interaction or intervention with living humans; or
If identifiable private data about living humans is used

Some examples of human subject research involving information




"a systematic investigation, including research development, testing and evaluation, designed to
develop or contribute to generalizable knowledge.”

Assess research protocol
Identify whether research is exempt from further review and management
Assess sensitivity of information

Does your research require review, a quick checklist:


52

http://web.mit.edu/committees/couhes/quickguide.shtml

Managing Confidential Data

v. 24 (January IAP
Session 1)
Recommended Planning
Practices

Law, policy, ethics
Research design
…
Information security
Disclosure
limitation

Identify potential type and magnitude of harm;
assess risk of harm, if disclosure were to occur
 Identify risk mitigation opportunities across
information lifecycle
 Plan for publication, data dissemination, research
replication, and reuse


53

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Reducing Risk in Data Collection

Research design
…
Information security



Disclosure

Avoid collecting sensitive information, unless it is limitation
required by research design, method, or hypothesis





Collect sensitive information in private settings





Unnecessary sensitive information  not minimal risk
Reducing sensitivity  higher participation, greater honesty
Reduces risk of disclosure
Increases participation

Reduce sensitivity through indirect measures


Less sensitive proxies







54

E.g. Implicit association test [Greenwald, et al. 1998]

Unfolding brackets
Group response collection
Random response technique [Warner 1965]
Item count/unmatched count/list experiment technique

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Reducing Risk in Data Collection

Research design
…
Information security



Disclosure

Avoid collecting sensitive information, unless it is limitation
required by research design, method, or hypothesis





Collect sensitive information in private settings





Unnecessary sensitive information  not minimal risk
Reducing sensitivity  higher participation, greater honesty
Reduces risk of disclosure
Increases participation

Reduce sensitivity through indirect measures


Less sensitive proxies







55

E.g. Implicit association test [Greenwald, et al. 1998]

Unfolding brackets
Group response collection
Random response technique [Warner 1965]
Item count/unmatched count/list experiment technique

Managing Confidential Data

v. 24 (January IAP
Session 1)
Randomized Response
Technique

Law, policy, ethics
Research design
…
Information security
Disclosure
limitation

Sensitiv
e
questio
n
Subjec
t rolls
a die >
2

Record
Answer

Say
“YES”

56

Managing Confidential Data

Variations:
- Ask two different
questions
- Item counts with
sensitive and nonsensitive items – eliminate
subject-randomization
- Regression analysis
methods

v. 24 (January IAP
Session 1)
Law, policy, ethics

Our Table – Less (?) Sensitive

Research design
…

Less (?)
Sensitive

Information security

Name

SSN

Birthdate

Zipcode

Gender

Favorite
Ice Cream

Treat?

#
acts
*

A. Jones

12341

01011961

02145

M

Raspberry

0

0

B. Jones

12342

02021961

02138

M

Pistachio

1

20

C. Jones

12343

11111972

94043

M

Chocolate

0

0

D. Jones

12344

12121972

94043

M

Hazelnut

1

12

E. Jones

12345

03251972

94041

F

Lemon

0

0

F. Jones

12346

03251972

02127

F

Lemon

1

7

G. Jones

12347

08081989

02138

F

Peach

0

1

H. Smith

12348

01011973

63200

F

Lime

1

17

I. Smith

12349

02021973

63300

M

Mango

0

4

J. Smith

12350

02021973

63400

M

Coconut

1

18

K. Smith

12351

03031974

64500

M

Frog

0

32

L. Smith

12352

04041974

64600

M

Vanilla

1

65

M. Smith

12353

04041974

64700

F

Pumpkin

0

128

N. Smith

12354

04041974

64800

F

Allergic

1

256

Disclosure
limitation

* Acts = crimes if treatment = 0; crimes + acts of generosity if treatment =1
57

Managing Confidential Data

v. 24 (January IAP
Session 1)
Randomized Response –
Pros and Cons







Can substantially reduce direct legal risks of disclosure
Can increase response rate
Can decrease mis-reporting

Disclosure
limitation

Warning!






Research design
…
Information security

Pros


Law, policy, ethics

None of the randomized models uses a formal measure of disclosure limitation
Some would clearly violate measures (such as differential privacy) -- we‟ll see in
section 4
Do not use as a replacement for disclosure limitation

Other Issues





Loss of statistical efficiency
(if compliance would otherwise be the same)
Complicates data analysis, especially model-based analysis
Leaving randomization up to subject can be unreliable
May provide less confidentiality protection if:






58

Randomization is incomplete
Records of randomization assignment are kept
Lists of responses overlap across questions
Sensitive question response is large enough to dominate overall response
Non-sensitive question responses are extremely predictable, or publicly observable

Managing Confidential Data

v. 24 (January IAP
Session 1)
Partitioning Information



Reduces risk in information management
Partition data information based on sensitivity













Research design
…
Information security
Disclosure
limitation

Identifying information
Descriptive information
Sensitive information
Other information

Segregate




Law, policy, ethics

Storage of information
Access regimes
Data collections channels
Data transmission channels

Plan to segregate as early as feasible in data collection
and processing
Link segregated information with artificial keys …
59

Managing Confidential Data

v. 24 (January IAP
Session 1)
Partitioned table
Not Identified
Name

SSN

Birthdate

Zipcode

Gender

LINK

LINK

Favorite
Ice Cream

Treat

#
act
s

A.
Jones

12341

0101196
1

02145

M

1401

1401

Raspberry

0

0

283

Pistachio

1

20

B.
Jones

12342

0202196
1

02138

M

283

8979

Chocolate

0

0

C.
Jones

12343

11111972

94043

M

8979

7023

Hazelnut

1

12

1498

Lemon

0

0

D.
Jones

12344

1036

Lemon

1

7

3864

Peach

0

1

E.
Jones

2124

Lime

1

17

F.

4339

Mango

0

4

6629

Coconut

1

18

9091

Frog

0

32

9918

Vanilla

1

65

4749

Pumpkin

0

12
8

8197

Allergic

1

25
6

1212197
2

94043

12345

0325197
2

94041

F

1498

12346

0325197
2

02127

F

1036

Jon
es
G.

12347
Jon
es

H. Smith
I. Smith
60

12348
12349

0808198
9

02138

0101197
3

63200

0202197
3

63300

M

F

F

7023

3864

2124

M
4339
Managing Confidential Data

v. 24 (January IAP
Session 1)
Managing Sensitive Data
Collection: Data Organization

Law, policy, ethics
Research design
…
Information security





Disclosure
limitation

Separate:
sensitive measures, (quasi)-identifiers, other
measures
If possible avoid storing identifiers with measures:





For sensitive data:





Collect identifying information beforehand
Assign opaque subject identifiers
Collect on-line directly (with appropriate protections); or
Encrypt collection devices/media (laptops, usb keys, etc)

For very/extremely sensitive data:



61

Collect with oversight directly; then
Store on encrypted device and;
Transfer to secure server as soon as feasible
Managing Confidential Data

v. 24 (January IAP
Session 1)
Choosing Linking Keys






Most resistant to relink
Mapping from original id to random keys is highly sensitive
Must keep and be able to access mapping to add new identified data
Most computer-generated random numbers are not sufficient by themselves















More troublesome to compute
Same id‟s + same key + same “salt” produces same values  facilitates merging
ID‟s can be recovered if key is exposed, cracked, or algorithm weak

Security is well understood
Tools available to compute
Same id‟s produce same hashes  easier to merge new identified data
ID‟s cannot be recovered from hash because hash loses information
ID‟s can be confirmed if identifying information is known or guessable

Cryptographic Hash + secret key









Most are PSEUDO random – predictable sequences
Use a cryptographic secure PRNG: Blum Blum Shub, AES (or other block cypher) in counter mode OR
Use real random numbers (e.g. from physical sources ) OR
Use a PRNG with a real random seed to random the order of the table; then another to generate the ID‟s for this randomly
ordered table

Cryptographic Hash
e.g. SHA-256




Disclosure
limitation

Encryption




Research design
…
Information security

Entirely randomized


Law, policy, ethics

Security is well understood
Tools available to compute
Same id‟s produce same hashes  easier to merge new identified data
ID‟s cannot be recovered from hash because hash loses information
ID‟s cannot be confirmed unless key is also known

Do not choose arbitrary mathematical functions of other identifiers!

62

Managing Confidential Data

v. 24 (January IAP
Session 1)
When is information anonymous?


Remove all HIPAA ID‟s







Research design
…
Information security
Disclosure
limitation

OR anonymous observation
only






Anonymous under HIPAA
Anonymous under Mass Law
Probably anonymous under
common rule, not guaranteed
But new/innovative forms of data
collection may pose anonymity
challenges

Law, policy, ethics

No interaction with subject
No collection of identifying data
Not “human subjects” under
common rule

OR determined by a statistical
expert to be anonymous
63

Managing Confidential Data

v. 24 (January IAP
Session 1)
Anonymous Data Collection:
Pros & Cons








Research design
…
Information security

Pros


Law, policy, ethics

Presumption that data is not identifiable
May increase participation
May increase honesty

Disclosure
limitation

Cons





64

Barrier to follow-up, longitudinal studies
Can conflict with quality control, validation
Data still may be indirectly identifiable if respondent
descriptive information is collected
Linking data to other sources of information may have
large research benefits

Managing Confidential Data

v. 24 (January IAP
Session 1)
Anonymous Data Collection Methods





Law, policy, ethics
Research design
…
Information security

Trusted third party intermediates
Disclosure
limitation
Respondent initiates re-contacts
No identifying information recorded
Use id‟s randomized to subjects, destroy mapping

65

Managing Confidential Data

v. 24 (January IAP
Session 1)
Field Data Collection Challenges






Whole-disk-encrypted laptop
Plus, Encrypted cloud backup solutions: CrashPlan, BackBlaze,
SpiderOak

Small data, short term




Encrypted network file transfer (e.g. SFTP, part of ssh )
Encrypted/tunneled network file system (e.g. Expandrive)

Where network connection is less reliable, high bandwidth




Research design
…

Information security
Where network connection is readily available,
Disclosure
easy to transfer as collected, or enter on remote hardened
limitation
system





Law, policy, ethics

Encrypted USB keys (e.g. w/IronKey, TrueCrypt, PGP)

Foreign Travel






66

Be aware of U.S. EAR export restrictions, use commercial or
widely-available open encryption software only. Do not use
bespoke software.
Be aware of country import restrictions (as of 2008): Burma,
Belarus, China, Hungary, Iran, Israel, Morocco, Russia, Saudi
Arabia, Tunisia, Ukraine
Encrypt data if possible, but don‟t break foreign laws. Check with
v. 24 (January IAP
department of state. Managing Confidential Data
Session 1)
Online/Electronic data collection
challenges












Vendor could retain IP addresses, identifying cookies, etc., even if not intended by researcher

Recommendation






Data collected from subjects from other states / countries could subject you to laws in that
jurisdiction
Jurisdiction may depend on residency of subject, availability of data collection instrument in
jurisdiction, or explicit data collections efforts within jurisdiction
For international data collection, be aware of EU Privacy Directive AND EU “Cookie” Directive
(EU 2009/136)

Vendor




Cookies provide a way to link data more easily
May or may not explicitly identify subject

Jurisdiction




Disclosure
IP addresses can be logged automatically by host, even if not intended by researcher
limitation
IP addresses can trivially be observed as data is collected
Partial ID numbers can be used for probabilistic geographical identification at sub-zipcode
levels

Cookies may be identifiers




Research design
…
Information security

IP addresses are identifiers


Law, policy, ethics

Use only vendors that certify compliance with your confidentiality policy
Do not retain IP numbers if data is being collected anonymously
Use SSL/TLS encryption unless data is non-sensitive and anonymous

Some tools for anonymizing IP addresses and system/network logs
www.caida.org/tools/taxonomy/anonymization.xml
67

Managing Confidential Data

v. 24 (January IAP
Session 1)
Cloud computing risks




Cloud computing decouples physical and computing
infrastructure
Increasingly used for core-IT, research computing, data
collection, storage, and analysis
Confidentiality issues









Law, policy, ethics
Research design …
Information security
Disclosure limitation

Auditing and compliance
Access and commingling of data
Location of data and services and legal jurisdiction
Cooperation with incident response
Cooperation with your legal team
Diligence in resisting government legal requests

Mitigation strategies


Careful review of TOS






Certification






FISMA
ISO
Audits

Contracting





Security standards
Breach procedures
Support/remedies

Contract riders
Master service level agreements

Encryption






68

Strong (256 bit AES)
Client-side
Verified
Keys never stored in cloud, never transmitted in clear
Keys escrowed within organization or w/organizational
representatives

Managing Confidential Data

v. 24 (January IAP
Session 1)
Agency Certificates, …


DHHS Certificate of Confidentiality








Research design
…
Information security
Disclosure
limitation

For NIH, CDC, FDA, …
Optional, but recommended if available
Protects against many types of forced legal disclosure of
confidential information
May not protect against all state disclosure law
Does not protect against voluntary disclosures by
researcher/research institutions

National Institute of Justice Privacy Certificate






Law, policy, ethics

Required to conduct research
Certifies PI‟s compliance to NIJ
NIJ Research is protected against some additional forms of
forced disclosure

Sworn Census Employee -- Title 13



69

Required for access to restricted Census Bureau
Extends Title-13 penalties and protections
Managing Confidential Data

v. 24 (January IAP
Session 1)
Confidentiality & Consent
Best practice is to describe in consent form...







Research design
…
Information security
Disclosure
limitation

Practices in place to protect confidentiality
Plans for making the data available, to whom, and under what
circumstances, rationale.
Limitations on confidentiality (e.g. limits to a certificate of
confidentiality under state law, planned voluntary disclosure)
Description of data reuses and benefits therof

Consent form should be consistent with your:






Law, policy, ethics

Data management plan
Data sharing plans and requirements
Terms of service for systems used in data storage and processing

Not generally best practice to promise





70

Unlimited confidentiality
Destruction of data
Expiration of data
Restriction of all data to original researchers

Managing Confidential Data

v. 24 (January IAP
Session 1)
Data Management Plan












Disclosure
limitation

Documentation
Backup and recovery
Review

Treatment of confidential information








Any NIH request over $500K
All NSF proposals after 12/31/2010
NIJ
Wellcome Trust
Any proposal where collected data will be a resource beyond the project

Safeguarding data during collection




Research design
…
Information security

When is it required?


Law, policy, ethics

Overview: http://www.icpsr.org/DATAPASS/pdf/confidentiality.pdf
Separation of identifying and sensitive information
Obtain certificate of confidentiality, other legal safeguards
De-identification and public use files

Dissemination






Archiving commitment (include letter of support)
Archiving timeline
Access procedures
Documentation
User vetting, tracking, and support

71

One size does not fit all projects.
Managing Confidential Data

v. 24 (January IAP
Session 1)
Data Management Plan
Outline

Law, policy, ethics

Research design
…
Information security











nature of data {generated, observed,
experimental information; amples; publications;
physical collections; software; models}
scale of data



Plans for depositing in an existing public
database



Access procedures





Naming conventions



Types of IP rights in data



Protections provided



Dispute resolution process

Quality Assurance [if not described in main
proposal]


Procedures for ensuring data quality in
collections, and expected measurement error

Timeframe for access



Cleaning and editing procedures

Technical access methods



Validation methods

Restrictions on access



Requirements for data destruction, if applicable



Procedures for long term preservation

Methods



Procedures

Institution responsible for long-term costs of
data preservation

Frequency





Replication

Succession plans for data should archiving
entity go out of existence



Existing Data [ If applicable ]

Version management
Recovery guarantees

description of existing data relevant to the
project
plans for integration with data collection





added value of collection, need to collect/create
new data

Security

Metadata and documentation


Metadata to be provided



Metadata standards used



Confidentiality concerns
Access control rules





Technical Controls



Storage format and archival justification

Procedural controls



Generation and dissemination formats and
procedural justification




Formats



Archiving and Preservation



Reasons not to share or reuse



Institutional requirements and plans to meet
them









Provider requirements and plans to meet them



Facilities









Potential scope or scale of use



Legal Requirements



Potential secondary users



Storage, backup, replication, and
versioning





Audience


Intellectual Property Rights
Entities who hold property rights







Cost of permanent archiving









File organization

Access charges



Data Organization [if complex]

Disclosure
limitation
Cost of preparing data and documentation

Budget



Embargo periods





Quality assurance procedures for metadata
and documentation





Access and Sharing


Planned documentation and supporting
materials



Data description



Ethics and privacy


Informed consent



Protection of privacy



Other ethical issues

Restrictions on use

Responsibility



Adherence


When will adherence to data management plan
be checked or demonstrated



Who is responsible for managing data in the
project



Who is responsible for checking adherence to
data management plan

Treatment of field notes, and collection records

72



Individual or project team role responsible for
data management

Managing Confidential Data

v. 24 (January IAP
Session 1)
Data Management Consulting:
Libraries @MIT


The libraries can help:





Assist with data management plans
Individual consultation/collaboration with researchers
General workshops & guides
Dissemination of public data through



DSpace@MIT
Referrals to subject-based repositories

For more information:
libraries.mit.edu/guides/subjects/data-management/
<data-management@mit.edu>

73

Managing Confidential Data

v. 24 (January IAP
Session 1)
Data Management Plans Examples
(Summaries)














Law, policy, ethics
Research design
…

Information security
Example 1
The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the
Disclosure
New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial
limitation
features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if
not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical
data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting
subjects. Therefore, we are not planning to share the data.
Example 2
The proposed research will include data from approximately 500 subjects being screened for three bacterial
sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported
demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens
provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information.
Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains
the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and
associated documentation available to users only under a data-sharing agreement that provides for: (1) a
commitment to using the data only for research purposes and not to identify any individual participant; (2) a
commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or
returning the data after analyses are completed.
Example 3
This application requests support to collect public-use data from a survey of more than 22,000 Americans over the
age of 50 every 2 years. Data products from this study will be made available without cost to researchers and
analysts. https://ssl.isr.umich.edu/hrs/
User registration is required in order to access or download files. As part of the registration process, users must
agree to the conditions of use governing access to the public release data, including restrictions against attempting
to identify study participants, destruction of the data after analyses are completed, reporting responsibilities,
restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource.
Registered users will receive user support, as well as information related to errors in the data, future releases,
workshops, and publication lists. The information provided to users will not be used for commercial purposes, and
will not be redistributed to third parties.
FROM NIH, [grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex]

74

Managing Confidential Data

v. 24 (January IAP
Session 1)
External Data Usage
Agreements


Between provider and
individual






Between provider and
University





Careful – you‟re liable
University will not help if you
sign
University review and OSP
signature strongly
recommended

University Liable
Requires University Approved
Signer : OSP

Avoid nonstandard protections
whenever possible


75



Sample Standardized DUA
Response;
Controls on Confidential Information
Harvard University has developed extensive technical and administrative
procedures to be used with all identified personal information and other
confidential information. The University classifies this form of information internally
as "Harvard Confidential Information" -- or HCI.
Any use of HCI at Harvard includes the following safeguards:
- Systems security. Any system used to store HCI is subject to a checklist of
technical and procedural security measures including: operating system and
applications must be patched to current security levels, a host based firewall is
enabled, anti virus software is enabled and the definitions file is current
- Server security. Any server used to distribute HCI to other systems (e.g. through
providing remote file system), or otherwise offering login access, must employ
additional security measures including: connection through a private network only;
limitation on length of idle sessions; limitations on incorrect password attempts;
and additional logging and monitoring.
- Access restriction: an individual is allowed to access HCI only if there is a
specific need for access. All access to HCI is over physically controlled and/or
encrypted channels.
- Disposal processes: including secure file erasure and document destruction.
- Encryption: HCI must be strongly encrypted whenever it is transmitted across a
public network, stored on a laptop, or stored on a portable device such as a flash
drive or on portable media.
This is only a brief summary. The full University security policy can be found here:
http://security.harvard.edu/heisp
And a more detailed checklist used to verify systems compliance is found here:

http://security.harvard.edu/files/resources/forms/
DUA‟s can impose very specific
and detailed requirements
These safeguards are applied consistently throughout the university, we believe
that these requirements offer stringent protection for the requested data. And
Compatible in spirit does not
these requirements will be applied in addition to any others required by specific
apply compatibility in legal
data use agreement.
practice
Managing Confidential Data
v. 24 (January IAP
Session 1)
Use higher level controls,
Checklist: Research Design …

Law, policy, ethics
Research design
…
Information security










Disclosure
Does research involve human subjects?
limitation
What are possible harms that could occur if identified information was disclosed?
Is information collected benign, sensitive, very sensitive, or extremely sensitive? (IRB
makes final determination)
Can the sensitivity of the information be reduced?
Can research be carried out with anonymity?
Can research data be de-identified during collection?
How can identifying information, descriptive information and sensitive information be
segregated?
Have you:




Have you written/verified the following to be consistent with final plans for analysis and
dissemination:






Completed appropriate training

data management plan?
consent documents?
application for certificate of confidentiality?

If you are using third party/cloud storage or services on confidential information, have
you:





Reviewed the Terms of Service
Verified appropriate security certifications
Used strong client-side encryption
Planned to manage encryption-keys
76
Managing Confidential Data

v. 24 (January IAP
Session 1)
Preliminary Recommendations
Planning and methods


Review research design for sensitive identified information






Design research methods to reduce sensitivity











Separate sensitive/identifying information at collection, if feasible
Link separate files using cryptographic hash of identifiers plus secret key; or cryptographic-strength
random number

Incorporate extra protections for on-line data collection







Recognize benefits of data sharing
Ask for consent to share data appropriately
Apply for a certificate of confidentiality where data is very sensitive

Separate sensitive information




Eliminate sensitive/identifying information not needed for research questions
Consider randomized response, list experiment design

Design human subjects plan with information management in mind




Information which would cause harm if disclosed
HIPAA identifiers
Other indirectly identifying characteristics

Use vendor agreements that specify anonymity and confidentiality protections
Do not collect IP addresses if possible, regularly anonymize and purge otherwise
Restrict display of very sensitive information in user interfaces
Limit on-line collection of very sensitive information

Using cloud services


77

Use strong client-side encryption for confidential data

Managing Confidential Data

v. 24 (January IAP
Session 1)
Resources

Law, policy, ethics
Research design
…
Information security













Disclosure

E.A. Bankert & R.J. Andur, 2006, Institutional Review Board:
limitation
Management and Function,
Jones and Bartlett Publishers
R. Groves, et al., 2004, Survey Methodology, John Wiley &
sons.
J.A. Fox, P.E. Tracy, 1986, Randomized Response, Sage
Publications.
R.M. Lee, 1993, Doing Research on Sensitive Topics, Sage
Publications.
D. Corstange, 2009, "Sensitive Questions, Truthful Answers?
Modeling the List Experiment with LISTIT", Political Analysis
17:45–63
ICPSR Data Enclave
[www.icpsr.umich.edu/icpsrweb/ICPSR/access/restricted/encla
ve]
Murray Research Archive

[www.murray.harvard.edu]


NIH Certificate of Confidentiality Kiosk
Managing
[grants.nih.gov/grants/policy/coc] Confidential Data

78

v. 24 (January IAP
Session 1)
Information Security

Law, policy, ethics
Research design …
Information security
Disclosure limitation







79

Security principles
FISMA
Categories of technical
controls
A simplified approach
[Summary]

Managing Confidential Data

v. 24 (January IAP
Session 1)
Core Information Security Concepts

Law, policy, ethics
Research design …
Information security



Security properties









Disclosure limitation

Confidentiality
Integrity
Availability
[Authenticity]
[Nonrepudiation]

Security practices





80

Defense in depth
Threat modeling
Risk assessment
Vulnerability assessment

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Risk Assessment

Research design …
Information security



[NIST 800-100, simplification of NIST 800-30]Disclosure limitation
Threat
Modeling
Analysis
- likelihood

System
Analysis

- impact
- mitigating controls

Vulnerability
Identification

81

Institute
Selected
Controls

Testing and
Auditing

Information Security Control Selection Process

Managing Confidential Data

v. 24 (January IAP
Session 1)
Risk Management Details

Law, policy, ethics
Research design …
Information security











System Characterization
Threat Identification
Control analysis
Likelihood determination
Impact Analysis
Risk Determination
Control recommendation
Results documentation

82

Managing Confidential Data

Disclosure limitation

v. 24 (January IAP
Session 1)
Classes of threats and vulnerabilities

Law, policy, ethics
Research design …
Information security



Sources of threat






Disclosure limitation

Natural
Unintentional Human
Intentional

Areas of vulnerability


Logical






Physical






Computer systems
Network
Backups, disposal, media

Social





83

Data at rest in system
Data in motion across networks
Data being processed in applications

Social engineering
Mistakes
Insider threats

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Simple Control Model

Information security

Access Control

Request/Resp
onse

Disclosure limitation

Resource
Auditing

Authorization

Credential
s

Authentication

Client

Research design …

Log

Resource Control Model
External Auditor

84

Managing Confidential Data

v. 24 (January IAP
Session 1)
Operational and Technical
Controls [NIST 800-53]

Law, policy, ethics
Research design …
Information security



Operational














Disclosure limitation

Personnel security
Physical and environmental protection
Contingency planning
Configuration management
Maintenance
System and information integrity
Media protection
Incident Response
Awareness and training

Technical Controls





85

Identification and authentication
Access control
Audit and accountability
System and communication protection

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Key Information Security Standards


Information security
Disclosure limitation

Comprehensive Information Security Standards






Research design …

FISMA – framework for non-classified information security
in federal government.
ISO/IEC 27002 – framework of similar scope to FISMA, used
internationally
PCI – Payment card industry security standards. Used by major
payment card companies, processors, etc.

Related Certifications


FIPS-compliance and certification





SAEE 16/SAS 70 Audits – Type 2





Independent audit of controls and control objectives
Does not establish sufficiency of control objectives

CISSP -- Certified Information Systems Security Professional


86

Establishes standards for cryptographic methods and modules
Be aware that FIPS-certification often limited to algorithm used,
and not entire system

Widely recognized certification for information security
professionals
Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

FISMA Overview

Research design …
Information security

Disclosure limitation
Federal Information Security Management Act of 2002
 All federal agencies required to develop agency-wide
information security plan
 NIST published extensive list of recommendations
 Federal sponsors seem to be trending to FISMA as
best practice for managing confidential data
produced by award
 Identifies risk and impact level; monitoring;
technical and procedural controls
 MIT typical controls:
less than FISMA “low”

87

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Security Control Baselines

Research design …
Information security
Disclosure limitation

Access Control
Low (impact)

Medium-High (impact), adds…

Policies; Account management *; Access
Enforcement; Unsuccessful Login Attempts;
System Use Notification; Restrict Anonymous
Access*; Restrict Remote Access*; Restrict
Wireless Access*; Restrict Mobile Devices*;
Restrict use of External Information Systems*;
Restrict Publicly Accessible Content

Information flow enforcement; Separation of
Duties; Least Privilege; Session Lock

Security Awareness and Training
Policies; Awareness; Training; Training Records

Audit and Accountability
Policies; Auditable Events *; Content of Audit
Records *; Storage Capacity; Audit Review,
Analysis and Reporting *; Time Stamps *;
Protection of Audit Information; Audit Record
Retention; Audit Generation

Audit Reduction; Non-Repudiation

Security Assessment and Authorization
Policies; Assessments* ; System Connections;
Planning; Authorization; Continuous Monitoring
88
Managing Confidential Data

v. 24 (January IAP
Session 1)
Security Control Baselines

Law, policy, ethics
Research design …
Information security
Disclosure limitation

Configuration Management
Low (impact)

Medium-High (impact), adds…

Policies; Baseline*; Impact Analysis; Settings*;
Least Functionality; Component Inventory*

Change Control; Access Restrictions for Change;
Configuration Management Plan

Contingency Planning
Policies; Plan * ; Training *; Plan Testing*;
System backup*; Recovery & Reconstitution *

Alternate storage site; Alternate processing site;
Telecomm

Identification and Authentication
Policies; Organizational Users*; Identifier
Management; Authenticator Management *;
Authenticator Feedback; Cryptographic Module
Authentication; Non-Organizational Users

Device identification and authentication

Incident Response
Policies; Training; Handling *; Monitoring;
Reporting*; Response Assistance; Response
Plan

Testing

Maintenance
Policies; Control*; Non-Local Maintenance
Tools; Maintenance scheduling/timeliness
Restrictions*; Personnel Restrictions*
89
Managing Confidential Data
v. 24 (January IAP
Session 1)
Security Control Baselines

Law, policy, ethics
Research design …
Information security
Disclosure limitation

Media Protection
Low (impact)

Medium-High (impact), adds…

Policies; Access restrictions*; Sanitization

Marking; Storage; Transport

Physical and Environmental Protection
Policies; Access Authorizations; Access Control*;
Monitoring*; Visitor Control *; Records*;
Emergency Lighting; Fire protection*;
Temperature, Humidity, water damage*; Delivery
and removal

Network access control; Output device Access
control; Power equipment access, shutoff,
backup; Alternate work site; Location of
information system components; information
leakage

Planning
Policies, Plan, Rules of Behavior; Privacy Impact
Assessment

Activity planning

Personnel Security
Policies; Position categorization; Screening;
Termination; Transfer; Access Agreements; ThirdParties; Sanctions

Risk Assessment
Policies; Categorization Assessment; Vulnerability
Scanning*
90
Managing Confidential Data

v. 24 (January IAP
Session 1)
Security Control Baselines

Law, policy, ethics
Research design …
Information security
Disclosure limitation

System and Services Acquisition
Low (impact)

Medium-High (impact), adds…

Policies; Resource Allocation; Life Cycle Support;
Acquisition*; Documentation; Software usage
restrictions; User installed software restrictions;
External information System Services restrictions

Security Engineering; Developer configuration
management; Developer security testing; supply
chain protection; Trustworthiness

System and Communications Protection
Policies; Denial of Service Protection; Boundary
protection*; Cryptographic key Management;
Encryption; Public Access Protection;
Collaborative computing devices restriction;
Secure Name resolution*

Application Partitioning; Restrictions on Shared
Resources; Transmission integrity &
confidentiality; Network Disconnection Procedure;
Public Key Infrastructure Certificates; Mobile
Code management; VOIP management; Session
authenticity; Fail in known state; Protection of
information at rest; Information system partitioning

System and Information Integrity
Policies, Flaw remediation*; Malicious code
protection*; Security Advisory monitoring*;
Information output handling

Information system monitoring; Software and
information integrity; Spam protection; Information
input restrictions & validation; Error handling

Program Management
Plan; Security Officer Role; Resources; Inventory; Performance Measures; Enterprise architecture;
Risk management strategy; Authorization process; Mission definition
91
Managing Confidential Data
v. 24 (January IAP
Session 1)
HIPAA Requirements


Administrative controls









Physical controls








Access authorization, establishment, modification, and termination.
Training program
Vendor compliance
Disaster recovery
Internal audits
Breach procedures
Disposal
Access to equipment
Access to physical environment
Workstation environment

Technical controls







92

Intrusion protection
Network encryption
Integrity checking
Authentication of communication
Configuration management
Risk analysis

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Delegating Systems Security

Research design …
Information security
Disclosure limitation







What are goals for confidentiality, integrity, availability?
What threats are envisioned?
What controls are in place?
Is there a checklist?
Who is responsible for technical controls?




Who is responsible for procedural controls?




Have they received appropriate training?

How is security monitored, audited, and tested?




Do they have appropriate training, experience and/or
certification?

E.g. SSAE 16/SAS Type -2 Audits; FISMA Compliance; ISO
Certification

What security standards are referenced?

93

E.g. FISMA, ISO, HEISP/HDRSP/PCI
Managing Confidential Data

v. 24 (January IAP
Session 1)
What most security plans do not do

Law, policy, ethics
Research design …
Information security







Disclosure limitation
Protect against all insider threats
Protect against all unintentional threats (human
error, voluntary disclosure)
Protect against the NSA, CIA, TEMPEST, evil maids,
and other well-resourced, sophisticated adversaries
Protect against prolonged physical threats to
computer equipment, or data owner

94

Managing Confidential Data

v. 24 (January IAP
Session 1)
Information Security is Systemic

Law, policy, ethics
Research design …
Information security

Disclosure limitation
Not just control implementation but…
 Policy creation, maintenance, auditing
 Implementation review, auditing, logging, monitoring
 Regular vulnerability & threat assessment

95

Managing Confidential Data

v. 24 (January IAP
Session 1)
Simplified Approach for Sensitive Data

Law, policy, ethics
Research design …
Information security
Disclosure limitation









Systematically identify confidential information
Use whole-disk/media encryption to protect data at
rest
Use end-to-end encryption to protect data in motion
Use core information hygiene to protect systems
Scan for HRCI regularly
Be thorough in disposal of information

Very sensitive/extremely sensitive data requires
more protection.

96

Managing Confidential Data

v. 24 (January IAP
Session 1)
Recommendation:
Law, policy, ethics
Identifying Confidential Information Research design …
Information security



Does it contain social security numbers, financial Disclosure limitation
id‟s,
credit card #‟s, driver‟s license #‟s, other federal id‟s?




Is information completely anonymous?




 Not confidential

Does it contain private information from research on
people?




 Treat as confidential

 Treat as confidential

Does it contain private information about
named/identified students that does not appear in the
directory?

 Treat as confidential
 Would its release put individuals or organizations at significant
risk of criminal or civil liability, or be damaging to financial
standing, employability, or reputation?
  Treat as confidential
97
Managing Confidential Data
v. 24 (January IAP


Session 1)
Recommended:
Core information hygiene


Computer setup








Disclosure limitation

And keep it updated

Install all operating system and application security updates

Server Setup







Information security

Use a host-based firewall
Strong credentials”
Use a locking screen-saver
Lock default/open accounts
Regularly scan for sensitive information
Update your software regularly




Research design …

Use a virus checker




Law, policy, ethics

Password guessing restrictions
Idle session locking (or used on all client)
No password retrieval
Keep access logs

Behavior


Don‟t share accounts or passwords



Don‟t use administrative accounts all the time
Don‟t run programs from untrusted sources
Don‟t give out your password to anyone
Have a process for revoking user access when no longer
needed/authorized (e.g. if user leaves university)
Documented breach reporting procedure
Users should have appropriate confidentiality training








98

Managing Confidential Data

v. 24 (January IAP
Session 1)
Recommendation: Use
Strong Credentials









These accounts should use other brute-force limiting
techniques
Much longer for protecting encrypted files/filesystems
(16 characters)

Not be common words, names, or anything easily
guessed, No dates, no long sequences (>4) of
numbers
Not shared across accounts or people
Never stored unencrypted

Better



Lots of entropy, Hard for a person to guess, Easy to
remember
One method for choosing good passwords…




Use complete sentences phrases with punctuation
AND abbreviate
AND substitute characters – but not schematically

Disclosure limitation

Store passwords in a manner that can‟t be
retrieved
Store hashes
Force password change of automatically
generated passwords
Protect against password guessing
Brute-force guessing of a file of 8 charact
alphanumeric passwords takes less than $
dollars of compute time!
(12 character complex passwords >
$1,000,000) [Campbell 2009]
Protect encrypted password files
Monitor failed login attempts

Even better


Use multifactor identification where possible




AND careful thought to recovery mechanism, should
additional factor be lost

Use PASSWORD manager software





Information security

At least 8 characters long for login/account passwords Servers and Applications




Research design …

Strong Passwords, at minimum




Law, policy, ethics

Allows for long random passwords, escrow keys etc
Be very careful with the master

Key length



99

Private key encryption >= 256 bits
Public key >= 2048 bits

Managing Confidential Data

v. 23 (7/18/2013)
Recommended: Safe
storage

Law, policy, ethics
Research design …
Information security





Extremely sensitive
information/ must be
stored only in carefully
hardened designated
systems
Sensitiveinformation may
be stored on:




Managed File Services OR
Other approved server OR
On desktop/laptop/USB
if…



100

Disclosure limitation

Encrypted
Good information hygiene
Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Recommended: Safe use information Research design …
Information security
Disclosure limitation



Who can use Confidential Information?


Education records




MIT business related sensitive informationI








Sharing must be approved by appropriate head of lab,
etc.

Research confidential information




Can be shared with other staff & faculty who have a
specific need

Sharing must be approved by PI of research project.
PI must have data sharing policy approved by IRB.

It is your responsibility to notify the recipient of
their responsibility to protect confidentiality.

What to do?




101

Practice good information hygiene
Always Encrypt
Scan for vulnerabilities and sensitive information
Use Strong Passwords
Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Use encryption

Research design …
Information security





Encrypt HCI files if they are not stored
on an approved server
Encrypt whole disks for all laptops
and portable storage”
Encrypt network transmissions:







Disclosure limitation

Presenting your password
Transmitting sensitive files
Remotely accessing confidential
information
Do not send unencrypted files through
mail – even if they are “passworded”

Preferred tools


Remote access generally:





VPN
ssh

File transfer




102

Winzip (must enable encryption)
Secure FTP (Secure FX, Expandrive, etc.)
PGP Encrypted mail
Managing Confidential Data

v. 24 (January IAP
Session 1)
Safe disposal

Law, policy, ethics
Research design …
Information security



Shred Paper, tapes, cd‟s:





Use a cross-cut shredder
Place in Harvard-designated, locked, shredder
bin

Erase confidential files when they are no
longer needed









Disclosure limitation

Information can hide in files, even if not visible
Unless encrypted filesystem was used,
unerased information is vulnerable
Even where encrypted filesystem used,
vulnerabilities heightened if key length is short
Files can remain on disk, even if not visible
Use an approved secure file eraser such as
PGP SHRED

Ask IT support when transferring, donating, or
disposing of computers or portable storage


Information can hide in obscure places on an a
computer.



Files on SSD may not be erased until later…
Information on standard disks may be preserved in
“bad blocks”, etc.

IT Support has special equipment to securely
and complete erase disks.

103 Computers can then be safely reused.
Managing Confidential Data


v. 24 (January IAP
Session 1)
Safe Sharing
One size doesn’t fit all…

Law, policy, ethics
Research design …
Information security
Disclosure limitation

External dissemination
Approaches:
 De-identify + publish
 Partial de-identify + DUA
 Deposit in “virtual enclave”
Trade-offs
 Decreasing breadth of
dissemination
 Decreasing risk, increasing utility
 Increasing continuing
104 maintenance costs
Managing Confidential Data

v. 24 (January IAP
Session 1)
Safe Sharing
Collaboration
 Institutionally Hosted Trusted Remote Computing
Environment








Information security
Disclosure limitation

Example: RCE
Limitations: Multi-institutional Collaboration
Example: AWS for Gov – FISMA Container
Limitations: IT deployment and ongoing maintenance costs;
cloud services due diligence for certification, contract, breach
response, etc

Non-encrypted transmission + local encryption layer







Research design …

DIY Trusted Remote Environment




Law, policy, ethics

Encrypted Files
{Winzip, PGP file, XXX} + {DropBox, Box, Google-Drive, E-mail}
Encrypted Filesystem:
{Box, DropBox, XXX} + {FileVault,TrueCrypt}
Encrypted P2P : Bittorrent Sync
Limitations: Key distribution; access revocation; key sharing,
simultaneous updating (except for P2P)

Zero-Knowledge Magic


105

Spider-Oak Hive; BoxCryptor; Sync.Com
Limitations: enterprise key management features; lack of24 (January IAP
Managing Confidential Data
v.
transparency/auditing/certification
Session 1)
Law, policy, ethics

Plan Outline – Very Sensitive Data

Research design …
Information security



Protect very sensitive data on “target systems”


Extra physical, logical, administrative access control








Private network, not directly connected to public network

Regular scans





Record keeping
Limitations
Lockouts

Extra monitoring, auditing
Extra procedural controls – specific, renewed approvals
Limits on network connectivity




Disclosure limitation

Vulnerability scans
Scans for PII

Extremely sensitive



106

Increased access control, procedural limitations
Not physically/logically connected (even via wireless) to public
network, directly or indirectly
Managing Confidential Data

v. 24 (January IAP
Session 1)
Key Concepts Review

Law, policy, ethics
Research design …
Information security
Disclosure limitation













Confidentiality
Integrity
Availability
Threat modeling
Vulnerability assessment
Risk assessment
Defense in depth
Logical Controls
Physical Controls
Administrative Controls

107

Managing Confidential Data

v. 24 (January IAP
Session 1)
Checklist: Identify Requirements

Law, policy, ethics
Research design …
Information security



Documented information security plan?














Extra logical, administrative, physical controls for very sensitive data?
Monitoring and vulnerability scanning for very sensitive data?
Check requirements for remote and foreign data collection

Refer to security standards







Use whole-disk/media encryption to protect data at rest
Use end-to-end encryption to protect data in motion
Use basic information hygiene to protect systems
Be thorough in disposal of information
Use strong credentials

Additional protections for sensitive data




What are goals for confidentiality, integrity, availability?
What threats are envisioned?
What are the broad types of controls in place?

Key protections




Disclosure limitation

FIPS encryption
FISMA / ISO practices
SAEE -16/SAS-70 Auditing
CISSP certification of key staff

Delegate implementation to information security professionals

108

Managing Confidential Data

v. 24 (January IAP
Session 1)
Preliminary Recommendations
Information Security



Use FISMA as a reference/dictionary of possible baseline controls
Document:







Delegate implementation to IT professionals
Refer to standards




Protection goals
Threat models
Types of controls

Gold standards: FISMA / ISO practices,

SAS-70 Auditing, CISSP certification of key staff

Strongly recommended controls




Use whole-disk/media encryption to protect data at rest
Use end-to-end encryption to protect data in motion
Use core information hygiene to protect systems



Use a virus checker, and keep it updated
Use a host-based firewall
Update your software regularly
Install all operating system and application security updates



Don‟t share accounts or passwords



Don‟t use administrative accounts all the time
Don‟t run programs from untrusted sources
Don‟t give out your password to anyone











Scan for sensitive informaton regularly
Be thorough in disposal of information



109

Use secure file erase tools when disposing of files
Use secure disk erase tools when disposing/repurposing disks

Managing Confidential Data

v. 24 (January IAP
Session 1)
Preliminary Recommendations
Very Sensitive/Extremely Sensitive Information security


Protect very sensitive data on “target systems”


Extra physical, logical, administrative access control








Extra monitoring, auditing
Extra procedural controls – specific, renewed approvals
Limits on network connectivity






Private network, not directly connected to public network

Consider multi-factor identification

Regular scans





Record keeping
Limitations
Lockouts

Vulnerability scans
Scans for PII

Extremely sensitive



110

Increased access control, procedural limitations
Not physically/logically connected (even via wireless) to public
network, directly or indirectly
Managing Confidential Data

v. 24 (January IAP
Session 1)
Resources

Law, policy, ethics
Research design …
Information security
Disclosure limitation

S. Garfinkel, et al. 2003, Practical Unix and Internet
Security, 3rd ed. , O‟Reilly Media
 Shon Harris, 2001, CISSP All-in-One Exam Guide,
Osborne
 NIST, 2009, DRAFT Guide to Protecting the
Confidentiality of Personally Identifiable Information,
Nist Publication 800-122.
 NIST, 2009, Recommended Security Controls for
Federal Information Systems and Organizations v. 3,
NIST 800-53.
(Also see related NIST 800-53A, and other NIST
Computer Security Division Special Publications)
[csrc.nist.gov/publications/PubsSPs.html]
 NIST, 2006, Information Security Handbook: A Guide
111
Managing Confidential 800-100.
v. 24 (January IAP
for Managers, NIST PublicationData
Session 1)

Recommended Software

Law, policy, ethics
Research design …
Information security















Vulnerability scanner/assessment tool: www.nessus.org/nessus
Commercial version scans for (limited) PII: www.nessus.org/nessus
PII Scanning tool (open source), Cornell Spider: www2.cit.cornell.edu/security/tools
PII Scanning tool (commercial), Identity Finder: www.identityfinder.com
File integrity/intrusion detection engine, Samhain: la-samhna.de/samhain
Network intrusion detection, Snort: www.snort.org

Encrypt transmission over network






Open Source: truecrypt.org
Commercial: pgp.com

Scanning




Disclosure limitation

Whole Disk Encryption

Open SSL: http://openssl.org
Open SSH: http://openssh.org
VTUN: http://vtun.sourceforge.net

Cloud backup services with encryption




112

Crashplan: http://crashplan.com
Spider oak: http://spideroak.com
Backblaze: http://backblaze.com

Managing Confidential Data

v. 24 (January IAP
Session 1)
Disclosure Limitation

Law, policy, ethics
Research design …
Information security










113

Disclosure
limitation

Threat models
Disclosure limitation methods
Statistical disclosure limitation
methods
Types of disclosure
Factors affecting disclosure protection
SDL Caveats
SDL Observations

Managing Confidential Data

v. 24 (January IAP
Session 1)
Law, policy, ethics

Threat Models

Research design …
Information security







Nosy neighbor (nosy employer)
Muck-raking Journalist (zero-tolerance)
Business rival contributing to same survey
Absent-minded professor
…

114

Managing Confidential Data

Disclosure
limitation

v. 24 (January IAP
Session 1)
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data
Managing confidential data

Weitere ähnliche Inhalte

Was ist angesagt?

cyber security presentation.pptx
cyber security presentation.pptxcyber security presentation.pptx
cyber security presentation.pptxkishore golla
 
Introduction to Data Protection and Information Security
Introduction to Data Protection and Information SecurityIntroduction to Data Protection and Information Security
Introduction to Data Protection and Information SecurityJisc Scotland
 
“Privacy Today” Slide Presentation
“Privacy Today” Slide Presentation “Privacy Today” Slide Presentation
“Privacy Today” Slide Presentation tomasztopa
 
what is data security full ppt
what is data security full pptwhat is data security full ppt
what is data security full pptShahbaz Khan
 
Data Protection (Download for slideshow)
Data Protection (Download for slideshow)Data Protection (Download for slideshow)
Data Protection (Download for slideshow)Andrew Sharpe
 
Privacy & Data Protection
Privacy & Data ProtectionPrivacy & Data Protection
Privacy & Data Protectionsp_krishna
 
Privacy and personal information
Privacy and personal informationPrivacy and personal information
Privacy and personal informationUc Man
 
The CIA triad.pptx
The CIA triad.pptxThe CIA triad.pptx
The CIA triad.pptxGulnurAzat
 
Information Management Life Cycle
Information Management Life CycleInformation Management Life Cycle
Information Management Life CycleCollabor8now Ltd
 
Employee Security Awareness Training
Employee Security Awareness TrainingEmployee Security Awareness Training
Employee Security Awareness TrainingDenis kisina
 
GDPR Basics - General Data Protection Regulation
GDPR Basics - General Data Protection RegulationGDPR Basics - General Data Protection Regulation
GDPR Basics - General Data Protection RegulationVicky Dallas
 
02 Legal, Ethical, and Professional Issues in Information Security
02 Legal, Ethical, and Professional Issues in Information Security02 Legal, Ethical, and Professional Issues in Information Security
02 Legal, Ethical, and Professional Issues in Information Securitysappingtonkr
 

Was ist angesagt? (20)

cyber security presentation.pptx
cyber security presentation.pptxcyber security presentation.pptx
cyber security presentation.pptx
 
Introduction to Data Protection and Information Security
Introduction to Data Protection and Information SecurityIntroduction to Data Protection and Information Security
Introduction to Data Protection and Information Security
 
“Privacy Today” Slide Presentation
“Privacy Today” Slide Presentation “Privacy Today” Slide Presentation
“Privacy Today” Slide Presentation
 
what is data security full ppt
what is data security full pptwhat is data security full ppt
what is data security full ppt
 
Data Protection (Download for slideshow)
Data Protection (Download for slideshow)Data Protection (Download for slideshow)
Data Protection (Download for slideshow)
 
Introduction to security
Introduction to securityIntroduction to security
Introduction to security
 
Data Protection Presentation
Data Protection PresentationData Protection Presentation
Data Protection Presentation
 
Privacy & Data Protection
Privacy & Data ProtectionPrivacy & Data Protection
Privacy & Data Protection
 
Privacy and personal information
Privacy and personal informationPrivacy and personal information
Privacy and personal information
 
Overview on data privacy
Overview on data privacy Overview on data privacy
Overview on data privacy
 
The CIA triad.pptx
The CIA triad.pptxThe CIA triad.pptx
The CIA triad.pptx
 
Information Management Life Cycle
Information Management Life CycleInformation Management Life Cycle
Information Management Life Cycle
 
Web Security
Web SecurityWeb Security
Web Security
 
Insider threat
Insider threatInsider threat
Insider threat
 
Employee Security Awareness Training
Employee Security Awareness TrainingEmployee Security Awareness Training
Employee Security Awareness Training
 
GDPR Basics - General Data Protection Regulation
GDPR Basics - General Data Protection RegulationGDPR Basics - General Data Protection Regulation
GDPR Basics - General Data Protection Regulation
 
02 Legal, Ethical, and Professional Issues in Information Security
02 Legal, Ethical, and Professional Issues in Information Security02 Legal, Ethical, and Professional Issues in Information Security
02 Legal, Ethical, and Professional Issues in Information Security
 
Cybersecurity
CybersecurityCybersecurity
Cybersecurity
 
Gdpr for dummies
Gdpr for dummiesGdpr for dummies
Gdpr for dummies
 
Information Governance
Information GovernanceInformation Governance
Information Governance
 

Andere mochten auch

July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
 
Top 10 personnel specialist interview questions and answers
Top 10 personnel specialist interview questions and answersTop 10 personnel specialist interview questions and answers
Top 10 personnel specialist interview questions and answersjomdare
 
Project management roles in context - PRINCE2 introduction
Project management roles in context - PRINCE2 introductionProject management roles in context - PRINCE2 introduction
Project management roles in context - PRINCE2 introductionNorthumbria University
 
How to find the Right Mobile and Cloud Application Development Company for Yo...
How to find the Right Mobile and Cloud Application Development Company for Yo...How to find the Right Mobile and Cloud Application Development Company for Yo...
How to find the Right Mobile and Cloud Application Development Company for Yo...MageCloud
 
Legal Issues In Human Resources
Legal Issues In Human ResourcesLegal Issues In Human Resources
Legal Issues In Human ResourcesHeather Reynolds
 
Human Resource Management : Constitutional and Legal Framework
Human Resource Management : Constitutional and Legal FrameworkHuman Resource Management : Constitutional and Legal Framework
Human Resource Management : Constitutional and Legal FrameworkJohn Edward Estayo
 
ITIL foundations - Complete introduction to ITIL phases, lifecycle and processes
ITIL foundations - Complete introduction to ITIL phases, lifecycle and processesITIL foundations - Complete introduction to ITIL phases, lifecycle and processes
ITIL foundations - Complete introduction to ITIL phases, lifecycle and processesRichard Grieman
 
Agile and ITIL Continuous Delivery
Agile and ITIL Continuous DeliveryAgile and ITIL Continuous Delivery
Agile and ITIL Continuous DeliveryMartin Jackson
 
PRINCE2 Foundation Training Manual by Frank Turley
PRINCE2 Foundation Training Manual by Frank TurleyPRINCE2 Foundation Training Manual by Frank Turley
PRINCE2 Foundation Training Manual by Frank TurleyFrank Turley
 
Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesAlan McSweeney
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery PresentationTimSchaefer
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IThhuihhui
 
Data security in cloud computing
Data security in cloud computingData security in cloud computing
Data security in cloud computingPrince Chandu
 
Change Management PPT Slides
Change Management PPT SlidesChange Management PPT Slides
Change Management PPT SlidesYodhia Antariksa
 

Andere mochten auch (17)

July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
 
Top 10 personnel specialist interview questions and answers
Top 10 personnel specialist interview questions and answersTop 10 personnel specialist interview questions and answers
Top 10 personnel specialist interview questions and answers
 
Project management roles in context - PRINCE2 introduction
Project management roles in context - PRINCE2 introductionProject management roles in context - PRINCE2 introduction
Project management roles in context - PRINCE2 introduction
 
How to find the Right Mobile and Cloud Application Development Company for Yo...
How to find the Right Mobile and Cloud Application Development Company for Yo...How to find the Right Mobile and Cloud Application Development Company for Yo...
How to find the Right Mobile and Cloud Application Development Company for Yo...
 
Legal Issues In Human Resources
Legal Issues In Human ResourcesLegal Issues In Human Resources
Legal Issues In Human Resources
 
AXELOS - PRINCE2® Foundation
AXELOS - PRINCE2® FoundationAXELOS - PRINCE2® Foundation
AXELOS - PRINCE2® Foundation
 
Human Resource Management : Constitutional and Legal Framework
Human Resource Management : Constitutional and Legal FrameworkHuman Resource Management : Constitutional and Legal Framework
Human Resource Management : Constitutional and Legal Framework
 
ITIL foundations - Complete introduction to ITIL phases, lifecycle and processes
ITIL foundations - Complete introduction to ITIL phases, lifecycle and processesITIL foundations - Complete introduction to ITIL phases, lifecycle and processes
ITIL foundations - Complete introduction to ITIL phases, lifecycle and processes
 
Agile and ITIL Continuous Delivery
Agile and ITIL Continuous DeliveryAgile and ITIL Continuous Delivery
Agile and ITIL Continuous Delivery
 
PRINCE2 Foundation Training Manual by Frank Turley
PRINCE2 Foundation Training Manual by Frank TurleyPRINCE2 Foundation Training Manual by Frank Turley
PRINCE2 Foundation Training Manual by Frank Turley
 
Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery Notes
 
Cloud Computing Security Issues
Cloud Computing Security Issues Cloud Computing Security Issues
Cloud Computing Security Issues
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery Presentation
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IT
 
Prince2 Methodology
Prince2 MethodologyPrince2 Methodology
Prince2 Methodology
 
Data security in cloud computing
Data security in cloud computingData security in cloud computing
Data security in cloud computing
 
Change Management PPT Slides
Change Management PPT SlidesChange Management PPT Slides
Change Management PPT Slides
 

Ähnlich wie Managing confidential data

June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacyMicah Altman
 
Managing Confidential Information in Research
Managing Confidential Information in ResearchManaging Confidential Information in Research
Managing Confidential Information in ResearchMicah Altman
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingMicah Altman
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesMicah Altman
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics DataMicah Altman
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523ORCID, Inc
 
You down with dmp yeah you know me!
You down with dmp  yeah you know me!You down with dmp  yeah you know me!
You down with dmp yeah you know me!Renaine Julian
 
LIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & CurationLIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & CurationDr. Starr Hoffman
 
Frances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentationFrances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentationARLGSW
 
Personal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t controlPersonal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t controlFrances Ryan
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 
Building and providing data management services a framework for everyone!
Building and providing data management services  a framework for everyone!Building and providing data management services  a framework for everyone!
Building and providing data management services a framework for everyone!Renaine Julian
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data CitationMicah Altman
 
Overcoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjectsOvercoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjectsRobin Rice
 

Ähnlich wie Managing confidential data (20)

June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacy
 
Managing Confidential Information in Research
Managing Confidential Information in ResearchManaging Confidential Information in Research
Managing Confidential Information in Research
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy Framing
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use Cases
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics Data
 
Niso library law
Niso library lawNiso library law
Niso library law
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
 
You down with dmp yeah you know me!
You down with dmp  yeah you know me!You down with dmp  yeah you know me!
You down with dmp yeah you know me!
 
Levine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal ConsiderationsLevine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal Considerations
 
LIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & CurationLIS 653, Session 11: Data Management & Curation
LIS 653, Session 11: Data Management & Curation
 
Frances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentationFrances Ryan DARTS5 presentation
Frances Ryan DARTS5 presentation
 
Personal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t controlPersonal online reputations: Managing what you can’t control
Personal online reputations: Managing what you can’t control
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Building and providing data management services a framework for everyone!
Building and providing data management services  a framework for everyone!Building and providing data management services  a framework for everyone!
Building and providing data management services a framework for everyone!
 
Data management plans
Data management plansData management plans
Data management plans
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Overcoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjectsOvercoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjects
 

Mehr von Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

Mehr von Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Kürzlich hochgeladen

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Kürzlich hochgeladen (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Managing confidential data

  • 1. Managing Confidential Data Micah Altman Director of Research MIT Libraries
  • 2. DISCLAIMER These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It‟s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. 2 Managing Confidential Data v. 24 (January IAP Session 1)
  • 3. Collaborators & Co-Conspirators Privacy Tools for Sharing Research Data Team (Salil Vadhan, P.I.) http://privacytools.seas.harvard.edu/people Research Support   Supported in part by NSF grant CNS-1237235 3 Managing Confidential Data v. 24 (January IAP Session 1)
  • 4. Related Work Main Project:  Privacy Tools for Sharing Research Data http://privacytools.seas.harvard.edu/ Related publications:  Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D., Laevart, C., et al. (2011). Communicating Science and Engineering Data in the Information Age. Computer Science and Telecommunications. National Academies Press  Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections”. Available from: http://dataprivacylab.org/projects/irb/Vadhan.pdf Slides and reprints available from: informatics.mit.edu 4 Managing Confidential Data v. 24 (January IAP Session 1)
  • 5. Confidential data matters in research Researchers often collect or use information from people – much of which is not clearly public Professional and ethical responsibility to manage subject privacy and confidentiality appropriately Possibility of huge administrative penalties    5 2010 Managing Confidential Data v. 24 (January IAP Session 1)
  • 6. Why protect confidential information?  It‟s your money –    It‟s your job –        Data management/sharing plans strengthen grant proposals Data management/sharing plans can clarify research design Data management/sharing plans can strengthen research program Confidentiality can improve subject recruitment, responses It‟s the law -    Institutional reputation Loss of external funding It‟s your research -  Protection is individual responsibility Violations can lead to sanctions It‟s your institution -  8.3 Million identify theft victims/year Lose a week of your life and $$ Civil penalties Criminal penalties Administrative penalties It‟s the right thing to do -  Professional behavior Obligations to research subjects 6 Managing Confidential Data v. 24 (January IAP Session 1)
  • 7. How Identifiable Are You? Try this: http://aboutmyinfo.org 7 Managing Confidential Data v. 24 (January IAP Session 1)
  • 8. Personally identifiable private information is surprisingly common Includes information from a variety of sources, such as…     Research data, even if you aren‟t the original collector Student “records” such as e-mail, grades Logs from web-servers, other systems Lots of things are potentially identifying:  Under some federal laws: IP addresses, dates, zipcodes, …  Birth date + zipcode + gender uniquely identify ~87% of people in the U.S. [Sweeney 2002] Try it: http://aboutmyinfo.org/index.html  With date and place of birth, can guess first five digits of social security number (SSN) > 60% of the time. (Can guess the whole thing in under 10 tries, for a significant minority of people.) [Aquisti & Gross 2009]  Analysis of writing style or eclectic tastes has been used to identify individuals  Brownstein, et al., 2006 , NEJM 355(16), Tables, graphs and maps can also reveal identifiable information  8 Managing Confidential Data v. 24 (January IAP Session 1)
  • 9. Traditional approaches are failing Modal traditional approach:      removing subjects‟ names storing descriptive information in a locked filing cabinet publishing summary tables (sometimes) release a public use version that suppressed and recoded descriptive information Problems      9 law is changing – requirements are becoming more complex research computing is moving towards the cloud, other distributed storage researchers are using new forms of data that create new privacy issues advances in the formal analysis of disclosure risk imply the impracticality of “de-identification” as required by law Managing Confidential Data v. 24 (January IAP Session 1)
  • 10. The MIT libraries provide support for all researchers at MIT:     Research consulting, including: bibliographic information management; literature searches; subject-specific consultation Data management, including: data management plan consulting; data archiving; metadata creation Data acquisition and analysis, including: database licensing; statistical software training; GIS consulting, analysis & data collection Scholarly publishing: open access publication & licensing 10 libraries.mit.edu Managing Confidential Data v. 24 (January IAP Session 1)
  • 11. Goals for course      Overview of key areas Identify key concepts & issues Summarize MIT policies, procedures, resources Establish framework for action Provide connection to resources, literature 11 Managing Confidential Data v. 24 (January IAP Session 1)
  • 12. Outline  [Preliminaries]  Law, policy, ethics Research methods, design, management Information Security (Storage, Transmission, Use) Disclosure Limitation     [Additional Resources & Summary of Recommendations] 12 Managing Confidential Data v. 24 (January IAP Session 1)
  • 13. Steps to Manage Confidential Research Data  Identify potentially sensitive information in planning        Reduce sensitivity of collected data in design Separate sensitive information in collection Encrypt sensitive information in transit Desensitize information in processing    Removing names and other direct identifiers Suppressing, aggregating, or perturbing indirect identifiers Protect sensitive information in systems    Identify legal requirements, institutional requirements, data use agreements Consider obtaining a certificate of confidentiality Plan for IRB review Use systems that are controlled, securely configured, and audited Ensure people are authenticated, authorized, licensed Review sensitive information before dissemination       13 Review disclosure risk Apply non-statistical disclosure limitation Apply statistical disclosure limitation Review past releases and publically available data Check for changes in the law Require a use agreement Managing Confidential Data v. 24 (January IAP Session 1)
  • 14. Handling Confidential Information @ MIT  1. Identify confidential information  2. You are responsible for confidential information you store, access, or share  Obtaining appropriate approval to access information   Approval from PI for access to research data   Approval from IRB to use or collect private identified information Approval from to sign external data use agreements on behalf of university 3. Safely dispose of confidential information    When you change computers – contact IT for cleanup Shred the rest 4. If unsure, seek help or approval  Data-management planning: libraries.mit.edu/guides/subjects/datamanagement/  Research approvals: web.mit.edu/committees/couhes/  General policies: web.mit.edu/policies/ 14 Managing Confidential Data v. 24 (January IAP Session 1)
  • 15. Data Management Consulting: Libraries @MIT  The libraries can help:     Assist with data management plans Individual consultation/collaboration with researchers General workshops & guides Dissemination of public data through   DSpace@MIT Referrals to subject-based repositories For more information: libraries.mit.edu/guides/subjects/data-management/ <data-management@mit.edu> 15 Managing Confidential Data v. 24 (January IAP Session 1)
  • 16. Law, Policy & Ethics Law, policy, ethics Research design … Information security     16 Ethical Obligations Laws Fun and games  [Summary] Managing Confidential Data Disclosure limitation v. 24 (January IAP Session 1)
  • 17. Confidentiality & Research Ethics Law, policy, ethics Research design … Information security  Belmont Principles  Respect for Persons      individuals should be treated as autonomous agents persons with diminished autonomy are entitled to protection implies “informed consent” implies respect for confidentiality and privacy Beneficence   17 Disclosure limitation research must have individual and/or societal benefit to justify risks implies minimizing risk/benefit ratio Managing Confidential Data v. 24 (January IAP Session 1)
  • 18. Scientific & Societal Benefits of Data Sharing Law, policy, ethics Research design … Information security  Increases replicability of research   Increases scientific impact of research     Funder policies may apply Public interest in data that supports public policy   Follow up studies Extensions Citations Public interest in data produced by public funder   Journal publication policies may apply Disclosure limitation FOIA and state FOI laws may apply Open data facilitates…        Transparent government Scientific collaboration Scientific verification New forms of science Participation in science Hands-on education Continuity of research Sources: Fienberg et. al 1985; ICSU 2004; Nature 2009 18 Managing Confidential Data v. 24 (January IAP Session 1)
  • 19. Sources of Confidentiality Restrictions for University Research Data Law, policy, ethics Research design … Information security    Overlapping laws Different laws apply to different cases All affiliates subject to university policy Disclosure limitation (Not included: EU directive, foreign laws, classified data, …) 19 Managing Confidential Data v. 24 (January IAP Session 1)
  • 20. Legal Constraints Intellectual Property Contract Trade Secret Contract Intellectua l Attribution Moral Rights Patent Click-Wrap TOU License Database Rights Journal Replication Requirements FOIA State FOI Laws Access Rights Funder Open Access Fair Use Copyrigh t HIPA A FERP A Trademark DMCA Common Rule 45 CFR 26 EU Privacy Directive CIPSE A State Privacy Laws Classifie d Sensitive but Unclassified Rights of Publicity Privacy Torts (Invasion, Defamation) Potentially Harmful (Archeologica l Sites, Endangered Export Species, Restriction Animal s Testing, …) EA R ITA Confidentiality
  • 21. 45 CFR 46 [Overview] Law, policy, ethics Research design … Information security “The Common Rule”  Governs human subject research      Disclosure limitation With federal funds/ at federal institution Establishes rules for conduct of research Establishes confidentiality and consent requirement for for identified private data However, some information may be required to be disclosed under state and federal laws (e.g. in cases of child abuse) Delegates procedural decisions to Institutional Review Boards (IRB‟s) 21 Managing Confidential Data v. 24 (January IAP Session 1)
  • 22. Law, policy, ethics HIPAA [Overview] Research design … Information security Health Insurance Portability and Accountability Act Disclosure limitation  Protects personal health care information for „covered entities‟  Detailed technical protection requirements  Provides clearest legal standards for dissemination  Provides a „safe harbor‟  Has become an accepted practice for dissemination in other areas where laws are less clear  HITECH Act of 2009 extends HIPAA    Extends coverage to associated entities of covered entities Additional technical safeguards Adds breach reporting requirement HIPAA provides three dissemination options … 22 Managing Confidential Data v. 24 (January IAP Session 1)
  • 23. Dissemination under HIPAA [option 1]  “safe harbor” -- remove 18 identifiers    Information security Disclosure limitation Names Social Security #‟s; Personal Account #‟s; Certificate/License #‟s; full face photos (and comparable images); biometric id‟s; medical Any other unique identifying number, characteristic, or code [Asset identifiers]     Research design … [Personal identifiers]   Law, policy, ethics fax #‟s; phone #‟s; vehicle #‟s; personal URL‟s; IP addresses; e-mail addresses Device ID‟s and serial numbers [Quasi identifiers]   dates smaller than a year (and ages > 89 collapsed into one category) geographic subdivisions smaller than a state (except for 3 digits of zipcode, if unit > 20,000 people) And  23 Entity does not have actual knowledge [direct and clear awareness] that it would be possible to use the remaining information alone or in combination with other information to identify the subject Managing Confidential Data v. 24 (January IAP Session 1)
  • 24. Dissemination under HIPAA [Option 2]  Law, policy, ethics Research design … Information security “limited dataset” – leave some quasi-id‟s    Disclosure limitation Remove personal and asset identifiers Permitted dates: dates of birth, death, service, years Permitted geographic subdivisions: town, city, state, zip code And  24 Require access control and data use agreement. Managing Confidential Data v. 24 (January IAP Session 1)
  • 25. Dissemination under HIPAA [Option 3]   Law, policy, ethics Research design … Information security “qualified statistician” – statistical determination Disclosure limitation Have qualified statistician determine, using generally accepted statistical and scientific principles and methods, that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by the anticipated recipient to identify the subject of the information. Important caveats    25 Methods and results of the analysis must be documented No bright line for “qualified”, text of rule is: “a person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable.” [Section 164.514(b)(1)] No clear definitions for “generally accepted” or “very small” or “reasonably available information”… however, there are references in the federal register to statistical publications to be used as “starting points” Managing Confidential Data v. 24 (January IAP Session 1)
  • 26. Law, policy, ethics FERPA Research design … Information security Family Educational Rights and Privacy Act    Applies schools that receive federal (D.O.E.) funding Restricts use of student (not employee) information Establishes      Right to privacy of educational records Right to inspect and correct records (with appeal to Federal government) Definition of public “directory” information Right to block access to public “directory” information, and to other records Educational records include:    Identified information about student Maintained by institution Not …     Disclosure limitation Employee records Some medical and law-enforcement records Records solely in the possession and for use by the creator (e.g. unpublished instructor notes) Personally identifiable information includes:     26 Direct identifiers Indirect (quasi) identifiers Indirectly linkable identifiers “Information requested by a person who the educational agency or institution reasonably believes knows the identity of the student to whom the education record relates.” Managing Confidential Data v. 24 (January IAP Session 1)
  • 27. Law, policy, ethics MA 201 CMR 17 Research design … Information security Disclosure Standards for the Protection of Personal Information limitation  Strongest U.S. general privacy protection law  Has been delayed/modified repeatedly  Requires reporting of breaches    If data is not encrypted Or encryption key is released in conjunction with data Requires specific technical protections:     27 Firewalls Encryption of data transmitted in public Anti-virus software Software updates Managing Confidential Data v. 24 (January IAP Session 1)
  • 28. Inconsistencies in Requirements and Definitions Inconsistent definitions of “personally identifiable” Inconsistent definitions of sensitive information Requirements for to de-identify jibes with statistical realities Law, policy, ethics Research design … Information security Disclosure limitation FERPA HIPAA Common Rule Coverage Students in Educational Institutions Medical Information in “Covered Entities” Living persons in Mass. Residents research by funded institutions Identification Criteria -Direct -Indirect -Linked -Bad intent (!) -Direct -Indirect -Linked -Direct -Indirect -Linked -Direct Sensitivity Criteria Any non-directory information Any medical information Private information – based on harm Financial, State, Federal Identifiers Management - Directory opt-out - Consent - Consent Requirement - [Implied] good - Specific technical - [Implied] risk s 28 practice safeguards minimization Managing Confidential Data - Breach MA 201 CMR 17 - Specific technical safeguards v. 24 (January IAP - Breach 1) Session
  • 29. Third Party Requirements Law, policy, ethics Research design … Information security    Licensing requirements Intellectual property requirements Federal/state law and/or policy requirements     State protection of personal information laws Freedom of information laws (FOIA & State FOI) State mandatory abuse/neglect notification laws And … think ahead to publisher requirements    Disclosure limitation Replication requirements IP requirements Examples     29 NSF requires data from funded research be shared NIH requires a data sharing plan for large projects Wellcome Trust requires a data sharing plan Many leading journals require data sharing Managing Confidential Data v. 24 (January IAP Session 1)
  • 30. Law, policy, ethics US (some other parts of the) World Research design … Information security Disclosure limitation      Sectoral approach Designed to limit government access Limited core principles: “reasonable” expectation of privacy; “unreasonable” search and seizure; intrusion onto seclusion; right of publicity Varies by state Enforcement uneven 30      Omnibus approach Designed to limit commercial access Broad core principles: privacy is a fundamental human right; not all rights waivable, even by contract Varies by country Enforcement uneven Managing Confidential Data v. 24 (January IAP Session 1)
  • 31. (Some) More Laws & Standards     California Laws  Lots of rules   Applies any data about California residents  Privacy policy  Disclosure  Reporting policy EU Directive 95/46/EC  Data protection directive  Provides for notice, limits on purpose of use, consent, security, disclosure, access, accountability  Forbids transfer of data to entities in countries  compliant with directive  Extended by EU Cookie Directive EU 2009/136, requiring notice on use of web cookies  U.S. is not compliant but …  Organizations can certify compliance with FTC  No auditing/enforcement !  Substantial criticism of this arrangement Payment Card Industry (PCI) Security Standards  Governs treatment of credit card numbers  Requires reports, audits, fines  Detailed technical measures  Not a law, but helps define good practice   Nevada law mandates PCI standards  Law, policy, ethics Research design … Information security Detailed technical controls over information Disclosure systems limitation Sarbanes-Oxley (aka, SOX, aka SARBOX) Corporate and Auditing Accountability and Responsibility Act of 2002  Applies to U.S. public company boards, management and public accounting firms  Rarely applies to research in universities  Section 404 requires annual assessment of organizational internal controls – but does not specify details of controls Classified Data  Separate and complex rules and requirements  The University does not accept classified data  But, may have “Controlled But Unclassified”  Vaguely defined area  Mostly government produced area  Penalties unclear  And… export controlled information, under ITAR and EAR  Export control include technologies, software, documentation/design documents may be included  Large penalties … and over 1100 International Human Subjects laws…  FISMA  Federal Information Security Management Act (FISMA), Public Law (P.L.) 107-347.  31 Is starting to be applied to NIH sponsored Confidential Data Managing research v. 24 (January IAP Session 1)
  • 32. Possible Legal/Regulatory Changes for 2013-16 Law, policy, ethics Research design … Information security  2013     Thanks to the NSA. Everyone knows what “metadata” means California – right to be forgotten, from social media, for teenagers OECD Updated Principles for transborder privacy Likely     Disclosure limitation New information privacy laws in selected states Increased open data requirements from federal funders Adoption of data availability requirements by increasing numbers of journals Possible    Anti “revenge-porn” laws; anti mug-shot blackmail laws Right to be forgotten progress in EU Adoption of data availability requirements by more journals, subject to least restrictive terms   Changes to common rule to deal explicitly with informational risks  32 See: http://openeconomics.net/principles/ See: http://www.nap.edu/catalog.php?record_id=18614 Managing Confidential Data v. 24 (January IAP Session 1)
  • 33. Researcher Responsibilities Law, policy, ethics Research design … Information security        … for knowing the rules Disclosure limitation … for identifying potentially confidential information in all forms (digital/analogue; on-line/off-line) … for notifying recipients of their responsibility to protect confidentiality … for obtaining IRB approval for any human subjects research … for obtaining proper training … for following an IRB approved plan … for obtaining approval of restricted data use agreements with providers, even if no money involved … and for proper     Storage Access Transmission Disposal Confidentiality is not an “IT problem” 33 Managing Confidential Data v. 24 (January IAP Session 1)
  • 34. Law, policy, ethics Key Concepts & Issues Review Research design … Information security  Privacy   Confidentiality   Information that would cause harm if disclosed and linked to an individual Personally/individually identifiable information    Treatment of private, sensitive information Sensitive information   Disclosure limitation Control over extent and circumstances of sharing Private information Directly or indirectly linked to an identifiable individual Human subjects A living person …  who is interacted with to obtain research data  who‟s private identifiable information is included in research data  Research    “Common Rule”   Law governing funded human subjects research HIPAA   Systematic investigation Designed to develop or contribute to generalizable knowledge Law governing use of personal health information in covered and associated entities MA 201 CMR 17  34 Law governing use of certain personal identifiers for Massachusetts residents Managing Confidential Data v. 24 (January IAP Session 1)
  • 35. Checklist: Identify Requirements Law, policy, ethics Research design … Information security Check if research includes …  Interaction with humans  Common Rule applies Disclosure limitation Check if data used includes identified …  Student records  FERPA applies  State, federal, financial id‟s  state law applies  Medical/health information  HIPAA (likely) applies  Human subjects & private info  Common Rule applies Understand what comprises a breach, typically:  Release of unencrypted information  To unauthorized entity  Containing prohibited identifiers OR sufficient information to link a record to an identifiable person Check for other requirements/restriction on data dissemination:  Data provider restrictions and University approvals thereof  Open data requirements and norms  University information policy 35 Managing Confidential Data v. 24 (January IAP Session 1)
  • 36. Preliminary Recommendation: Choose the Lesser of Three Evils    (1) Use only information that has already been made public, is entirely innocuous, or has been declared legally deidentified; or (2) Obtain informed consent from research subjects, at the time of data collection, that includes acceptance of the potential risks of disclosure of personally identifiable information; or (3) Pay close attention to the technical requirements imposed by law:   Apply appropriate IT security controls Apply disclosure analysis limitation for publicly released data     36 Consider using suppression and recoding to achieve k-anonymity with ldiversity on data before releasing it or generating detailed figures, maps, or summary tables. For interactive disclosure systems, consider differentially private algorithms Supplement data sharing with data-use agreements. Consider providing for access to full data through a virtual data enclave/restricted environment Managing Confidential Data v. 24 (January IAP Session 1)
  • 37. Law, policy, ethics Resources   Research design … E.A. Bankert & R.J. Andur, 2006, Institutional Review Board: Management and Function, Jones and Bartlett Publishers Information security Disclosure limitation Ohm, Paul. "Broken promises of privacy: Responding to the surprising failure of anonymization." UCLA Law Review 57 (2010): 1701.  D. J. Mazur, 2007. Evaluating the Science and ethics of Research on Humans, Johns Hopkins University Press  IRB: Ethics & Human Research [Journal], Hastings Press www.thehastingscenter.org/Publications/IRB/  Journal of Empirical Research on Human Research Ethics, University of California Press ucpressjournals.com/journal.asp?j=jer  201 CMR 17 text www.mass.gov/Eoca/docs/idtheft/201CMR17amended.pdf  FERPA Website www.ed.gov/policy/gen/guid/fpco/ferpa/index.html  HIPAA Website www.hhs.gov/ocr/privacy/  Common Rule Website www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm  State laws www.ncsl.org/Default.aspx?TabId=13489  MIT Resources   Research approvals: cuhs@fas.harvard.edu  37 Data-management planning: libraries.mit.edu/guides/subjects/data-management/ General policies: http://web.mit.edu/policies/ Managing Confidential Data v. 24 (January IAP Session 1)
  • 38. Research design, methods, management Law, policy, ethics Research design … Information security   Information Lifecycle Reducing risk      38 Disclosure limitation Sensitivity of information Partitioning Decreasing identification Managing confidentiality and dissemination [Summary] Managing Confidential Data v. 24 (January IAP Session 1)
  • 39. What’s wrong with this picture? Law, policy, ethics Research design … Information security Disclosure limitation Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed A. Jones 12341 01011961 02145 M Raspberr y 0 B. Jones 12342 02021961 02138 M Pistachio 0 C. Jones 12343 11111972 94043 M Chocolat e 0 D. Jones 12344 12121972 94043 M Hazelnut 0 E. Jones 12345 03251972 94041 F Lemon 0 F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 M. Smith 12353 04041974 64700 F Pumpkin 128 N. 39 12354 04041974 64800 F Allergic 256 Managing Confidential Data Smi v. 24 (January IAP Session 1)
  • 40. What’s wrong with this picture? Law, policy, ethics Research design … Information security Identifier Sensitive Private Identifier Private Identifier Sensitive Identifier Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed A. Jones 12341 01011961 02145 M Raspberr y 0 B. Jones 12342 02021961 02138 M Pistachio 0 C. Jones 12343 11111972 94043 M Chocolat e 0 D. Jones 12344 12121972 94043 M Hazelnut 0 E. Jones 12345 03251972 94041 F Lemon 0 F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 M. Smith 12353 40 N. Smith 12354 04041974 64700 04041974 64800 Disclosure limitation F Pumpkin 128 Managing Confidential Data F Allergic 256 Mass resident Californian Twins, separated at birth? FERPA too? Unexpected Response? v. 24 (January IAP Session 1)
  • 41. Name/Description Examples Comparison case: Official Statistics • • U.S. Census dissemination European statistical agencies • Data Sharing Systems for Open Access Journals American Political Science Association Data Access and Research Transparency [DART] Policy Initiative Well-resourced data collector summarizes tables/relational data in the form of summary statistics and contingency tables Example Use Cases Privacy-Aware Replicable Research Scholarly journals adopting policies for deposit and disposition of data for verification and replication. How to balance privacy and replicability without intensive review? Long-term Longitudinal data Collection Data collections tracking individual subjects (and possibly friends and relations) over decades Computational Social Science • • • National Longitudinal Study of Adolescent Health (Add Health) Framingham Heart Study Panel Study of Income Dynamics Analysis of … • Netflix • Facebook • Hubway • GPS • Blogs Privacy TOOLS FOR SHARING RESEARCH DATA -- Policy Update “Big” data. New forms and sources of data. Cutting-edge analytical methods and algorithms. 41 •
  • 42. Exemplar: Social Media Attribute Type Examples Data: Structure - network Data: Attribute Types - Continuous/Discrete/ Scale: ratio/interval/ordinal/nominal Data: Performance Characteristics - 10M-1B observations Sample from stream of continuously updated corpus Dozens of dimensions/measures Measurement: Unit of Observation - Individuals; Interactions Measurement: Measurement type - Observational Measurement: Performance characteristic - High volume Complex network structure Sparsity Systematic and sparse metadata Management Constraints - License; Replication Analysis methods - Bespoke algorithms (clustering); nonlinear optimization; Bayesian methods Desired Outputs - Summary scalars (model coefficients) Summary table Static /interactive visualization - More Information • 42 Grimmer, Justin, and Gary King. "General purpose computerassisted clustering and conceptualization." Proceedings of the National Academy of Sciences 108.7 (2011): 2643-2650. • King, Gary, Jennifer Pan, and Molly Roberts. "How censorship in China allows government criticism but silences collective expression." APSA 2012 Annual Meeting Paper. 2012. Privacy TOOLS FOR SHARING et al. "Life in the network: the coming age of • Lazer, David, computational RESEARCH DATA -- Policy Update social science." Science (New York, NY)
  • 43. Research Information Life Cycle Model Long-term access Creation/Coll ection Re-use Storage /Ingest External dissemination/public ation Processing Analysis 43 Internal Sharing Managing Confidential Data v. 24 (January IAP Session 1)
  • 44. Research Design Decisions Relevant to Confidential Information  Collection:     Consent/licensing terms Methods Measures Re-use Storage    Longterm access Systems information security Data structures and partitioning Dissemination    Vetting Disclosure limitation Data use agreements 44 External dissemination/pu blication Analysi s Managing Confidential Data Creation/C ollection Researc h methods Statistical / Computational Frameworks Data Management Systems Legal / Policy Frameworks ∂∂ Storag e/Inge st Processing Internal Sharing v. 24 (January IAP Session 1)
  • 45. Law, policy, ethics Trade-offs  Anonymity vs. research utility     Research design … Immediate utility Replicability and reuse Information security Disclosure limitation Sensitivity vs. research utility (Anonymity * Sensitivity) vs. research costs/efforts 45 Managing Confidential Data v. 24 (January IAP Session 1)
  • 46. Types of Sensitive Information   Law, policy, ethics Research design … Information security Information is sensitive, if, once disclosed there Disclosure limitation is a “significant” likelihood of harm IRB literature suggests possible categories of harm:          46 loss of insurability loss of employability criminal liability psychological harm social harm to a vulnerable group loss of reputational harm emotional harm dignitary harm physical harm: risk of disease, injury, or death Managing Confidential Data v. 24 (January IAP Session 1)
  • 47. Levels of sensitivity    No widely accepted scale Publicly available data not sensitive under “common rule” Common rule anchors scale at “minimal risk”: Law, policy, ethics Research design … Information security Disclosure limitation “if disclosed, the probability and magnitude of harm or discomfort anticipated are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests”  Example: Harvard Research Data Security Policy  Level 5- Extremely sensitive information about individually identifiable people. Information that if exposed poses significant risk of serious harm. Includes information posing serious risk of criminal liability, serious psychological harm or other significant injury, loss of insurability or employability, or significant social harm to an individual or group.  Level 4 - Very sensitive information about individually identifiable people Information that if exposed poses a non-minimal risk of moderate harm. Includes civil liability, moderate psychological harm, or material social harm to individuals or groups, medical records not classified as Level 5, sensitive-but-unclassified national security information, and financial identifiers (as per HRCI standards).  Level 3- Sensitive information about individually identifiable people Information that if disclosed poses a significant risk of minor harm. Includes information that would reasonably be expected to damage reputation or cause embarrassment; and FERPA records.  Level 2 – Benign information about individually identifiable people Information that would not be considered harmful, but that as to which a subject has been promised confidentiality.  Level 1 – De-identified information about people, and information not about people 47 Managing Confidential Data v. 24 (January IAP Session 1)
  • 48. Evaluating sensitivity  If “sensitive information” was disclosed and linked to a person or entity     Law, policy, ethics Research design … Information security Disclosure limitation Whom would it harm? How much would the harm be? What would the likelihood of harm manifesting? An the individual at risk protect against or mitigate harm? 48 Managing Confidential Data v. 24 (January IAP Session 1)
  • 49. Law, policy, ethics IRB Review Scope  Research design … IRB approval needed for all: Information security federally-funded research; Disclosure limitation  or any research at (almost all) institutions receiving federal funding that involve “human subjects”. ( Any organization operating under a general “federal-wide assurance”)  All human subjects research at MIT Research: "a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.”    Human subject: “a living individual about whom an investigator (whether professional or student) conducting research obtains:    Data through intervention or interaction with the individual, or Identifiable private information.” See www.hhs.gov/ohrp/ 49 Managing Confidential Data v. 24 (January IAP Session 1)
  • 50. Research not requiring IRB approval    Non-research: not generalizable knowledge & systematic inquiry Non-funded: institution receives no federal funds for research Not human subject:    Law, policy, ethics Research design … Information security Disclosure limitation No living people described Observation only AND no private identifiable information is obtained Human Subjects, but “exempt” under 45 CFR 46       use of existing, publicly-available data use of existing non-public data, if data is individuals cannot be directly or indirectly identified research conducted in educational settings, involving normal educational practices taste & food quality evaluation federal program evaluation approved by agency head observational, survey, test & interview of public officials and candidates (in their formal capacity, or not identified)  Caution not all “exempt” is exempt…  Some research on prisoners, children, not exemptable  Some universities require review of “exempt” research  MIT requires review of all human subject research  See: www.hhs.gov/ohrp/humansubjects/guidance/ decisioncharts.htm 50 Managing Confidential Data v. 24 (January IAP Session 1)
  • 51. IRB’s and Confidential Information Law, policy, ethics Research design … Information security   Disclosure limitation IRB‟s review consent procedures and documentation IRB‟s may review data management plans     May require procedures to minimize risk of disclosure May require procedures to minimize harm resulting from disclosure IRB‟s make determination of sensitivity of information -- potential harm resulting from disclosure IRB‟s make determination regarding whether data is de-identified for “public use” [see NHRPAC, “Recommendations on Public Use Data Files”] 51 Managing Confidential Data v. 24 (January IAP Session 1)
  • 52. Human Subjects @MIT: COUHES Law, policy, ethics Research design … Information security   The Institutional Review Board (IRB) must approve all human subjects research at MIT Disclosure prior to limitation data collection or use Research comprises   Research involves human subjects if:       Surveys Behavioral experiments Educational tests and evaluations Analysis of identified private data collected from people (your e-mail inbox, logs of web-browsing activity, facebook activity, ebay bids … ) The IRB will:     There is any interaction or intervention with living humans; or If identifiable private data about living humans is used Some examples of human subject research involving information   "a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.” Assess research protocol Identify whether research is exempt from further review and management Assess sensitivity of information Does your research require review, a quick checklist:  52 http://web.mit.edu/committees/couhes/quickguide.shtml Managing Confidential Data v. 24 (January IAP Session 1)
  • 53. Recommended Planning Practices Law, policy, ethics Research design … Information security Disclosure limitation Identify potential type and magnitude of harm; assess risk of harm, if disclosure were to occur  Identify risk mitigation opportunities across information lifecycle  Plan for publication, data dissemination, research replication, and reuse  53 Managing Confidential Data v. 24 (January IAP Session 1)
  • 54. Law, policy, ethics Reducing Risk in Data Collection Research design … Information security  Disclosure Avoid collecting sensitive information, unless it is limitation required by research design, method, or hypothesis    Collect sensitive information in private settings    Unnecessary sensitive information  not minimal risk Reducing sensitivity  higher participation, greater honesty Reduces risk of disclosure Increases participation Reduce sensitivity through indirect measures  Less sensitive proxies      54 E.g. Implicit association test [Greenwald, et al. 1998] Unfolding brackets Group response collection Random response technique [Warner 1965] Item count/unmatched count/list experiment technique Managing Confidential Data v. 24 (January IAP Session 1)
  • 55. Law, policy, ethics Reducing Risk in Data Collection Research design … Information security  Disclosure Avoid collecting sensitive information, unless it is limitation required by research design, method, or hypothesis    Collect sensitive information in private settings    Unnecessary sensitive information  not minimal risk Reducing sensitivity  higher participation, greater honesty Reduces risk of disclosure Increases participation Reduce sensitivity through indirect measures  Less sensitive proxies      55 E.g. Implicit association test [Greenwald, et al. 1998] Unfolding brackets Group response collection Random response technique [Warner 1965] Item count/unmatched count/list experiment technique Managing Confidential Data v. 24 (January IAP Session 1)
  • 56. Randomized Response Technique Law, policy, ethics Research design … Information security Disclosure limitation Sensitiv e questio n Subjec t rolls a die > 2 Record Answer Say “YES” 56 Managing Confidential Data Variations: - Ask two different questions - Item counts with sensitive and nonsensitive items – eliminate subject-randomization - Regression analysis methods v. 24 (January IAP Session 1)
  • 57. Law, policy, ethics Our Table – Less (?) Sensitive Research design … Less (?) Sensitive Information security Name SSN Birthdate Zipcode Gender Favorite Ice Cream Treat? # acts * A. Jones 12341 01011961 02145 M Raspberry 0 0 B. Jones 12342 02021961 02138 M Pistachio 1 20 C. Jones 12343 11111972 94043 M Chocolate 0 0 D. Jones 12344 12121972 94043 M Hazelnut 1 12 E. Jones 12345 03251972 94041 F Lemon 0 0 F. Jones 12346 03251972 02127 F Lemon 1 7 G. Jones 12347 08081989 02138 F Peach 0 1 H. Smith 12348 01011973 63200 F Lime 1 17 I. Smith 12349 02021973 63300 M Mango 0 4 J. Smith 12350 02021973 63400 M Coconut 1 18 K. Smith 12351 03031974 64500 M Frog 0 32 L. Smith 12352 04041974 64600 M Vanilla 1 65 M. Smith 12353 04041974 64700 F Pumpkin 0 128 N. Smith 12354 04041974 64800 F Allergic 1 256 Disclosure limitation * Acts = crimes if treatment = 0; crimes + acts of generosity if treatment =1 57 Managing Confidential Data v. 24 (January IAP Session 1)
  • 58. Randomized Response – Pros and Cons     Can substantially reduce direct legal risks of disclosure Can increase response rate Can decrease mis-reporting Disclosure limitation Warning!     Research design … Information security Pros  Law, policy, ethics None of the randomized models uses a formal measure of disclosure limitation Some would clearly violate measures (such as differential privacy) -- we‟ll see in section 4 Do not use as a replacement for disclosure limitation Other Issues     Loss of statistical efficiency (if compliance would otherwise be the same) Complicates data analysis, especially model-based analysis Leaving randomization up to subject can be unreliable May provide less confidentiality protection if:      58 Randomization is incomplete Records of randomization assignment are kept Lists of responses overlap across questions Sensitive question response is large enough to dominate overall response Non-sensitive question responses are extremely predictable, or publicly observable Managing Confidential Data v. 24 (January IAP Session 1)
  • 59. Partitioning Information   Reduces risk in information management Partition data information based on sensitivity          Research design … Information security Disclosure limitation Identifying information Descriptive information Sensitive information Other information Segregate   Law, policy, ethics Storage of information Access regimes Data collections channels Data transmission channels Plan to segregate as early as feasible in data collection and processing Link segregated information with artificial keys … 59 Managing Confidential Data v. 24 (January IAP Session 1)
  • 60. Partitioned table Not Identified Name SSN Birthdate Zipcode Gender LINK LINK Favorite Ice Cream Treat # act s A. Jones 12341 0101196 1 02145 M 1401 1401 Raspberry 0 0 283 Pistachio 1 20 B. Jones 12342 0202196 1 02138 M 283 8979 Chocolate 0 0 C. Jones 12343 11111972 94043 M 8979 7023 Hazelnut 1 12 1498 Lemon 0 0 D. Jones 12344 1036 Lemon 1 7 3864 Peach 0 1 E. Jones 2124 Lime 1 17 F. 4339 Mango 0 4 6629 Coconut 1 18 9091 Frog 0 32 9918 Vanilla 1 65 4749 Pumpkin 0 12 8 8197 Allergic 1 25 6 1212197 2 94043 12345 0325197 2 94041 F 1498 12346 0325197 2 02127 F 1036 Jon es G. 12347 Jon es H. Smith I. Smith 60 12348 12349 0808198 9 02138 0101197 3 63200 0202197 3 63300 M F F 7023 3864 2124 M 4339 Managing Confidential Data v. 24 (January IAP Session 1)
  • 61. Managing Sensitive Data Collection: Data Organization Law, policy, ethics Research design … Information security   Disclosure limitation Separate: sensitive measures, (quasi)-identifiers, other measures If possible avoid storing identifiers with measures:    For sensitive data:    Collect identifying information beforehand Assign opaque subject identifiers Collect on-line directly (with appropriate protections); or Encrypt collection devices/media (laptops, usb keys, etc) For very/extremely sensitive data:    61 Collect with oversight directly; then Store on encrypted device and; Transfer to secure server as soon as feasible Managing Confidential Data v. 24 (January IAP Session 1)
  • 62. Choosing Linking Keys     Most resistant to relink Mapping from original id to random keys is highly sensitive Must keep and be able to access mapping to add new identified data Most computer-generated random numbers are not sufficient by themselves            More troublesome to compute Same id‟s + same key + same “salt” produces same values  facilitates merging ID‟s can be recovered if key is exposed, cracked, or algorithm weak Security is well understood Tools available to compute Same id‟s produce same hashes  easier to merge new identified data ID‟s cannot be recovered from hash because hash loses information ID‟s can be confirmed if identifying information is known or guessable Cryptographic Hash + secret key       Most are PSEUDO random – predictable sequences Use a cryptographic secure PRNG: Blum Blum Shub, AES (or other block cypher) in counter mode OR Use real random numbers (e.g. from physical sources ) OR Use a PRNG with a real random seed to random the order of the table; then another to generate the ID‟s for this randomly ordered table Cryptographic Hash e.g. SHA-256   Disclosure limitation Encryption   Research design … Information security Entirely randomized  Law, policy, ethics Security is well understood Tools available to compute Same id‟s produce same hashes  easier to merge new identified data ID‟s cannot be recovered from hash because hash loses information ID‟s cannot be confirmed unless key is also known Do not choose arbitrary mathematical functions of other identifiers! 62 Managing Confidential Data v. 24 (January IAP Session 1)
  • 63. When is information anonymous?  Remove all HIPAA ID‟s      Research design … Information security Disclosure limitation OR anonymous observation only     Anonymous under HIPAA Anonymous under Mass Law Probably anonymous under common rule, not guaranteed But new/innovative forms of data collection may pose anonymity challenges Law, policy, ethics No interaction with subject No collection of identifying data Not “human subjects” under common rule OR determined by a statistical expert to be anonymous 63 Managing Confidential Data v. 24 (January IAP Session 1)
  • 64. Anonymous Data Collection: Pros & Cons     Research design … Information security Pros  Law, policy, ethics Presumption that data is not identifiable May increase participation May increase honesty Disclosure limitation Cons     64 Barrier to follow-up, longitudinal studies Can conflict with quality control, validation Data still may be indirectly identifiable if respondent descriptive information is collected Linking data to other sources of information may have large research benefits Managing Confidential Data v. 24 (January IAP Session 1)
  • 65. Anonymous Data Collection Methods     Law, policy, ethics Research design … Information security Trusted third party intermediates Disclosure limitation Respondent initiates re-contacts No identifying information recorded Use id‟s randomized to subjects, destroy mapping 65 Managing Confidential Data v. 24 (January IAP Session 1)
  • 66. Field Data Collection Challenges    Whole-disk-encrypted laptop Plus, Encrypted cloud backup solutions: CrashPlan, BackBlaze, SpiderOak Small data, short term   Encrypted network file transfer (e.g. SFTP, part of ssh ) Encrypted/tunneled network file system (e.g. Expandrive) Where network connection is less reliable, high bandwidth   Research design … Information security Where network connection is readily available, Disclosure easy to transfer as collected, or enter on remote hardened limitation system   Law, policy, ethics Encrypted USB keys (e.g. w/IronKey, TrueCrypt, PGP) Foreign Travel    66 Be aware of U.S. EAR export restrictions, use commercial or widely-available open encryption software only. Do not use bespoke software. Be aware of country import restrictions (as of 2008): Burma, Belarus, China, Hungary, Iran, Israel, Morocco, Russia, Saudi Arabia, Tunisia, Ukraine Encrypt data if possible, but don‟t break foreign laws. Check with v. 24 (January IAP department of state. Managing Confidential Data Session 1)
  • 67. Online/Electronic data collection challenges        Vendor could retain IP addresses, identifying cookies, etc., even if not intended by researcher Recommendation     Data collected from subjects from other states / countries could subject you to laws in that jurisdiction Jurisdiction may depend on residency of subject, availability of data collection instrument in jurisdiction, or explicit data collections efforts within jurisdiction For international data collection, be aware of EU Privacy Directive AND EU “Cookie” Directive (EU 2009/136) Vendor   Cookies provide a way to link data more easily May or may not explicitly identify subject Jurisdiction   Disclosure IP addresses can be logged automatically by host, even if not intended by researcher limitation IP addresses can trivially be observed as data is collected Partial ID numbers can be used for probabilistic geographical identification at sub-zipcode levels Cookies may be identifiers   Research design … Information security IP addresses are identifiers  Law, policy, ethics Use only vendors that certify compliance with your confidentiality policy Do not retain IP numbers if data is being collected anonymously Use SSL/TLS encryption unless data is non-sensitive and anonymous Some tools for anonymizing IP addresses and system/network logs www.caida.org/tools/taxonomy/anonymization.xml 67 Managing Confidential Data v. 24 (January IAP Session 1)
  • 68. Cloud computing risks    Cloud computing decouples physical and computing infrastructure Increasingly used for core-IT, research computing, data collection, storage, and analysis Confidentiality issues        Law, policy, ethics Research design … Information security Disclosure limitation Auditing and compliance Access and commingling of data Location of data and services and legal jurisdiction Cooperation with incident response Cooperation with your legal team Diligence in resisting government legal requests Mitigation strategies  Careful review of TOS     Certification     FISMA ISO Audits Contracting    Security standards Breach procedures Support/remedies Contract riders Master service level agreements Encryption      68 Strong (256 bit AES) Client-side Verified Keys never stored in cloud, never transmitted in clear Keys escrowed within organization or w/organizational representatives Managing Confidential Data v. 24 (January IAP Session 1)
  • 69. Agency Certificates, …  DHHS Certificate of Confidentiality       Research design … Information security Disclosure limitation For NIH, CDC, FDA, … Optional, but recommended if available Protects against many types of forced legal disclosure of confidential information May not protect against all state disclosure law Does not protect against voluntary disclosures by researcher/research institutions National Institute of Justice Privacy Certificate     Law, policy, ethics Required to conduct research Certifies PI‟s compliance to NIJ NIJ Research is protected against some additional forms of forced disclosure Sworn Census Employee -- Title 13   69 Required for access to restricted Census Bureau Extends Title-13 penalties and protections Managing Confidential Data v. 24 (January IAP Session 1)
  • 70. Confidentiality & Consent Best practice is to describe in consent form...      Research design … Information security Disclosure limitation Practices in place to protect confidentiality Plans for making the data available, to whom, and under what circumstances, rationale. Limitations on confidentiality (e.g. limits to a certificate of confidentiality under state law, planned voluntary disclosure) Description of data reuses and benefits therof Consent form should be consistent with your:     Law, policy, ethics Data management plan Data sharing plans and requirements Terms of service for systems used in data storage and processing Not generally best practice to promise     70 Unlimited confidentiality Destruction of data Expiration of data Restriction of all data to original researchers Managing Confidential Data v. 24 (January IAP Session 1)
  • 71. Data Management Plan         Disclosure limitation Documentation Backup and recovery Review Treatment of confidential information      Any NIH request over $500K All NSF proposals after 12/31/2010 NIJ Wellcome Trust Any proposal where collected data will be a resource beyond the project Safeguarding data during collection   Research design … Information security When is it required?  Law, policy, ethics Overview: http://www.icpsr.org/DATAPASS/pdf/confidentiality.pdf Separation of identifying and sensitive information Obtain certificate of confidentiality, other legal safeguards De-identification and public use files Dissemination      Archiving commitment (include letter of support) Archiving timeline Access procedures Documentation User vetting, tracking, and support 71 One size does not fit all projects. Managing Confidential Data v. 24 (January IAP Session 1)
  • 72. Data Management Plan Outline Law, policy, ethics Research design … Information security      nature of data {generated, observed, experimental information; amples; publications; physical collections; software; models} scale of data  Plans for depositing in an existing public database  Access procedures   Naming conventions  Types of IP rights in data  Protections provided  Dispute resolution process Quality Assurance [if not described in main proposal]  Procedures for ensuring data quality in collections, and expected measurement error Timeframe for access  Cleaning and editing procedures Technical access methods  Validation methods Restrictions on access  Requirements for data destruction, if applicable  Procedures for long term preservation Methods  Procedures Institution responsible for long-term costs of data preservation Frequency   Replication Succession plans for data should archiving entity go out of existence  Existing Data [ If applicable ] Version management Recovery guarantees description of existing data relevant to the project plans for integration with data collection   added value of collection, need to collect/create new data Security Metadata and documentation  Metadata to be provided  Metadata standards used  Confidentiality concerns Access control rules   Technical Controls  Storage format and archival justification Procedural controls  Generation and dissemination formats and procedural justification   Formats  Archiving and Preservation  Reasons not to share or reuse  Institutional requirements and plans to meet them     Provider requirements and plans to meet them  Facilities     Potential scope or scale of use  Legal Requirements  Potential secondary users  Storage, backup, replication, and versioning   Audience  Intellectual Property Rights Entities who hold property rights    Cost of permanent archiving     File organization Access charges  Data Organization [if complex] Disclosure limitation Cost of preparing data and documentation Budget  Embargo periods   Quality assurance procedures for metadata and documentation   Access and Sharing  Planned documentation and supporting materials  Data description  Ethics and privacy  Informed consent  Protection of privacy  Other ethical issues Restrictions on use Responsibility  Adherence  When will adherence to data management plan be checked or demonstrated  Who is responsible for managing data in the project  Who is responsible for checking adherence to data management plan Treatment of field notes, and collection records 72  Individual or project team role responsible for data management Managing Confidential Data v. 24 (January IAP Session 1)
  • 73. Data Management Consulting: Libraries @MIT  The libraries can help:     Assist with data management plans Individual consultation/collaboration with researchers General workshops & guides Dissemination of public data through   DSpace@MIT Referrals to subject-based repositories For more information: libraries.mit.edu/guides/subjects/data-management/ <data-management@mit.edu> 73 Managing Confidential Data v. 24 (January IAP Session 1)
  • 74. Data Management Plans Examples (Summaries)         Law, policy, ethics Research design … Information security Example 1 The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the Disclosure New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial limitation features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data. Example 2 The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed. Example 3 This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts. https://ssl.isr.umich.edu/hrs/ User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction of the data after analyses are completed, reporting responsibilities, restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource. Registered users will receive user support, as well as information related to errors in the data, future releases, workshops, and publication lists. The information provided to users will not be used for commercial purposes, and will not be redistributed to third parties. FROM NIH, [grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex] 74 Managing Confidential Data v. 24 (January IAP Session 1)
  • 75. External Data Usage Agreements  Between provider and individual     Between provider and University    Careful – you‟re liable University will not help if you sign University review and OSP signature strongly recommended University Liable Requires University Approved Signer : OSP Avoid nonstandard protections whenever possible   75  Sample Standardized DUA Response; Controls on Confidential Information Harvard University has developed extensive technical and administrative procedures to be used with all identified personal information and other confidential information. The University classifies this form of information internally as "Harvard Confidential Information" -- or HCI. Any use of HCI at Harvard includes the following safeguards: - Systems security. Any system used to store HCI is subject to a checklist of technical and procedural security measures including: operating system and applications must be patched to current security levels, a host based firewall is enabled, anti virus software is enabled and the definitions file is current - Server security. Any server used to distribute HCI to other systems (e.g. through providing remote file system), or otherwise offering login access, must employ additional security measures including: connection through a private network only; limitation on length of idle sessions; limitations on incorrect password attempts; and additional logging and monitoring. - Access restriction: an individual is allowed to access HCI only if there is a specific need for access. All access to HCI is over physically controlled and/or encrypted channels. - Disposal processes: including secure file erasure and document destruction. - Encryption: HCI must be strongly encrypted whenever it is transmitted across a public network, stored on a laptop, or stored on a portable device such as a flash drive or on portable media. This is only a brief summary. The full University security policy can be found here: http://security.harvard.edu/heisp And a more detailed checklist used to verify systems compliance is found here: http://security.harvard.edu/files/resources/forms/ DUA‟s can impose very specific and detailed requirements These safeguards are applied consistently throughout the university, we believe that these requirements offer stringent protection for the requested data. And Compatible in spirit does not these requirements will be applied in addition to any others required by specific apply compatibility in legal data use agreement. practice Managing Confidential Data v. 24 (January IAP Session 1) Use higher level controls,
  • 76. Checklist: Research Design … Law, policy, ethics Research design … Information security         Disclosure Does research involve human subjects? limitation What are possible harms that could occur if identified information was disclosed? Is information collected benign, sensitive, very sensitive, or extremely sensitive? (IRB makes final determination) Can the sensitivity of the information be reduced? Can research be carried out with anonymity? Can research data be de-identified during collection? How can identifying information, descriptive information and sensitive information be segregated? Have you:   Have you written/verified the following to be consistent with final plans for analysis and dissemination:     Completed appropriate training data management plan? consent documents? application for certificate of confidentiality? If you are using third party/cloud storage or services on confidential information, have you:     Reviewed the Terms of Service Verified appropriate security certifications Used strong client-side encryption Planned to manage encryption-keys 76 Managing Confidential Data v. 24 (January IAP Session 1)
  • 77. Preliminary Recommendations Planning and methods  Review research design for sensitive identified information     Design research methods to reduce sensitivity       Separate sensitive/identifying information at collection, if feasible Link separate files using cryptographic hash of identifiers plus secret key; or cryptographic-strength random number Incorporate extra protections for on-line data collection      Recognize benefits of data sharing Ask for consent to share data appropriately Apply for a certificate of confidentiality where data is very sensitive Separate sensitive information   Eliminate sensitive/identifying information not needed for research questions Consider randomized response, list experiment design Design human subjects plan with information management in mind   Information which would cause harm if disclosed HIPAA identifiers Other indirectly identifying characteristics Use vendor agreements that specify anonymity and confidentiality protections Do not collect IP addresses if possible, regularly anonymize and purge otherwise Restrict display of very sensitive information in user interfaces Limit on-line collection of very sensitive information Using cloud services  77 Use strong client-side encryption for confidential data Managing Confidential Data v. 24 (January IAP Session 1)
  • 78. Resources Law, policy, ethics Research design … Information security        Disclosure E.A. Bankert & R.J. Andur, 2006, Institutional Review Board: limitation Management and Function, Jones and Bartlett Publishers R. Groves, et al., 2004, Survey Methodology, John Wiley & sons. J.A. Fox, P.E. Tracy, 1986, Randomized Response, Sage Publications. R.M. Lee, 1993, Doing Research on Sensitive Topics, Sage Publications. D. Corstange, 2009, "Sensitive Questions, Truthful Answers? Modeling the List Experiment with LISTIT", Political Analysis 17:45–63 ICPSR Data Enclave [www.icpsr.umich.edu/icpsrweb/ICPSR/access/restricted/encla ve] Murray Research Archive [www.murray.harvard.edu]  NIH Certificate of Confidentiality Kiosk Managing [grants.nih.gov/grants/policy/coc] Confidential Data 78 v. 24 (January IAP Session 1)
  • 79. Information Security Law, policy, ethics Research design … Information security Disclosure limitation      79 Security principles FISMA Categories of technical controls A simplified approach [Summary] Managing Confidential Data v. 24 (January IAP Session 1)
  • 80. Core Information Security Concepts Law, policy, ethics Research design … Information security  Security properties       Disclosure limitation Confidentiality Integrity Availability [Authenticity] [Nonrepudiation] Security practices     80 Defense in depth Threat modeling Risk assessment Vulnerability assessment Managing Confidential Data v. 24 (January IAP Session 1)
  • 81. Law, policy, ethics Risk Assessment Research design … Information security  [NIST 800-100, simplification of NIST 800-30]Disclosure limitation Threat Modeling Analysis - likelihood System Analysis - impact - mitigating controls Vulnerability Identification 81 Institute Selected Controls Testing and Auditing Information Security Control Selection Process Managing Confidential Data v. 24 (January IAP Session 1)
  • 82. Risk Management Details Law, policy, ethics Research design … Information security         System Characterization Threat Identification Control analysis Likelihood determination Impact Analysis Risk Determination Control recommendation Results documentation 82 Managing Confidential Data Disclosure limitation v. 24 (January IAP Session 1)
  • 83. Classes of threats and vulnerabilities Law, policy, ethics Research design … Information security  Sources of threat     Disclosure limitation Natural Unintentional Human Intentional Areas of vulnerability  Logical     Physical     Computer systems Network Backups, disposal, media Social    83 Data at rest in system Data in motion across networks Data being processed in applications Social engineering Mistakes Insider threats Managing Confidential Data v. 24 (January IAP Session 1)
  • 84. Law, policy, ethics Simple Control Model Information security Access Control Request/Resp onse Disclosure limitation Resource Auditing Authorization Credential s Authentication Client Research design … Log Resource Control Model External Auditor 84 Managing Confidential Data v. 24 (January IAP Session 1)
  • 85. Operational and Technical Controls [NIST 800-53] Law, policy, ethics Research design … Information security  Operational           Disclosure limitation Personnel security Physical and environmental protection Contingency planning Configuration management Maintenance System and information integrity Media protection Incident Response Awareness and training Technical Controls     85 Identification and authentication Access control Audit and accountability System and communication protection Managing Confidential Data v. 24 (January IAP Session 1)
  • 86. Law, policy, ethics Key Information Security Standards  Information security Disclosure limitation Comprehensive Information Security Standards     Research design … FISMA – framework for non-classified information security in federal government. ISO/IEC 27002 – framework of similar scope to FISMA, used internationally PCI – Payment card industry security standards. Used by major payment card companies, processors, etc. Related Certifications  FIPS-compliance and certification    SAEE 16/SAS 70 Audits – Type 2    Independent audit of controls and control objectives Does not establish sufficiency of control objectives CISSP -- Certified Information Systems Security Professional  86 Establishes standards for cryptographic methods and modules Be aware that FIPS-certification often limited to algorithm used, and not entire system Widely recognized certification for information security professionals Managing Confidential Data v. 24 (January IAP Session 1)
  • 87. Law, policy, ethics FISMA Overview Research design … Information security Disclosure limitation Federal Information Security Management Act of 2002  All federal agencies required to develop agency-wide information security plan  NIST published extensive list of recommendations  Federal sponsors seem to be trending to FISMA as best practice for managing confidential data produced by award  Identifies risk and impact level; monitoring; technical and procedural controls  MIT typical controls: less than FISMA “low” 87 Managing Confidential Data v. 24 (January IAP Session 1)
  • 88. Law, policy, ethics Security Control Baselines Research design … Information security Disclosure limitation Access Control Low (impact) Medium-High (impact), adds… Policies; Account management *; Access Enforcement; Unsuccessful Login Attempts; System Use Notification; Restrict Anonymous Access*; Restrict Remote Access*; Restrict Wireless Access*; Restrict Mobile Devices*; Restrict use of External Information Systems*; Restrict Publicly Accessible Content Information flow enforcement; Separation of Duties; Least Privilege; Session Lock Security Awareness and Training Policies; Awareness; Training; Training Records Audit and Accountability Policies; Auditable Events *; Content of Audit Records *; Storage Capacity; Audit Review, Analysis and Reporting *; Time Stamps *; Protection of Audit Information; Audit Record Retention; Audit Generation Audit Reduction; Non-Repudiation Security Assessment and Authorization Policies; Assessments* ; System Connections; Planning; Authorization; Continuous Monitoring 88 Managing Confidential Data v. 24 (January IAP Session 1)
  • 89. Security Control Baselines Law, policy, ethics Research design … Information security Disclosure limitation Configuration Management Low (impact) Medium-High (impact), adds… Policies; Baseline*; Impact Analysis; Settings*; Least Functionality; Component Inventory* Change Control; Access Restrictions for Change; Configuration Management Plan Contingency Planning Policies; Plan * ; Training *; Plan Testing*; System backup*; Recovery & Reconstitution * Alternate storage site; Alternate processing site; Telecomm Identification and Authentication Policies; Organizational Users*; Identifier Management; Authenticator Management *; Authenticator Feedback; Cryptographic Module Authentication; Non-Organizational Users Device identification and authentication Incident Response Policies; Training; Handling *; Monitoring; Reporting*; Response Assistance; Response Plan Testing Maintenance Policies; Control*; Non-Local Maintenance Tools; Maintenance scheduling/timeliness Restrictions*; Personnel Restrictions* 89 Managing Confidential Data v. 24 (January IAP Session 1)
  • 90. Security Control Baselines Law, policy, ethics Research design … Information security Disclosure limitation Media Protection Low (impact) Medium-High (impact), adds… Policies; Access restrictions*; Sanitization Marking; Storage; Transport Physical and Environmental Protection Policies; Access Authorizations; Access Control*; Monitoring*; Visitor Control *; Records*; Emergency Lighting; Fire protection*; Temperature, Humidity, water damage*; Delivery and removal Network access control; Output device Access control; Power equipment access, shutoff, backup; Alternate work site; Location of information system components; information leakage Planning Policies, Plan, Rules of Behavior; Privacy Impact Assessment Activity planning Personnel Security Policies; Position categorization; Screening; Termination; Transfer; Access Agreements; ThirdParties; Sanctions Risk Assessment Policies; Categorization Assessment; Vulnerability Scanning* 90 Managing Confidential Data v. 24 (January IAP Session 1)
  • 91. Security Control Baselines Law, policy, ethics Research design … Information security Disclosure limitation System and Services Acquisition Low (impact) Medium-High (impact), adds… Policies; Resource Allocation; Life Cycle Support; Acquisition*; Documentation; Software usage restrictions; User installed software restrictions; External information System Services restrictions Security Engineering; Developer configuration management; Developer security testing; supply chain protection; Trustworthiness System and Communications Protection Policies; Denial of Service Protection; Boundary protection*; Cryptographic key Management; Encryption; Public Access Protection; Collaborative computing devices restriction; Secure Name resolution* Application Partitioning; Restrictions on Shared Resources; Transmission integrity & confidentiality; Network Disconnection Procedure; Public Key Infrastructure Certificates; Mobile Code management; VOIP management; Session authenticity; Fail in known state; Protection of information at rest; Information system partitioning System and Information Integrity Policies, Flaw remediation*; Malicious code protection*; Security Advisory monitoring*; Information output handling Information system monitoring; Software and information integrity; Spam protection; Information input restrictions & validation; Error handling Program Management Plan; Security Officer Role; Resources; Inventory; Performance Measures; Enterprise architecture; Risk management strategy; Authorization process; Mission definition 91 Managing Confidential Data v. 24 (January IAP Session 1)
  • 92. HIPAA Requirements  Administrative controls        Physical controls      Access authorization, establishment, modification, and termination. Training program Vendor compliance Disaster recovery Internal audits Breach procedures Disposal Access to equipment Access to physical environment Workstation environment Technical controls       92 Intrusion protection Network encryption Integrity checking Authentication of communication Configuration management Risk analysis Managing Confidential Data v. 24 (January IAP Session 1)
  • 93. Law, policy, ethics Delegating Systems Security Research design … Information security Disclosure limitation      What are goals for confidentiality, integrity, availability? What threats are envisioned? What controls are in place? Is there a checklist? Who is responsible for technical controls?   Who is responsible for procedural controls?   Have they received appropriate training? How is security monitored, audited, and tested?   Do they have appropriate training, experience and/or certification? E.g. SSAE 16/SAS Type -2 Audits; FISMA Compliance; ISO Certification What security standards are referenced?  93 E.g. FISMA, ISO, HEISP/HDRSP/PCI Managing Confidential Data v. 24 (January IAP Session 1)
  • 94. What most security plans do not do Law, policy, ethics Research design … Information security     Disclosure limitation Protect against all insider threats Protect against all unintentional threats (human error, voluntary disclosure) Protect against the NSA, CIA, TEMPEST, evil maids, and other well-resourced, sophisticated adversaries Protect against prolonged physical threats to computer equipment, or data owner 94 Managing Confidential Data v. 24 (January IAP Session 1)
  • 95. Information Security is Systemic Law, policy, ethics Research design … Information security Disclosure limitation Not just control implementation but…  Policy creation, maintenance, auditing  Implementation review, auditing, logging, monitoring  Regular vulnerability & threat assessment 95 Managing Confidential Data v. 24 (January IAP Session 1)
  • 96. Simplified Approach for Sensitive Data Law, policy, ethics Research design … Information security Disclosure limitation       Systematically identify confidential information Use whole-disk/media encryption to protect data at rest Use end-to-end encryption to protect data in motion Use core information hygiene to protect systems Scan for HRCI regularly Be thorough in disposal of information Very sensitive/extremely sensitive data requires more protection. 96 Managing Confidential Data v. 24 (January IAP Session 1)
  • 97. Recommendation: Law, policy, ethics Identifying Confidential Information Research design … Information security  Does it contain social security numbers, financial Disclosure limitation id‟s, credit card #‟s, driver‟s license #‟s, other federal id‟s?   Is information completely anonymous?    Not confidential Does it contain private information from research on people?    Treat as confidential  Treat as confidential Does it contain private information about named/identified students that does not appear in the directory?  Treat as confidential  Would its release put individuals or organizations at significant risk of criminal or civil liability, or be damaging to financial standing, employability, or reputation?   Treat as confidential 97 Managing Confidential Data v. 24 (January IAP  Session 1)
  • 98. Recommended: Core information hygiene  Computer setup       Disclosure limitation And keep it updated Install all operating system and application security updates Server Setup      Information security Use a host-based firewall Strong credentials” Use a locking screen-saver Lock default/open accounts Regularly scan for sensitive information Update your software regularly   Research design … Use a virus checker   Law, policy, ethics Password guessing restrictions Idle session locking (or used on all client) No password retrieval Keep access logs Behavior  Don‟t share accounts or passwords  Don‟t use administrative accounts all the time Don‟t run programs from untrusted sources Don‟t give out your password to anyone Have a process for revoking user access when no longer needed/authorized (e.g. if user leaves university) Documented breach reporting procedure Users should have appropriate confidentiality training      98 Managing Confidential Data v. 24 (January IAP Session 1)
  • 99. Recommendation: Use Strong Credentials      These accounts should use other brute-force limiting techniques Much longer for protecting encrypted files/filesystems (16 characters) Not be common words, names, or anything easily guessed, No dates, no long sequences (>4) of numbers Not shared across accounts or people Never stored unencrypted Better   Lots of entropy, Hard for a person to guess, Easy to remember One method for choosing good passwords…    Use complete sentences phrases with punctuation AND abbreviate AND substitute characters – but not schematically Disclosure limitation Store passwords in a manner that can‟t be retrieved Store hashes Force password change of automatically generated passwords Protect against password guessing Brute-force guessing of a file of 8 charact alphanumeric passwords takes less than $ dollars of compute time! (12 character complex passwords > $1,000,000) [Campbell 2009] Protect encrypted password files Monitor failed login attempts Even better  Use multifactor identification where possible   AND careful thought to recovery mechanism, should additional factor be lost Use PASSWORD manager software    Information security At least 8 characters long for login/account passwords Servers and Applications   Research design … Strong Passwords, at minimum   Law, policy, ethics Allows for long random passwords, escrow keys etc Be very careful with the master Key length   99 Private key encryption >= 256 bits Public key >= 2048 bits Managing Confidential Data v. 23 (7/18/2013)
  • 100. Recommended: Safe storage Law, policy, ethics Research design … Information security   Extremely sensitive information/ must be stored only in carefully hardened designated systems Sensitiveinformation may be stored on:    Managed File Services OR Other approved server OR On desktop/laptop/USB if…   100 Disclosure limitation Encrypted Good information hygiene Managing Confidential Data v. 24 (January IAP Session 1)
  • 101. Law, policy, ethics Recommended: Safe use information Research design … Information security Disclosure limitation  Who can use Confidential Information?  Education records   MIT business related sensitive informationI     Sharing must be approved by appropriate head of lab, etc. Research confidential information   Can be shared with other staff & faculty who have a specific need Sharing must be approved by PI of research project. PI must have data sharing policy approved by IRB. It is your responsibility to notify the recipient of their responsibility to protect confidentiality. What to do?     101 Practice good information hygiene Always Encrypt Scan for vulnerabilities and sensitive information Use Strong Passwords Managing Confidential Data v. 24 (January IAP Session 1)
  • 102. Law, policy, ethics Use encryption Research design … Information security    Encrypt HCI files if they are not stored on an approved server Encrypt whole disks for all laptops and portable storage” Encrypt network transmissions:      Disclosure limitation Presenting your password Transmitting sensitive files Remotely accessing confidential information Do not send unencrypted files through mail – even if they are “passworded” Preferred tools  Remote access generally:    VPN ssh File transfer    102 Winzip (must enable encryption) Secure FTP (Secure FX, Expandrive, etc.) PGP Encrypted mail Managing Confidential Data v. 24 (January IAP Session 1)
  • 103. Safe disposal Law, policy, ethics Research design … Information security  Shred Paper, tapes, cd‟s:    Use a cross-cut shredder Place in Harvard-designated, locked, shredder bin Erase confidential files when they are no longer needed       Disclosure limitation Information can hide in files, even if not visible Unless encrypted filesystem was used, unerased information is vulnerable Even where encrypted filesystem used, vulnerabilities heightened if key length is short Files can remain on disk, even if not visible Use an approved secure file eraser such as PGP SHRED Ask IT support when transferring, donating, or disposing of computers or portable storage  Information can hide in obscure places on an a computer.   Files on SSD may not be erased until later… Information on standard disks may be preserved in “bad blocks”, etc. IT Support has special equipment to securely and complete erase disks.  103 Computers can then be safely reused. Managing Confidential Data  v. 24 (January IAP Session 1)
  • 104. Safe Sharing One size doesn’t fit all… Law, policy, ethics Research design … Information security Disclosure limitation External dissemination Approaches:  De-identify + publish  Partial de-identify + DUA  Deposit in “virtual enclave” Trade-offs  Decreasing breadth of dissemination  Decreasing risk, increasing utility  Increasing continuing 104 maintenance costs Managing Confidential Data v. 24 (January IAP Session 1)
  • 105. Safe Sharing Collaboration  Institutionally Hosted Trusted Remote Computing Environment     Information security Disclosure limitation Example: RCE Limitations: Multi-institutional Collaboration Example: AWS for Gov – FISMA Container Limitations: IT deployment and ongoing maintenance costs; cloud services due diligence for certification, contract, breach response, etc Non-encrypted transmission + local encryption layer      Research design … DIY Trusted Remote Environment   Law, policy, ethics Encrypted Files {Winzip, PGP file, XXX} + {DropBox, Box, Google-Drive, E-mail} Encrypted Filesystem: {Box, DropBox, XXX} + {FileVault,TrueCrypt} Encrypted P2P : Bittorrent Sync Limitations: Key distribution; access revocation; key sharing, simultaneous updating (except for P2P) Zero-Knowledge Magic   105 Spider-Oak Hive; BoxCryptor; Sync.Com Limitations: enterprise key management features; lack of24 (January IAP Managing Confidential Data v. transparency/auditing/certification Session 1)
  • 106. Law, policy, ethics Plan Outline – Very Sensitive Data Research design … Information security  Protect very sensitive data on “target systems”  Extra physical, logical, administrative access control       Private network, not directly connected to public network Regular scans    Record keeping Limitations Lockouts Extra monitoring, auditing Extra procedural controls – specific, renewed approvals Limits on network connectivity   Disclosure limitation Vulnerability scans Scans for PII Extremely sensitive   106 Increased access control, procedural limitations Not physically/logically connected (even via wireless) to public network, directly or indirectly Managing Confidential Data v. 24 (January IAP Session 1)
  • 107. Key Concepts Review Law, policy, ethics Research design … Information security Disclosure limitation           Confidentiality Integrity Availability Threat modeling Vulnerability assessment Risk assessment Defense in depth Logical Controls Physical Controls Administrative Controls 107 Managing Confidential Data v. 24 (January IAP Session 1)
  • 108. Checklist: Identify Requirements Law, policy, ethics Research design … Information security  Documented information security plan?           Extra logical, administrative, physical controls for very sensitive data? Monitoring and vulnerability scanning for very sensitive data? Check requirements for remote and foreign data collection Refer to security standards      Use whole-disk/media encryption to protect data at rest Use end-to-end encryption to protect data in motion Use basic information hygiene to protect systems Be thorough in disposal of information Use strong credentials Additional protections for sensitive data   What are goals for confidentiality, integrity, availability? What threats are envisioned? What are the broad types of controls in place? Key protections   Disclosure limitation FIPS encryption FISMA / ISO practices SAEE -16/SAS-70 Auditing CISSP certification of key staff Delegate implementation to information security professionals 108 Managing Confidential Data v. 24 (January IAP Session 1)
  • 109. Preliminary Recommendations Information Security   Use FISMA as a reference/dictionary of possible baseline controls Document:      Delegate implementation to IT professionals Refer to standards   Protection goals Threat models Types of controls Gold standards: FISMA / ISO practices, SAS-70 Auditing, CISSP certification of key staff Strongly recommended controls    Use whole-disk/media encryption to protect data at rest Use end-to-end encryption to protect data in motion Use core information hygiene to protect systems  Use a virus checker, and keep it updated Use a host-based firewall Update your software regularly Install all operating system and application security updates  Don‟t share accounts or passwords  Don‟t use administrative accounts all the time Don‟t run programs from untrusted sources Don‟t give out your password to anyone        Scan for sensitive informaton regularly Be thorough in disposal of information   109 Use secure file erase tools when disposing of files Use secure disk erase tools when disposing/repurposing disks Managing Confidential Data v. 24 (January IAP Session 1)
  • 110. Preliminary Recommendations Very Sensitive/Extremely Sensitive Information security  Protect very sensitive data on “target systems”  Extra physical, logical, administrative access control       Extra monitoring, auditing Extra procedural controls – specific, renewed approvals Limits on network connectivity    Private network, not directly connected to public network Consider multi-factor identification Regular scans    Record keeping Limitations Lockouts Vulnerability scans Scans for PII Extremely sensitive   110 Increased access control, procedural limitations Not physically/logically connected (even via wireless) to public network, directly or indirectly Managing Confidential Data v. 24 (January IAP Session 1)
  • 111. Resources Law, policy, ethics Research design … Information security Disclosure limitation S. Garfinkel, et al. 2003, Practical Unix and Internet Security, 3rd ed. , O‟Reilly Media  Shon Harris, 2001, CISSP All-in-One Exam Guide, Osborne  NIST, 2009, DRAFT Guide to Protecting the Confidentiality of Personally Identifiable Information, Nist Publication 800-122.  NIST, 2009, Recommended Security Controls for Federal Information Systems and Organizations v. 3, NIST 800-53. (Also see related NIST 800-53A, and other NIST Computer Security Division Special Publications) [csrc.nist.gov/publications/PubsSPs.html]  NIST, 2006, Information Security Handbook: A Guide 111 Managing Confidential 800-100. v. 24 (January IAP for Managers, NIST PublicationData Session 1) 
  • 112. Recommended Software Law, policy, ethics Research design … Information security          Vulnerability scanner/assessment tool: www.nessus.org/nessus Commercial version scans for (limited) PII: www.nessus.org/nessus PII Scanning tool (open source), Cornell Spider: www2.cit.cornell.edu/security/tools PII Scanning tool (commercial), Identity Finder: www.identityfinder.com File integrity/intrusion detection engine, Samhain: la-samhna.de/samhain Network intrusion detection, Snort: www.snort.org Encrypt transmission over network     Open Source: truecrypt.org Commercial: pgp.com Scanning   Disclosure limitation Whole Disk Encryption Open SSL: http://openssl.org Open SSH: http://openssh.org VTUN: http://vtun.sourceforge.net Cloud backup services with encryption    112 Crashplan: http://crashplan.com Spider oak: http://spideroak.com Backblaze: http://backblaze.com Managing Confidential Data v. 24 (January IAP Session 1)
  • 113. Disclosure Limitation Law, policy, ethics Research design … Information security        113 Disclosure limitation Threat models Disclosure limitation methods Statistical disclosure limitation methods Types of disclosure Factors affecting disclosure protection SDL Caveats SDL Observations Managing Confidential Data v. 24 (January IAP Session 1)
  • 114. Law, policy, ethics Threat Models Research design … Information security      Nosy neighbor (nosy employer) Muck-raking Journalist (zero-tolerance) Business rival contributing to same survey Absent-minded professor … 114 Managing Confidential Data Disclosure limitation v. 24 (January IAP Session 1)

Hinweis der Redaktion

  1. This work. Managing Confidential information in research, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This talk discusses findings from this survey, common gaps, and trends in this area.(I also have a little fun highlighting the hidden assumptions underlying Amazon Glacier&apos;s reliability claims. For more on that see this earlier post: http://drmaltman.wordpress.com/2012/11/15/amazons-creeping-glacier-and-digital-preservation )
  3. - Institutions may give “limited assurance” of common rule compliance – just for funded research. Most give “general assurance”