Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Anonymizing Health Data

1.693 Aufrufe

Veröffentlicht am

Slide deck from the O'Reilly webcast on the "Anonymizing Health Data" book

  • Als Erste(r) kommentieren

Anonymizing Health Data

  1. 1. Anonymizing Health Data Webcast Case Studies and Methods to Get You S Khaled El Emam & Luk
  2. 2. Anonymizing Health Data Part 1 of Webcast: Intro and Methodology Part 2 of Webcast: A Look at Our Case Studies Part 3 of Webcast: Questions and Answers Khaled El Emam & Luk
  3. 3. Anonymizing Health Data Part 1 of Webcast: Intro and Methodology Khaled El Emam & Luk
  4. 4. Anonymizing Health Data To Anonymize or not to Anonymize Khaled El Emam & Luk
  5. 5. Anonymizing Health Data Consent needs to be informed. To Anonymize or not to Anonymize Khaled El Emam & Luk
  6. 6. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. To Anonymize or not to Anonymize Khaled El Emam & Luk
  7. 7. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Khaled El Emam & Luk
  8. 8. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Compelling financial case. Breach cost ~$200 per patient. Khaled El Emam & Luk
  9. 9. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Compelling financial case. Breach cost ~$200 per patient. Khaled El Emam & Luk
  10. 10. Anonymizing Health Data Consent needs to be informed. Not all health care providers are willing to share their patient’s PHI. Anonymization allows for the sharing of health information. To Anonymize or not to Anonymize Privacy protective behaviors by patients. Compelling financial case. Breach cost ~$200 per patient. Khaled El Emam & Luk
  11. 11. Anonymizing Health Data Masking Standards Khaled El Emam & Luk
  12. 12. Anonymizing Health Data Masking Standards First name, last name, SSN. Khaled El Emam & Luk
  13. 13. Anonymizing Health Data Masking Standards Distortion of data—no analytics. First name, last name, SSN. Khaled El Emam & Luk
  14. 14. Anonymizing Health Data Masking Standards Creating pseudonyms. First name, last name, SSN. Distortion of data—no analytics. Khaled El Emam & Luk
  15. 15. Anonymizing Health Data Masking Standards Removing a whole field. Creating pseudonyms. First name, last name, SSN. Distortion of data—no analytics. Khaled El Emam & Luk
  16. 16. Anonymizing Health Data Masking Standards Removing a whole field. Creating pseudonyms. Replacing actual values with random ones. First name, last name, SSN. Distortion of data—no analytics. Khaled El Emam & Luk
  17. 17. Anonymizing Health Data De-identification Standards Khaled El Emam & Luk
  18. 18. Anonymizing Health Data De-identification Standards Age, sex, race, address, income. Khaled El Emam & Luk
  19. 19. Anonymizing Health Data Minimal distortion of data—for analytics. Age, sex, race, address, income. De-identification Standards Khaled El Emam & Luk
  20. 20. Anonymizing Health Data Minimal distortion of data—for analytics. Age, sex, race, address, income. De-identification Standards Safe Harbor in HIPAA Privacy Rule. Khaled El Emam & Luk
  21. 21. Anonymizing Health Data What’s “Actual Knowledge”? Privacy Rule Safe Harbor Khaled El Emam & Luk
  22. 22. Anonymizing Health Data What’s “Actual Knowledge”? Info, alone or in combo, that could identify an individual. Khaled El Emam & Luk
  23. 23. Anonymizing Health Data What’s “Actual Knowledge”? Info, alone or in combo, that could identify an individual. Has to be specific to the data set—not theoretical. Khaled El Emam & Luk
  24. 24. Anonymizing Health Data What’s “Actual Knowledge”? Info, alone or in combo, that could identify an individual. Has to be specific to the data set—not theoretical. Occupation Mayor of Gotham. Khaled El Emam & Luk
  25. 25. Anonymizing Health Data Heuristics, or rules of thumb. Minimal distortion of data—for analytics. Age, sex, race, address, income. Safe Harbor in HIPAA Privacy Rule. De-identification Standards Khaled El Emam & Luk
  26. 26. Anonymizing Health Data Heuristics, or rules of thumb. Statistical method in HIPAA Privacy Rule. Minimal distortion of data—for analytics. Age, sex, race, address, income. Safe Harbor in HIPAA Privacy Rule. De-identification Standards Khaled El Emam & Luk
  27. 27. Anonymizing Health Data De-identification Myths Khaled El Emam & Luk
  28. 28. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Khaled El Emam & Luk
  29. 29. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Using robust methods, evidence suggests risk can be very small. Khaled El Emam & Luk
  30. 30. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Myth: Genomic sequences are not identifiable, or are easy to re-identify. Using robust methods, evidence suggests risk can be very small. Khaled El Emam & Luk
  31. 31. Anonymizing Health Data De-identification Myths Myth: It’s possible to re-identify most, if not all, data. Myth: Genomic sequences are not identifiable, or are easy to re-identify. In some cases can re-identify, difficult to de- identify using our methods. Using robust methods, evidence suggests risk can be very small. Khaled El Emam & Luk
  32. 32. Anonymizing Health Data A Risk-based De-identification Methodology Khaled El Emam & Luk
  33. 33. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. Khaled El Emam & Luk
  34. 34. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. The Goldilocks principle: balancing privacy with data utility. Khaled El Emam & Luk
  35. 35. Anonymizing Health Data Khaled El Emam & Luk
  36. 36. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. The Goldilocks principle: balancing privacy with data utility. The re-identification risk needs to be very small. Khaled El Emam & Luk
  37. 37. Anonymizing Health Data A Risk-based De-identification Methodology The risk of re-identification can be quantified. The Goldilocks principle: balancing privacy with data utility. De-identification involves a mix of technical, contractual, and other measures. The re-identification risk needs to be very small. Khaled El Emam & Luk
  38. 38. Anonymizing Health Data Steps in the De-identification Methodology Step 1: Select Direct and Indirect Identifiers Step 2: Setting the Threshold Step 3: Examining Plausible Attacks Step 4: De-identifying the Data Step 5: Documenting the Process Khaled El Emam & Luk
  39. 39. Anonymizing Health Data Step 1: Select Direct and Indirect Identifiers Khaled El Emam & Luk
  40. 40. Anonymizing Health Data Direct identifiers: name, telephone number, health insurance card number, medical record number. Step 1: Select Direct and Indirect Identifiers Khaled El Emam & Luk
  41. 41. Anonymizing Health Data Direct identifiers: name, telephone number, health insurance card number, medical record number. Indirect identifiers, or quasi-identifiers: sex, date of birth, ethnicity, locations, event dates, medical codes. Step 1: Select Direct and Indirect Identifiers Khaled El Emam & Luk
  42. 42. Anonymizing Health Data Step 2: Setting the Threshold Khaled El Emam & Luk
  43. 43. Anonymizing Health Data Maximum acceptable risk for sharing data. Step 2: Setting the Threshold Khaled El Emam & Luk
  44. 44. Anonymizing Health Data Maximum acceptable risk for sharing data. Needs to be quantitative and defensible. Step 2: Setting the Threshold Khaled El Emam & Luk
  45. 45. Anonymizing Health Data Maximum acceptable risk for sharing data. Needs to be quantitative and defensible. Is the data in going to be in the public domain? Step 2: Setting the Threshold Khaled El Emam & Luk
  46. 46. Anonymizing Health Data Maximum acceptable risk for sharing data. Needs to be quantitative and defensible. Is the data in going to be in the public domain? Extent of invasion-of-privacy when data was shared? Step 2: Setting the Threshold Khaled El Emam & Luk
  47. 47. Anonymizing Health Data Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  48. 48. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  49. 49. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Recipient inadvertently re-identifies the data. “Holly Smokes, I know her!” Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  50. 50. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Recipient inadvertently re-identifies the data. Data breach at recipient’s site, “data gone wild”. Step 3: Examining Plausible Attacks Khaled El Emam & Luk
  51. 51. Anonymizing Health Data Recipient deliberately attempts to re-identify the data. Data breach at recipient’s site, “data gone wild”. Adversary launches a demonstration attack on the data. Step 3: Examining Plausible Attacks Khaled El Emam & Luk Recipient inadvertently re-identifies the data.
  52. 52. Anonymizing Health Data Step 4: De-identifying the Data Khaled El Emam & Luk
  53. 53. Anonymizing Health Data Step 4: De-identifying the Data Generalization: reducing the precision of a field. Dates converted to month/year, or year. Khaled El Emam & Luk
  54. 54. Anonymizing Health Data Step 4: De-identifying the Data Generalization: reducing the precision of a field. Suppression: replacing a cell with NULL. Unique 55-year old female in birth registry. Khaled El Emam & Luk
  55. 55. Anonymizing Health Data Step 4: De-identifying the Data Generalization: reducing the precision of a field. Suppression: replacing a cell with NULL. Sub-sampling: releasing a simple random sample. 50% of data set instead of all data. Khaled El Emam & Luk
  56. 56. Anonymizing Health Data Step 5: Documenting the Process Khaled El Emam & Luk
  57. 57. Anonymizing Health Data Step 5: Documenting the Process Process documentation—a methodology text. Khaled El Emam & Luk
  58. 58. Anonymizing Health Data Step 5: Documenting the Process Results documentation—data set, risk thresholds, assumptions, evidence of low risk. Khaled El Emam & Luk Process documentation—a methodology text.
  59. 59. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk
  60. 60. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Pr(re-id, attempt) = Pr(attempt) Pr(re-id | attempt) Khaled El Emam & Luk
  61. 61. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) Pr(re-id, acquaintance) = Pr(acquaintance) Pr(re-id | acquaintance)
  62. 62. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Pr(re-id, breach) = Pr(breach) Pr(re-id | breach)
  63. 63. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) T4: Public Data (demonstration attack) Pr(re-id), based on data set only
  64. 64. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk
  65. 65. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades.
  66. 66. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades. Recommended by regulators.
  67. 67. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades. Recommended by regulators. All based on max risk though.
  68. 68. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Many precedents going back multiple decades. Recommended by regulators. All based on max risk though.
  69. 69. Anonymizing Health Data Part 2 of Webcast: A Look at Our Case Studies Khaled El Emam & Luk
  70. 70. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk
  71. 71. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario
  72. 72. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year.
  73. 73. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year. Cross-sectional—mothers not traced over time.
  74. 74. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year. Cross-sectional—mothers not traced over time. Process of getting de-identified data from a research registry.
  75. 75. Anonymizing Health Data Cross Sectional Data: Research Registries Khaled El Emam & Luk Better Outcomes Registry & Network (BORN) of Ontario 140,000 births per year. Cross-sectional—mothers not traced over time. Process of getting de-identified data from a research registry.
  76. 76. Anonymizing Health Data Researcher Ronnie wants data! Khaled El Emam & Luk
  77. 77. Anonymizing Health Data Researcher Ronnie wants data! Khaled El Emam & Luk 919,710 records from 2005-2011
  78. 78. Anonymizing Health Data Researcher Ronnie wants data! Khaled El Emam & Luk 919,710 records from 2005-2011
  79. 79. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk
  80. 80. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk Average risk of 0.1 for Researcher Ronnie (and the data he specifically requested).
  81. 81. Anonymizing Health Data Choosing Thresholds Khaled El Emam & Luk 0.05 if there were highly sensitive variables (congenital anomalies, mental health problems). Average risk of 0.1 for Researcher Ronnie
  82. 82. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk
  83. 83. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk Low motives and capacity
  84. 84. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk Low motives and capacity; low mitigating controls.
  85. 85. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk Pr(attempt) = 0.4
  86. 86. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) 119,785 births out of a 4,478,500 women ( = 0.027)
  87. 87. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87
  88. 88. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Based on historical data.
  89. 89. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Pr(breach)=0.27
  90. 90. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) T4: Public Data (demonstration attack)
  91. 91. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) Overall risk Pr(re-id, T) = Pr(T) x Pr(re-id | T) ≤ 0.1
  92. 92. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87 Overall risk Pr(re-id, acquaintance) = 0.87 Pr(re-id | acquaintance) ≤ 0.1
  93. 93. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk
  94. 94. Anonymizing Health Data Meeting Thresholds: k-anonymity Khaled El Emam & Luk k
  95. 95. Anonymizing Health Data Meeting Thresholds: k-anonymity Khaled El Emam & Luk
  96. 96. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char.
  97. 97. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char. MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars.
  98. 98. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char. MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars. MDOB in 10-yy; BDOB in mm/yy; MPC of 3 chars.
  99. 99. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk
  100. 100. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005.
  101. 101. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005—deleted. In 2007 Researcher Ronnie asks for 2006.
  102. 102. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006—deleted. In 2008 Researcher Ronnie asks for 2007.
  103. 103. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006. In 2008 Researcher Ronnie asks for 2007—deleted. In 2009 Researcher Ronnie asks for 2008.
  104. 104. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006. In 2008 Researcher Ronnie asks for 2007. In 2009 Researcher Ronnie asks for 2008—deleted. In 2010 Researcher Ronnie asks for 2009.
  105. 105. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk In 2006 Researcher Ronnie asks for 2005. In 2007 Researcher Ronnie asks for 2006. In 2008 Researcher Ronnie asks for 2007. In 2009 Researcher Ronnie asks for 2008—deleted. In 2010 Researcher Ronnie asks for 2009. Can we use the same de-identification scheme every year?
  106. 106. Anonymizing Health Data Khaled El Emam & Luk
  107. 107. Anonymizing Health Data Khaled El Emam & Luk
  108. 108. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations.
  109. 109. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations. No dramatic changes in the number or characteristics of births from 2005-2010.
  110. 110. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations. No dramatic changes in the number or characteristics of births from 2005-2010. Revisit de-identification scheme every 18 to 24 months.
  111. 111. Anonymizing Health Data Year on Year: Re-using Risk Analyses Khaled El Emam & Luk BORN data pertains to very stable populations. No dramatic changes in the number or characteristics of births from 2005-2010. Revisit de-identification scheme every 18 to 24 months. Revisit if any new quasi-identifiers are added or changed.
  112. 112. Anonymizing Health Data Longitudinal Discharge Abstract Data: State Inpatient Databases Khaled El Emam & Luk
  113. 113. Anonymizing Health Data Longitudinal Discharge Abstract Data: State Inpatient Databases Khaled El Emam & Luk Linking a patient’s records over time.
  114. 114. Anonymizing Health Data Longitudinal Discharge Abstract Data: State Inpatient Databases Khaled El Emam & Luk Linking a patient’s records over time. Need to be de-identified differently.
  115. 115. Anonymizing Health Data Meeting Thresholds: k-anonymity? Khaled El Emam & Luk k?
  116. 116. Anonymizing Health Data Meeting Thresholds: k-anonymity? Khaled El Emam & Luk
  117. 117. Anonymizing Health Data Meeting Thresholds: k-anonymity? Khaled El Emam & Luk
  118. 118. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  119. 119. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  120. 120. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  121. 121. Anonymizing Health Data De-identifying Under Complete Knowledge Khaled El Emam & Luk
  122. 122. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk
  123. 123. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk Researcher Ronnie wants public data!
  124. 124. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk Researcher Ronnie wants public data!
  125. 125. Anonymizing Health Data State Inpatient Database (SID) of California Khaled El Emam & Luk
  126. 126. Anonymizing Health Data Measuring Risk Under Plausible Attacks Khaled El Emam & Luk
  127. 127. Anonymizing Health Data T1:Deliberate Attempt Measuring Risk Under Plausible Attacks Khaled El Emam & Luk T2: Inadvertent Attempt (“Holly Smokes, I know her!”) T3: Data Breach (“data gone wild”) T4: Public Data (demonstration attack) Pr(re-id) ≤ 0.09 (maximum risk)
  128. 128. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk
  129. 129. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk BirthYear in 5-yy (cut at 1910-); AdmissionYear unchanged; DaysSinceLastService in 28-dd (cut at 7-, 182+); LengthOfStay same as DaysSinceLastService.
  130. 130. Anonymizing Health Data De-identifying the Data Set Khaled El Emam & Luk BirthYear in 5-yy (cut at 1910-); AdmissionYear unchanged; DaysSinceLastService in 28-dd (cut at 7-, 182+); LengthOfStay same as DaysSinceLastService.
  131. 131. Anonymizing Health Data Connected Variables Khaled El Emam & Luk
  132. 132. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI
  133. 133. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI Similar QI? Same generalization and suppression.
  134. 134. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI Similar QI? Same generalization and suppression. QI to non-QI
  135. 135. Anonymizing Health Data Connected Variables Khaled El Emam & Luk QI to QI Similar QI? Same generalization and suppression. QI to non-QI Non-QI is revealing? Same suppression so both are removed.
  136. 136. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk
  137. 137. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk Date shifting—maintaining order of records.
  138. 138. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk Date shifting—maintaining order of records. Long tails—truncation of records.
  139. 139. Anonymizing Health Data Other Issues Regarding Longitudinal Data Khaled El Emam & Luk Date shifting—maintaining order of records. Long tails—truncation of records. Adversary power—assumption of knowledge.
  140. 140. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk
  141. 141. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization.
  142. 142. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization. Geospatial information—aggregation and geoproxy risk.
  143. 143. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization. Geospatial information—aggregation and geoproxy risk. Medical codes—generalization, suppression, shuffling (yes, as in cards).
  144. 144. Anonymizing Health Data Other Concerns to Think About Khaled El Emam & Luk Free-form text—anonymization. Geospatial information—aggregation and geoproxy risk. Medical codes—generalization, suppression, shuffling (yes, as in cards). Secure linking—linking data through encryption before anonymization.
  145. 145. Anonymizing Health Data Part 3 of Webcast: Questions and Answers Khaled El Emam & Luk
  146. 146. Anonymizing Health Data Khaled El Emam & Luk More Comments or Questions: Contact us!
  147. 147. Anonymizing Health Data Khaled El Emam & Luk Khaled El Emam: kelemam@privacyanalytics.ca Luk Arbuckle: larbuckle@privacyanalytics.ca More Comments or Questions: Contact us!

×