Weitere ähnliche Inhalte Ähnlich wie Real world data engineering practices for GDPR (20) Kürzlich hochgeladen (20) Real world data engineering practices for GDPR2. © 2019 Trend Micro Inc.2
⚠️ Disclaimer
• Please view this sharing as a reference
– Detailed implementation varies with different business
requirements
– Maybe not suitable for every company
– MUST reach a consensus with legal department before
implementing your data pipeline
3. © 2019 Trend Micro Inc.3
What is GDPR?
General Data Protection Regulation
Effective on 2018/5/25
Protect Personal Data of EU citizens
Strengthen Privacy Rights of EU Individuals
4. © 2019 Trend Micro Inc.4
Key Changes
Increased Territorial Scope
• All businesses collecting personal data on EU citizens
• Regardless of the company’s location
Breach Notification
• Report it within 72 hours
Penalties
• 20M € or 4% of global turnover
• Google was fined 50M € on 2019/1/21
5. © 2019 Trend Micro Inc.5
Highlighted Individual’s Rights
Right to
Access
Right to
Erasure
Data
Portability
Privacy by
Design
7. © 2019 Trend Micro Inc.7
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
8. © 2019 Trend Micro Inc.8
Data Collection Declaration
• Clearly declare the purposes in Terms of Use
– What data will be sent?
• List all the categories
– Reasons for collecting data
• Is it essential for service?
– A clear consent
• Check box for opt-in or opt-out
9. © 2019 Trend Micro Inc.9
Data Categorization
• Definition of personal data
– Personally Identifying Information (PII)
– Non-PII, PII and Sensitive-PII
• PII: name, account ID, email address, date of birth, gender, etc.
• Sensitive-PII: Health data, sexual orientation, Race, etc.
– Collecting Sensitive-PII data is basically prohibited
10. © 2019 Trend Micro Inc.10
It’s All About Compliance
• The definition MUST be established by Legal
Department
• Review process in development cycle
– Clear description for the data being collected
• Provided by product team
– Legal review, approve and archive it
– Clearer document, better communication
11. © 2019 Trend Micro Inc.11
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
12. © 2019 Trend Micro Inc.12
Separated Databases
• De-identification in analytical data
– Have a clear separation between user and analytical data
• No one can access both
– User data (user’s behavior and personal information)
• Purchase records, login records, etc.
– Analytical data (neutral logs)
• Detection logs, activity data, etc.
13. © 2019 Trend Micro Inc.13
Anonymization
• GDPR suggests to have a unified anonymous ID
across all the systems
– Stop using e-mail or other user’s personal information as
the unique ID
– Avoid storing personal information in each
service/application
• Use foreign key or other similar concepts
14. © 2019 Trend Micro Inc.14
• How to de-identify an identifiable field?
– Irreversible encoding
– Simplest way: one-way hash
• With or without salt?
• Refresh salt or not?
– Ways to avoid re-counting (e.g., DAU and MAU)
• Synchronize the salt between client and server
• Use one-way hash (or with fixed salt)
• Change the definition of “active”
Anonymization (cont’d)
15. © 2019 Trend Micro Inc.15
Anonymization (cont’d)
• Where to de-identify a field?
– Ideally at the client-side (before the data sends out)
– At least at the very beginning step of server-side ETL
process
• The mapping table of identifiable
data is viewed as User data
• The operation MUST be isolated
16. © 2019 Trend Micro Inc.16
Permission Control
• ACL on bucket
– Few users/service accounts can read
– Even fewer service accounts can write
• User cannot have write permission
– Principle of analytical data permission control
17. © 2019 Trend Micro Inc.17
Limited Data Retention
• Data shouldn’t be kept for “just in case” purpose
• Periodically remove outdated data
– The retention period is set according to…
• Business value (application’s need)
• Data volume (cost)
• Other legal issues
18. © 2019 Trend Micro Inc.18
Data Encryption
• All the data should be encrypted in storage and in
transmission
– Bucket-level encryption
– SSL connection
– Audit logs
19. © 2019 Trend Micro Inc.19
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
20. © 2019 Trend Micro Inc.20
Rights to Access and Erasure
• If the user and analytical database are separated
– Just dump/delete the related records in user database
• Otherwise
– It’s a big project…
21. © 2019 Trend Micro Inc.21
The Design of User Database
• Dump/Delete user database is challenging
– Try not to put historical data in user database (if you can)
– Try to concentrate personal data on few tables
– Use foreign key or similar concept for storing “key
information”
• Just modify the record in main table as “removed”
– Consider the data exportation and deletion processes at
design-phase
• Minimize the number of actions to take
22. © 2019 Trend Micro Inc.22
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
23. © 2019 Trend Micro Inc.23
Data Abuse Prevention
• Fulfill marketing’s requirements
– When you have to associate user and analytical data
• To send promotion e-mail to the inactive users
• Let active users have discount while purchase new edition
– Do the association at the last step
24. © 2019 Trend Micro Inc.24
Role & Responsibility
• There MUST be a Data Protection Officer (DPO) in
each company
– Organize a taskforce to review the out-coming inquiries
– Audit data usage
• Audit log parser for monitoring data accessing
– Monitor data breach
26. © 2019 Trend Micro Inc.26
Summary
• Recommended practices for engineers
– Good communication with Legal
• Documentation
– Separate user data and analytical data
• De-identify all analytical data
• Permission control
• Data retention period
28. Automated hybrid cloud workload protection via calls to
Trend Micro APIs. Created with real data by Trend Micro
threat researcher and artist Jindrich Karasek.
29. © 2019 Trend Micro Inc.29
Reference
[1] https://eugdpr.org/
[2] https://gdpr-info.eu
[3] https://blog.infodiagram.com/2018/05/present-
gdpr-diagram-data-privacy-ppt-template.html