SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Automating Data Pipeline Security
1
2
4
3
5 6
Carta’s Data Team is Hiring 🎉
Automating Data Pipeline Security
Automating Data Pipeline Security
Privacy
3 Big Ideas
1. Privacy has a strange history.
2. Privacy-first systems are designed by people with a professional ethic.
3. Privacy can be automated away.
Automating security in your data pipeline
privacy
1. Strange History of Privacy
16

“The actio iniuriarum was, in Roman law,
a delict which served to protect the
non-patrimonial aspects of a person's
existence – who a person is rather than
what a person has.”
©1979 "The Invention of the Right to Privacy" by Dorothy J. Glancy
2. Privacy-first Ethic
Software is eating
the world.
“Audit defensibility is too low a
bar when it comes to our
customer’s privacy.”
GDPR
EU General Data Protection Regulation
● Right of access
● Pseudonymisation
● Right of erasure
● Records of processing activities
● Privacy by design
CCPA
California Consumer Privacy Act
● Know what personal information is being
collected
● Right to erasure
● Know whether their personal information is
being shared, and if so, with whom
● Opt-out of the sale of their personal
information
Privacy Regulation
3. Automate Privacy
“The security posture of your
weakest vendor is the security
posture of your entire
organization.”
Blank Slide
● Airflow DAGs to move data into S3
and Redshift
● DAG: Directed Acyclic Graph
● Operator/Task: A node in the graph
● Airflow runs dbt
Workflow manager from Airbnb
Apache Airflow
Apache Airflow
● Open source boilerplate for running Airflow
in Docker
● Used at Carta
Dockerized Airflow
How do we keep up with the sensitive
columns being added in source data?
Automating the blacklist updates
Stale Blacklist
● dbt tests fail when the result set is
not empty.
● The records returned by dbt test
are the offending records.
Automated data tests
dbt test
● dbt tests fail when the result set is
not empty.
● The records returned by dbt test
are the offending records.
Automated data tests
dbt test
We have a custom access management
system called Gatekeeper.
Tools for requesting and granting access
Automating Access
This example uses our IAM Service
Account custom Terraform module to
create a new Revenue Service account
user with access to a single S3 data lake
bucket.
Automate Data Lake access
Terraform Modules
Data Warehouse Migrations
● sql-migrate: Excellent cli and
migrations library written in Go.
● Extended to support Jinja
templating.
We can rebuild the Warehouse from code.
Pseudonymity
Disguised identity or “false name”
©2019 Alex Ewerlöf "GDPR pseudonymization techniques"
Pseudonymity: Obfuscation
👍 Easy to do in any language.
👍 No impact to downstream systems.
👎 Can be unscrambled.
Scrambling or mixing up data
Pseudonymity: Masking
👍 Simple.
👍 Owner can verify the last 4 digits.
👎 Some pieces of the real data are stored.
Obscure part of the data
Pseudonymity: Tokenization
👍 Popular libraries like Faker.
👍 All original data is replaced.
👎 No way to recover the original data.
Replace real data with fake data
Pseudonymity: Blurring
👍 95% of this image is left unblurred.
👎 Possible to reverse blurring.
Blur a subset of the data
Pseudonymity: Encryption
👍 The original data can be recovered.
👍 Manage fewer permissions downstream.
👎 Asymmetric vs Symmetric trade-offs.
Two-way transformation of the data
AWS Key Management Service
● Generate a new data key for encrypting and
decrypting data protected by a master key.
● Or manually rotate the master key and
re-encrypt the data.
Automate key creation and rotation
Encrypted Columns
● pgcrypto allows us to encrypt sensitive
columns before the data lands in our S3
data lake.
● This example is encrypting the birth_date
column in Postgres.
Postgres pgcrypto
“Last Mile” Decryption
● Access to encrypted columns is limited to
analysts with the encryption key.
● This example is decrypting the birth_date
column in Redshift.
Decrypt sensitive data at query time
Encrypted Column Problems
Some things to consider...
1. Symmetric or Asymmetric encryption scheme?
2. Should we manually rotate our master key?
3. How many keys should we use and how should they be organized?
4. Should our analysts and data scientists need to think about keys?
5. When and how do we re-encrypt data? When an employee with
access to keys leaves the company?
3 Big Ideas
1. Privacy has a strange history.
2. Privacy-first systems are designed by people with a professional ethic.
3. Privacy can be automated away.
Automating security in your data pipeline
privacy
carta.com/jobs
@troyharvey
troy.harvey@carta.com
OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey

Weitere ähnliche Inhalte

Was ist angesagt?

C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam HeywoodC* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam HeywoodDataStax Academy
 
CryptocurrencyProject
CryptocurrencyProjectCryptocurrencyProject
CryptocurrencyProjectTim Tosi
 
BigchainDB - Big Data meets Blockchain
BigchainDB - Big Data meets BlockchainBigchainDB - Big Data meets Blockchain
BigchainDB - Big Data meets BlockchainDimitri De Jonghe
 
Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas Elasticsearch
 
Blockchain big data groningen meetup 2017-03-23
Blockchain   big data groningen meetup 2017-03-23Blockchain   big data groningen meetup 2017-03-23
Blockchain big data groningen meetup 2017-03-23Lykle de Vries
 
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017BigchainDB
 
Ethereum explorer
Ethereum explorerEthereum explorer
Ethereum explorerOliviaJune1
 
Blockchain – The future of Internet by Moinur Rahman
Blockchain – The future of Internet by Moinur RahmanBlockchain – The future of Internet by Moinur Rahman
Blockchain – The future of Internet by Moinur RahmanMyNOG
 
Records keeper product deck
Records keeper   product deckRecords keeper   product deck
Records keeper product deckRecords Keeper
 
Crowdsourcing Speech Data Science and AI
Crowdsourcing Speech Data Science and AI Crowdsourcing Speech Data Science and AI
Crowdsourcing Speech Data Science and AI Crowdsourcing Week
 
Demystifying messaging communication patterns
Demystifying messaging communication patterns Demystifying messaging communication patterns
Demystifying messaging communication patterns Radu Vunvulea
 
Trent McConaghy- BigchainDB
Trent McConaghy- BigchainDBTrent McConaghy- BigchainDB
Trent McConaghy- BigchainDBPyData
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesJesus Rodriguez
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraPyData
 

Was ist angesagt? (15)

C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam HeywoodC* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
 
CryptocurrencyProject
CryptocurrencyProjectCryptocurrencyProject
CryptocurrencyProject
 
BigchainDB - Big Data meets Blockchain
BigchainDB - Big Data meets BlockchainBigchainDB - Big Data meets Blockchain
BigchainDB - Big Data meets Blockchain
 
Blockchain
BlockchainBlockchain
Blockchain
 
Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas
 
Blockchain big data groningen meetup 2017-03-23
Blockchain   big data groningen meetup 2017-03-23Blockchain   big data groningen meetup 2017-03-23
Blockchain big data groningen meetup 2017-03-23
 
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 
Ethereum explorer
Ethereum explorerEthereum explorer
Ethereum explorer
 
Blockchain – The future of Internet by Moinur Rahman
Blockchain – The future of Internet by Moinur RahmanBlockchain – The future of Internet by Moinur Rahman
Blockchain – The future of Internet by Moinur Rahman
 
Records keeper product deck
Records keeper   product deckRecords keeper   product deck
Records keeper product deck
 
Crowdsourcing Speech Data Science and AI
Crowdsourcing Speech Data Science and AI Crowdsourcing Speech Data Science and AI
Crowdsourcing Speech Data Science and AI
 
Demystifying messaging communication patterns
Demystifying messaging communication patterns Demystifying messaging communication patterns
Demystifying messaging communication patterns
 
Trent McConaghy- BigchainDB
Trent McConaghy- BigchainDBTrent McConaghy- BigchainDB
Trent McConaghy- BigchainDB
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- Frontera
 

Ähnlich wie OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey

CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge SystemsCrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge SystemsSteve Phillips
 
key-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storagekey-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storageswathi78
 
XP Days 2019: First secret delivery for modern cloud-native applications
XP Days 2019: First secret delivery for modern cloud-native applicationsXP Days 2019: First secret delivery for modern cloud-native applications
XP Days 2019: First secret delivery for modern cloud-native applicationsVlad Fedosov
 
Cloud Security and some preferred practices
Cloud Security and some preferred practicesCloud Security and some preferred practices
Cloud Security and some preferred practicesMichael Pearce
 
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...David Timothy Strauss
 
Webinar: How to Design Primary Storage for GDPR
Webinar: How to Design Primary Storage for GDPRWebinar: How to Design Primary Storage for GDPR
Webinar: How to Design Primary Storage for GDPRStorage Switzerland
 
Protecting Sensitive Data using Encryption and Key Management
Protecting Sensitive Data using Encryption and Key ManagementProtecting Sensitive Data using Encryption and Key Management
Protecting Sensitive Data using Encryption and Key ManagementStuart Marsh
 
Rugged DevOps at Scale with Rich Mogull
Rugged DevOps at Scale with Rich MogullRugged DevOps at Scale with Rich Mogull
Rugged DevOps at Scale with Rich MogullSeniorStoryteller
 
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Denodo
 
Securing data at rest with encryption
Securing data at rest with encryptionSecuring data at rest with encryption
Securing data at rest with encryptionRuban Deventhiran
 
key-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storagekey-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storageswathi78
 
Building A Cloud Security Strategy for Scale
Building A Cloud Security Strategy for ScaleBuilding A Cloud Security Strategy for Scale
Building A Cloud Security Strategy for ScaleChris Farris
 
Solve Big Data Security Issues
Solve Big Data Security IssuesSolve Big Data Security Issues
Solve Big Data Security IssuesEditor IJCATR
 
Automation Patterns for Scalable Secret Management
Automation Patterns for Scalable Secret ManagementAutomation Patterns for Scalable Secret Management
Automation Patterns for Scalable Secret ManagementMary Racter
 
Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control	Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control DBmaestro - Database DevOps
 

Ähnlich wie OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey (20)

CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge SystemsCrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
 
key-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storagekey-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storage
 
XP Days 2019: First secret delivery for modern cloud-native applications
XP Days 2019: First secret delivery for modern cloud-native applicationsXP Days 2019: First secret delivery for modern cloud-native applications
XP Days 2019: First secret delivery for modern cloud-native applications
 
Cloud Security and some preferred practices
Cloud Security and some preferred practicesCloud Security and some preferred practices
Cloud Security and some preferred practices
 
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
 
P2 Project
P2 ProjectP2 Project
P2 Project
 
Security pre
Security preSecurity pre
Security pre
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Webinar: How to Design Primary Storage for GDPR
Webinar: How to Design Primary Storage for GDPRWebinar: How to Design Primary Storage for GDPR
Webinar: How to Design Primary Storage for GDPR
 
cryptography
cryptographycryptography
cryptography
 
Protecting Sensitive Data using Encryption and Key Management
Protecting Sensitive Data using Encryption and Key ManagementProtecting Sensitive Data using Encryption and Key Management
Protecting Sensitive Data using Encryption and Key Management
 
Rugged DevOps at Scale with Rich Mogull
Rugged DevOps at Scale with Rich MogullRugged DevOps at Scale with Rich Mogull
Rugged DevOps at Scale with Rich Mogull
 
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
 
Securing data at rest with encryption
Securing data at rest with encryptionSecuring data at rest with encryption
Securing data at rest with encryption
 
key-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storagekey-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storage
 
Building A Cloud Security Strategy for Scale
Building A Cloud Security Strategy for ScaleBuilding A Cloud Security Strategy for Scale
Building A Cloud Security Strategy for Scale
 
Solve Big Data Security Issues
Solve Big Data Security IssuesSolve Big Data Security Issues
Solve Big Data Security Issues
 
Automation Patterns for Scalable Secret Management
Automation Patterns for Scalable Secret ManagementAutomation Patterns for Scalable Secret Management
Automation Patterns for Scalable Secret Management
 
Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control	Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control
 
Privacy and Neutrality v0.1.0
Privacy and Neutrality v0.1.0Privacy and Neutrality v0.1.0
Privacy and Neutrality v0.1.0
 

Kürzlich hochgeladen

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Revolutionize Your Video Editing with InVideo.io: A Comprehensive Review
Revolutionize Your Video Editing with InVideo.io: A Comprehensive ReviewRevolutionize Your Video Editing with InVideo.io: A Comprehensive Review
Revolutionize Your Video Editing with InVideo.io: A Comprehensive Reviewjw364beach
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsDEEPRAJ PATHAK
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxRemote DBA Services
 
full course of software engineering mid term.pdf
full course of software engineering mid term.pdffull course of software engineering mid term.pdf
full course of software engineering mid term.pdfAbdul salam
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 

Kürzlich hochgeladen (20)

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Revolutionize Your Video Editing with InVideo.io: A Comprehensive Review
Revolutionize Your Video Editing with InVideo.io: A Comprehensive ReviewRevolutionize Your Video Editing with InVideo.io: A Comprehensive Review
Revolutionize Your Video Editing with InVideo.io: A Comprehensive Review
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software Projects
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptx
 
full course of software engineering mid term.pdf
full course of software engineering mid term.pdffull course of software engineering mid term.pdf
full course of software engineering mid term.pdf
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 

OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey

  • 3.
  • 4.
  • 5. Carta’s Data Team is Hiring 🎉
  • 7. Automating Data Pipeline Security Privacy
  • 8. 3 Big Ideas 1. Privacy has a strange history. 2. Privacy-first systems are designed by people with a professional ethic. 3. Privacy can be automated away. Automating security in your data pipeline privacy
  • 9. 1. Strange History of Privacy
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. 16  “The actio iniuriarum was, in Roman law, a delict which served to protect the non-patrimonial aspects of a person's existence – who a person is rather than what a person has.”
  • 17.
  • 18.
  • 19.
  • 20. ©1979 "The Invention of the Right to Privacy" by Dorothy J. Glancy
  • 21.
  • 24. “Audit defensibility is too low a bar when it comes to our customer’s privacy.”
  • 25. GDPR EU General Data Protection Regulation ● Right of access ● Pseudonymisation ● Right of erasure ● Records of processing activities ● Privacy by design CCPA California Consumer Privacy Act ● Know what personal information is being collected ● Right to erasure ● Know whether their personal information is being shared, and if so, with whom ● Opt-out of the sale of their personal information Privacy Regulation
  • 27.
  • 28. “The security posture of your weakest vendor is the security posture of your entire organization.”
  • 29.
  • 31.
  • 32. ● Airflow DAGs to move data into S3 and Redshift ● DAG: Directed Acyclic Graph ● Operator/Task: A node in the graph ● Airflow runs dbt Workflow manager from Airbnb Apache Airflow
  • 33. Apache Airflow ● Open source boilerplate for running Airflow in Docker ● Used at Carta Dockerized Airflow
  • 34.
  • 35. How do we keep up with the sensitive columns being added in source data? Automating the blacklist updates Stale Blacklist
  • 36. ● dbt tests fail when the result set is not empty. ● The records returned by dbt test are the offending records. Automated data tests dbt test
  • 37. ● dbt tests fail when the result set is not empty. ● The records returned by dbt test are the offending records. Automated data tests dbt test
  • 38. We have a custom access management system called Gatekeeper. Tools for requesting and granting access Automating Access
  • 39. This example uses our IAM Service Account custom Terraform module to create a new Revenue Service account user with access to a single S3 data lake bucket. Automate Data Lake access Terraform Modules
  • 40. Data Warehouse Migrations ● sql-migrate: Excellent cli and migrations library written in Go. ● Extended to support Jinja templating. We can rebuild the Warehouse from code.
  • 41. Pseudonymity Disguised identity or “false name” ©2019 Alex Ewerlöf "GDPR pseudonymization techniques"
  • 42. Pseudonymity: Obfuscation 👍 Easy to do in any language. 👍 No impact to downstream systems. 👎 Can be unscrambled. Scrambling or mixing up data
  • 43. Pseudonymity: Masking 👍 Simple. 👍 Owner can verify the last 4 digits. 👎 Some pieces of the real data are stored. Obscure part of the data
  • 44. Pseudonymity: Tokenization 👍 Popular libraries like Faker. 👍 All original data is replaced. 👎 No way to recover the original data. Replace real data with fake data
  • 45. Pseudonymity: Blurring 👍 95% of this image is left unblurred. 👎 Possible to reverse blurring. Blur a subset of the data
  • 46. Pseudonymity: Encryption 👍 The original data can be recovered. 👍 Manage fewer permissions downstream. 👎 Asymmetric vs Symmetric trade-offs. Two-way transformation of the data
  • 47. AWS Key Management Service ● Generate a new data key for encrypting and decrypting data protected by a master key. ● Or manually rotate the master key and re-encrypt the data. Automate key creation and rotation
  • 48. Encrypted Columns ● pgcrypto allows us to encrypt sensitive columns before the data lands in our S3 data lake. ● This example is encrypting the birth_date column in Postgres. Postgres pgcrypto
  • 49. “Last Mile” Decryption ● Access to encrypted columns is limited to analysts with the encryption key. ● This example is decrypting the birth_date column in Redshift. Decrypt sensitive data at query time
  • 50. Encrypted Column Problems Some things to consider... 1. Symmetric or Asymmetric encryption scheme? 2. Should we manually rotate our master key? 3. How many keys should we use and how should they be organized? 4. Should our analysts and data scientists need to think about keys? 5. When and how do we re-encrypt data? When an employee with access to keys leaves the company?
  • 51. 3 Big Ideas 1. Privacy has a strange history. 2. Privacy-first systems are designed by people with a professional ethic. 3. Privacy can be automated away. Automating security in your data pipeline privacy