SlideShare a Scribd company logo
1 of 19
How-to implement a GDPR
solution in a Cloudera
architecture
Cloudera + StreamSets Data Collector
Introduction
Since the implementation of GDPR regulation, all data processors across the world have
been struggling to be GDPR compliant and also deal with the new reality in Big Data, that
data is constantly drifting and mutating.
In this presentation, the approach will be:
● Cloudera architecture
● No additional financial cost
● Masking & Encrypting
Architecture
This architecture enables the following:
● Discover Data - ETL
● Data Governance - Policies
● Protect - Anonymization
Pre-Assumptions
1. Cluster hostname: cm1.localdomain
2. StreamSets Data Collector Linux user: sdc Kerberos Principal: sdc/cm1.localdomain/@DOMAIN.COM
3. Cluster Cloudera Manager: Cloudera Manager 5.12.2
4. StreamSets Data Collector: Installed
5. Cluster Authentication Pre-Installed: Kerberos
a. Kerberos Realm DOMAIN.COM
6. StreamSets Data Collector version: 3.5.0
7. StreamSets Data Collector Authentication: Kerberos
Base of Knowledge
With GDPR regulation it’s difficult to find the right balance
between protecting data and utilizing it, but with the solution
on this presentation you will not only do the ETL on your data
lake as you will also protect all the sensitive data.
Knowing this, you must know your data, the purpose of it, as
also the architecture of your System Environment and
therefore choose one or more ways to handle it.
JDBC Connections
The JDBC connections on StreamSets Origins require additional drivers.
For Hive and Impala JDBC Origin connection you need the Cloudera drivers (HiveJDBC41.jar, ImpalaJDBC41.jar), as for a Oracle connection the
Ojdbc8.jar.
Base of Knowledge
JDBC Conn Security URL
Hive
Metadata
None jdbc:hive2://server:10000/db_name
Hive
Metadata
Kerberos
(static)
jdbc:hive2://server:10000/db_name;principal=hive/server@REALM.COM
Hive
(Origin)
Kerberos
(user)
jdbc:impala://server:10000;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=hive
Impala
(Origin)
None jdbc:impala://server:21050;authMech=0
Impala
(Origin)
Kerberos jdbc:impala://server:21050;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=impala
Oracle
(Origin)
Static jdbc:oracle:thin:@server:listerner_port:db
Use Case
Ingest and Mask Data from Local File System to Hive Metadata.
Data Description
Sample Data
Synopsis
● Dataset title.basics.tsv from IMDB dataset https://www.imdb.com/interfaces/
● Read Tabular Data File from Local File System,
● Mask the field primaryTitle with a regular expression
● Create a Hive External Table and Write to HDFS
Read data from File System
Mask Field
Hive Metadata
Hadoop File System
Hive Metastore
Pipeline Finisher
Data Result
Sample Data
Synopsis
● Regular expression: (.*)(.{4})
● Irreversible
Use Case
Ingest and Encrypt Data from Local File System to Hive Metadata.
Encrypt Field
Data Result
Sample Data
Synopsis
● Regular expression: (.*)(.{4})
● Reversible
Thanks
Big Data Engineer
Tiago Simões

More Related Content

What's hot

Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configurationSudheer Kondla
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingCloudera, Inc.
 
IT Infrastructure Through The Public Network Challenges And Solutions
IT Infrastructure Through The Public Network   Challenges And SolutionsIT Infrastructure Through The Public Network   Challenges And Solutions
IT Infrastructure Through The Public Network Challenges And SolutionsMartin Jackson
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)YoungHeon (Roy) Kim
 
Mysql Fun
Mysql FunMysql Fun
Mysql FunSHC
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKYoungHeon (Roy) Kim
 
Connect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceConnect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceVCP Muthukrishna
 
How Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksHow Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksMatthew Farina
 
Squid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress GuideSquid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress GuideAbhishek Kumar
 
Multinode kubernetes-cluster
Multinode kubernetes-clusterMultinode kubernetes-cluster
Multinode kubernetes-clusterRam Nath
 
Guide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormationGuide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormationRob Linton
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsAhmed Mekawy
 
How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7VCP Muthukrishna
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackSean Dague
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Actionjuvenxu
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax
 

What's hot (20)

Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configuration
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
 
IT Infrastructure Through The Public Network Challenges And Solutions
IT Infrastructure Through The Public Network   Challenges And SolutionsIT Infrastructure Through The Public Network   Challenges And Solutions
IT Infrastructure Through The Public Network Challenges And Solutions
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
 
Build Automation 101
Build Automation 101Build Automation 101
Build Automation 101
 
Mysql Fun
Mysql FunMysql Fun
Mysql Fun
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
 
Connect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceConnect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux Instance
 
How Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksHow Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, Works
 
Squid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress GuideSquid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
Squid for Load-Balancing & Cache-Proxy ~ A techXpress Guide
 
Multinode kubernetes-cluster
Multinode kubernetes-clusterMultinode kubernetes-cluster
Multinode kubernetes-cluster
 
Guide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormationGuide - Migrating from Heroku to AWS using CloudFormation
Guide - Migrating from Heroku to AWS using CloudFormation
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
 
How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7How To Install and Configure AWS CLI on RHEL 7
How To Install and Configure AWS CLI on RHEL 7
 
OpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with DevstackOpenStack in 10 minutes with Devstack
OpenStack in 10 minutes with Devstack
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Action
 
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
DataStax | Advanced DSE Analytics Client Configuration (Jacek Lewandowski) | ...
 

Similar to How to implement a gdpr solution in a cloudera architecture

Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
Data Access Technologies
Data Access TechnologiesData Access Technologies
Data Access TechnologiesDimara Hakim
 
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...NomanKhalid56
 
Oracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureOracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureSinanPetrusToma
 
Introduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard BrokerIntroduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard BrokerZohar Elkayam
 
Oracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast SlidesOracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast SlidesLudovico Caldara
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021Frederic Descamps
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfseo18
 
Database Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David IzahkDatabase Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David Izahksqlserver.co.il
 
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...Michael Noel
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.xFabio Codebue
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax
 
From Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security EnhancementsFrom Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security EnhancementsAna-Maria Mihalceanu
 
ADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and SynchingADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and SynchingSteven Davelaar
 
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...MongoDB
 

Similar to How to implement a gdpr solution in a cloudera architecture (20)

Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
Data Access Technologies
Data Access TechnologiesData Access Technologies
Data Access Technologies
 
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
5675212318661411677_TRN4034_How_to_Migrate_to_Oracle_Autonomous_Database_Clou...
 
Oracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureOracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud Infrastructure
 
JDBC.ppt
JDBC.pptJDBC.ppt
JDBC.ppt
 
Introduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard BrokerIntroduction to Oracle Data Guard Broker
Introduction to Oracle Data Guard Broker
 
Oracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast SlidesOracle Fleet Patching and Provisioning Deep Dive Webcast Slides
Oracle Fleet Patching and Provisioning Deep Dive Webcast Slides
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
 
Database Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David IzahkDatabase Mirror for the exceptional DBA – David Izahk
Database Mirror for the exceptional DBA – David Izahk
 
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
TechEd Africa 2011 - OFC308: SharePoint Security in an Insecure World: Unders...
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.x
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
 
From Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security EnhancementsFrom Java 17 to 21- A Showcase of JDK Security Enhancements
From Java 17 to 21- A Showcase of JDK Security Enhancements
 
Oracle Database 12c : Multitenant
Oracle Database 12c : MultitenantOracle Database 12c : Multitenant
Oracle Database 12c : Multitenant
 
ADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and SynchingADF Mobile: Implementing Data Caching and Synching
ADF Mobile: Implementing Data Caching and Synching
 
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
Securing MongoDB to Serve an AWS-Based, Multi-Tenant, Security-Fanatic SaaS A...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

How to implement a gdpr solution in a cloudera architecture

  • 1. How-to implement a GDPR solution in a Cloudera architecture Cloudera + StreamSets Data Collector
  • 2. Introduction Since the implementation of GDPR regulation, all data processors across the world have been struggling to be GDPR compliant and also deal with the new reality in Big Data, that data is constantly drifting and mutating. In this presentation, the approach will be: ● Cloudera architecture ● No additional financial cost ● Masking & Encrypting
  • 3. Architecture This architecture enables the following: ● Discover Data - ETL ● Data Governance - Policies ● Protect - Anonymization
  • 4. Pre-Assumptions 1. Cluster hostname: cm1.localdomain 2. StreamSets Data Collector Linux user: sdc Kerberos Principal: sdc/cm1.localdomain/@DOMAIN.COM 3. Cluster Cloudera Manager: Cloudera Manager 5.12.2 4. StreamSets Data Collector: Installed 5. Cluster Authentication Pre-Installed: Kerberos a. Kerberos Realm DOMAIN.COM 6. StreamSets Data Collector version: 3.5.0 7. StreamSets Data Collector Authentication: Kerberos
  • 5. Base of Knowledge With GDPR regulation it’s difficult to find the right balance between protecting data and utilizing it, but with the solution on this presentation you will not only do the ETL on your data lake as you will also protect all the sensitive data. Knowing this, you must know your data, the purpose of it, as also the architecture of your System Environment and therefore choose one or more ways to handle it.
  • 6. JDBC Connections The JDBC connections on StreamSets Origins require additional drivers. For Hive and Impala JDBC Origin connection you need the Cloudera drivers (HiveJDBC41.jar, ImpalaJDBC41.jar), as for a Oracle connection the Ojdbc8.jar. Base of Knowledge JDBC Conn Security URL Hive Metadata None jdbc:hive2://server:10000/db_name Hive Metadata Kerberos (static) jdbc:hive2://server:10000/db_name;principal=hive/server@REALM.COM Hive (Origin) Kerberos (user) jdbc:impala://server:10000;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=hive Impala (Origin) None jdbc:impala://server:21050;authMech=0 Impala (Origin) Kerberos jdbc:impala://server:21050;AuthMech=1;KrbRealm=REALM.COM;KrbHostFQDN=server.example.com;KrbServiceName=impala Oracle (Origin) Static jdbc:oracle:thin:@server:listerner_port:db
  • 7. Use Case Ingest and Mask Data from Local File System to Hive Metadata.
  • 8. Data Description Sample Data Synopsis ● Dataset title.basics.tsv from IMDB dataset https://www.imdb.com/interfaces/ ● Read Tabular Data File from Local File System, ● Mask the field primaryTitle with a regular expression ● Create a Hive External Table and Write to HDFS
  • 9. Read data from File System
  • 15. Data Result Sample Data Synopsis ● Regular expression: (.*)(.{4}) ● Irreversible
  • 16. Use Case Ingest and Encrypt Data from Local File System to Hive Metadata.
  • 18. Data Result Sample Data Synopsis ● Regular expression: (.*)(.{4}) ● Reversible