SlideShare a Scribd company logo
1 of 15
1
How to Maximize Data Governance in
Snowflake Test Environment
2
Introduction
With the increasing need to comply with regulations
and standards such as GDPR, HIPAA, and PCI DSS, it has
become crucial for organizations to protect sensitive
data. In software development, data governance
becomes a challenge in non-production environments.
These non-production environments, such as testing,
QA, and staging, are crucial for software development,
but in these environments, data is accessed by several
stakeholders and poses a risk of security and
non-compliance.
3
What is Data Masking?
Data masking is the process of replacing
sensitive data with fictitious data or
scrambling values that preserve the
characteristics of the original data while
protecting its confidentiality.
The goal of data masking is to prevent
unauthorized access to sensitive data in
non-production environments.
There are several reasons why data masking is necessary in
non-production environments.
Compliance
Organizations are required to comply with various regulations and
standards that mandate the protection of sensitive data. Data masking
helps organizations comply with these regulations and standards by
preventing unauthorized access to sensitive data in non-production
environments.
Security
Non-production environments are more vulnerable to security breaches
and attacks, which can compromise sensitive data. Data masking helps
protect sensitive data by replacing it with fictitious data that cannot be
used for malicious purposes.
4
Why is Data Masking Important in
Non-Production Environments?
Jade has developed a Snowflake data masking solution that
enables businesses to enhance their data governance in test
environments within Snowflake while ensuring compliance
with various regulations and standards that require the
safeguarding of sensitive data.
Jade's Snowflake data masking solution offers an automated
process that takes data from the source system, conducts PIA
discovery, performs a lookup, deploys masking policies, and
loads the data. By automating the data masking process,
businesses can improve security, comply with regulations, save
costs, and focus on real development needs.
5
Data Masking Solution for Snowflake Test
Environments Built by Jade
The architecture of Jade’s Snowflake data masking solution is
based on Python and Snowflake. It uses a metadata table to
store information for dynamic data masking. The solution
covers three use cases:
Use-case 1: Loading masked data to non-production
environments
Use-case 2: Protecting PII from unauthorized users
Use-case 3: Protecting PII data while providing it to
third-party with a Snowflake reader account.
6
Jade's Data Masking Solution - The Architecture
The first step involves bringing in data from the source and putting it
in the production Snowflake staging area.
A PII discovery phase can be conducted to identify probable PII
fields, but it's not mandatory.
Once PIIs are identified, and the metadata table is updated, a
Python script takes inputs such as application name, table name,
and masking type (i.e., use case 1,2 or 3)
Taking those three inputs, the script does a lookup to the metadata
table, finds masking policies to be applied on each PII of the table
provided in the input, and applies those policies.
The data is then loaded to the mask environment or production
environment based on the use case.
7
The architecture functions in the followingway
Jade's data masking solution applies to both individual tables and batch mode to take care of your
entire database schema.
There is a wrapper program already built that can take input for all tables and perform the lookup
from the metadata table to do the masking for all tables. This may take some time, as it will go in
batches, but it is possible.
Additionally, if you want to schedule it, you can do so once a week, daily, or monthly using your
existing scheduler or as a Windows service. It can also be automated with your refresh policies so that
whenever you have a scheduled or manual refresh of your non-production environments from
production, these scripts can be deployed to mask the data automatically.
8
Scalable Data Masking Solution for Snowflake:
Individual Tables and Batch Mode
The first step in the data masking process is the PII exploration
phase. This involves running a predictive algorithm on the table
data to determine the probability of each field being PII and, if so,
what category it falls under. The results are stored on a table, with
a JSON data file generated for each application table.
To view the data in a tabular format, a query can be run which
displays the probability of each field being PII. Fields with a
probability of 100% are considered PII, while those with a value of
0% are non-PII. Fields with a probability greater than 0% are
assigned a semantic category, such as address or date of birth.
9
Technical Overview: Data Masking Process for
Protecting PII in Three Use Cases
Python program takes table name, application name,
and masking type as input
Looks up metadata table, applies the on-the-fly masking rule, and
loads to the final tableto masked environment
Automatically selects masking policy
Masked data loaded into a masked environment
The masked data table reflects masked data, except for the ID
field (for comparison purposes)
10
Use-case 1: Loading masked data to
non-production environments
One instance of the Production table data remains in storage. No additional table is maintained for masked
data. Real data is accessible/visible to authorized users only. For unauthorized users, data appears as
MASKED values of the PII Fields. Only authorized users see the actual PII Value.
Python code with specific parameters deploys masking policies on PII fields in the table
Masking policies are deployed on fields like name, date of birth, etc.,
to protect sensitive information
Customization is available to hide certain portions of data for specific users
Data protection achieved for test users accessing the testing table
11
Use Case 2: Protecting PII from unauthorized
access
A reader account with permission to read the production table but no right to modify data is created
Require providing test data to a credit unions regulation like TransUnion or Equifax
Python code is used with specific parameters to deploy masking policies on PII fields
in the production table
Data protection is achieved for the reader account without exposing sensitive PII
No need for code changes to update masking policies
Metadata table used to update masking policy information and automate the process
The implemented solution can be easily scaled to cover additional tables and applications
12
Use Case 3: Providing data to third-party with
a Snowflake reader account
To sum up, Jade's automated data masking solution helps
businesses maximize Snowflake data governance and stay
compliant with various regulations and standards that mandate
the protection of sensitive data. The solution offers several
benefits, including improved security, compliance with
regulations, cost-effectiveness, and realistic testing.
The architecture of Jade's data masking solution is based on
Python and Snowflake and covers three use cases. Jade's data
masking solution is scalable and applies to both individual tables
and batch modes.
13
Endnote
14
Headquarters
1731 Technology Drive, Suite 350
San Jose, CA 95110, USA
Phone
+1-408-899-7200
Email
info@jadeglobal.com
Website
www.jadeglobal.com
USA I CANADA I UK I AUSTRIA I INDIA
www.jadeglobal.com
THANK YOU
To Know More About our Services, Visit:
https://www.jadeglobal.com/snowflake
15
Headquarters
1731 Technology Drive, Suite 350
San Jose, CA 95110, USA
Phone
+1-408-899-7200
Email
info@jadeglobal.com
Website
www.jadeglobal.com
USA I CANADA I UK I AUSTRIA I INDIA
www.jadeglobal.com
Read Blog
https://www.jadeglobal.com/blog/how-maximize-data-
governance-snowflake-test-environment

More Related Content

Similar to How to Maximize Data Governance in Snowflake Test Environment

Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
Ulf Mattsson
 

Similar to How to Maximize Data Governance in Snowflake Test Environment (20)

Protecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environmentsProtecting your data against cyber attacks in big data environments
Protecting your data against cyber attacks in big data environments
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protection
 
Opteamix_whitepaper_Data Masking Strategy.pdf
Opteamix_whitepaper_Data Masking Strategy.pdfOpteamix_whitepaper_Data Masking Strategy.pdf
Opteamix_whitepaper_Data Masking Strategy.pdf
 
IRJET- Providing Privacy in Healthcare Cloud for Medical Data using Fog Compu...
IRJET- Providing Privacy in Healthcare Cloud for Medical Data using Fog Compu...IRJET- Providing Privacy in Healthcare Cloud for Medical Data using Fog Compu...
IRJET- Providing Privacy in Healthcare Cloud for Medical Data using Fog Compu...
 
Ijcatr04051015
Ijcatr04051015Ijcatr04051015
Ijcatr04051015
 
IRJET- Detecting Data Leakage and Implementing Security Measures in Cloud Com...
IRJET- Detecting Data Leakage and Implementing Security Measures in Cloud Com...IRJET- Detecting Data Leakage and Implementing Security Measures in Cloud Com...
IRJET- Detecting Data Leakage and Implementing Security Measures in Cloud Com...
 
ISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty ComputationISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
 
IRJET- Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASC
IRJET-  	  Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASCIRJET-  	  Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASC
IRJET- Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASC
 
IRJET- Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASC
IRJET- Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASCIRJET- Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASC
IRJET- Secure Data Sharing Scheme for Mobile Cloud Computing using SEDASC
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
 
Isaca journal - bridging the gap between access and security in big data...
Isaca journal  - bridging the gap between access and security in big data...Isaca journal  - bridging the gap between access and security in big data...
Isaca journal - bridging the gap between access and security in big data...
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
 
A proposed Solution: Data Availability and Error Correction in Cloud Computing
A proposed Solution: Data Availability and Error Correction in Cloud ComputingA proposed Solution: Data Availability and Error Correction in Cloud Computing
A proposed Solution: Data Availability and Error Correction in Cloud Computing
 
Expanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesExpanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challenges
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challenges
 
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
Simplifying Data Governance and Security with a Logical Data Fabric (ASEAN)
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 
A review on privacy preservation in data mining
A review on privacy preservation in data miningA review on privacy preservation in data mining
A review on privacy preservation in data mining
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 

More from Jade Global

More from Jade Global (20)

How Life Sciences Can Meet the Track and Trace Supply Chain Challenge.pptx
How Life Sciences Can Meet the Track and Trace Supply Chain Challenge.pptxHow Life Sciences Can Meet the Track and Trace Supply Chain Challenge.pptx
How Life Sciences Can Meet the Track and Trace Supply Chain Challenge.pptx
 
The Future of API Management: Trends and Innovations
The Future of API Management: Trends and InnovationsThe Future of API Management: Trends and Innovations
The Future of API Management: Trends and Innovations
 
Why Boomi iPaaS is the Smart Choice for Your Integration Needs
Why Boomi iPaaS is the Smart Choice for Your Integration NeedsWhy Boomi iPaaS is the Smart Choice for Your Integration Needs
Why Boomi iPaaS is the Smart Choice for Your Integration Needs
 
Rightsizing the Time and Cost of GxP
Rightsizing the Time and Cost of GxPRightsizing the Time and Cost of GxP
Rightsizing the Time and Cost of GxP
 
Top 5 Emerging Trends in Data Integration
Top 5 Emerging Trends in Data IntegrationTop 5 Emerging Trends in Data Integration
Top 5 Emerging Trends in Data Integration
 
Unlock the Power of Supply Chain Analytics
Unlock the Power of Supply Chain AnalyticsUnlock the Power of Supply Chain Analytics
Unlock the Power of Supply Chain Analytics
 
Seamless SAP and Salesforce Integration Tips, Techniques, and Best Practices
Seamless SAP and Salesforce Integration Tips, Techniques, and Best PracticesSeamless SAP and Salesforce Integration Tips, Techniques, and Best Practices
Seamless SAP and Salesforce Integration Tips, Techniques, and Best Practices
 
How to Leverage SAP To Meet Withholding Tax Challenges-completed
How to Leverage SAP To Meet Withholding Tax Challenges-completedHow to Leverage SAP To Meet Withholding Tax Challenges-completed
How to Leverage SAP To Meet Withholding Tax Challenges-completed
 
Nine Tips for a Successful SAP Concur Implementation
Nine Tips for a Successful SAP Concur ImplementationNine Tips for a Successful SAP Concur Implementation
Nine Tips for a Successful SAP Concur Implementation
 
4 Ways Automation-Driven SAP AMS Can Help You-completed
4 Ways Automation-Driven SAP AMS Can Help You-completed4 Ways Automation-Driven SAP AMS Can Help You-completed
4 Ways Automation-Driven SAP AMS Can Help You-completed
 
Why Businesses Must Adopt NetSuite ERP Data Migration
Why Businesses Must Adopt NetSuite ERP Data MigrationWhy Businesses Must Adopt NetSuite ERP Data Migration
Why Businesses Must Adopt NetSuite ERP Data Migration
 
EDI Integration Process Overview & Benefits for Multiple Channel Partners
EDI Integration Process Overview & Benefits for Multiple Channel PartnersEDI Integration Process Overview & Benefits for Multiple Channel Partners
EDI Integration Process Overview & Benefits for Multiple Channel Partners
 
Benefits of Upgrading Oracle E-Business Suite to Latest Release 12.2.11
Benefits of Upgrading Oracle E-Business Suite to Latest Release 12.2.11Benefits of Upgrading Oracle E-Business Suite to Latest Release 12.2.11
Benefits of Upgrading Oracle E-Business Suite to Latest Release 12.2.11
 
Why Should Businesses Partner with a NetSuite ERP Provider
Why Should Businesses Partner with a NetSuite ERP ProviderWhy Should Businesses Partner with a NetSuite ERP Provider
Why Should Businesses Partner with a NetSuite ERP Provider
 
NetSuite Email Campaigns Best Practices for Delivering Results
NetSuite Email Campaigns Best Practices for Delivering ResultsNetSuite Email Campaigns Best Practices for Delivering Results
NetSuite Email Campaigns Best Practices for Delivering Results
 
Oracle Integration Cloud – Pragmatic approach to integrations
Oracle Integration Cloud – Pragmatic approach to integrationsOracle Integration Cloud – Pragmatic approach to integrations
Oracle Integration Cloud – Pragmatic approach to integrations
 
P2P Cycle in Oracle Cloud Fusion
P2P Cycle in Oracle Cloud FusionP2P Cycle in Oracle Cloud Fusion
P2P Cycle in Oracle Cloud Fusion
 
Oracle Revenue Management Cloud Service (RMCS)
Oracle Revenue Management Cloud Service (RMCS)Oracle Revenue Management Cloud Service (RMCS)
Oracle Revenue Management Cloud Service (RMCS)
 
NetSuite Ship Central 
NetSuite Ship Central NetSuite Ship Central 
NetSuite Ship Central 
 
NetSuite Accounts Payable (AP) Automation.pptx
NetSuite Accounts Payable (AP) Automation.pptxNetSuite Accounts Payable (AP) Automation.pptx
NetSuite Accounts Payable (AP) Automation.pptx
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

How to Maximize Data Governance in Snowflake Test Environment

  • 1. 1 How to Maximize Data Governance in Snowflake Test Environment
  • 2. 2 Introduction With the increasing need to comply with regulations and standards such as GDPR, HIPAA, and PCI DSS, it has become crucial for organizations to protect sensitive data. In software development, data governance becomes a challenge in non-production environments. These non-production environments, such as testing, QA, and staging, are crucial for software development, but in these environments, data is accessed by several stakeholders and poses a risk of security and non-compliance.
  • 3. 3 What is Data Masking? Data masking is the process of replacing sensitive data with fictitious data or scrambling values that preserve the characteristics of the original data while protecting its confidentiality. The goal of data masking is to prevent unauthorized access to sensitive data in non-production environments.
  • 4. There are several reasons why data masking is necessary in non-production environments. Compliance Organizations are required to comply with various regulations and standards that mandate the protection of sensitive data. Data masking helps organizations comply with these regulations and standards by preventing unauthorized access to sensitive data in non-production environments. Security Non-production environments are more vulnerable to security breaches and attacks, which can compromise sensitive data. Data masking helps protect sensitive data by replacing it with fictitious data that cannot be used for malicious purposes. 4 Why is Data Masking Important in Non-Production Environments?
  • 5. Jade has developed a Snowflake data masking solution that enables businesses to enhance their data governance in test environments within Snowflake while ensuring compliance with various regulations and standards that require the safeguarding of sensitive data. Jade's Snowflake data masking solution offers an automated process that takes data from the source system, conducts PIA discovery, performs a lookup, deploys masking policies, and loads the data. By automating the data masking process, businesses can improve security, comply with regulations, save costs, and focus on real development needs. 5 Data Masking Solution for Snowflake Test Environments Built by Jade
  • 6. The architecture of Jade’s Snowflake data masking solution is based on Python and Snowflake. It uses a metadata table to store information for dynamic data masking. The solution covers three use cases: Use-case 1: Loading masked data to non-production environments Use-case 2: Protecting PII from unauthorized users Use-case 3: Protecting PII data while providing it to third-party with a Snowflake reader account. 6 Jade's Data Masking Solution - The Architecture
  • 7. The first step involves bringing in data from the source and putting it in the production Snowflake staging area. A PII discovery phase can be conducted to identify probable PII fields, but it's not mandatory. Once PIIs are identified, and the metadata table is updated, a Python script takes inputs such as application name, table name, and masking type (i.e., use case 1,2 or 3) Taking those three inputs, the script does a lookup to the metadata table, finds masking policies to be applied on each PII of the table provided in the input, and applies those policies. The data is then loaded to the mask environment or production environment based on the use case. 7 The architecture functions in the followingway
  • 8. Jade's data masking solution applies to both individual tables and batch mode to take care of your entire database schema. There is a wrapper program already built that can take input for all tables and perform the lookup from the metadata table to do the masking for all tables. This may take some time, as it will go in batches, but it is possible. Additionally, if you want to schedule it, you can do so once a week, daily, or monthly using your existing scheduler or as a Windows service. It can also be automated with your refresh policies so that whenever you have a scheduled or manual refresh of your non-production environments from production, these scripts can be deployed to mask the data automatically. 8 Scalable Data Masking Solution for Snowflake: Individual Tables and Batch Mode
  • 9. The first step in the data masking process is the PII exploration phase. This involves running a predictive algorithm on the table data to determine the probability of each field being PII and, if so, what category it falls under. The results are stored on a table, with a JSON data file generated for each application table. To view the data in a tabular format, a query can be run which displays the probability of each field being PII. Fields with a probability of 100% are considered PII, while those with a value of 0% are non-PII. Fields with a probability greater than 0% are assigned a semantic category, such as address or date of birth. 9 Technical Overview: Data Masking Process for Protecting PII in Three Use Cases
  • 10. Python program takes table name, application name, and masking type as input Looks up metadata table, applies the on-the-fly masking rule, and loads to the final tableto masked environment Automatically selects masking policy Masked data loaded into a masked environment The masked data table reflects masked data, except for the ID field (for comparison purposes) 10 Use-case 1: Loading masked data to non-production environments
  • 11. One instance of the Production table data remains in storage. No additional table is maintained for masked data. Real data is accessible/visible to authorized users only. For unauthorized users, data appears as MASKED values of the PII Fields. Only authorized users see the actual PII Value. Python code with specific parameters deploys masking policies on PII fields in the table Masking policies are deployed on fields like name, date of birth, etc., to protect sensitive information Customization is available to hide certain portions of data for specific users Data protection achieved for test users accessing the testing table 11 Use Case 2: Protecting PII from unauthorized access
  • 12. A reader account with permission to read the production table but no right to modify data is created Require providing test data to a credit unions regulation like TransUnion or Equifax Python code is used with specific parameters to deploy masking policies on PII fields in the production table Data protection is achieved for the reader account without exposing sensitive PII No need for code changes to update masking policies Metadata table used to update masking policy information and automate the process The implemented solution can be easily scaled to cover additional tables and applications 12 Use Case 3: Providing data to third-party with a Snowflake reader account
  • 13. To sum up, Jade's automated data masking solution helps businesses maximize Snowflake data governance and stay compliant with various regulations and standards that mandate the protection of sensitive data. The solution offers several benefits, including improved security, compliance with regulations, cost-effectiveness, and realistic testing. The architecture of Jade's data masking solution is based on Python and Snowflake and covers three use cases. Jade's data masking solution is scalable and applies to both individual tables and batch modes. 13 Endnote
  • 14. 14 Headquarters 1731 Technology Drive, Suite 350 San Jose, CA 95110, USA Phone +1-408-899-7200 Email info@jadeglobal.com Website www.jadeglobal.com USA I CANADA I UK I AUSTRIA I INDIA www.jadeglobal.com THANK YOU To Know More About our Services, Visit: https://www.jadeglobal.com/snowflake
  • 15. 15 Headquarters 1731 Technology Drive, Suite 350 San Jose, CA 95110, USA Phone +1-408-899-7200 Email info@jadeglobal.com Website www.jadeglobal.com USA I CANADA I UK I AUSTRIA I INDIA www.jadeglobal.com Read Blog https://www.jadeglobal.com/blog/how-maximize-data- governance-snowflake-test-environment