Jade has developed a Snowflake data masking solution to help businesses enhance data governance in test environments. The solution automates the process of taking data from source systems, conducting privacy impact assessments, deploying masking policies, and loading masked data into test environments. It uses Python and a metadata table stored in Snowflake. The solution masks data for three use cases: loading masked data to test environments, protecting personally identifiable information from unauthorized users, and providing masked data to third parties. It applies masking at both the individual table and batch levels for entire databases.
2. 2
Introduction
With the increasing need to comply with regulations
and standards such as GDPR, HIPAA, and PCI DSS, it has
become crucial for organizations to protect sensitive
data. In software development, data governance
becomes a challenge in non-production environments.
These non-production environments, such as testing,
QA, and staging, are crucial for software development,
but in these environments, data is accessed by several
stakeholders and poses a risk of security and
non-compliance.
3. 3
What is Data Masking?
Data masking is the process of replacing
sensitive data with fictitious data or
scrambling values that preserve the
characteristics of the original data while
protecting its confidentiality.
The goal of data masking is to prevent
unauthorized access to sensitive data in
non-production environments.
4. There are several reasons why data masking is necessary in
non-production environments.
Compliance
Organizations are required to comply with various regulations and
standards that mandate the protection of sensitive data. Data masking
helps organizations comply with these regulations and standards by
preventing unauthorized access to sensitive data in non-production
environments.
Security
Non-production environments are more vulnerable to security breaches
and attacks, which can compromise sensitive data. Data masking helps
protect sensitive data by replacing it with fictitious data that cannot be
used for malicious purposes.
4
Why is Data Masking Important in
Non-Production Environments?
5. Jade has developed a Snowflake data masking solution that
enables businesses to enhance their data governance in test
environments within Snowflake while ensuring compliance
with various regulations and standards that require the
safeguarding of sensitive data.
Jade's Snowflake data masking solution offers an automated
process that takes data from the source system, conducts PIA
discovery, performs a lookup, deploys masking policies, and
loads the data. By automating the data masking process,
businesses can improve security, comply with regulations, save
costs, and focus on real development needs.
5
Data Masking Solution for Snowflake Test
Environments Built by Jade
6. The architecture of Jade’s Snowflake data masking solution is
based on Python and Snowflake. It uses a metadata table to
store information for dynamic data masking. The solution
covers three use cases:
Use-case 1: Loading masked data to non-production
environments
Use-case 2: Protecting PII from unauthorized users
Use-case 3: Protecting PII data while providing it to
third-party with a Snowflake reader account.
6
Jade's Data Masking Solution - The Architecture
7. The first step involves bringing in data from the source and putting it
in the production Snowflake staging area.
A PII discovery phase can be conducted to identify probable PII
fields, but it's not mandatory.
Once PIIs are identified, and the metadata table is updated, a
Python script takes inputs such as application name, table name,
and masking type (i.e., use case 1,2 or 3)
Taking those three inputs, the script does a lookup to the metadata
table, finds masking policies to be applied on each PII of the table
provided in the input, and applies those policies.
The data is then loaded to the mask environment or production
environment based on the use case.
7
The architecture functions in the followingway
8. Jade's data masking solution applies to both individual tables and batch mode to take care of your
entire database schema.
There is a wrapper program already built that can take input for all tables and perform the lookup
from the metadata table to do the masking for all tables. This may take some time, as it will go in
batches, but it is possible.
Additionally, if you want to schedule it, you can do so once a week, daily, or monthly using your
existing scheduler or as a Windows service. It can also be automated with your refresh policies so that
whenever you have a scheduled or manual refresh of your non-production environments from
production, these scripts can be deployed to mask the data automatically.
8
Scalable Data Masking Solution for Snowflake:
Individual Tables and Batch Mode
9. The first step in the data masking process is the PII exploration
phase. This involves running a predictive algorithm on the table
data to determine the probability of each field being PII and, if so,
what category it falls under. The results are stored on a table, with
a JSON data file generated for each application table.
To view the data in a tabular format, a query can be run which
displays the probability of each field being PII. Fields with a
probability of 100% are considered PII, while those with a value of
0% are non-PII. Fields with a probability greater than 0% are
assigned a semantic category, such as address or date of birth.
9
Technical Overview: Data Masking Process for
Protecting PII in Three Use Cases
10. Python program takes table name, application name,
and masking type as input
Looks up metadata table, applies the on-the-fly masking rule, and
loads to the final tableto masked environment
Automatically selects masking policy
Masked data loaded into a masked environment
The masked data table reflects masked data, except for the ID
field (for comparison purposes)
10
Use-case 1: Loading masked data to
non-production environments
11. One instance of the Production table data remains in storage. No additional table is maintained for masked
data. Real data is accessible/visible to authorized users only. For unauthorized users, data appears as
MASKED values of the PII Fields. Only authorized users see the actual PII Value.
Python code with specific parameters deploys masking policies on PII fields in the table
Masking policies are deployed on fields like name, date of birth, etc.,
to protect sensitive information
Customization is available to hide certain portions of data for specific users
Data protection achieved for test users accessing the testing table
11
Use Case 2: Protecting PII from unauthorized
access
12. A reader account with permission to read the production table but no right to modify data is created
Require providing test data to a credit unions regulation like TransUnion or Equifax
Python code is used with specific parameters to deploy masking policies on PII fields
in the production table
Data protection is achieved for the reader account without exposing sensitive PII
No need for code changes to update masking policies
Metadata table used to update masking policy information and automate the process
The implemented solution can be easily scaled to cover additional tables and applications
12
Use Case 3: Providing data to third-party with
a Snowflake reader account
13. To sum up, Jade's automated data masking solution helps
businesses maximize Snowflake data governance and stay
compliant with various regulations and standards that mandate
the protection of sensitive data. The solution offers several
benefits, including improved security, compliance with
regulations, cost-effectiveness, and realistic testing.
The architecture of Jade's data masking solution is based on
Python and Snowflake and covers three use cases. Jade's data
masking solution is scalable and applies to both individual tables
and batch modes.
13
Endnote
14. 14
Headquarters
1731 Technology Drive, Suite 350
San Jose, CA 95110, USA
Phone
+1-408-899-7200
Email
info@jadeglobal.com
Website
www.jadeglobal.com
USA I CANADA I UK I AUSTRIA I INDIA
www.jadeglobal.com
THANK YOU
To Know More About our Services, Visit:
https://www.jadeglobal.com/snowflake
15. 15
Headquarters
1731 Technology Drive, Suite 350
San Jose, CA 95110, USA
Phone
+1-408-899-7200
Email
info@jadeglobal.com
Website
www.jadeglobal.com
USA I CANADA I UK I AUSTRIA I INDIA
www.jadeglobal.com
Read Blog
https://www.jadeglobal.com/blog/how-maximize-data-
governance-snowflake-test-environment