SlideShare ist ein Scribd-Unternehmen logo
1 von 45
AWS Data Governance
Demystified
[Network, Security,Privacy &
Data Access Management]
AWS Big Data Demystified #4
Omid Vahdaty, Big Data Ninja
Agenda
● Disclaimer
○ I am not a network and security expert
○ This is not a security and network lecture
● Agenda….
○ All the possible consideration for my data?
(access,regulation, availability etc) == Data
Governance
○ What are the architecture implications?
● My Advice:
○ Stick to high level overview
○ Remember the topic not the details.
○ ASK me questions during the lecture!
TODAY’S BIG DATA
APPLICATION STACK
PaaS and DC...
Big Data Generic Architecture | Summary
Data Collection
S3
Data Transformation
Data Modeling
Data Visualization
Before I forget: for new AWS accounts….
● Disable unused regions via IAM
● Set the the limit for the instances your are using , e.g 50 instances
● Set the limit for the instances your are not using to 0!
● Remove access key secret key for root account
● Don't use root account
● Use MFA
Data Governance…
Data Security Level
Data governance is the capability that
enables an organization to ensure that
high data quality exists throughout the
complete lifecycle of the data. The key
focus areas of data governance include
availability, usability, consistency,
data integrity and data security and
includes establishing processes to
ensure effective data management
throughout the enterprise such as
accountability for the adverse effects of
poor data quality and ensuring that the
data which an enterprise has can be
used by the entire organization.
Big Data Security: Nobody likes it… But…
AWS Regulation and more...
● https://aws.amazon.com/compliance/programs/
● PII
● GDPR
AWS Security options in general [ sorry, not covering everything… ]
● IAM : Identity management
○ IAM : Identity management
■ User, policy, roles,group, least privileges, MFA
○ Key Management Service
■ Server Side
■ Client side
○ Disable Data centers, unused instance families,
○ Limit resources
○ Account segregation
○ Identity based policies!
● Cloud trail: enables governance, compliance, auditing, and risk auditing
● S3: Resource management (e.g s3)
○ Write only, read only, no delete
○ Versioning, encryption,life cycle policy
AWS Network options in general
● VPC
○ Create a Non Default VPC
○ Private network + Bastion host + ALB
○ Public subnet vs private subnet
○ VPC Endpoints
○ NACL
○ SG
○ IG
○ VPC peering
○ Site 2 Site - it is per VPC…. choose carefully
● Direct Connect VS HTTPS
● Cloudfront with GEOlocation protection
● https://amazon-aws-big-data-demystified.ninja/2018/06/27/aws-enterprise-
grade-networking-security-what-are-your-options-to-protect-your-bigdata/
AWS Direct Connect
VPC
peering
S3
Endpoint
Example
VPN
Example
How to Define VPC
with private and
public subnet
http://docs.aws.amazon.com/
AmazonVPC/latest/UserGuide
/VPC_Scenario2.html
VPC private
subnet + Virtual
Private Gateway
● https://amazon-aws-big-data-
demystified.ninja/2018/06/27/
aws-enterprise-grade-
networking-security-what-are-
your-options-to-protect-your-
bigdata/
● http://docs.aws.amazon.com/
AmazonVPC/latest/UserGuide
/VPC_VPN.html
● https://docs.openvpn.net/how-
to-
tutorialsguides/administration/
extending-vpn-connectivity-to-
amazon-aws-vpc-using-aws-
Data Governance…
Storage Level
Data governance is the capability that
enables an organization to ensure that
high data quality exists throughout the
complete lifecycle of the data. The key
focus areas of data governance include
availability, usability, consistency,
data integrity and data security and
includes establishing processes to
ensure effective data management
throughout the enterprise such as
accountability for the adverse effects of
poor data quality and ensuring that the
data which an enterprise has can be
used by the entire organization.
S3 coolest options on AWS
● All of your data is replicated on multi AZ!
● Life Cycle policy
● Versioning
● Cross region replication
● Undeletable bucket - No delete policy
● MFA on delete Action - with time interval of 1 min or per action.
● Deny unencrypted data write.
● Analytics
General Security Concepts
| Good to know!
● protecting data while
○ in-transit (as it travels to and from Amazon S3) , 2 ways:
■ by using SSL
○ at rest (while it is stored on disks in Amazon S3 data centers) 2 ways:
■ Server Side encryption. (SSE)
■ client-side encryption.
○ In use:
■ Hasing…. (dictionary attack?)
■ Hashing with key
■ Any Encryption
Blog: S3 security options in detail
https://amazon-aws-big-data-demystified.ninja/2018/06/27/aws-s3-security-
introduction-and-access-management/
● Detailed Encryption options in AWS
● Resource based policy VS Identity based policy
Server Side Encryption (SSE) summary
● Server-Side Encryption with Customer-Provided Keys (SSE-C)
■ You manage the encryption keys and Amazon S3 manages the encryption, as it writes to
disks, and decryption, when you access your objects
● S3-Managed Keys (SSE-S3)
● AWS KMS-Managed Keys (SSE-KMS)
Additional aws s3 Safeguard
1. VPN (site to site)
2. IP ACL
3. Identity Based policy (who? Me? S3 read only?)
4. Resourced based policy : e.g deny delete requests/encrypted objects
5. https://amazon-aws-big-data-demystified.ninja/2018/06/27/aws-s3-security-
introduction-and-access-management/
Basic S3 Security Diagram
Destination
AWS S3
Source
Datacenter
(secured)
Data in Transit
Client side encryption
HTTPS / SSL
VPN
Data at Rest
(Server/Client side encryption)
Identity based policy:only
myUser, access only to
s3,write only
Resource based policy:
denyUnencrypted,Deny
Delete,Deny Policy Change
Accept only from DC IP
Data Governance…
Data Availability Level
Data governance is the capability that
enables an organization to ensure that
high data quality exists throughout the
complete lifecycle of the data. The key
focus areas of data governance include
availability, usability, consistency,
data integrity and data security and
includes establishing processes to
ensure effective data management
throughout the enterprise such as
accountability for the adverse effects of
poor data quality and ensuring that the
data which an enterprise has can be
used by the entire organization.
Cross Region Replication
● https://docs.aws.amaz
on.com/AmazonS3/lat
est/dev/crr.html
● Versioning must be
enabled
● SSL security at transit
Cross account bucket policy (resource based)
http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs-
managing-access-example2.html
Cross Region & Cross account replication
https://docs.aws.amazon.com/AmazonS3/latest/dev/crr-walkthrough-2.html
Account A Account B
Use Case used by Walla for DR purposes
● Cross account & Cross region replication
○ Copy all backups & and mission critical data
○ Copy all Cloud Formation templates
○ Copy all Code
Data Governance…
Big Data Security
Architecture Level
(At rest, In motion, In use)
Data governance is the capability that
enables an organization to ensure that
high data quality exists throughout the
complete lifecycle of the data. The key
focus areas of data governance include
availability, usability, consistency,
data integrity and data security and
includes establishing processes to
ensure effective data management
throughout the enterprise such as
accountability for the adverse effects of
poor data quality and ensuring that the
data which an enterprise has can be
used by the entire organization.
Architecture Security Consideration
● Basicly for each component
○ At rest ,At motion,In use
● Understand different Encryption options per component
○ Prefer SSE
○ Rotate your KMS key X periods
● Restrict access via -
○ Identity based policy: Role based security per cluster/technology
○ Resource based policy
○ least privileges, avoid admin users as much as possible.
○ Manage your EC2 keys wisely
● Understand differences of
○ Direct Connect VS public internet VS VPN and their impact on your application
○ Security groups / NACL / IP Based protection
○ private subnet / public subnet on your app
● For web Consider Cloudfront for GEOlocation protection
EMR Security
● EMR specific:
○ Security configurations
○ Kerberos
● All other webs (Zeppelin, Hue, Oozie)
○ SSL
○ user/password prefer LDAP when possible
○ Shiro use role based access.
● EMR Role → restrict access to s3 buckets
● Identity level - least privileges
● application level (e.g.hive: user level, table level, row level, columns level,
kerberos etc)
Data governance…
Privacy in a nutshell: PII
Personal Identifiable Information
Data governance is the capability that
enables an organization to ensure that
high data quality exists throughout the
complete lifecycle of the data. The key
focus areas of data governance include
availability, usability, consistency,
data integrity and data security and
includes establishing processes to
ensure effective data management
throughout the enterprise such as
accountability for the adverse effects of
poor data quality and ensuring that the
data which an enterprise has can be
used by the entire organization.
PII Privacy challenges in a nutshell
● Problem
○ Assume The attacker has Admin permissions
○ According to Lawyers - no such things a anonymous DB →
■ Your anonymous DB
■ Public 3rd party personal identifiable information DB
■ Joined together… implies your DB is not anonymous.
● Solution
○ Obfuscate & Aggregate your data wherever possible to avoid a scenario where an
anonymous user can be reversed engineered based on the data (location history, habits +
timestamp etc)
○ Keep only RAW data you need.
Data governance…
Usability,
& Fine Grained Access Control
Data governance is the capability that
enables an organization to ensure that
high data quality exists throughout the
complete lifecycle of the data. The key
focus areas of data governance include
availability, usability, consistency,
data integrity and data security and
includes establishing processes to
ensure effective data management
throughout the enterprise such as
accountability for the adverse effects of
poor data quality and ensuring that the
data which an enterprise has can be
used by the entire organization.
s3
Example AWS Architecture in one account
(nothing special)
Region EU-WEST-1
Region US-EAST-1
VPC Prod1 VPC stg
VPC Prod2 VPC stg
peering S3 endpoint
Account Segregations Motivation
● Fine grained access control.
● Fine grained billing control.
● Limit your Blast radius -
○ what happens if your account is
hacked?
○ What happens if one account out
of 10 of your accounts is
hacked?
Simple access control in one VPC
Data engineering
EMR Cluster (transformation) ,
RW access to all buckets except modeling
Data scientist
EMR Cluster (modeling) , no access to RAW data,
Read only access to transformation, and RW to modeling
S3 raw buckets
S3
transformation
buckets
S3 Modeling
buckets
Data source -
write only
Read Only
Full Control
write only
Simple Account segregation for Business units
example
[common use case]
Users
account1
Login only
Business unit 1 account1
Business unit 2 account3
DR account4
Assume
role
Admin
Bi team
PreSale
Pre sale and BI may have access to both business units , or just to one business unit,
Admin has access to everything as usual
Account segregations per GEOlocation
[data must not leave country use case]
Users account
Login only
US-east-1 - N.virgina
EU-west-2 - london
DR account
Assume
role
Admin
London
NewYork
Fine grained access control via Account segregations
[my use case]
Users
account
Login
only
In bound Data Transformation Account
per data source
(RAW data, cleansing, encryption)
Data Modeling and obfuscation on
transformed data (big data)
DR account
Assume
role
DevOps
Data
Science
Data
Engineer
Prod
Outbound Data Transformation
Account per data source
(aggregated data, obfuscated data, re-
encryption)
Fine Grained Data Governance Flow via Account segregations
Visualize
/ Prod
account
Inbound Data transformation per
account pere data source if
needed.
Modeling & obfuscation account
Outbound Data transformation
account
Raw data1
account1
READ / WRITE Encrypted PII
DATA+user
Hashed data DATA
(hashed user id, hashed data, non PII data)
Obfuscated data, Data modeling,
hased users, non pii
Raw data2
account2
Big Data Generic Architecture | one account
Data Collection
S3
Data Transformation
Data Modeling
Data Visualization
Big Data Generic Architecture | acc. seg.
Data ingestion & RAW Data
Data Transformation
Data Modeling & obfuscation
Data Visualization
Summary and Take messages
● Design you network from the ground up, always keep your big data in mind
○ Data access, latency, Data encryption in motion, Performance impact
○ Avoid public IP’s when possible. If possible Use Direct Connect / VPN
● Design your data access layers carefully per your organization needs.
○ RAW data + encryption @rest & @motion
○ Keep DR on data level in mind. [seperate accounts + separate region!]
○ Transformation + hashing level [inbound/outbound]
○ Modeling + obfuscation level
● Architecture impact
○ Security [at rest, in motion, in use] for each component
○ Identity based policy
○ Resource based policy
○ Account segregations, to avoid blast radius
Stay in touch...
● Omid Vahdaty
● +972-54-2384178
● https://amazon-aws-big-data-demystified.ninja/
● Join our meetup, FB group and youtube channel
○ https://www.meetup.com/AWS-Big-Data-Demystified/
○ https://www.facebook.com/groups/amazon.aws.big.data.demystified/
○ https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber

Weitere ähnliche Inhalte

Was ist angesagt?

Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsDataStax Academy
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraDataStax
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesScyllaDB
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsCassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsAnirvan Chakraborty
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidCharles Allen
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
 
Improving Organizational Knowledge with Natural Language Processing Enriched ...
Improving Organizational Knowledge with Natural Language Processing Enriched ...Improving Organizational Knowledge with Natural Language Processing Enriched ...
Improving Organizational Knowledge with Natural Language Processing Enriched ...DataWorks Summit
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
HBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgentHBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgentHBaseCon
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupJelena Zanko
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoScyllaDB
 

Was ist angesagt? (20)

Signal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide RowsSignal Digital: The Skinny on Wide Rows
Signal Digital: The Skinny on Wide Rows
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Cassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analyticsCassandra as event sourced journal for big data analytics
Cassandra as event sourced journal for big data analytics
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & Druid
 
How to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your NeedsHow to Build a Scylla Database Cluster that Fits Your Needs
How to Build a Scylla Database Cluster that Fits Your Needs
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Improving Organizational Knowledge with Natural Language Processing Enriched ...
Improving Organizational Knowledge with Natural Language Processing Enriched ...Improving Organizational Knowledge with Natural Language Processing Enriched ...
Improving Organizational Knowledge with Natural Language Processing Enriched ...
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
HBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgentHBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgent
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Druid
DruidDruid
Druid
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and Go
 

Ähnlich wie AWS Big Data Demystified #4 data governance demystified [security, network and data access management]

Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...David Timothy Strauss
 
004 - Logging in the Cloud -- hide01.ir.pptx
004 - Logging in the Cloud  --  hide01.ir.pptx004 - Logging in the Cloud  --  hide01.ir.pptx
004 - Logging in the Cloud -- hide01.ir.pptxnitinscribd
 
Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)Druva
 
Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016
Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016
Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016Amazon Web Services
 
Core strategies to develop defense in depth in AWS
Core strategies to develop defense in depth in AWSCore strategies to develop defense in depth in AWS
Core strategies to develop defense in depth in AWSShane Peden
 
How to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel AvivHow to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel AvivAmazon Web Services
 
Securing your database servers from external attacks
Securing your database servers from external attacksSecuring your database servers from external attacks
Securing your database servers from external attacksAlkin Tezuysal
 
Resposta a Incidentes | Mind The Sec 2022 com Rodrigo Montoro
Resposta a Incidentes | Mind The Sec 2022 com Rodrigo MontoroResposta a Incidentes | Mind The Sec 2022 com Rodrigo Montoro
Resposta a Incidentes | Mind The Sec 2022 com Rodrigo MontoroClavis Segurança da Informação
 
Automating AWS security and compliance
Automating AWS security and compliance Automating AWS security and compliance
Automating AWS security and compliance John Varghese
 
Enterprise Cloud Security
Enterprise Cloud SecurityEnterprise Cloud Security
Enterprise Cloud SecurityMongoDB
 
Big data security in AWS.pptx
Big data security in AWS.pptxBig data security in AWS.pptx
Big data security in AWS.pptxAshish210583
 
Data Privacy By Design with AWS
Data Privacy By Design with AWSData Privacy By Design with AWS
Data Privacy By Design with AWSKrzysztof Kąkol
 
Designing for Privacy in AWS cloud
Designing for Privacy in AWS cloudDesigning for Privacy in AWS cloud
Designing for Privacy in AWS cloudKrzysztof Kąkol
 
Cloud Security and some preferred practices
Cloud Security and some preferred practicesCloud Security and some preferred practices
Cloud Security and some preferred practicesMichael Pearce
 
Cloud Security.pptx
Cloud Security.pptxCloud Security.pptx
Cloud Security.pptxReena Harnal
 
How to protect your IoT data on AWS
How to protect your IoT data on AWSHow to protect your IoT data on AWS
How to protect your IoT data on AWSLahav Savir
 
Agile enterprise analytics on aws
Agile enterprise analytics on awsAgile enterprise analytics on aws
Agile enterprise analytics on awsDon Gillis
 
Cloud Security for Regulated Firms - Securing my cloud and proving it
Cloud Security for Regulated Firms - Securing my cloud and proving itCloud Security for Regulated Firms - Securing my cloud and proving it
Cloud Security for Regulated Firms - Securing my cloud and proving itHentsū
 
In Cloud We Encrypt #GHC15
In Cloud We Encrypt #GHC15In Cloud We Encrypt #GHC15
In Cloud We Encrypt #GHC15Intuit Inc.
 

Ähnlich wie AWS Big Data Demystified #4 data governance demystified [security, network and data access management] (20)

Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
 
004 - Logging in the Cloud -- hide01.ir.pptx
004 - Logging in the Cloud  --  hide01.ir.pptx004 - Logging in the Cloud  --  hide01.ir.pptx
004 - Logging in the Cloud -- hide01.ir.pptx
 
The Perimeter Is Dead
The Perimeter Is DeadThe Perimeter Is Dead
The Perimeter Is Dead
 
Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)Security and privacy of cloud data: what you need to know (Interop)
Security and privacy of cloud data: what you need to know (Interop)
 
Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016
Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016
Next-Generation Security Operations with AWS | AWS Public Sector Summit 2016
 
Core strategies to develop defense in depth in AWS
Core strategies to develop defense in depth in AWSCore strategies to develop defense in depth in AWS
Core strategies to develop defense in depth in AWS
 
How to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel AvivHow to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
 
Securing your database servers from external attacks
Securing your database servers from external attacksSecuring your database servers from external attacks
Securing your database servers from external attacks
 
Resposta a Incidentes | Mind The Sec 2022 com Rodrigo Montoro
Resposta a Incidentes | Mind The Sec 2022 com Rodrigo MontoroResposta a Incidentes | Mind The Sec 2022 com Rodrigo Montoro
Resposta a Incidentes | Mind The Sec 2022 com Rodrigo Montoro
 
Automating AWS security and compliance
Automating AWS security and compliance Automating AWS security and compliance
Automating AWS security and compliance
 
Enterprise Cloud Security
Enterprise Cloud SecurityEnterprise Cloud Security
Enterprise Cloud Security
 
Big data security in AWS.pptx
Big data security in AWS.pptxBig data security in AWS.pptx
Big data security in AWS.pptx
 
Data Privacy By Design with AWS
Data Privacy By Design with AWSData Privacy By Design with AWS
Data Privacy By Design with AWS
 
Designing for Privacy in AWS cloud
Designing for Privacy in AWS cloudDesigning for Privacy in AWS cloud
Designing for Privacy in AWS cloud
 
Cloud Security and some preferred practices
Cloud Security and some preferred practicesCloud Security and some preferred practices
Cloud Security and some preferred practices
 
Cloud Security.pptx
Cloud Security.pptxCloud Security.pptx
Cloud Security.pptx
 
How to protect your IoT data on AWS
How to protect your IoT data on AWSHow to protect your IoT data on AWS
How to protect your IoT data on AWS
 
Agile enterprise analytics on aws
Agile enterprise analytics on awsAgile enterprise analytics on aws
Agile enterprise analytics on aws
 
Cloud Security for Regulated Firms - Securing my cloud and proving it
Cloud Security for Regulated Firms - Securing my cloud and proving itCloud Security for Regulated Firms - Securing my cloud and proving it
Cloud Security for Regulated Firms - Securing my cloud and proving it
 
In Cloud We Encrypt #GHC15
In Cloud We Encrypt #GHC15In Cloud We Encrypt #GHC15
In Cloud We Encrypt #GHC15
 

Mehr von Omid Vahdaty

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedOmid Vahdaty
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedOmid Vahdaty
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...Omid Vahdaty
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedOmid Vahdaty
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...Omid Vahdaty
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...Omid Vahdaty
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedOmid Vahdaty
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...Omid Vahdaty
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...Omid Vahdaty
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive Omid Vahdaty
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystifiedOmid Vahdaty
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedOmid Vahdaty
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystifiedOmid Vahdaty
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
 
Introduction to aws dynamo db
Introduction to aws dynamo dbIntroduction to aws dynamo db
Introduction to aws dynamo dbOmid Vahdaty
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 

Mehr von Omid Vahdaty (20)

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data Demystified
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
 
Aws s3 security
Aws s3 securityAws s3 security
Aws s3 security
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Introduction to aws dynamo db
Introduction to aws dynamo dbIntroduction to aws dynamo db
Introduction to aws dynamo db
 
Hive vs. Impala
Hive vs. ImpalaHive vs. Impala
Hive vs. Impala
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 

Kürzlich hochgeladen

Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 

Kürzlich hochgeladen (20)

Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 

AWS Big Data Demystified #4 data governance demystified [security, network and data access management]

  • 1. AWS Data Governance Demystified [Network, Security,Privacy & Data Access Management] AWS Big Data Demystified #4 Omid Vahdaty, Big Data Ninja
  • 2. Agenda ● Disclaimer ○ I am not a network and security expert ○ This is not a security and network lecture ● Agenda…. ○ All the possible consideration for my data? (access,regulation, availability etc) == Data Governance ○ What are the architecture implications? ● My Advice: ○ Stick to high level overview ○ Remember the topic not the details. ○ ASK me questions during the lecture!
  • 3. TODAY’S BIG DATA APPLICATION STACK PaaS and DC...
  • 4. Big Data Generic Architecture | Summary Data Collection S3 Data Transformation Data Modeling Data Visualization
  • 5. Before I forget: for new AWS accounts…. ● Disable unused regions via IAM ● Set the the limit for the instances your are using , e.g 50 instances ● Set the limit for the instances your are not using to 0! ● Remove access key secret key for root account ● Don't use root account ● Use MFA
  • 6. Data Governance… Data Security Level Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.
  • 7. Big Data Security: Nobody likes it… But…
  • 8. AWS Regulation and more... ● https://aws.amazon.com/compliance/programs/ ● PII ● GDPR
  • 9. AWS Security options in general [ sorry, not covering everything… ] ● IAM : Identity management ○ IAM : Identity management ■ User, policy, roles,group, least privileges, MFA ○ Key Management Service ■ Server Side ■ Client side ○ Disable Data centers, unused instance families, ○ Limit resources ○ Account segregation ○ Identity based policies! ● Cloud trail: enables governance, compliance, auditing, and risk auditing ● S3: Resource management (e.g s3) ○ Write only, read only, no delete ○ Versioning, encryption,life cycle policy
  • 10. AWS Network options in general ● VPC ○ Create a Non Default VPC ○ Private network + Bastion host + ALB ○ Public subnet vs private subnet ○ VPC Endpoints ○ NACL ○ SG ○ IG ○ VPC peering ○ Site 2 Site - it is per VPC…. choose carefully ● Direct Connect VS HTTPS ● Cloudfront with GEOlocation protection ● https://amazon-aws-big-data-demystified.ninja/2018/06/27/aws-enterprise- grade-networking-security-what-are-your-options-to-protect-your-bigdata/
  • 15. How to Define VPC with private and public subnet http://docs.aws.amazon.com/ AmazonVPC/latest/UserGuide /VPC_Scenario2.html
  • 16. VPC private subnet + Virtual Private Gateway ● https://amazon-aws-big-data- demystified.ninja/2018/06/27/ aws-enterprise-grade- networking-security-what-are- your-options-to-protect-your- bigdata/ ● http://docs.aws.amazon.com/ AmazonVPC/latest/UserGuide /VPC_VPN.html ● https://docs.openvpn.net/how- to- tutorialsguides/administration/ extending-vpn-connectivity-to- amazon-aws-vpc-using-aws-
  • 17. Data Governance… Storage Level Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.
  • 18. S3 coolest options on AWS ● All of your data is replicated on multi AZ! ● Life Cycle policy ● Versioning ● Cross region replication ● Undeletable bucket - No delete policy ● MFA on delete Action - with time interval of 1 min or per action. ● Deny unencrypted data write. ● Analytics
  • 19. General Security Concepts | Good to know! ● protecting data while ○ in-transit (as it travels to and from Amazon S3) , 2 ways: ■ by using SSL ○ at rest (while it is stored on disks in Amazon S3 data centers) 2 ways: ■ Server Side encryption. (SSE) ■ client-side encryption. ○ In use: ■ Hasing…. (dictionary attack?) ■ Hashing with key ■ Any Encryption
  • 20. Blog: S3 security options in detail https://amazon-aws-big-data-demystified.ninja/2018/06/27/aws-s3-security- introduction-and-access-management/ ● Detailed Encryption options in AWS ● Resource based policy VS Identity based policy
  • 21. Server Side Encryption (SSE) summary ● Server-Side Encryption with Customer-Provided Keys (SSE-C) ■ You manage the encryption keys and Amazon S3 manages the encryption, as it writes to disks, and decryption, when you access your objects ● S3-Managed Keys (SSE-S3) ● AWS KMS-Managed Keys (SSE-KMS)
  • 22. Additional aws s3 Safeguard 1. VPN (site to site) 2. IP ACL 3. Identity Based policy (who? Me? S3 read only?) 4. Resourced based policy : e.g deny delete requests/encrypted objects 5. https://amazon-aws-big-data-demystified.ninja/2018/06/27/aws-s3-security- introduction-and-access-management/
  • 23. Basic S3 Security Diagram Destination AWS S3 Source Datacenter (secured) Data in Transit Client side encryption HTTPS / SSL VPN Data at Rest (Server/Client side encryption) Identity based policy:only myUser, access only to s3,write only Resource based policy: denyUnencrypted,Deny Delete,Deny Policy Change Accept only from DC IP
  • 24. Data Governance… Data Availability Level Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.
  • 25. Cross Region Replication ● https://docs.aws.amaz on.com/AmazonS3/lat est/dev/crr.html ● Versioning must be enabled ● SSL security at transit
  • 26. Cross account bucket policy (resource based) http://docs.aws.amazon.com/AmazonS3/latest/dev/example-walkthroughs- managing-access-example2.html
  • 27. Cross Region & Cross account replication https://docs.aws.amazon.com/AmazonS3/latest/dev/crr-walkthrough-2.html Account A Account B
  • 28. Use Case used by Walla for DR purposes ● Cross account & Cross region replication ○ Copy all backups & and mission critical data ○ Copy all Cloud Formation templates ○ Copy all Code
  • 29. Data Governance… Big Data Security Architecture Level (At rest, In motion, In use) Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.
  • 30. Architecture Security Consideration ● Basicly for each component ○ At rest ,At motion,In use ● Understand different Encryption options per component ○ Prefer SSE ○ Rotate your KMS key X periods ● Restrict access via - ○ Identity based policy: Role based security per cluster/technology ○ Resource based policy ○ least privileges, avoid admin users as much as possible. ○ Manage your EC2 keys wisely ● Understand differences of ○ Direct Connect VS public internet VS VPN and their impact on your application ○ Security groups / NACL / IP Based protection ○ private subnet / public subnet on your app ● For web Consider Cloudfront for GEOlocation protection
  • 31. EMR Security ● EMR specific: ○ Security configurations ○ Kerberos ● All other webs (Zeppelin, Hue, Oozie) ○ SSL ○ user/password prefer LDAP when possible ○ Shiro use role based access. ● EMR Role → restrict access to s3 buckets ● Identity level - least privileges ● application level (e.g.hive: user level, table level, row level, columns level, kerberos etc)
  • 32. Data governance… Privacy in a nutshell: PII Personal Identifiable Information Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.
  • 33. PII Privacy challenges in a nutshell ● Problem ○ Assume The attacker has Admin permissions ○ According to Lawyers - no such things a anonymous DB → ■ Your anonymous DB ■ Public 3rd party personal identifiable information DB ■ Joined together… implies your DB is not anonymous. ● Solution ○ Obfuscate & Aggregate your data wherever possible to avoid a scenario where an anonymous user can be reversed engineered based on the data (location history, habits + timestamp etc) ○ Keep only RAW data you need.
  • 34. Data governance… Usability, & Fine Grained Access Control Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.
  • 35. s3 Example AWS Architecture in one account (nothing special) Region EU-WEST-1 Region US-EAST-1 VPC Prod1 VPC stg VPC Prod2 VPC stg peering S3 endpoint
  • 36. Account Segregations Motivation ● Fine grained access control. ● Fine grained billing control. ● Limit your Blast radius - ○ what happens if your account is hacked? ○ What happens if one account out of 10 of your accounts is hacked?
  • 37. Simple access control in one VPC Data engineering EMR Cluster (transformation) , RW access to all buckets except modeling Data scientist EMR Cluster (modeling) , no access to RAW data, Read only access to transformation, and RW to modeling S3 raw buckets S3 transformation buckets S3 Modeling buckets Data source - write only Read Only Full Control write only
  • 38. Simple Account segregation for Business units example [common use case] Users account1 Login only Business unit 1 account1 Business unit 2 account3 DR account4 Assume role Admin Bi team PreSale Pre sale and BI may have access to both business units , or just to one business unit, Admin has access to everything as usual
  • 39. Account segregations per GEOlocation [data must not leave country use case] Users account Login only US-east-1 - N.virgina EU-west-2 - london DR account Assume role Admin London NewYork
  • 40. Fine grained access control via Account segregations [my use case] Users account Login only In bound Data Transformation Account per data source (RAW data, cleansing, encryption) Data Modeling and obfuscation on transformed data (big data) DR account Assume role DevOps Data Science Data Engineer Prod Outbound Data Transformation Account per data source (aggregated data, obfuscated data, re- encryption)
  • 41. Fine Grained Data Governance Flow via Account segregations Visualize / Prod account Inbound Data transformation per account pere data source if needed. Modeling & obfuscation account Outbound Data transformation account Raw data1 account1 READ / WRITE Encrypted PII DATA+user Hashed data DATA (hashed user id, hashed data, non PII data) Obfuscated data, Data modeling, hased users, non pii Raw data2 account2
  • 42. Big Data Generic Architecture | one account Data Collection S3 Data Transformation Data Modeling Data Visualization
  • 43. Big Data Generic Architecture | acc. seg. Data ingestion & RAW Data Data Transformation Data Modeling & obfuscation Data Visualization
  • 44. Summary and Take messages ● Design you network from the ground up, always keep your big data in mind ○ Data access, latency, Data encryption in motion, Performance impact ○ Avoid public IP’s when possible. If possible Use Direct Connect / VPN ● Design your data access layers carefully per your organization needs. ○ RAW data + encryption @rest & @motion ○ Keep DR on data level in mind. [seperate accounts + separate region!] ○ Transformation + hashing level [inbound/outbound] ○ Modeling + obfuscation level ● Architecture impact ○ Security [at rest, in motion, in use] for each component ○ Identity based policy ○ Resource based policy ○ Account segregations, to avoid blast radius
  • 45. Stay in touch... ● Omid Vahdaty ● +972-54-2384178 ● https://amazon-aws-big-data-demystified.ninja/ ● Join our meetup, FB group and youtube channel ○ https://www.meetup.com/AWS-Big-Data-Demystified/ ○ https://www.facebook.com/groups/amazon.aws.big.data.demystified/ ○ https://www.youtube.com/channel/UCzeGqhZIWU-hIDczWa8GtgQ?view_as=subscriber