SlideShare ist ein Scribd-Unternehmen logo
1 von 36
© Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton
Lesson Learned Securing Data at Scale
Drew Farris
Peter Guerra
Hadoop Summit 2014
© Copyright 2014 Booz Allen Hamilton
© Copyright 2014 Booz Allen Hamilton Photo: CC BY 2.0: https://www.flickr.com/photos/atoach/5015711744
© Copyright 2014 Booz Allen Hamilton
Photo CC BY 2.0: https://www.flickr.com/photos/dutchamsterdam/
© Copyright 2014 Booz Allen Hamilton
Who we are
  Founded and run DC Hadoop Users
Group Meetup –
http://www.meetup.com/Hadoop-DC
  Technical talks at multiple conferences
–  Strata, Data Science Summit, IDGA Gov
Cloud Conference, Cloudera Hadoop
Summit,Yahoo! Hadoop Summit, IEEE Cloud
Conference, CSA Congress, Black Hat
  Multiple client engagements over the
last 7 years
–  Defense
–  Civil and Commercial Health
–  Civil and Commercial Financial Services
–  Commercial and International
+  Booz Allen Big Data and Data Science
Points-of-View
+  http://www.boozallen.com/cloud
+  http://www.boozallen.com/datascience
+  Advancing the Art of Analytics & Big Data
+  http://www.boozallen.com/insights/expertvoices/big-
data
+  http://www.federalnewsradio.com/?
nid=154&sid=2080808
+  Tackling Large Scale Data in Government
+  http://www.cloudera.com/blog/2010/11/tackling-
large-scale-data-in-government/
+  IT Architectures for Complex Search and Information
Retrieval
+  http://www.slideshare.net/cloudera/fuzzy-table-final
+  http://www.slideshare.net/ydn/3-biometric-
hadoopsummit2010
© Copyright 2014 Booz Allen Hamilton
Agenda
+  Securing Data in Hadoop
+  Architectural Case Study
+  What we did
+  How we did it
+  What tools we used
+  Smart Data
+  Emerging Security Capabilities
© Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton
Securing Data in Hadoop
© Copyright 2014 Booz Allen Hamilton
+  Data is growing exponentially and our
ability to securely store and process it is
falling behind
+  Security policies haven’t kept up with the
technology
+  Most security policies and tools were not
written for Big Data systems, so mapping
can be difficult
+  Clients are often not prepared for the
security challenges when integrating
multiple data sources
What are the security challenges with these architectures?
© Copyright 2014 Booz Allen Hamilton
Our approach to data security has made adoption more difficult
+  For the last 20 years we have built systems in silos,
isolated data containers (databases, applications, and
so forth)
+  Most organizations secure each silo individually and
protect access by database
+  Most certification and accreditation programs (FISMA),
PCI, HIPAA, and SANS top 20 controls define security
controls around each data silo
+  Most security controls implemented are to protect the
servers, user, or network access to data
© Copyright 2014 Booz Allen Hamilton
Example: SANS 20 – Control 15; Controlled Access on Need to Know
Deploy data protection such as IDS,
firewalls, anti-virus, HIPS, DLP, GRC…
Wrap those around a number of Big Data
technologies, most of which are based on
Apache Hadoop or integrate with it:
+  Hortonworks / Cloudera Stack
+  NoSQL MongoDB / CouchDB /
Cassandra
+  BigTable (Apache Accumulo / Apache
Hbase )
Distributed Systems by nature have
different security challenges because of
their architecture
SANS Control 15:
… the data classification system and permission
baseline is the blueprint for how authentication and
access of data is controlled…
+  Step 1:An appropriate data classification system and
permissions baseline applied to production data
systems
+  Step 2:Access appropriately logged to a log
management system
+  Step 3: Proper access control applied to portable
media/USB drives
+  Step 4:Active scanner validates,checks access,and
checks data classification
+  Step 5: Host-based encryption and data-loss
prevention validates and checks all access requests.
© Copyright 2014 Booz Allen Hamilton
Overview of Security Architecture Components
+  Infrastructure & Network
+  Encryption (at Rest & in Transit)
+  Authentication (User Principal and Device)
+  Authorization (Privileged Access Management)
+  Access Controls (Data Visibility)
+  Auditing & Monitoring of Data Access
+  Policy & Compliance
Driving Principles
+  Start with People, Process and
Culture
+  Understand the Data and the
Threat
+  Start small and build
+  Never finished
© Copyright 2014 Booz Allen Hamilton
Apache Hadoop Security Challenges
Scale
+  The large number of tasks presents problems with direct authentication
HDFS / File System
+  NameNodes have ACLs, while DataNodes don’t
Job Execution
+  Propagation of credentials to executing nodes
Job Data
+  Task Parameters / Intermediate output accessible via HTTP
Multi Tenancy
+  Access to Intermediate Output & Local Block Storage
Trust Of Auxiliary Services (Oozie, Hadoop clients, Hadoop Pipes/Streaming)
© Copyright 2014 Booz Allen Hamilton
First Hadoop release with Kerberos in 2008
A better solution was available, not always
implemented:
+  Tokens: Delegation Token, Block Access Token, Job
Token
+  Symmetric Encryption == Shared Keys
+  Large Cluster = Thousands of Copies of Shared
Keys
+  Performance Goals (Less than 3% impact) lead to
weak SASL QoP
+  Pluggable Authentication left to end-user
+  HDFS proxies for bulk transfer expose data
Often not implemented in favor of putting Hadoop into
an enclave, but still doesn’t fully regulate access to data
Alternatives?
+  Tahoe-LAFS. Cool,
but significant
Performance Impact
© Copyright 2014 Booz Allen Hamilton
Apache Hadoop 2.x Security
Hadoop RPC
+  Clients, MapReduce Jobs, Hadoop Daemons
+  SASL with varying levels of protection (QoP):
-  Authorization, Integrity Protection and Confidentiality
Direct TCP/IP
+  HDFS Data Transfer between Clients, DN
+  Tunnel existing protocol over SASL HDFS-3637
HTTP
+  Web-UI, FSImage Operations between NN / SNN
+  HTTPS, Reloadable Java Keystore, Others
+  MAPREDUCE-4417, HADOOP-8581
© Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton
Architectural Case Study
Commercial Client
© Copyright 2014 Booz Allen Hamilton
+  Client is a multi-national Fortune 500 company with over 100,000
employees
+  Client had multiple data sources for each business unit – R&D,
Manufacturing, Sales and Marketing, Corporate
+  Client wanted to combine data, but many sensitive issues around new
product development and access to data by third party contractors, others
within its network boundaries
+  Efforts to integrate data previously had failed because of political and
technical issues
+  Could not get CISO to sign off on combining data!
Challenges
© Copyright 2014 Booz Allen Hamilton
Securing the Enterprise Ecosystem
Design Goals
+  Build a fully realized “Data Lake” combining information from many
different sources
+  Protect from unauthorized release or modification of information
+  Focus primarily on full-text retrieval but enable a variety of analytic
functions.
+  Enable the use of a variety of components from Hadoop Ecosystem
+  Implement in a series of phases based on client requirements
© Copyright 2014 Booz Allen Hamilton
Services (SOA)
Analytics and
Discovery
Views and Indexes
Data Lake
Metadata Tagging
Data Sources
Infrastructure/
Management
Visualization,
Reporting, Dashboards,
and Query Interface
Human Insights and Actions
Enabled by customizable interfaces
and visualizations of the data
Analytics and Services
Your tools for analysis, modeling, testing,
and simulations
Data Management
The single, secure repository for all
of your valuable data
Infrastructure
The technology platform for storing and
managing your data
Machine Learning Free-Computation Alerting
Geographic
Language
Translation
Entity
Relationship
Event Grab
Dense/
Sparse
Structured Unstructured Streaming
Provisioning Deployment Monitoring Workflow
Streaming Analytics
Streaming
indexes
Our Common Reference Architecture for Big Data
© Copyright 2014 Booz Allen Hamilton
Distributed*
Storage
Extract
Distributed
Analy6cs*&*Indexing
Presenta6on*Layer
periodic*updates
Non=Rela6onal*Stores
Sta6c*Rela6onal*
Databases
Sta6c*Data
Custom*Ingest*Logic
Sqoop
Hadoop
HDFS
Storm+Lucene*
Processing*Layer
Index*Files
Index*Persistence*&
Meta=data*Management
depending*on*use*case
JeGy*App*Server
Applica6ons*&*
Services*Layer
interac6ve*search
batch*repor6ng
View*/*UI*Model
Browser*App
Front=end*Client
(On=Network*Users)
Data$Lake$Pla*orm$Components$&$Search$App.$Architecture
Enterprise*Security,*Monitoring,*and*Governance*Controls
Hadoop
Map/Reduce
Search*&*BI*Logic
Kerberos*SSO*
Connector
Directory
Services
On=Premise*Firewall
Hive
DNS,*DHCP,*NTP,*
SMTP,*Proxy*(package*
updates)*Services
ZooKeeper
Informa6on*Model*/*
Hive*meta=store
Security
Groups*(FW)
Network*ACLs
Standard*AWS*
Machine*
Images
Encrypted*Data*
Volumes
An6virus*&*
System
Monitoring
Knox*Gateway*
&*Audit*Logging
AWS*Direct*Connect
AWS$Virtual$Private$Cloud$(EC2) OnCPremise$Network
Remote*Access*
Cer6ficate
(2=way*SSL)
Accumulo
Data*
Governance*&**
Stewardship
Analy6c*App*&*BI*
Users*(On=Network)
Spoire*&*Other*BI*
Tools
Privileged*Users*/*
Data*Scien6sts
(Direct*Access)
Streaming*Data
User*Uploaded
Data*Sets
Rela6onal*Database*
Triggers
Ka]a
low-latency
updates
=*Open*Source*Components*(Green)
© Copyright 2014 Booz Allen Hamilton
tl; dr;
+  Data Loading via Sqoop / Custom Transport
+  Ingest / Index via MapReduce
+  Distributed Query via Storm+Lucene
+  Batch / Reporting Via MR / Hive
+  Authentication via Kerberos
+  Access Via Web Application & Knox
+  Currently 100TB / 50% used, 150TB by EOY
© Copyright 2014 Booz Allen Hamilton
Infrastructure and Network Security
+  Amazon Web Services Provided
+  Virtual Private Cloud / Security Groups
+  Time to Deployment in Early Phases
+  Physical access to data centers, network isolation, etc.
+  Future Transition on-Premise Infrastructure
+  Concerned with procurement time
+  Other clients we’ve worked with 3-6 month turnaround for infrastructure
prep
+  Instance Level Malware Detection tuned to co-exist with cluster workloads
© Copyright 2014 Booz Allen Hamilton
Encryption
At Rest:
+  LUKS (Linux Unified Key Setup) for Ephemeral Storage Volumes
+  “Lock it up and throw away the key”
In Transit:
+  SSL to Web App Endpoints and Knox Gateway
+  Internal Network Isolation – VPC Controls prevent traffic interception &
MITM attacks
© Copyright 2014 Booz Allen Hamilton
Authentication and Authorization
+  Authentication via Kerberos
+  Authorization via LDAP
+  Future transition to enterprise authentication services: Oracle IAM.
+  Multi-factor Authentication for both Users and Devices via PKI
+  Authorization performed at both the User and Device Level
© Copyright 2014 Booz Allen Hamilton
Operating System user accounts and groups for users, projects and teams
reflected in HDFS permissions
Privileged access via Knox Gateway extension which provides access via SSH,
auditing and monitoring and control of administrative connections into the
cluster. (KNOX-250)
Identity Provider
Knox
Gateway
Hadoop Cluster
(Master)
(Oozie)
(Hive2 Server)
External Sources
REST/SSL
SSH HTTP
SPNEGO
Privileged Access Management
© Copyright 2014 Booz Allen Hamilton
Putting it All Together
+  Search UI is a web application accessed via SSL
+  Knox is the primary cluster access mechanism for users who need to access
to the cluster. Knox Provides access to the following services:
+  WebHDFS, WebHCat, Hive, Oozie
+  Knox for administrative access, via custom SSH plugin
© Copyright 2014 Booz Allen Hamilton
Future Directions
+  Role Base Access Control is an emerging client need. This will require:
+  Integration with enterprise role management
+  Passing roles through Web App & Knox to backend
+  Role based access in Accumulo, Lucene Indexes
+  Smart Data Tagging Strategy …
© Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton
Smart Data
© Copyright 2014 Booz Allen Hamilton
Smart Data
+  How many organizations have data security requirements?
+  A structured, verifiable representation of security tags bound to the data is
required in order for the enterprise to become inherently "smarter" about
the information flowing in and around it –
Smart Data
+  Overview of design principles:
+  PKI
+  Implement ABAC controls in IdAM
+  Define trusted data format based on data security
+  Tag all your data
+  Deploy Hadoop platform that leverages tags to track access
+  Log, monitor, and audit everything
© Copyright 2014 Booz Allen Hamilton
Data
Element
Visibility Tags
(red | blue | green)
Authorization
Authentication
Attributes
(red, orange, blue)
IDAM
User
Machine Learning Free-Computation Alerting
Geographic
Language
Translation
Entity
Relationship
Event Grab
Dense/
Sparse
Structured Unstructured Streaming
Provisioning Deployment Monitoring Workflow
Streaming Analytics
Streaming
indexes
Apache
Accumulo
Overview of Smart Data
© Copyright 2014 Booz Allen Hamilton
Allow access to resource MedicalJournal with attribute patientID=x
if Subject match DesignatedDoctorOfPatient
and action is read
with obligation
on Permit: doLog_Inform(patientID,Subject,time)
on Deny : doLog_UnauthorizedLogin(patientID,Subject,time)
Smart Data Security Controls
+  Trusted Client – Authorization and Authentication using PKI
+  Trusted Data Format – Data visibility is controlled using Boolean expressions
+  Ex.“((red|blue|green) & (white|yellow))”
+  Clients present Authorizations (red, blue, green, yellow) to Apache Accumulo
+  Corresponding tags are bound to data stored in Apache Accumulo
+  Trusted Log – All data interactions are logged and audited
Identity and Access Management
+  Attribute Based Access Control – Users all assigned series of attributes
+  Attributes and Authorization Bound by XACML, SAML
+  Policy Decision Point (PDP)
+  Policy Enforcement Point (PEP)
+  Policy Retrieval Point (PRP)
+  Policy Information Point (PIP)
+  Policy Administration Point (PAP)
© Copyright 2014 Booz Allen Hamilton
Tagging Smart Data
Formulate the tags used to control data from multiple perspectives
+  Data Origin
+  Level of Access Required
+  Information Governance Policy
+  Data Owners
+  Intended Recipients
Use fine grained tags, assign users many roles
+  Tag at the field level so that existence can be verified without revealing the
full data record
In Accumulo:
+  Capitalize on the richness of boolean expressions in visibility tags
+  Differential Compression eliminates the impact of repartition of data
+  Visibility Tags are bound to the data, changing visibilities is not trivial: it
means a delete and a re-add.
© Copyright 2014 Booz Allen Hamilton
Representational versus Referential Tags
Representational tags encode the specific visibilities they represent, including
all alternate controls for a specific document
User has roles of ACCOUNTING, RESEARCH and PII
+  If data has tag PII&RESEARCH, user can access data
+  If data has tag HIPAA&ACCOUNTING, user can’t access data
Referential Tags are a code, that relies on external translation between assigned
access controls and visibility markings:
Data has marking of 03DECAF00D
+  User has roles of ACCOUNTING, RESEARCH and PII
+  At lookup, translation of user roles into possible referential tags
Choice depends on security posture, what are the consequences of getting it
wrong versus the ease of shifting policy or data?
© Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton
Emerging Security Capabilities
© Copyright 2014 Booz Allen Hamilton
Ecosystem for security capabilities for Hadoop is growing rapidly
Cloudera (with Intel Rhino)
+  Sentry (ACLs for Hive / Impala)
+  Gazzang (Filesystem Encryption)
+  Intel Rhino
+  Encryption Codec Support HADOOP-9331
+  Key Distribution & Management MAPREDUCE-5025
+  Token Based Authentication HADOOP-9392
+  Unified Authorization Framework HADOOP-9466
+  Transparent Encryption for Hbase/Zookeeper
+  Others, see https://github.com/intel-hadoop/project-rhino/
Hortonworks
+  Production Ready Apache Knox
+  XA Secure
+  Central Administration
+  Authorization for HDFS / Hive / Hbase
+  Compliance Controls
Lots of talks at this Hadoop Summit on
data security:
The Future of Hadoop Security –
Joey Echeverria
Hadoop REST API Security with the
Apache Knox Gateway –
Kevin Minder,Larry McCay
Securing Big Data: Lock it Down, or
Liberate?
Jeff Graham,Mark Tomallo
Improvements in Hadoop Security –
Sanjay Radia,Chris Nauroth
© Copyright 2014 Booz Allen Hamilton
Summary
+  Security for Hadoop has come a long way and is changing rapidly, but is still
maturing
+  Securing the data in Hadoop means thinking differently about the architecture
when combining multiple data sources
+  Your Hadoop Architecture should provide consistent security mechanisms across
all of the data
+  A more complete way to secure data is to implement Smart Data (ABAC and Fine
Grained Access Controls) but this hasn’t been embraced consistently across the
Hadoop ecosystem yet
+  The next 6 months will be interesting …
© Copyright 2014 Booz Allen Hamilton
Just Released!
The Field Guide to Data Science
120 page e-book of data science geekery
Download for free:
http://www.boozallen.com/datascience
Thanks!
Drew (@drewfarris)
Peter (@petrguerra)

Weitere ähnliche Inhalte

Was ist angesagt?

Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak
 
Study notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerStudy notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerDavid Sweigert
 
Cloud Computing for Lawyers: Practical and Ethical Uses of the Cloud
Cloud Computing for Lawyers: Practical and Ethical Uses of the CloudCloud Computing for Lawyers: Practical and Ethical Uses of the Cloud
Cloud Computing for Lawyers: Practical and Ethical Uses of the CloudRobert Ambrogi
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 
6 Ways to Get More From Your Azure
6 Ways to Get More From Your Azure6 Ways to Get More From Your Azure
6 Ways to Get More From Your AzureHolly Plude
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pmJ ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pmJathin Ullal
 
Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopDataWorks Summit
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopHortonworks
 
Leveraging The Power Of The Cloud For Your Business
Leveraging The Power Of The Cloud For Your BusinessLeveraging The Power Of The Cloud For Your Business
Leveraging The Power Of The Cloud For Your BusinessJoel Katz
 
AIIM/ARMA Cloud Collaboration Presentation
AIIM/ARMA Cloud Collaboration PresentationAIIM/ARMA Cloud Collaboration Presentation
AIIM/ARMA Cloud Collaboration PresentationPorter-Roth Associates
 
Cómo AWS lo ayuda a cumplir con requisitos regulatorios
Cómo AWS lo ayuda a cumplir con requisitos regulatoriosCómo AWS lo ayuda a cumplir con requisitos regulatorios
Cómo AWS lo ayuda a cumplir con requisitos regulatoriosAmazon Web Services LATAM
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Cloud Computing: What it Means for Libraries, Library Staff, Training and Skills
Cloud Computing: What it Means for Libraries, Library Staff, Training and SkillsCloud Computing: What it Means for Libraries, Library Staff, Training and Skills
Cloud Computing: What it Means for Libraries, Library Staff, Training and Skillssherif user group
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 

Was ist angesagt? (20)

Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
 
Presd1 10
Presd1 10Presd1 10
Presd1 10
 
Study notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerStudy notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security Practitioner
 
Cloud Computing for Lawyers: Practical and Ethical Uses of the Cloud
Cloud Computing for Lawyers: Practical and Ethical Uses of the CloudCloud Computing for Lawyers: Practical and Ethical Uses of the Cloud
Cloud Computing for Lawyers: Practical and Ethical Uses of the Cloud
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
6 Ways to Get More From Your Azure
6 Ways to Get More From Your Azure6 Ways to Get More From Your Azure
6 Ways to Get More From Your Azure
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pmJ ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
J ullal hphybrid-cloud-interop14lv-theatresession-apr1tue4pm
 
Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
 
Leveraging The Power Of The Cloud For Your Business
Leveraging The Power Of The Cloud For Your BusinessLeveraging The Power Of The Cloud For Your Business
Leveraging The Power Of The Cloud For Your Business
 
AIIM/ARMA Cloud Collaboration Presentation
AIIM/ARMA Cloud Collaboration PresentationAIIM/ARMA Cloud Collaboration Presentation
AIIM/ARMA Cloud Collaboration Presentation
 
Cómo AWS lo ayuda a cumplir con requisitos regulatorios
Cómo AWS lo ayuda a cumplir con requisitos regulatoriosCómo AWS lo ayuda a cumplir con requisitos regulatorios
Cómo AWS lo ayuda a cumplir con requisitos regulatorios
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Cloud Computing: What it Means for Libraries, Library Staff, Training and Skills
Cloud Computing: What it Means for Libraries, Library Staff, Training and SkillsCloud Computing: What it Means for Libraries, Library Staff, Training and Skills
Cloud Computing: What it Means for Libraries, Library Staff, Training and Skills
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 

Andere mochten auch

Metodos ITIL, COBIT, BS15000
Metodos  ITIL, COBIT, BS15000Metodos  ITIL, COBIT, BS15000
Metodos ITIL, COBIT, BS15000Christian Cruz
 
Six Lessons I Have Learnt from Steve Jobs
Six Lessons I Have Learnt from Steve JobsSix Lessons I Have Learnt from Steve Jobs
Six Lessons I Have Learnt from Steve JobsGyan Lab
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
The Cloud's Hidden Lock-in: Network Latency
The Cloud's Hidden Lock-in: Network LatencyThe Cloud's Hidden Lock-in: Network Latency
The Cloud's Hidden Lock-in: Network LatencyTom Croucher
 
Cloud Computing. Gestión de configuraciones
Cloud Computing. Gestión de configuracionesCloud Computing. Gestión de configuraciones
Cloud Computing. Gestión de configuracionespacvslideshare
 
Diseño del software
Diseño del softwareDiseño del software
Diseño del softwareduberlisg
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Mejorando la Gestión de la gerencia de TI
Mejorando la Gestión de la gerencia de TIMejorando la Gestión de la gerencia de TI
Mejorando la Gestión de la gerencia de TIGeneXus
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
Los SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio Hrabinski
Los SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio HrabinskiLos SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio Hrabinski
Los SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio HrabinskiForo Global Crossing
 
Real-World Data Governance: Managing Data & Information as an Asset - Governa...
Real-World Data Governance: Managing Data & Information as an Asset - Governa...Real-World Data Governance: Managing Data & Information as an Asset - Governa...
Real-World Data Governance: Managing Data & Information as an Asset - Governa...DATAVERSITY
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
V mware v realize orchestrator 6.0 knowledge transfer kit
V mware v realize orchestrator 6.0 knowledge transfer kitV mware v realize orchestrator 6.0 knowledge transfer kit
V mware v realize orchestrator 6.0 knowledge transfer kitsolarisyougood
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in PracticeC4Media
 
umeng analytical arch
umeng analytical archumeng analytical arch
umeng analytical archYan Zhang
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 

Andere mochten auch (20)

Itil v2.5
Itil v2.5Itil v2.5
Itil v2.5
 
Metodos ITIL, COBIT, BS15000
Metodos  ITIL, COBIT, BS15000Metodos  ITIL, COBIT, BS15000
Metodos ITIL, COBIT, BS15000
 
Six Lessons I Have Learnt from Steve Jobs
Six Lessons I Have Learnt from Steve JobsSix Lessons I Have Learnt from Steve Jobs
Six Lessons I Have Learnt from Steve Jobs
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
The Cloud's Hidden Lock-in: Network Latency
The Cloud's Hidden Lock-in: Network LatencyThe Cloud's Hidden Lock-in: Network Latency
The Cloud's Hidden Lock-in: Network Latency
 
Cloud Computing. Gestión de configuraciones
Cloud Computing. Gestión de configuracionesCloud Computing. Gestión de configuraciones
Cloud Computing. Gestión de configuraciones
 
Diseño del software
Diseño del softwareDiseño del software
Diseño del software
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Mejorando la Gestión de la gerencia de TI
Mejorando la Gestión de la gerencia de TIMejorando la Gestión de la gerencia de TI
Mejorando la Gestión de la gerencia de TI
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Los SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio Hrabinski
Los SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio HrabinskiLos SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio Hrabinski
Los SLAs y el uso de ITIL® en un contexto de outsourcing, por Sergio Hrabinski
 
Real-World Data Governance: Managing Data & Information as an Asset - Governa...
Real-World Data Governance: Managing Data & Information as an Asset - Governa...Real-World Data Governance: Managing Data & Information as an Asset - Governa...
Real-World Data Governance: Managing Data & Information as an Asset - Governa...
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
V mware v realize orchestrator 6.0 knowledge transfer kit
V mware v realize orchestrator 6.0 knowledge transfer kitV mware v realize orchestrator 6.0 knowledge transfer kit
V mware v realize orchestrator 6.0 knowledge transfer kit
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
Tourist behaviour, unit 1
Tourist behaviour, unit 1Tourist behaviour, unit 1
Tourist behaviour, unit 1
 
umeng analytical arch
umeng analytical archumeng analytical arch
umeng analytical arch
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 

Ähnlich wie Lessons Learned on How to Secure Petabytes of Data

Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
XA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within HadoopXA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within Hadoopbalajiganesan03
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedCloudera, Inc.
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksLuan Moreno Medeiros Maciel
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in HadoopRommel Garcia
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 

Ähnlich wie Lessons Learned on How to Secure Petabytes of Data (20)

Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
XA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within HadoopXA Secure | Whitepaper on data security within Hadoop
XA Secure | Whitepaper on data security within Hadoop
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Lessons Learned on How to Secure Petabytes of Data

  • 1. © Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton Lesson Learned Securing Data at Scale Drew Farris Peter Guerra Hadoop Summit 2014
  • 2. © Copyright 2014 Booz Allen Hamilton
  • 3. © Copyright 2014 Booz Allen Hamilton Photo: CC BY 2.0: https://www.flickr.com/photos/atoach/5015711744
  • 4. © Copyright 2014 Booz Allen Hamilton Photo CC BY 2.0: https://www.flickr.com/photos/dutchamsterdam/
  • 5. © Copyright 2014 Booz Allen Hamilton Who we are   Founded and run DC Hadoop Users Group Meetup – http://www.meetup.com/Hadoop-DC   Technical talks at multiple conferences –  Strata, Data Science Summit, IDGA Gov Cloud Conference, Cloudera Hadoop Summit,Yahoo! Hadoop Summit, IEEE Cloud Conference, CSA Congress, Black Hat   Multiple client engagements over the last 7 years –  Defense –  Civil and Commercial Health –  Civil and Commercial Financial Services –  Commercial and International +  Booz Allen Big Data and Data Science Points-of-View +  http://www.boozallen.com/cloud +  http://www.boozallen.com/datascience +  Advancing the Art of Analytics & Big Data +  http://www.boozallen.com/insights/expertvoices/big- data +  http://www.federalnewsradio.com/? nid=154&sid=2080808 +  Tackling Large Scale Data in Government +  http://www.cloudera.com/blog/2010/11/tackling- large-scale-data-in-government/ +  IT Architectures for Complex Search and Information Retrieval +  http://www.slideshare.net/cloudera/fuzzy-table-final +  http://www.slideshare.net/ydn/3-biometric- hadoopsummit2010
  • 6. © Copyright 2014 Booz Allen Hamilton Agenda +  Securing Data in Hadoop +  Architectural Case Study +  What we did +  How we did it +  What tools we used +  Smart Data +  Emerging Security Capabilities
  • 7. © Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton Securing Data in Hadoop
  • 8. © Copyright 2014 Booz Allen Hamilton +  Data is growing exponentially and our ability to securely store and process it is falling behind +  Security policies haven’t kept up with the technology +  Most security policies and tools were not written for Big Data systems, so mapping can be difficult +  Clients are often not prepared for the security challenges when integrating multiple data sources What are the security challenges with these architectures?
  • 9. © Copyright 2014 Booz Allen Hamilton Our approach to data security has made adoption more difficult +  For the last 20 years we have built systems in silos, isolated data containers (databases, applications, and so forth) +  Most organizations secure each silo individually and protect access by database +  Most certification and accreditation programs (FISMA), PCI, HIPAA, and SANS top 20 controls define security controls around each data silo +  Most security controls implemented are to protect the servers, user, or network access to data
  • 10. © Copyright 2014 Booz Allen Hamilton Example: SANS 20 – Control 15; Controlled Access on Need to Know Deploy data protection such as IDS, firewalls, anti-virus, HIPS, DLP, GRC… Wrap those around a number of Big Data technologies, most of which are based on Apache Hadoop or integrate with it: +  Hortonworks / Cloudera Stack +  NoSQL MongoDB / CouchDB / Cassandra +  BigTable (Apache Accumulo / Apache Hbase ) Distributed Systems by nature have different security challenges because of their architecture SANS Control 15: … the data classification system and permission baseline is the blueprint for how authentication and access of data is controlled… +  Step 1:An appropriate data classification system and permissions baseline applied to production data systems +  Step 2:Access appropriately logged to a log management system +  Step 3: Proper access control applied to portable media/USB drives +  Step 4:Active scanner validates,checks access,and checks data classification +  Step 5: Host-based encryption and data-loss prevention validates and checks all access requests.
  • 11. © Copyright 2014 Booz Allen Hamilton Overview of Security Architecture Components +  Infrastructure & Network +  Encryption (at Rest & in Transit) +  Authentication (User Principal and Device) +  Authorization (Privileged Access Management) +  Access Controls (Data Visibility) +  Auditing & Monitoring of Data Access +  Policy & Compliance Driving Principles +  Start with People, Process and Culture +  Understand the Data and the Threat +  Start small and build +  Never finished
  • 12. © Copyright 2014 Booz Allen Hamilton Apache Hadoop Security Challenges Scale +  The large number of tasks presents problems with direct authentication HDFS / File System +  NameNodes have ACLs, while DataNodes don’t Job Execution +  Propagation of credentials to executing nodes Job Data +  Task Parameters / Intermediate output accessible via HTTP Multi Tenancy +  Access to Intermediate Output & Local Block Storage Trust Of Auxiliary Services (Oozie, Hadoop clients, Hadoop Pipes/Streaming)
  • 13. © Copyright 2014 Booz Allen Hamilton First Hadoop release with Kerberos in 2008 A better solution was available, not always implemented: +  Tokens: Delegation Token, Block Access Token, Job Token +  Symmetric Encryption == Shared Keys +  Large Cluster = Thousands of Copies of Shared Keys +  Performance Goals (Less than 3% impact) lead to weak SASL QoP +  Pluggable Authentication left to end-user +  HDFS proxies for bulk transfer expose data Often not implemented in favor of putting Hadoop into an enclave, but still doesn’t fully regulate access to data Alternatives? +  Tahoe-LAFS. Cool, but significant Performance Impact
  • 14. © Copyright 2014 Booz Allen Hamilton Apache Hadoop 2.x Security Hadoop RPC +  Clients, MapReduce Jobs, Hadoop Daemons +  SASL with varying levels of protection (QoP): -  Authorization, Integrity Protection and Confidentiality Direct TCP/IP +  HDFS Data Transfer between Clients, DN +  Tunnel existing protocol over SASL HDFS-3637 HTTP +  Web-UI, FSImage Operations between NN / SNN +  HTTPS, Reloadable Java Keystore, Others +  MAPREDUCE-4417, HADOOP-8581
  • 15. © Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton Architectural Case Study Commercial Client
  • 16. © Copyright 2014 Booz Allen Hamilton +  Client is a multi-national Fortune 500 company with over 100,000 employees +  Client had multiple data sources for each business unit – R&D, Manufacturing, Sales and Marketing, Corporate +  Client wanted to combine data, but many sensitive issues around new product development and access to data by third party contractors, others within its network boundaries +  Efforts to integrate data previously had failed because of political and technical issues +  Could not get CISO to sign off on combining data! Challenges
  • 17. © Copyright 2014 Booz Allen Hamilton Securing the Enterprise Ecosystem Design Goals +  Build a fully realized “Data Lake” combining information from many different sources +  Protect from unauthorized release or modification of information +  Focus primarily on full-text retrieval but enable a variety of analytic functions. +  Enable the use of a variety of components from Hadoop Ecosystem +  Implement in a series of phases based on client requirements
  • 18. © Copyright 2014 Booz Allen Hamilton Services (SOA) Analytics and Discovery Views and Indexes Data Lake Metadata Tagging Data Sources Infrastructure/ Management Visualization, Reporting, Dashboards, and Query Interface Human Insights and Actions Enabled by customizable interfaces and visualizations of the data Analytics and Services Your tools for analysis, modeling, testing, and simulations Data Management The single, secure repository for all of your valuable data Infrastructure The technology platform for storing and managing your data Machine Learning Free-Computation Alerting Geographic Language Translation Entity Relationship Event Grab Dense/ Sparse Structured Unstructured Streaming Provisioning Deployment Monitoring Workflow Streaming Analytics Streaming indexes Our Common Reference Architecture for Big Data
  • 19. © Copyright 2014 Booz Allen Hamilton Distributed* Storage Extract Distributed Analy6cs*&*Indexing Presenta6on*Layer periodic*updates Non=Rela6onal*Stores Sta6c*Rela6onal* Databases Sta6c*Data Custom*Ingest*Logic Sqoop Hadoop HDFS Storm+Lucene* Processing*Layer Index*Files Index*Persistence*& Meta=data*Management depending*on*use*case JeGy*App*Server Applica6ons*&* Services*Layer interac6ve*search batch*repor6ng View*/*UI*Model Browser*App Front=end*Client (On=Network*Users) Data$Lake$Pla*orm$Components$&$Search$App.$Architecture Enterprise*Security,*Monitoring,*and*Governance*Controls Hadoop Map/Reduce Search*&*BI*Logic Kerberos*SSO* Connector Directory Services On=Premise*Firewall Hive DNS,*DHCP,*NTP,* SMTP,*Proxy*(package* updates)*Services ZooKeeper Informa6on*Model*/* Hive*meta=store Security Groups*(FW) Network*ACLs Standard*AWS* Machine* Images Encrypted*Data* Volumes An6virus*&* System Monitoring Knox*Gateway* &*Audit*Logging AWS*Direct*Connect AWS$Virtual$Private$Cloud$(EC2) OnCPremise$Network Remote*Access* Cer6ficate (2=way*SSL) Accumulo Data* Governance*&** Stewardship Analy6c*App*&*BI* Users*(On=Network) Spoire*&*Other*BI* Tools Privileged*Users*/* Data*Scien6sts (Direct*Access) Streaming*Data User*Uploaded Data*Sets Rela6onal*Database* Triggers Ka]a low-latency updates =*Open*Source*Components*(Green)
  • 20. © Copyright 2014 Booz Allen Hamilton tl; dr; +  Data Loading via Sqoop / Custom Transport +  Ingest / Index via MapReduce +  Distributed Query via Storm+Lucene +  Batch / Reporting Via MR / Hive +  Authentication via Kerberos +  Access Via Web Application & Knox +  Currently 100TB / 50% used, 150TB by EOY
  • 21. © Copyright 2014 Booz Allen Hamilton Infrastructure and Network Security +  Amazon Web Services Provided +  Virtual Private Cloud / Security Groups +  Time to Deployment in Early Phases +  Physical access to data centers, network isolation, etc. +  Future Transition on-Premise Infrastructure +  Concerned with procurement time +  Other clients we’ve worked with 3-6 month turnaround for infrastructure prep +  Instance Level Malware Detection tuned to co-exist with cluster workloads
  • 22. © Copyright 2014 Booz Allen Hamilton Encryption At Rest: +  LUKS (Linux Unified Key Setup) for Ephemeral Storage Volumes +  “Lock it up and throw away the key” In Transit: +  SSL to Web App Endpoints and Knox Gateway +  Internal Network Isolation – VPC Controls prevent traffic interception & MITM attacks
  • 23. © Copyright 2014 Booz Allen Hamilton Authentication and Authorization +  Authentication via Kerberos +  Authorization via LDAP +  Future transition to enterprise authentication services: Oracle IAM. +  Multi-factor Authentication for both Users and Devices via PKI +  Authorization performed at both the User and Device Level
  • 24. © Copyright 2014 Booz Allen Hamilton Operating System user accounts and groups for users, projects and teams reflected in HDFS permissions Privileged access via Knox Gateway extension which provides access via SSH, auditing and monitoring and control of administrative connections into the cluster. (KNOX-250) Identity Provider Knox Gateway Hadoop Cluster (Master) (Oozie) (Hive2 Server) External Sources REST/SSL SSH HTTP SPNEGO Privileged Access Management
  • 25. © Copyright 2014 Booz Allen Hamilton Putting it All Together +  Search UI is a web application accessed via SSL +  Knox is the primary cluster access mechanism for users who need to access to the cluster. Knox Provides access to the following services: +  WebHDFS, WebHCat, Hive, Oozie +  Knox for administrative access, via custom SSH plugin
  • 26. © Copyright 2014 Booz Allen Hamilton Future Directions +  Role Base Access Control is an emerging client need. This will require: +  Integration with enterprise role management +  Passing roles through Web App & Knox to backend +  Role based access in Accumulo, Lucene Indexes +  Smart Data Tagging Strategy …
  • 27. © Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton Smart Data
  • 28. © Copyright 2014 Booz Allen Hamilton Smart Data +  How many organizations have data security requirements? +  A structured, verifiable representation of security tags bound to the data is required in order for the enterprise to become inherently "smarter" about the information flowing in and around it – Smart Data +  Overview of design principles: +  PKI +  Implement ABAC controls in IdAM +  Define trusted data format based on data security +  Tag all your data +  Deploy Hadoop platform that leverages tags to track access +  Log, monitor, and audit everything
  • 29. © Copyright 2014 Booz Allen Hamilton Data Element Visibility Tags (red | blue | green) Authorization Authentication Attributes (red, orange, blue) IDAM User Machine Learning Free-Computation Alerting Geographic Language Translation Entity Relationship Event Grab Dense/ Sparse Structured Unstructured Streaming Provisioning Deployment Monitoring Workflow Streaming Analytics Streaming indexes Apache Accumulo Overview of Smart Data
  • 30. © Copyright 2014 Booz Allen Hamilton Allow access to resource MedicalJournal with attribute patientID=x if Subject match DesignatedDoctorOfPatient and action is read with obligation on Permit: doLog_Inform(patientID,Subject,time) on Deny : doLog_UnauthorizedLogin(patientID,Subject,time) Smart Data Security Controls +  Trusted Client – Authorization and Authentication using PKI +  Trusted Data Format – Data visibility is controlled using Boolean expressions +  Ex.“((red|blue|green) & (white|yellow))” +  Clients present Authorizations (red, blue, green, yellow) to Apache Accumulo +  Corresponding tags are bound to data stored in Apache Accumulo +  Trusted Log – All data interactions are logged and audited Identity and Access Management +  Attribute Based Access Control – Users all assigned series of attributes +  Attributes and Authorization Bound by XACML, SAML +  Policy Decision Point (PDP) +  Policy Enforcement Point (PEP) +  Policy Retrieval Point (PRP) +  Policy Information Point (PIP) +  Policy Administration Point (PAP)
  • 31. © Copyright 2014 Booz Allen Hamilton Tagging Smart Data Formulate the tags used to control data from multiple perspectives +  Data Origin +  Level of Access Required +  Information Governance Policy +  Data Owners +  Intended Recipients Use fine grained tags, assign users many roles +  Tag at the field level so that existence can be verified without revealing the full data record In Accumulo: +  Capitalize on the richness of boolean expressions in visibility tags +  Differential Compression eliminates the impact of repartition of data +  Visibility Tags are bound to the data, changing visibilities is not trivial: it means a delete and a re-add.
  • 32. © Copyright 2014 Booz Allen Hamilton Representational versus Referential Tags Representational tags encode the specific visibilities they represent, including all alternate controls for a specific document User has roles of ACCOUNTING, RESEARCH and PII +  If data has tag PII&RESEARCH, user can access data +  If data has tag HIPAA&ACCOUNTING, user can’t access data Referential Tags are a code, that relies on external translation between assigned access controls and visibility markings: Data has marking of 03DECAF00D +  User has roles of ACCOUNTING, RESEARCH and PII +  At lookup, translation of user roles into possible referential tags Choice depends on security posture, what are the consequences of getting it wrong versus the ease of shifting policy or data?
  • 33. © Copyright 2014 Booz Allen Hamilton© Copyright 2014 Booz Allen Hamilton Emerging Security Capabilities
  • 34. © Copyright 2014 Booz Allen Hamilton Ecosystem for security capabilities for Hadoop is growing rapidly Cloudera (with Intel Rhino) +  Sentry (ACLs for Hive / Impala) +  Gazzang (Filesystem Encryption) +  Intel Rhino +  Encryption Codec Support HADOOP-9331 +  Key Distribution & Management MAPREDUCE-5025 +  Token Based Authentication HADOOP-9392 +  Unified Authorization Framework HADOOP-9466 +  Transparent Encryption for Hbase/Zookeeper +  Others, see https://github.com/intel-hadoop/project-rhino/ Hortonworks +  Production Ready Apache Knox +  XA Secure +  Central Administration +  Authorization for HDFS / Hive / Hbase +  Compliance Controls Lots of talks at this Hadoop Summit on data security: The Future of Hadoop Security – Joey Echeverria Hadoop REST API Security with the Apache Knox Gateway – Kevin Minder,Larry McCay Securing Big Data: Lock it Down, or Liberate? Jeff Graham,Mark Tomallo Improvements in Hadoop Security – Sanjay Radia,Chris Nauroth
  • 35. © Copyright 2014 Booz Allen Hamilton Summary +  Security for Hadoop has come a long way and is changing rapidly, but is still maturing +  Securing the data in Hadoop means thinking differently about the architecture when combining multiple data sources +  Your Hadoop Architecture should provide consistent security mechanisms across all of the data +  A more complete way to secure data is to implement Smart Data (ABAC and Fine Grained Access Controls) but this hasn’t been embraced consistently across the Hadoop ecosystem yet +  The next 6 months will be interesting …
  • 36. © Copyright 2014 Booz Allen Hamilton Just Released! The Field Guide to Data Science 120 page e-book of data science geekery Download for free: http://www.boozallen.com/datascience Thanks! Drew (@drewfarris) Peter (@petrguerra)