SlideShare ist ein Scribd-Unternehmen logo
1 von 23
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Vinay Shukla Srikanth Venkat
Director, Product Management Senior Director, Product Management
@neomythos @srikvenk
Don’t Let a Spark Burn Your House:
Perspectives on Securing Spark
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
About us…
Vinay Shukla
Director of Product Management, Data Science
Spark & Zeppelin
Srikanth Venkat
Senior Director of Product Management, Security & Governance
Apache Ranger, Apache Atlas, Apache Knox, HDP Platform Security
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Securing Spark in the Hadoop Castle…..
Secure In-Cluster Access :
Wire Encryption
Data At Rest Protection:
HDFS Encryption
Authorization & Audit:
HDFS ACLs, YARN ACLs, Apache RangerPerimeter Security:
Network Segmentation,
Firewalls
Authentication: LDAP/AD, Kerberos, Apache Knox
Secure Gateway: Apache Knox
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Challenges in Securing Enterprise Deployments of Spark
 How to deploy Spark securely?
AAA: Authentication, Authorization & Audits
Network and Perimeter Security
Protect data both in motion & at rest
 Make security easy to deploy, administer, manage and govern
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Guiding Principles
 Secure the Network access
 Firewalls
 Use Secure gateways and trusted proxies (Apache Knox)
 Provide access only to authorized users
 LDAP/AD
 Kerberos
 Service level authorization (Apache Knox)
 Secure data sources with coarse fine grained authorizations
 Hive (databases, tables, columns..)
 HDFS (files, folders)
 Apache Ranger for Audits and ABAC authroizations
 Data Protection at rest and in motion
 HDFS TDE (data encryption at rest)
 Wire encryption, SSL (data in motion)
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Many ways to interact with Spark
Ex
Spark on YARN
Zeppelin
Spark-
Shell
Ex
Spark
Thrift
Server
Livy REST
Server
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
Spark Driver
Livy REST
Server
D
r
i
v
e
r
With Livy
Interpreter
Spark
Interpreter
Firewall
Custom
Web
App
BI Tool
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Context: Spark Deployment Modes
• Spark on YARN
–Spark driver (SparkContext) in YARN AM(yarn-cluster)
–Spark driver (SparkContext) in local (yarn-client):
• Spark Shell & Spark Thrift Server runs in yarn-client only
Client
Executor
App
MasterSpark Driver
Client
Executor
App Master
Spark Driver
YARN-Client YARN-Cluster
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
How Spark on YARN works
Spark Submit
Jane Doe
Spark
AM
1
Hadoop Cluster
HDFS
Executor
YARN RM
4
2 3
Node
Manager
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Authenticate users with AD/LDAP
KDC
Use Spark ST, submit Spark Job
Spark gets Namenode (NN)
service ticket
YARN launches Spark
Executors using John
Doe’s identity
Get service ticket for
Spark
Jane Doe
Spark AM
NN
Executor reads from HDFS using
John Doe’s delegation token
kinit
1
2
3
4
5
6
7
Hadoop Cluster
AD/LDAP
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDFS
Authorization: Secure user access to data sources and queues
YARN Cluster
A B C
KDC
Use Spark ST,
submit Spark Job
Get Namenode (NN)
service ticket
Executors
read from
HDFS
Client gets service
ticket for Spark
Ranger
Can Jane launch jobs in this queue?
Can Jane read this file
Jane Doe
Firewall
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Livy RESTful Access to Spark
Livy supports only Kerberos/SPNEGO based authentication, no LDAP support
Livy default port 8999 & by default runs in yarn-cluster mode
See https://hortonworks.com/blog/livy-a-rest-interface-for-apache-spark/
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
SparkThirftServer doAs
1. End User > Spark Thirft Server > Spark Job runs as end user
2. Provides coarse grained (table/file) level access control
3. Only fixed for Spark 1.6 & available in HDP 2.6 & 2.5.x
4. Use SparkSQL + LLAP (Ranger Integration) for fine grained access control (row/column) & masking
(works with both Spark 1.6 & Spark 2.1)
See https://community.hortonworks.com/articles/101418/user-impersonation-in-apache-spark-16-
thrift-serve.html
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
More ways to interact with Spark
• With Kerberos
• Over SSL
• https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-
component-guide/content/using-spark-streaming.html#spark-streaming-kerb-job
• https://community.hortonworks.com/content/kbentry/55154/kafka-ssl-kerberos-cheat-
sheet-settingsconsole-com.html
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Yet more ways to interact with Spark
https://github.com/hortonworks-spark/shc
• With Kerberos
kinit -k -t /tmp/hrt_qa.headless.keytab hrt_qa
/usr/hdp/current/spark-client/bin/spark-submit --class your.application.class --master yarn-
client --files /etc/hbase/conf/hbase-site.xml --packages com.hortonworks:shc-core:1.1.1-2.1-
s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ /To/your/application/jar
/usr/hdp/current/spark-client/bin/spark-submit --class your.application.class --master yarn-
cluster --files /etc/hbase/conf/hbase-site.xml --packages com.hortonworks:shc-core:1.1.1-2.1-
s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ /To/your/application/jar
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Fine-Grained Security:
SparkSQL/Hive LLAP with Ranger
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
SparkSQL Security: Row Filtering and Column Masking
 Spark SQL + Hive use cases enable users to explore data lakes and
democratize data access without sacrificing security
 Spark provides strong authentication via Kerberos and wire encryption via
SSL but as general purpose compute has no built in authorization sub-system
(yet)
 Spark also does not currently have any way to define a pluggable module
that contains policies for fine grain authorization
 Use Cases:
– Co-mingled data in the same table may belong to two different groups, each with their own
regulatory requirements.
– Data may have regional restrictions, time based availability restrictions, departmental restrictions,
etc.
21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only.
Hive LLAP – Open Interfaces
Deep
Storage
YARN Cluster
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2
(Query
Endpoint)
ODBC /
JDBC
SQL
Queries In-Memory Cache
(Shared Across All Users)
HDFS and
Compatible
S3 WASB Isilon
Spark
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Key Features: Spark Column Security with LLAP
 Fine-Grained Column Level Access Control for SparkSQL.
 Fully dynamic policies per user without proliferation of views and resulting view management overhead
 Use Standard Ranger infrastructure to control resource and apply row filtering and masking policies.
Flow:
1. SparkSQL gets data locations
known as “splits” from HiveServer
and plans query.
2. HiveServer2 authorizes access
using Ranger. Per-user policies
like row filtering are applied.
3. Spark gets a modified query plan
based on dynamic security policy.
4. Spark reads data from LLAP.
Filtering / masking guaranteed by
LLAP server.
HiveServer2
Authorization
Hive Metastore
Data Locations
View Definitions
LLAP
Data Read
Filter Pushdown
Ranger Server
Dynamic Policies
Spark Client
1
2
4
3
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Dynamic Row Filtering & Column Masking: SparkSQL via Hive LLAP
User 2: Ivanna
Location : EU
Group: HRUser 1: Joe
Location : US
Group: Analyst
Original Query:
SELECT country, nationalid,
ccnumber, mrn, name FROM
ww_customers
Country National
ID
CC No DOB MRN Name Policy ID
US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424
US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984
Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909
Country National ID CC No MRN Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Ranger Policy Enforcement
Query Rewritten based on Dynamic
Ranger Policies: Filter rows by region
& apply relevant column masking
Users from US Analyst group see data
for US persons with CC and National ID
(SSN) as masked values and MRN is
nullified
Country National ID Name MRN
Germany T22000129 Ernie Schwarz 876452830A
EU HR Policy Admins can see
unmasked but are restricted
by row filtering policies to see
data for EU persons only
Original Query:
SELECT country, nationalid,
name, mrn FROM
ww_customers
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Key Benefit of SparkSQL + Ranger Integration
 Shared Access Control Policy between SparkSQL and Hive
 Audit: All access via SparkSQL audited searchable through Ranger
 Resource Management: Each user can use a unique queue while accessing the securely shared data
 Minimum Transition Cost: Since this feature offers row/ column level security in SQL, existing Spark 2.1
apps and scripts and all Spark shells (spark-shell, pyspark, sparkR, spark-sql) are supported without any
modifications.
 https://hortonworks.com/blog/row-column-level-control-apache-spark/
 https://community.hortonworks.com/articles/101181/rowcolumn-level-security-in-sql-for-apache-
spark-2.html
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Demo of SparkSQL via Hive LLAP with
Ranger Integration
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
The Road Ahead for Spark Security
 Spark & Atlas Integration
 Livy & Knox Integration
 Zeppelin SSO Integration
 Zeppelin Ranger Integration
 Paassword integration with Hadoop Credentials
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank You!!
Vinay Shukla
@neomythos
Srikanth Venkat
@srikvenk

Weitere ähnliche Inhalte

Was ist angesagt?

Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowDataWorks Summit/Hadoop Summit
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...DataWorks Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonDataWorks Summit
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Databricks
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 

Was ist angesagt? (20)

Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 

Ähnlich wie Don't Let the Spark Burn Your House: Perspectives on Securing Spark

Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...DataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark ApplicationsCloudera, Inc.
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
 

Ähnlich wie Don't Let the Spark Burn Your House: Perspectives on Securing Spark (20)

Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Spark Security
Spark SecuritySpark Security
Spark Security
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Don't Let the Spark Burn Your House: Perspectives on Securing Spark

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Vinay Shukla Srikanth Venkat Director, Product Management Senior Director, Product Management @neomythos @srikvenk Don’t Let a Spark Burn Your House: Perspectives on Securing Spark
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved About us… Vinay Shukla Director of Product Management, Data Science Spark & Zeppelin Srikanth Venkat Senior Director of Product Management, Security & Governance Apache Ranger, Apache Atlas, Apache Knox, HDP Platform Security
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Securing Spark in the Hadoop Castle….. Secure In-Cluster Access : Wire Encryption Data At Rest Protection: HDFS Encryption Authorization & Audit: HDFS ACLs, YARN ACLs, Apache RangerPerimeter Security: Network Segmentation, Firewalls Authentication: LDAP/AD, Kerberos, Apache Knox Secure Gateway: Apache Knox
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Challenges in Securing Enterprise Deployments of Spark  How to deploy Spark securely? AAA: Authentication, Authorization & Audits Network and Perimeter Security Protect data both in motion & at rest  Make security easy to deploy, administer, manage and govern
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Guiding Principles  Secure the Network access  Firewalls  Use Secure gateways and trusted proxies (Apache Knox)  Provide access only to authorized users  LDAP/AD  Kerberos  Service level authorization (Apache Knox)  Secure data sources with coarse fine grained authorizations  Hive (databases, tables, columns..)  HDFS (files, folders)  Apache Ranger for Audits and ABAC authroizations  Data Protection at rest and in motion  HDFS TDE (data encryption at rest)  Wire encryption, SSL (data in motion)
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Many ways to interact with Spark Ex Spark on YARN Zeppelin Spark- Shell Ex Spark Thrift Server Livy REST Server D r i v e r D r i v e r D r i v e r D r i v e r D r i v e r Spark Driver Livy REST Server D r i v e r With Livy Interpreter Spark Interpreter Firewall Custom Web App BI Tool
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Context: Spark Deployment Modes • Spark on YARN –Spark driver (SparkContext) in YARN AM(yarn-cluster) –Spark driver (SparkContext) in local (yarn-client): • Spark Shell & Spark Thrift Server runs in yarn-client only Client Executor App MasterSpark Driver Client Executor App Master Spark Driver YARN-Client YARN-Cluster
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved How Spark on YARN works Spark Submit Jane Doe Spark AM 1 Hadoop Cluster HDFS Executor YARN RM 4 2 3 Node Manager
  • 9. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Authenticate users with AD/LDAP KDC Use Spark ST, submit Spark Job Spark gets Namenode (NN) service ticket YARN launches Spark Executors using John Doe’s identity Get service ticket for Spark Jane Doe Spark AM NN Executor reads from HDFS using John Doe’s delegation token kinit 1 2 3 4 5 6 7 Hadoop Cluster AD/LDAP
  • 10. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved HDFS Authorization: Secure user access to data sources and queues YARN Cluster A B C KDC Use Spark ST, submit Spark Job Get Namenode (NN) service ticket Executors read from HDFS Client gets service ticket for Spark Ranger Can Jane launch jobs in this queue? Can Jane read this file Jane Doe Firewall
  • 11. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Livy RESTful Access to Spark Livy supports only Kerberos/SPNEGO based authentication, no LDAP support Livy default port 8999 & by default runs in yarn-cluster mode See https://hortonworks.com/blog/livy-a-rest-interface-for-apache-spark/
  • 12. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved SparkThirftServer doAs 1. End User > Spark Thirft Server > Spark Job runs as end user 2. Provides coarse grained (table/file) level access control 3. Only fixed for Spark 1.6 & available in HDP 2.6 & 2.5.x 4. Use SparkSQL + LLAP (Ranger Integration) for fine grained access control (row/column) & masking (works with both Spark 1.6 & Spark 2.1) See https://community.hortonworks.com/articles/101418/user-impersonation-in-apache-spark-16- thrift-serve.html
  • 13. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved More ways to interact with Spark • With Kerberos • Over SSL • https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark- component-guide/content/using-spark-streaming.html#spark-streaming-kerb-job • https://community.hortonworks.com/content/kbentry/55154/kafka-ssl-kerberos-cheat- sheet-settingsconsole-com.html
  • 14. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Yet more ways to interact with Spark https://github.com/hortonworks-spark/shc • With Kerberos kinit -k -t /tmp/hrt_qa.headless.keytab hrt_qa /usr/hdp/current/spark-client/bin/spark-submit --class your.application.class --master yarn- client --files /etc/hbase/conf/hbase-site.xml --packages com.hortonworks:shc-core:1.1.1-2.1- s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ /To/your/application/jar /usr/hdp/current/spark-client/bin/spark-submit --class your.application.class --master yarn- cluster --files /etc/hbase/conf/hbase-site.xml --packages com.hortonworks:shc-core:1.1.1-2.1- s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ /To/your/application/jar
  • 15. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Fine-Grained Security: SparkSQL/Hive LLAP with Ranger
  • 16. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved SparkSQL Security: Row Filtering and Column Masking  Spark SQL + Hive use cases enable users to explore data lakes and democratize data access without sacrificing security  Spark provides strong authentication via Kerberos and wire encryption via SSL but as general purpose compute has no built in authorization sub-system (yet)  Spark also does not currently have any way to define a pluggable module that contains policies for fine grain authorization  Use Cases: – Co-mingled data in the same table may belong to two different groups, each with their own regulatory requirements. – Data may have regional restrictions, time based availability restrictions, departmental restrictions, etc.
  • 17. 21 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only. Hive LLAP – Open Interfaces Deep Storage YARN Cluster LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors LLAP Daemon Query Executors Query Coordinators Coord- inator Coord- inator Coord- inator HiveServer2 (Query Endpoint) ODBC / JDBC SQL Queries In-Memory Cache (Shared Across All Users) HDFS and Compatible S3 WASB Isilon Spark
  • 18. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Key Features: Spark Column Security with LLAP  Fine-Grained Column Level Access Control for SparkSQL.  Fully dynamic policies per user without proliferation of views and resulting view management overhead  Use Standard Ranger infrastructure to control resource and apply row filtering and masking policies. Flow: 1. SparkSQL gets data locations known as “splits” from HiveServer and plans query. 2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied. 3. Spark gets a modified query plan based on dynamic security policy. 4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server. HiveServer2 Authorization Hive Metastore Data Locations View Definitions LLAP Data Read Filter Pushdown Ranger Server Dynamic Policies Spark Client 1 2 4 3
  • 19. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Dynamic Row Filtering & Column Masking: SparkSQL via Hive LLAP User 2: Ivanna Location : EU Group: HRUser 1: Joe Location : US Group: Analyst Original Query: SELECT country, nationalid, ccnumber, mrn, name FROM ww_customers Country National ID CC No DOB MRN Name Policy ID US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424 US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984 Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909 Country National ID CC No MRN Name US xxxxx3233 4539 xxxx xxxx xxxx null John Doe US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe Ranger Policy Enforcement Query Rewritten based on Dynamic Ranger Policies: Filter rows by region & apply relevant column masking Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified Country National ID Name MRN Germany T22000129 Ernie Schwarz 876452830A EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only Original Query: SELECT country, nationalid, name, mrn FROM ww_customers
  • 20. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Key Benefit of SparkSQL + Ranger Integration  Shared Access Control Policy between SparkSQL and Hive  Audit: All access via SparkSQL audited searchable through Ranger  Resource Management: Each user can use a unique queue while accessing the securely shared data  Minimum Transition Cost: Since this feature offers row/ column level security in SQL, existing Spark 2.1 apps and scripts and all Spark shells (spark-shell, pyspark, sparkR, spark-sql) are supported without any modifications.  https://hortonworks.com/blog/row-column-level-control-apache-spark/  https://community.hortonworks.com/articles/101181/rowcolumn-level-security-in-sql-for-apache- spark-2.html
  • 21. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Demo of SparkSQL via Hive LLAP with Ranger Integration
  • 22. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved The Road Ahead for Spark Security  Spark & Atlas Integration  Livy & Knox Integration  Zeppelin SSO Integration  Zeppelin Ranger Integration  Paassword integration with Hadoop Credentials
  • 23. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank You!! Vinay Shukla @neomythos Srikanth Venkat @srikvenk