SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Kerberizing Spark
February 2017
Spark Summit East
Jorge López-Malla Matute
INDEX
jlopezm@stratio.com
@jorgelopezmalla
Abel Rincón Matarranz
arincon@stratio.com
@ganger85
Kerberos
● Introduction
● Key concepts
● Workflow
● Impersonation
1
3
Use Case
● Definition
● Workflow
● Crossdata in production
2
4Stratio Solution
● Prerequirement
● Driver side
● Executor Side
● Final result
Demo time
● Demo
● Q&A
Presentation
Presentation
JORGE LÓPEZ-MALLA
After working with traditional
processing methods, I started to
do some R&S Big Data projects
and I fell in love with the Big Data
world. Currently i’m doing some
awesome Big Data projects
and tools at Stratio.
SKILLS
Presentation
Presentation
ABEL RINCÓN MATARRANZ
SKILLS
Our company
Presentation
Stratio accompanies businesses on their journey through complete Digital Transformation. Through hard
work and creativity, our ambition is to reinvent companies around their data so they can compete in a
changed world.
Our solution is a transformational product that uses third generation Big Data technologies to execute the
most comprehensive form of Digital Transformation, ensuring scalability, maximum flexibility and adaptation
to new markets.
If you have any questions or would like a demonstration of our product, get in touch: contact@stratio.com
Our product
Presentation
Kerberos1
Kerberos
Kerberos
• What is Kerberos
○ Authentication protocol / standard / service
■ Safe
■ Single-sign-on
■ Trust based
■ Mutual authentication
Kerberos key concepts
Kerberos
• Client/Server → Do you need an explanation???
• Principal → Identify a unique client or service
• Realm → Identify a environment, company, domain …
○ DEMO.EAST.SUMMIT.SPARK.ORG
• KDC → Actor who manages the tickets
• TGT → Ticket which has the client session
• TGS → Ticket which has the client-service session
Kerberos Workflow
Kerberos
1. Client retrieve principal and secret
2. Client performs a TGT request
3. KDC returns TGT
4. Client request a TGS with the TGT
5. KDC returns the TGS
6. Client request a service session using the
TGS
7. Service establish a secure connection
directly with the client
Kerberos workflow 2
Kerberos
Client
Service
Backend
AS / KDC
Principal → User1
TGT (user1)
TGS → user1-service1
(tgsUS)
Principal → Service1
TGT (Service1)
TGS → service1-backend1 (tgsSB)
Principal → backend1
TGT (backend1)
tgsUS
tgsSB
user1
service1
Kerberos workflow - Impersonation
Kerberos
Client
Service
Backend
AS / KDC
Principal → User1
TGT (user1)
TGS → user1-service1
(tgsUS)
Principal → Service1
TGT (Service1)
TGS → service1-backend1 (tgsSB)
Principal → backend1
TGT (backend1)
tgsUS
tgsSB
user1
service1user1
Use Case2
Use Case
• Stratio Crossdata is a distributed framework and a fast and general-purpose
computing system powered by Apache Spark
• Can be used both as library and as a server.
• Crossdata Server: Provides a multi-user environment to SparkSQL, giving a
reliable architecture with high-availability and scalability out of the box
• To do so it use both native queries and Spark
• Crossdata Server had a unique long time SparkContext to execute all its Sparks
queries
• Crossdata can use YARN, Mesos and Standalone as a resource manager
Use Case
Crossdata as Server
sql> select * from table1
Crossdata shell
Master
Worker-1 Worker-2
Executor-0 Executor-1
Task-0 Task-1
HDFS
Crossdata server
(Spark Driver)
Crossdata server
sql> select * from table1
-------------------------
|id | name |
-------------------------
|1 | John Doe |
Use Case
Kerbe
ros
• Projects in production needs runtime impersonation to be compliance the
AAA(Authorization, Authentication and Audit) at the storage.
• Crossdata allows several users per execution
• Neither of the Sparks resource managers allows us to impersonate in runtime.
• Evenmore Standalone as resource manager does not provide any Kerberos
feature.
Crossdata in production
Use Case
Stratio Solution3
Prerequirement
Stratio solution
• Keytab have to be accessible in all the cluster machines
• Keytab must provide proxy grants
• Hadoop client configuration located in the cluster
• Each user, both proxy and real, must have a home in HDFS
Introduction
Stratio solution
• Spark access to the storage system both in the Driver and the Executors.
• In the Driver side both Spark Core and SparkSQL will access to the storage
system.
• Executors will always access via Task.
• As Streaming use the same classes than SparkCore or SparkSQL the same
solution will be usable by Streaming jobs
KerberosUser (Utils)
object KerberosUser extends Logging with UserCache {
def setProxyUser(user: String): Unit = proxyUser = Option(user)
def getUserByName(name: Option[String]): Option[UserGroupInformation] = {
if (getConfiguration.isDefined) {
userFromKeyTab(name)
}
else None
}
private def userFromKeyTab(proxyUser: Option[String]): Option[UserGroupInformation] = {
if (realUser.isDefined) realUser.get.checkTGTAndReloginFromKeytab()
(realUser, proxyUser) match {
case (Some(_), Some(proxy)) => users.get(proxy).orElse(loginProxyUser(proxy))
case (Some(_), None) => realUser
case (None, None) => None
}
}
private lazy val getConfiguration: Option[(String, String)] = {
val principal = env.conf.getOption("spark.executor.kerberos.principal")
val keytab = env.conf.getOption("spark.executor.kerberos.keytab")
(principal, keytab) match {
case (Some(p), Some(k)) => Option(p, k)
case _ => None
}
}
Configuration
setting proxy user
(Global)
Choose between real or
proxy user
Stratio solution
Public Method retrieve
user
Wrappers (Utils)
def executeSecure[U, T](proxyUser: Option[String],
funct: (U => T),
inputParameters: U): T = {
KerberosUser.getUserByName(proxyUser) match {
case Some(user) => {
user.doAs(new PrivilegedExceptionAction[T]() {
@throws(classOf[Exception])
def run: T = {
funct(inputParameters)
}
})
}
case None => {
funct(inputParameters)
}
}
}
def executeSecure[T](exe: ExecutionWrp[T]): T = {
KerberosUser.getUser match {
case Some(user) => {
user.doAs(new PrivilegedExceptionAction[T]() {
@throws(classOf[Exception])
def run: T = {
exe.value
}
})
}
case None => exe.value
}
}
class ExecutionWrp[T](wrp: => T) {
lazy val value: T = wrp
}
Stratio solution
Driver Side
Stratio Solution
abstract class RDD[T: ClassTag](
@transient private var _sc: SparkContext,
@transient private var deps: Seq[Dependency[_]]
) extends Serializable with Logging {
...
/**
* Get the array of partitions of this RDD, taking into account whether the
* RDD is checkpointed or not.
*/
final def partitions: Array[Partition] = {
checkpointRDD.map(_.partitions).getOrElse {
if (partitions_ == null) {
partitions_ = KerberosFunction.executeSecure(new ExecutionWrp(getPartitions))
partitions_.zipWithIndex.foreach { case (partition, index) =>
require(partition.index == index,
s"partitions($index).partition == ${partition.index}, but it should equal $index")
}
}
partitions_
}
}
Wrapping parameterless
method
class PairRDDFunctions[K, V](self: RDD[(K, V)])
(implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null)
...
def saveAsHadoopDataset(conf: JobConf): Unit = self.withScope {
// Rename this as hadoopConf internally to avoid shadowing (see SPARK-2038).
val internalSave: (JobConf => Unit) = (conf: JobConf) => {
val hadoopConf = conf
val outputFormatInstance = hadoopConf.getOutputFormat
val keyClass = hadoopConf.getOutputKeyClass
val valueClass = hadoopConf.getOutputValueClass
...
val writeToFile = (context: TaskContext, iter: Iterator[(K, V)]) => {
….
}
self.context.runJob(self, writeToFile)
writer.commitJob()
}
KerberosFunction.executeSecure(internalSave, conf)
}
Driver Side
Stratio Solution
Hadoop Datastore
RDD save function
Inside wrapper function
that will run in the cluster
Kerberos authentified
save function
Driver Side
class InMemoryCatalog(
conf: SparkConf = new SparkConf,
hadoopConfig: Configuration = new Configuration)
override def createDatabase(
dbDefinition: CatalogDatabase,
ignoreIfExists: Boolean): Unit = synchronized {
def inner: Unit = {
...
val location = new Path(dbDefinition.locationUri)
val fs = location.getFileSystem(hadoopConfig)
fs.mkdirs(location)
} catch {
case e: IOException =>
throw new SparkException(s"Unable to create database ${dbDefinition.name} as failed " +
s"to create its directory ${dbDefinition.locationUri}", e)
}
catalog.put(dbDefinition.name, new DatabaseDesc(dbDefinition))
}
}
KerberosFunction.executeSecure(KerberosUser.principal, new ExecutionWrp(inner))
}
Stratio Solution
Spark create a
directory in HDFS
* Interface used to load a [[Dataset]] from external storage systems (e.g. file systems,
...
class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
...
def load(): DataFrame = {
load(Seq.empty: _*) // force invocation of `load(...varargs...)`
}
...
def load(paths: String*): DataFrame = {
val proxyuser = extraOptions.get("user")
if (proxyuser.isDefined) KerberosUser.setProxyUser(proxyuser.get)
val dataSource = KerberosFunction.executeSecure(proxyuser,
DataSource.apply,
sparkSession,
source,
paths, userSpecifiedSchema, Seq.empty, None, extraOptions.toMap)
val baseRelation = KerberosFunction.executeSecure(proxyuser, dataSource.resolveRelation, false)
KerberosFunction.executeSecure(proxyuser, sparkSession.baseRelationToDataFrame, baseRelation)
}
Driver Side
Stratio Solution
get user from dataset
options
Method for load data from
sources without path
obtaining baseRelation
from datasource
* Interface used to write a [[Dataset]] from external storage systems (e.g. file systems,
...
class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
...
/**
* Saves the content of the [[DataFrame]] as the specified table.
...
def save(): Unit = {
assertNotBucketed("save")
...
val maybeUser = extraOptions.get("user")
def innerWrite(modeData: (SaveMode, DataFrame)): Unit = {
val (mode, data) = modeData
dataSource.write(mode, data)
}
if (maybeUser.isDefined) KerberosUser.setProxyUser(maybeUser.get)
KerberosFunction.executeSecure(maybeUser, innerWrite, (mode, df))
}
Driver Side
Stratio Solution
get user from dataset
options
Method for save data in
external sources
Wrapping save
execution
class DAGScheduler(...){
…
…
KerberosUser.getMaybeUser match {
case Some(user) => properties.setProperty("user", user)
case _ =>
}
...
val tasks: Seq[Task[_]] = try {
stage match {
case stage: ShuffleMapStage =>
partitionsToCompute.map { id =>
...
new ShuffleMapTask(stage.id, stage.latestInfo.attemptId,
taskBinary, part, locs, stage.latestInfo.taskMetrics, properties)
...
new ResultTask(stage.id, stage.latestInfo.attemptId,
taskBinary, part, locs, id, properties, stage.latestInfo.taskMetrics)
}
...
Driver Side
Stratio Solution
private[spark] abstract class Task[T](
val stageId: Int,
val stageAttemptId: Int,
val partitionId: Int,
// The default value is only used in tests.
val metrics: TaskMetrics = TaskMetrics.registered,
@transient var localProperties: Properties = new Properties) extends Serializable {
...
final def run(
...
try {
val proxyUser =
Option(Executor.taskDeserializationProps.get().getProperty("user"))
KerberosFunction.executeSecure(proxyUser, runTask, context)
} catch {
…
def runTask(context: TaskContext): T
Executor Side
Stratio Solution
properties load in
Driver side
get proxy user and
wrapped execution
method implemented by
Task subclasses
Demo time
Demo time
• Merge this code in Apache Spark (SPARK-16788)
• Pluggable authorization
• Pluggable secret management (why always use Hadoop delegation tokens?)
• Distributed cache.
• ...
Next Steps
Next Steps
Q & A
Q & A
people@stratio.com
WE ARE HIRING
@StratioBD
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla

Weitere ähnliche Inhalte

Was ist angesagt?

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Spark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesIlya Ganelin
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaDatabricks
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonDatabricks
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Spark Summit
 

Was ist angesagt? (20)

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Frustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFramesFrustration-Reduced PySpark: Data engineering with DataFrames
Frustration-Reduced PySpark: Data engineering with DataFrames
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
 

Andere mochten auch

What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Spark Summit
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Spark Summit
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark Summit
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Spark Summit
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark Summit
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteSpark Summit
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Spark Summit
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Spark Summit
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...Spark Summit
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Spark Summit
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Summit
 
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas PatilCustom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas PatilSpark Summit
 

Andere mochten auch (20)

What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
Analysis Andromeda Galaxy Data Using Spark: Spark Summit East Talk by Jose Na...
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
 
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...Realtime Analytical Query Processing and Predictive Model Building on High Di...
Realtime Analytical Query Processing and Predictive Model Building on High Di...
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas PatilCustom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
Custom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil
 

Ähnlich wie Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla

Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastJorge Lopez-Malla
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overviewjessesanford
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...Codemotion
 
Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?Andrzej Ludwikowski
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinSigma Software
 
Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksMongoDB
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Codemotion
 
Docker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionDocker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionBrennan Saeta
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operationsgrim_radical
 
Microservices and modularity with java
Microservices and modularity with javaMicroservices and modularity with java
Microservices and modularity with javaDPC Consulting Ltd
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...Inhacking
 
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)lennartkats
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle ManagementDoKC
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino
 

Ähnlich wie Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla (20)

Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
 
Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?Event Sourcing - what could possibly go wrong?
Event Sourcing - what could possibly go wrong?
 
How to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita GalkinHow to make a high-quality Node.js app, Nikita Galkin
How to make a high-quality Node.js app, Nikita Galkin
 
Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New Tricks
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
Jörg Schad - Hybrid Cloud (Kubernetes, Spark, HDFS, …)-as-a-Service - Codemot...
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Docker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionDocker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline Execution
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
 
Microservices and modularity with java
Microservices and modularity with javaMicroservices and modularity with java
Microservices and modularity with java
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
Operator Lifecycle Management
Operator Lifecycle ManagementOperator Lifecycle Management
Operator Lifecycle Management
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless framework
 

Mehr von Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Mehr von Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Kürzlich hochgeladen

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla

  • 2. Jorge López-Malla Matute INDEX jlopezm@stratio.com @jorgelopezmalla Abel Rincón Matarranz arincon@stratio.com @ganger85 Kerberos ● Introduction ● Key concepts ● Workflow ● Impersonation 1 3 Use Case ● Definition ● Workflow ● Crossdata in production 2 4Stratio Solution ● Prerequirement ● Driver side ● Executor Side ● Final result Demo time ● Demo ● Q&A
  • 3. Presentation Presentation JORGE LÓPEZ-MALLA After working with traditional processing methods, I started to do some R&S Big Data projects and I fell in love with the Big Data world. Currently i’m doing some awesome Big Data projects and tools at Stratio. SKILLS
  • 5. Our company Presentation Stratio accompanies businesses on their journey through complete Digital Transformation. Through hard work and creativity, our ambition is to reinvent companies around their data so they can compete in a changed world. Our solution is a transformational product that uses third generation Big Data technologies to execute the most comprehensive form of Digital Transformation, ensuring scalability, maximum flexibility and adaptation to new markets. If you have any questions or would like a demonstration of our product, get in touch: contact@stratio.com
  • 8. Kerberos Kerberos • What is Kerberos ○ Authentication protocol / standard / service ■ Safe ■ Single-sign-on ■ Trust based ■ Mutual authentication
  • 9. Kerberos key concepts Kerberos • Client/Server → Do you need an explanation??? • Principal → Identify a unique client or service • Realm → Identify a environment, company, domain … ○ DEMO.EAST.SUMMIT.SPARK.ORG • KDC → Actor who manages the tickets • TGT → Ticket which has the client session • TGS → Ticket which has the client-service session
  • 10. Kerberos Workflow Kerberos 1. Client retrieve principal and secret 2. Client performs a TGT request 3. KDC returns TGT 4. Client request a TGS with the TGT 5. KDC returns the TGS 6. Client request a service session using the TGS 7. Service establish a secure connection directly with the client
  • 11. Kerberos workflow 2 Kerberos Client Service Backend AS / KDC Principal → User1 TGT (user1) TGS → user1-service1 (tgsUS) Principal → Service1 TGT (Service1) TGS → service1-backend1 (tgsSB) Principal → backend1 TGT (backend1) tgsUS tgsSB user1 service1
  • 12. Kerberos workflow - Impersonation Kerberos Client Service Backend AS / KDC Principal → User1 TGT (user1) TGS → user1-service1 (tgsUS) Principal → Service1 TGT (Service1) TGS → service1-backend1 (tgsSB) Principal → backend1 TGT (backend1) tgsUS tgsSB user1 service1user1
  • 14. Use Case • Stratio Crossdata is a distributed framework and a fast and general-purpose computing system powered by Apache Spark • Can be used both as library and as a server. • Crossdata Server: Provides a multi-user environment to SparkSQL, giving a reliable architecture with high-availability and scalability out of the box • To do so it use both native queries and Spark • Crossdata Server had a unique long time SparkContext to execute all its Sparks queries • Crossdata can use YARN, Mesos and Standalone as a resource manager Use Case
  • 15. Crossdata as Server sql> select * from table1 Crossdata shell Master Worker-1 Worker-2 Executor-0 Executor-1 Task-0 Task-1 HDFS Crossdata server (Spark Driver) Crossdata server sql> select * from table1 ------------------------- |id | name | ------------------------- |1 | John Doe | Use Case Kerbe ros
  • 16. • Projects in production needs runtime impersonation to be compliance the AAA(Authorization, Authentication and Audit) at the storage. • Crossdata allows several users per execution • Neither of the Sparks resource managers allows us to impersonate in runtime. • Evenmore Standalone as resource manager does not provide any Kerberos feature. Crossdata in production Use Case
  • 18. Prerequirement Stratio solution • Keytab have to be accessible in all the cluster machines • Keytab must provide proxy grants • Hadoop client configuration located in the cluster • Each user, both proxy and real, must have a home in HDFS
  • 19. Introduction Stratio solution • Spark access to the storage system both in the Driver and the Executors. • In the Driver side both Spark Core and SparkSQL will access to the storage system. • Executors will always access via Task. • As Streaming use the same classes than SparkCore or SparkSQL the same solution will be usable by Streaming jobs
  • 20. KerberosUser (Utils) object KerberosUser extends Logging with UserCache { def setProxyUser(user: String): Unit = proxyUser = Option(user) def getUserByName(name: Option[String]): Option[UserGroupInformation] = { if (getConfiguration.isDefined) { userFromKeyTab(name) } else None } private def userFromKeyTab(proxyUser: Option[String]): Option[UserGroupInformation] = { if (realUser.isDefined) realUser.get.checkTGTAndReloginFromKeytab() (realUser, proxyUser) match { case (Some(_), Some(proxy)) => users.get(proxy).orElse(loginProxyUser(proxy)) case (Some(_), None) => realUser case (None, None) => None } } private lazy val getConfiguration: Option[(String, String)] = { val principal = env.conf.getOption("spark.executor.kerberos.principal") val keytab = env.conf.getOption("spark.executor.kerberos.keytab") (principal, keytab) match { case (Some(p), Some(k)) => Option(p, k) case _ => None } } Configuration setting proxy user (Global) Choose between real or proxy user Stratio solution Public Method retrieve user
  • 21. Wrappers (Utils) def executeSecure[U, T](proxyUser: Option[String], funct: (U => T), inputParameters: U): T = { KerberosUser.getUserByName(proxyUser) match { case Some(user) => { user.doAs(new PrivilegedExceptionAction[T]() { @throws(classOf[Exception]) def run: T = { funct(inputParameters) } }) } case None => { funct(inputParameters) } } } def executeSecure[T](exe: ExecutionWrp[T]): T = { KerberosUser.getUser match { case Some(user) => { user.doAs(new PrivilegedExceptionAction[T]() { @throws(classOf[Exception]) def run: T = { exe.value } }) } case None => exe.value } } class ExecutionWrp[T](wrp: => T) { lazy val value: T = wrp } Stratio solution
  • 22. Driver Side Stratio Solution abstract class RDD[T: ClassTag]( @transient private var _sc: SparkContext, @transient private var deps: Seq[Dependency[_]] ) extends Serializable with Logging { ... /** * Get the array of partitions of this RDD, taking into account whether the * RDD is checkpointed or not. */ final def partitions: Array[Partition] = { checkpointRDD.map(_.partitions).getOrElse { if (partitions_ == null) { partitions_ = KerberosFunction.executeSecure(new ExecutionWrp(getPartitions)) partitions_.zipWithIndex.foreach { case (partition, index) => require(partition.index == index, s"partitions($index).partition == ${partition.index}, but it should equal $index") } } partitions_ } } Wrapping parameterless method
  • 23. class PairRDDFunctions[K, V](self: RDD[(K, V)]) (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null) ... def saveAsHadoopDataset(conf: JobConf): Unit = self.withScope { // Rename this as hadoopConf internally to avoid shadowing (see SPARK-2038). val internalSave: (JobConf => Unit) = (conf: JobConf) => { val hadoopConf = conf val outputFormatInstance = hadoopConf.getOutputFormat val keyClass = hadoopConf.getOutputKeyClass val valueClass = hadoopConf.getOutputValueClass ... val writeToFile = (context: TaskContext, iter: Iterator[(K, V)]) => { …. } self.context.runJob(self, writeToFile) writer.commitJob() } KerberosFunction.executeSecure(internalSave, conf) } Driver Side Stratio Solution Hadoop Datastore RDD save function Inside wrapper function that will run in the cluster Kerberos authentified save function
  • 24. Driver Side class InMemoryCatalog( conf: SparkConf = new SparkConf, hadoopConfig: Configuration = new Configuration) override def createDatabase( dbDefinition: CatalogDatabase, ignoreIfExists: Boolean): Unit = synchronized { def inner: Unit = { ... val location = new Path(dbDefinition.locationUri) val fs = location.getFileSystem(hadoopConfig) fs.mkdirs(location) } catch { case e: IOException => throw new SparkException(s"Unable to create database ${dbDefinition.name} as failed " + s"to create its directory ${dbDefinition.locationUri}", e) } catalog.put(dbDefinition.name, new DatabaseDesc(dbDefinition)) } } KerberosFunction.executeSecure(KerberosUser.principal, new ExecutionWrp(inner)) } Stratio Solution Spark create a directory in HDFS
  • 25. * Interface used to load a [[Dataset]] from external storage systems (e.g. file systems, ... class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { ... def load(): DataFrame = { load(Seq.empty: _*) // force invocation of `load(...varargs...)` } ... def load(paths: String*): DataFrame = { val proxyuser = extraOptions.get("user") if (proxyuser.isDefined) KerberosUser.setProxyUser(proxyuser.get) val dataSource = KerberosFunction.executeSecure(proxyuser, DataSource.apply, sparkSession, source, paths, userSpecifiedSchema, Seq.empty, None, extraOptions.toMap) val baseRelation = KerberosFunction.executeSecure(proxyuser, dataSource.resolveRelation, false) KerberosFunction.executeSecure(proxyuser, sparkSession.baseRelationToDataFrame, baseRelation) } Driver Side Stratio Solution get user from dataset options Method for load data from sources without path obtaining baseRelation from datasource
  • 26. * Interface used to write a [[Dataset]] from external storage systems (e.g. file systems, ... class DataFrameWriter[T] private[sql](ds: Dataset[T]) { ... /** * Saves the content of the [[DataFrame]] as the specified table. ... def save(): Unit = { assertNotBucketed("save") ... val maybeUser = extraOptions.get("user") def innerWrite(modeData: (SaveMode, DataFrame)): Unit = { val (mode, data) = modeData dataSource.write(mode, data) } if (maybeUser.isDefined) KerberosUser.setProxyUser(maybeUser.get) KerberosFunction.executeSecure(maybeUser, innerWrite, (mode, df)) } Driver Side Stratio Solution get user from dataset options Method for save data in external sources Wrapping save execution
  • 27. class DAGScheduler(...){ … … KerberosUser.getMaybeUser match { case Some(user) => properties.setProperty("user", user) case _ => } ... val tasks: Seq[Task[_]] = try { stage match { case stage: ShuffleMapStage => partitionsToCompute.map { id => ... new ShuffleMapTask(stage.id, stage.latestInfo.attemptId, taskBinary, part, locs, stage.latestInfo.taskMetrics, properties) ... new ResultTask(stage.id, stage.latestInfo.attemptId, taskBinary, part, locs, id, properties, stage.latestInfo.taskMetrics) } ... Driver Side Stratio Solution
  • 28. private[spark] abstract class Task[T]( val stageId: Int, val stageAttemptId: Int, val partitionId: Int, // The default value is only used in tests. val metrics: TaskMetrics = TaskMetrics.registered, @transient var localProperties: Properties = new Properties) extends Serializable { ... final def run( ... try { val proxyUser = Option(Executor.taskDeserializationProps.get().getProperty("user")) KerberosFunction.executeSecure(proxyUser, runTask, context) } catch { … def runTask(context: TaskContext): T Executor Side Stratio Solution properties load in Driver side get proxy user and wrapped execution method implemented by Task subclasses
  • 30. • Merge this code in Apache Spark (SPARK-16788) • Pluggable authorization • Pluggable secret management (why always use Hadoop delegation tokens?) • Distributed cache. • ... Next Steps Next Steps
  • 31. Q & A Q & A