Performance in Spark 2.0, PDX Spark Meetup 8/18/16

Copyright © 2016, Oracle and/or its aﬃliates. All rights reserved. |
Performance in Spark 2.0
…and onward to the next level
Brad Carlile
Sr. Director Strategic ApplicaIons Engineer SAE
Oracle Systems Group
August 18, 2016

Addi$onal info on system performance results:
hNp://blogs.oracle.com/bestperf

Safe Harbor Statement
The following is intended to outline our general product direcIon. It is intended for
informaIon purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or funcIonality, and should not be relied upon
in making purchasing decisions. The development, release, and Iming of any features or
funcIonality described for Oracle’s products remains at the sole discreIon of Oracle.
2

About me
Sr. Director Strategic Applica0ons Engineering, Oracle Systems Group
•  Oracle Hardware Systems Performance & benchmarks: x86 & SPARC products
•  30 years parallel programming, performance opImizaIon & benchmarks: aNached processors, Hypercube, MPPs, SMPs, NUMA…
•  FloaIng Point Systems, Cray, Sun, Oracle
•  Skier
•  Traveler
•  Big Wall Climber
•  I Drive an Art Car
3

Spark is ExciIng – But You Already Know That!
Spark’s Whole Ecosystem totally appeals to the performance expert in me !
•  Great Ecosystem for AnalyIcs
– Spark forms a conInuum with other technologies (Kaaa, Solr, …)
•  Spark is fast & scalable because of clean design
– Spark’s in-memory focus
•  Spark speaks SQL & DataFrames
– Perfect marriage for Data ScienIsts & Hardware AcceleraIon
– Oracle’s Sofware in Silicon can be used on Apache Spark
•  Lots of Oracle customers using Apache Spark
4

Apache Spark
5
Spark Core
(Implemented in Scala & JVM, operates on RDDs)
Spark SQL
(SQL &
DSL(Scala/Java/Python…)
Spark API: Scala, Python, R, Java
DataFrames / DataSets
Data Sources: Json, csv, Hadoop, Cassandra, Hive, Hbase, Postgres, MySQL, Elasticsearch,...
Spark Streaming
(Streaming Analytics
micro-batches)
Spark MLlib
(Machine Learning &
statistic routines)
GraphX
(Graph)

Apache Spark 2.0.0 Great New Features
•  Spark SQL
– SQL 2003 and Uniﬁed DataFrames/Datasets API.
– Tungsten Phase2: Whole Stage CodeGen
•  Spark Mllib & GraphX – Large Scale Machine Learning on Apache Spark
•  Structured Streaming
6
Best new features - Star0ng with Spark SQL

SQL is Powerful Feature of Apache Spark
•  SQL is a powerful language for the wide range of Data ScienIsts
– Expresses set operaIons on data of any size: sort , ﬁlter, manipulate etc
7

•  SQL concisely express data manipulaIon at scale in readable way
– ETL (Extract, Transform, and Load)
– Feature selecIon for ML
– Feature creaIon/generaIon for ML
– Report generaIon
8

•  SQL concisely express data manipulaIon at scale in readable way
– ETL (Extract, Transform, and Load)
– Feature selecIon for ML
– Feature creaIon/generaIon for ML
– Report generaIon
•  Many well-known techniques to eﬃciently opImize SQL
– Apache Spark “Catalyst opImizer” reorganizes query for fastest execuIon
– Extensible: you can contribute your own opImizaIons
9

Spark SQL & DataFrames: Perfect for Data ScienIsts
Amazing work which also efficiently puts Data In Memory for fast access
•  SQL easiest way to write code to filter, merge, scan, …
–  scala> dfr =sparkSession.sql("SELECT firstName, age, gender
FROM cust
WHERE age > 20
ORDER BY age DESC").show()
•  DataFrames columnar and typed // schema: firstName:String, lastName:String, gender:String, age:Int
– DataFrames beNer for analyIcs than generic RDDs
10
RDD: row-based, original
Apache Spark data structure
Jay,Lock,M,81
Rosa,Ruiz,F,14
Clair,Bride,F,23
DataFrame: columnar
Jay Lock M 81
Rosa Ruiz F 14
Clair Bride F 23

Columns: what is in them? Why not just row-store?
•  Each customer data point, event, etc. has many kinds of possible data
•  Spark Schema’s defines the columns and their types,
can have huge numbers 10’s to 100’s (very wide columns)
11
Different characteris0cs of a data point are stored in different columns
Name Addr St City Zip Age
Id’ed
Gender
Edu Sal Marital
Loyalty
plan
Cust
Year
Buy
Freq
Fav
Devices
Com
mode …
Duma 6 nw 7 Or PDX 97223 25 F MA 60k M Lev2b 5.7 .2 iPh txt
val loSchema = StructType(
StructField("lo_orderkey", IntegerType, true) ::
StructField("lo_linenumber", IntegerType, true) ::
StructField("lo_custkey", IntegerType, true) ::
StructField("lo_partkey", IntegerType, true) ::
StructField("lo_suppkey", IntegerType, true) ::
StructField("lo_orderdate", StringType, true) ::
StructField("lo_orderpriority", StringType, true) ::
StructField("lo_shippriority", StringType, true) ::
StructField("lo_quanIty", IntegerType, true) ::
StructField("lo_extprice", IntegerType, true) ::
StructField("lo_ordtotalprice", IntegerType, true) ::
StructField("lo_discount", IntegerType, true) ::
StructField("lo_revenue", IntegerType, true) ::
StructField("lo_supplycost", IntegerType, true) ::
StructField("lo_tax", IntegerType, true) ::
StructField("lo_commitdate", IntegerType, true) ::
StructField("lo_shipmode", StringType, true) ::
StructField("lo_ordermode", StringType, true) ::
StructField("lo_webIme", FloatType, true) ::
StructField("lo_shopcarvme", FloatType, true) ::
. . .
StructField("lo_loyaltyPercnt", FloatType, true) :: Nil)
An analysis may only
need to explore a small
subset of these columns.

Much faster if only access
the needed column,
also many server benefits!

OLD
case class Customer (c_custkey: Int, c_name: String,
c_address: String, c_city: String, c_phone: String,
c_mktsegment: String)
...
object RCDBTestProgram {
def main(args: Array[String]) {
val sparkConf = newSparkConf()
.setAppName("RCDB")
val sc = new SparkContext(sparkConf)
val sqlContext =
new org.apache.spark.sql.SQLContext(sc)
...
val dataDir ="file:/Users/bc/datasets/R_SF1000/”
val df1 = sc.textFile(dataDir+”customer.csv")
.map(_.split(";"))
.map(p => Customer(p(0).trim.toInt,p(1),
.trim,p(2).trim,p(3)
.trim,p(4).trim,p(5).trim,p(6).trim))
.toDF()
df1.registerTempTable(”customer”)
...
query = "””SELECT COUNT(*) FROM customer""”
val count = sqlContext.sql(query).take(1)
New
object RCDBTestProgram {
def main(args: Array[String]) {
val sparkSession = SparkSession.builder
.appName("RCDB”).getOrCreate()
...
val dataDir ="file:///Users/bc/datasets/R_SF1000/”
val custSchema = StructType(
StructField("c_custkey", IntegerType, true) ::
StructField("c_name", StringType, true) ::
StructField("c_address", StringType, true) ::
StructField("c_city", StringType, true) ::
StructField("c_phone", StringType, true) ::
StructField("c_mktsegment", StringType, true) :: Nil)
val df3 = sparkSession.read.option("sep", ";")
.option("header","false").schema(custSchema)
.csv(dataDir+"customer.csv").toDF().repartition(256)
df3.createOrReplaceTempView("customer")
df3.persist(StorageLevel.OFF_HEAP)
...
query = "””SELECT COUNT(*) FROM customer""”
val count = sparkSession.sql(query).take(1)
...
sparkSession.sql("SHOW TABLES").show()
sparkSession.catalog.listTables.show()
12
Apache Spark 2.0.0
New SparkSession replacing the SQLContext
https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html

SQL and DSL Both OpImized by Catalyst OpImizer
13
Ex: “Select all books by authors born aEer 1980 named ‘Paulo’ from books & authors”

SQL
SELECT *
FROM author a
JOIN book b ON a.id = b.author_id
WHERE a.year_of_birth > 1980
AND a.first_name = 'Paulo’
ORDER BY b.Itle
Query DSL (Domain Specific Language)
val joinDF =
author.as(‘a)
.join(book.as(‘b), $”a.id” === $”b.author_id”)
.filter($”a.year_of_birth” > 1980)
.filter($”a.first_name” = “Paulo”))
.orderby(‘b.Itle)
Catalyst Op0mizer
Op$mized Plan
(can I reorder various opera$ons?)

Apache Spark SQL: Catalyst OpImizes SQL/DSL ExecuIon
14
*HashAggregate(keys=[], funcIons=[count(1)])
+- Exchange SingleParIIon
+- *HashAggregate(keys=[], funcIons=[parIal_count(1)])
+- *Project
+- *Filter ((isnotnull(lo_quanIty#89) && (lo_quanIty#89 >= 10)) && (lo_quanIty#89 <= 20))
+- InMemoryTableScan [lo_quanIty#89], [isnotnull(lo_quanIty#89), (lo_quanIty#89 >= 10), (lo_quanIty#89 <= 20)]
: +- InMemoryRelaIon [lo_orderkey#81, lo_linenumber#82, lo_custkey#83, lo_partkey#84, lo_suppkey#85, lo_orderdate#86,
lo_orderpriority#87, lo_shippriority#88, lo_quanIty#89, lo_extendedprice#90, lo_ordtotalprice#91, lo_discount#92,
lo_revenue#93, lo_supplycost#94, lo_tax#95, lo_commitdate#96, lo_shipmode#97],
false, 16384, StorageLevel(disk, memory, o‚eap, 1 replicas)
: : +- Exchange RoundRobinParIIoning(16)
: : +- *Scan csv [lo_orderkey#81,lo_linenumber#82,lo_custkey#83,lo_partkey#84,lo_suppkey#85,lo_orderdate#86,
lo_orderpriority#87,lo_shippriority#88,lo_quan$ty#89,lo_extendedprice#90,lo_ordtotalprice#91,lo_discount#92,
lo_revenue#93, lo_supplycost#94,lo_tax#95,lo_commitdate#96,lo_shipmode#97] Format: CSV,
InputPaths: ﬁle:/Users/bradcarlile/datasets/RCDB_SF1/lineorder-new.csv,
PushedFilters: [],
ReadSchema: struct<lo_orderkey:int,lo_linenumber:int,lo_custkey:int,lo_partkey:int,lo_suppkey:int,lo_orderdat...
Qbet: SELECT count(*)
FROM lineorder
WHERE lo_quanIty BETWEEN 10 and 20
Can look at OpQmized plan with
Explain plan created by using:
sparkSession.sql(query).explain()

SQL and DSL Both OpImized by Catalyst OpImizer
15
Ex: “Select all books by authors born aEer 1980 named ‘Paulo’ from books & authors”

SQL
SELECT *
FROM author a
JOIN book b ON a.id = b.author_id
WHERE a.year_of_birth > 1980
AND a.first_name = 'Paulo’
ORDER BY b.Itle
Query DSL (Domain Specific Language)
val joinDF =
author.as(‘a)
.join(book.as(‘b), $”a.id” === $”b.author_id”)
.filter($”a.year_of_birth” > 1980)
.filter($”a.first_name” = “Paulo”))
.orderby(‘b.Itle)
Catalyst Op0mizer
Op$mized Plan
(can I reorder various opera$ons?)
Whole-Stage CodeGen
(new in Spark 2.0)

Spark 2.0: A big performance Increase with Tungsten Phase2
SELECT count(*) FROM lineorder WHERE lo_quan0ty BETWEEN 10 and 20
1.  Volcano Iterator model: SQL plan interpretaIon
–  Open;
–  Next data element;
–  perform predicate;
–  Close
–  …Iterate
2.  “College Freshman” Java code: Tungsten Whole Stage codeGen pipelined code
–  for (lo_quan$ty in lineorder {
–  if (lo_quan$ty > 10 and lo_quan$ty < 20)
–  { count += 1}
–  }
16

Sample Whole-Stage CodeGen
17
val code-df = sparkSession.sql("explain codegen " + query)
code-df.show(false)
Subtree 1/2 Generated code:
/* 033 */ private void agg_doAggregateWithoutKey() throws java.io.IOExcepIon {
/* 034 */ // iniIalize aggregaIon buffer
/* 035 */ agg_bufIsNull = false;
/* 036 */ agg_bufValue = 0L;
/* 037 */
/* 038 */ while (inputadapter_input.hasNext()) {
/* 039 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
/* 040 */ long inputadapter_value = inputadapter_row.getLong(0);
/* 041 */
/* 042 */ // do aggregate
/* 043 */ // common sub-expressions
/* 045 */ // evaluate aggregate funcIon
/* 046 */ boolean agg_isNull3 = false;
/* 047 */
/* 048 */ long agg_value3 = -1L;
/* 049 */ agg_value3 = agg_bufValue + inputadapter_value;
/* 050 */ // update aggregaIon buffer
/* 052 */ agg_bufValue = agg_value3;
/* 053 */ if (shouldStop()) return;
/* 054 */ }
/* 056 */ }
/* 058 */ protected void processNext() throws java.io.IOExcepIon {
/* 059 */ while (!agg_initAgg) {
/* 060 */ agg_initAgg = true;
/* 061 */ long agg_beforeAgg = System.nanoTime();
/* 062 */ agg_doAggregateWithoutKey();
/* 063 */ agg_aggTime.add((System.nanoTime() - agg_beforeAgg) / 1000000);
/* 064 */
/* 065 */ // output the result
/* 067 */ agg_numOutputRows.add(1);
/* 068 */ agg_rowWriter.zeroOutNullBytes();
/* 069 */
/* 070 */ if (agg_bufIsNull) {
/* 071 */ agg_rowWriter.setNullAt(0);
/* 072 */ } else {
/* 073 */ agg_rowWriter.write(0, agg_bufValue);
/* 074 */ }
/* 075 */ append(agg_result); } } }
Subtree 2/2 Generated code:
/* 033 */ private void agg_doAggregateWithoutKey() throws java.io.IOExcepIon {
/* 034 */ // iniIalize aggregaIon buffer
/* 036 */ agg_bufValue = 0L;
/* 037 */
/* 038 */ while (inputadapter_input.hasNext()) {
/* 039 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
/* 040 */ // do aggregate
/* 041 */ // common sub-expressions
/* 043 */ // evaluate aggregate funcIon
/* 044 */ boolean agg_isNull1 = false;
/* 045 */
/* 046 */ long agg_value1 = -1L;
/* 047 */ agg_value1 = agg_bufValue + 1L;
/* 048 */ // update aggregaIon buffer
/* 050 */ agg_bufValue = agg_value1;
/* 051 */ if (shouldStop()) return;
/* 052 */ }
/* 054 */ }
/* 056 */ protected void processNext() throws java.io.IOExcepIon {
/* 057 */ while (!agg_initAgg) {
/* 058 */ agg_initAgg = true;
/* 059 */ long agg_beforeAgg = System.nanoTime();
/* 060 */ agg_doAggregateWithoutKey();
/* 061 */ agg_aggTime.add((System.nanoTime() - agg_beforeAgg) / 1000000);
/* 062 */
/* 063 */ // output the result
/* 065 */ agg_numOutputRows.add(1);
/* 066 */ agg_rowWriter.zeroOutNullBytes();
/* 067 */
/* 068 */ if (agg_bufIsNull) {
/* 069 */ agg_rowWriter.setNullAt(0);
/* 070 */ } else {
/* 071 */ agg_rowWriter.write(0, agg_bufValue);
/* 072 */ }
/* 073 */ append(agg_result); } } }
query = “SELECT count(*) FROM lineorder WHERE lo_quan0ty BETWEEN 10 and 20”

The Basic AnalyIcs Flow
A lot of 0me spent in Data Munging

•  Data can come from many sources
– Databases, NoSQL, csv, feeds…
•  We need to prepare it
– Data Munging of all sorts!
•  Analyze the data
– Find the “right way” to analyze it
• ML, Graph, SQL…
18
Databases
Data Munging
RESULTS
Analy0cs
NoSQL,
Search. …
Streaming
Kafa,
Storm…

ConInuous AnalyIcs Cycle
Results Ofen Enrich TransacIons
Analy0cs is more than one pipeline stream
•  ConInuous iteraIons around the
data analyIcs wheel
– Save, catalog, and re-use all things:
data, SQL, code, and analyIcs
•  In-memory advantages at each stage
– SPARC’s DAX & leading bandwidth is key
•  Many sources of data
– Internal proprietary, public data,
external streaming, archives
19
Streaming
Kafa, Storm, …
Enhance
TransacQons
Reports
Databases
NoSQL
ETL
(SQL)
ETL
(SQL)
In-Memory
ML &
Graph
(FP)
Result
Delivery
(SQL)
Feature Extract,
Generate & Transform
(SQL)
All modern Apps (Uber,
Neolix, Amazon, FB …)
enhancing transac$ons
with Real-$me analy$cs
Ad Hoc

SQL
sqlContext.udf.register("newTitle", Itles.getOrElse((_: String),"Other"))
sqlContext.udf.register("toString", (_: Int).toString)
val avgAge = dataDFRaw.select("Age”).agg(avg("Age")).first().getDouble(0)
val avgFare = dataDFRaw.select("Fare”).agg(avg("Fare")).first().getDouble(0)

query = s"””SELECT
PassengerId,
toString(Survived) AS SurvivedString,
Pclass,
Name,
newTitle(regexp_extract(Name, ".*, (.*?)..*",1)) AS Title,
Sex,
NVL(Age,$avgAge) AS Age,
IF(SibSp + Parch > 3, 1, 0) AS WithFamily,
NVL(Fare,$avgFare) AS Fare,
NVL(Embarked,'S') AS Embarked
FROM dataDFRaw"""
…Same with Scala
def preprocess(data: DataFrame, sqlContext: SQLContext, train: Boolean): DataFrame = {
var dataTrain = data
val avgAge = dataTrain.select(mean("Age")).first()(0).asInstanceOf[Double]
println("avgAge =" + avgAge)

val avgFare = dataTrain.select(mean("Fare")).first()(0).asInstanceOf[Double]

val withFamily = sqlContext.udf.register("withFamily", (sib: Int, par: Int) => {
if (sib + par > 3) 1.0 else0.0})

val fillAge = sqlContext.udf.register("fillAge", (age: Double, Itle: String) => {
var newage = 0.0d
if ((age == avgAge) && (Itle.equals("Master.") || Itle.equals("Miss."))) newage = 14.1 else newage =
age
newage
})
val addChild = sqlContext.udf.register("addChild", (sex: String, age: Double) => {
if (age < 15)
if (sex == "male") "mChild” else "fChild” else sex })
val toDouble = sqlContext.udf.register("toDouble", ((n: Int) => { n.toDouble }))

dataTrain = dataTrain.withColumn("Title", findTitle(dataTrain("Name")))
dataTrain = dataTrain.na.fill(avgAge, Seq("Age"))
dataTrain = dataTrain.withColumn("fixAge",fillAge(dataTrain("Age"),dataTrain("Title")))
dataTrain = dataTrain.withColumn("Pclass",toDouble(dataTrain("Pclass")))
dataTrain = dataTrain.na.fill(avgFare, Seq("Fare"))
dataTrain = dataTrain
.withColumn("withFamily", withFamily(dataTrain("SibSp"), dataTrain("Parch")))
dataTrain.withColumn("sexMod", addChild(dataTrain("Sex"), dataTrain("Age"))) }
20
SelecIng Features & GeneraIng Features
Kaggle Titanic Challenge: SQL & Scala

ETL (Extract, Transpose, Load) & Feature CreaIon
query ="””SELECT x,
year (x) AS Year,
month (x) AS Month,
dayofmonth(x) AS DoM,
dayofyear(x) AS DoY,
date_format(x,"EEEE") AS longDoW,
date_format(x,"EE") AS shortDoW,
date_format(x,"u") AS DoW,
IF (date_format(x,"u") < 6 , 0, 1) AS weekendFlag,
IF (date_format(x,"u") < 6, datediff(next_day(x, "Sat"),x), 0) AS daystoWeekend,
floor(months_between(current_date(),x)/12) AS curAge,
IF (dayofyear(CONCAT(year(x),"-12-31")) > 365, 1, 0) AS leapYearFlag,
IF (month(x) = 12 AND dayofmonth(x) > 25, months_between(CONCAT(year(x)+1,"-12-25"),x),
months_between(CONCAT(year(x),"-12-25"),x)) AS monthstoXmas,
IF (month(x) = 12 AND dayofmonth(x) > 25, datediff(CONCAT(year(x)+1,"-12-25"),x),
datediff(CONCAT(year(x),"-12-25"),x)) AS daystoXmas,
IF ((datediff(CONCAT(year(x),"-12-25"), x) <= 14) AND (datediff(CONCAT(year(x),"-12-25"), x) >= 0),1, 0) AS x14dayBefore,
quarter(add_months(x,-2)) AS Season,
quarter(x) AS Qtr
FROM inputDate""".trim()
21
Can you SQL to create new features from data, example genera0ng 0me features

Apache Spark 2.0.0 Great New Features
•  Spark SQL
– SQL 2003 and Uniﬁed DataFrames/Datasets API.
– Tungsten Phase2: Whole Stage CodeGen
•  Spark MLlib & GraphX – Large Scale Machine Learning on Apache Spark
– DataFrame API is primary API for MLlib (RDD mode in maintenance)
– ML persistence can create model, then save and redeploy
•  hNps://databricks.com/blog/2016/05/31
•  Structured Streaming
– IntegraIon of DataFrames/Datasets & Streaming
– Power of SQL
22
Best new features - MLlib & Structured Streaming

ML - Machine Learning
•  AutomaQcally sifing through large amounts of data
– to find previously hidden paNerns,
– to discover valuable new insights and make predicIons
•  Examples:
• Id most important factors (Apribute Importance)
• Predict customer behaviors (Classifica$on)
• Predict or esImate a value (Regression)
• Segment a populaIon (Clustering)
• Find fraudulent or “rare events” (Anomaly Detec$on)
• Determine co-occurring items in a “baskets” (Associa$ons)
• Find profiles of targeted people or items (Decision Trees)
23
A1 A2 A3 A4 A5

Machine Learning (ML):
Scoring/PredicIon versus Training/Learning CharacterisIcs
Predic0on/Scoring operates on huge amounts of data with low compute intensity
ML
Predic0on
ML
Train
% of ac0vity Most Data *periodic
Computa0on O(n)
O(n^3)
Matrix-matrix
Data O(n) O(n^2)
Compute Intensity
(Compute/Data)
Low constant O(n)
Memory Bandwidth
Requirement
3x to 6x
per core
Up to 1.3x
per core
24
Training Set
PredicIon/Scoring/
Spark’s Transform
Results
Can train/ﬁt on one
server then move
model to predic$on
server (ex: StubHub)
Model
Training/
Learning/
Spark’s Fit
Data to Evaluate
*periodically updates to models:
quarterly, monthly, weekly, nightly

Saving the ML model in Apache Spark
•  Train/Fit a Random Forest Classifier in Python, save it
•  Can load it back in Python

•  Can Load into a Scala to Predict/Transform
25
Model can be moved between languages:
trainingData = sqlContext.read... # data: features, label
rf = RandomForestClassifier(numTrees=20)
model = rf.fit(trainingData)
model.save("myModelPath”)
// Load the model in Scala
val sameModel = RandomForestClassificaIonModel.load("myModelPath")
val predicIons = sameModel.transform(mybigdata)
sameModel = RandomForestClassificaIonModel.load("myModelPath")
MLlib also allows
users to save/load
en$re ML pipelines
Training
Predict
hNps://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html

How can we make Spark 2.x.x even Faster?
It can be made a LOT FASTER: 8x to 20x!

“You can only compute as fast as you can move data” – Brad Carlile
26

Let’s Explore First Principles Thinking
What are the true ways to get at eﬃciency – A quick Analogy
•  Where to Locate a Factory? Example: Elon Musk’s Tesla GigaFactory
27
Tesla GigaFactory Apache Spark Performance
Conven0onal Wisdom 1)  Tax IncenIves 1) Many cheap cores & Cloud
hpp://fortune.com/2015/12/11/nevada-energy-tech-hub/

28
First Principle Thinking

Closely Look at each component

What are all of the issues
1)  Nevada’s Energy costs
1)  Best Geothermal near
2)  Sun (2100 kWh/kW-yr)
3)  Wind (~8-10 ms/s)
2)  Nevada’s Raw Materials
1)  Only Lithium Mine in US
2)  Lithium salts nearby
3)  Few hours from BIG MARKETS
1)  Tax incen$ves (don’t hurt)

29
First Principle Thinking

Closely Look at each component

What are all of the issues
1)  Nevada’s Energy costs
1)  Best Geothermal near
2)  Sun (2100 kWh/kW-yr)
3)  Wind (~8-10 ms/s)
2)  Nevada’s Raw Materials
1)  Only Lithium Mine in US
2)  Lithium salts nearby
3)  Few hours from BIG MARKETS
1)  Tax incen$ves (don’t hurt)
1)  Use all of Delivered Bandwidth
2)  JVM performance op0miza0ons

3)  Innova0ons for scanning opera0ons

Whole-stage CodeGen Performance
•  Let’s evaluaIon performance on Full Table Scan of 600M rows
– Time to Scan = 0.16 sec Impressive: 104.7 Million Rows/sec per core
30
2-chip x86 E5 v3(Haswell), total 36 cores, 72 threads/VCPU

Whole-stage CodeGen Performance
•  Let’s evaluaIon performance on Full Table Scan of 600M rows
– Time to Scan = 0.16 sec Impressive: 104.7 Million Rows/sec per core
•  Going back to 1st principles: Scanning is about data movement
– What is the system bandwidth of this in-memory Scan? 15 GB/s (Spark 2.0.0)
– What is the system memory bandwidth of the system? 114 GB/s (Stream Triad)
– 7.6x more bandwidth available!
•  Other Queries: “SELECT count(*) FROM lineorder WHERE lo_quanIty BETWEEN 10 and 20”
– 19x more bandwidth available! (only delivering 6GB/s on the system)
31
2-chip x86 E5 v3(Haswell), total 36 cores, 72 threads/VCPU

Afer Spark 2.0: Big Leap in Performance Possible
Select count(*) from store_sales where ss_item_sk > 100 and ss_item_sk < 1000
1.  Volcano Iterator model: plan interpretaIon
–  Open; Next data element; perform predicate; Close
2.  “College Freshman” Java code: Whole-Stage CodeGen pipelined code
–  for (ss_item_sk in store_sales) {if (ss_item_sk > 100 and ss_item_sk < 1000 ) { count += 1}}
3.  Tuned library: True VectorizaIon highly tuned code operates on whole column
–  vectorRangeFilter (n, VECTOR_OP_GT, 1000,VECTOR_OP_LT, 1000, store_sales, result, result_cnt)
4.  Hardware Accelera0on further accelerates scanning
–  vectorRangeFilter (n, VECTOR_OP_GT, 1000,VECTOR_OP_LT, 1000, store_sales, result, result_cnt)
•  x86 AVX2 (Graphics InstrucIon) – Can achieve 70 GB/s per chip
•  Oracle’s SPARC M7/S7 processors – even faster
32

libdax Open API for Free & Open Source Sofware (FOSS)
•  Designed to accelerate wide variety of Oracle & FOSS sofware
– Examples:
• SQL acceleraIon for Oracle Database in-memory
• Apache Spark SQL
– DataFrames(Python, Scala, Java, R)
– Parquet experiments in progress
• Published sample codes
•  Open API for libdax – sign up for free systems to actually develop/try code
– hNps://swisdev.oracle.com
– x86 & SPARC versions (x86 & generic out soon!)
Libdax designed for key scan & dic0onary opera0ons to accelerate a variety of soxware

33

What else afer one uses all of the
bandwidth?
It can be made a LOT FASTER: 8x to 20x!
34

In-Memory Performance OpImizaIons
•  Columnar Format
•  Vector Processing
– Directly on DicIonary-encoded Columns
•  OperaIon pushdown
•  Join Processing acceleraIons
– Bloom Filters: bit vector set membership tesIng
•  In-Memory Storage Index
•  Predicate OpImizaIon
– Using DicIonary values, Min, Max, …
35
T. Lahiri, et. al. Oracle Database In-Memory: A Dual Format In-Memory Database. Proceed of the ICDE2015.

hNp://www.oracle.com/technetwork/database/in-memory/overview/twp-oracle-database-in-memory-2245633.html

In-Memory Columnar Format: Faster for AnalyIcs
Row-format requires skipping over data slower for Analy0cs
SELECT COL4 FROM MYTABLE
36
In-memory Column Format
IM Column Store
RESULT
With columnar we only Scan the data required by the query
Spark does Push-down Predicates for Parquet Files
… We need Columnar in-memory for Spark Internal format

Lower Cardinality Data is Usually Most InteresIng Data
Analy0cs analyzing features of lower cardinality data
AnalyIcs ofen disIlls data by grouping according to combinaIons of features
In ML, we ofen “BuckeIze” to reduce cardinality
37
Unique or
Random
Data
Gender
Season
5-point
Scale
Marital
Status
Top10
10 ranking
Month
Hour
State
Weeks
Minutes
Age
Test Score
Country
US City >100k
Days
Top 500 Job
Classiﬁca$on
Area code
Nasdaq NYSE
Top 5,000
School
districts zipcode
DOB last
150 years
Temperature
Rainfall Wind direc$on
Region
Price
Delivery
status
Most Interes$ng data has
fewer dic$onary bits Make
Model
2-bit 3-bit 4-bit 5-bit 6-bit 7-bit 8-bit 9-bit 10-bit 11-bit 12-bit 13-bit 14-bit 15-bit 16-bit 17-bit 18-bit 19-bit...
Cardinality n-bits (calculated as 2^n-bits) “Entropy”

In-Memory Columnar Compression – Persist in Memory
•  Eﬃcient In Memory Columnar Table Scan
–  ConIguous storage per column
•  DicIonary encoding huge compression
– 50 US only need 6 bits (<1 byte)
– Spelling out the state name is much longer
•  “South Dakota” needs 12 characters or 24 unicode
bytes (192 bits vs. 6 bits)
•  Innova0ons:
– Directly scan dic$onary encoded data !
–  Save Min & Max for scan eliminaIon
–  AddiIonal compression on column also possible
–  Can use dicIonary for “featurizaIon” of data for ML
38
0
1
3
2
0
2
3
Column
Min: South Dakota
Max: Utah
DicQonary
Dict
encode
Column value list
South Dakota
Tennessee
Utah
Texas
South Dakota
Texas
Utah
DicQonary
VALUE ID
South Dakota 0
Tennessee 1
Texas 2
Utah 3
Zip
+
RLE)


Apache Spark: Objects & Tungsten In-memory Columnar
Oracle incorpora0ng libdax & fixing misalignments in Apache Spark
•  Typesafe operaIons in Scala access
JVM object formaNed representaIon
•  SQL & Dataframe operaIons need to
access in-memory column format
–  Tungsten: SQL operates on internal format
–  Need to persist in column format
–  Currently not use dic$onary encoding to
speed SQL/DataFrame execu$on
39
Apache Spark
Currently doesn’t store columnar data for reuse,
Only regenerates columns each Qme on fly each Qme it’s used
JVM
Off-heap
Memory
SALES SALES
JVM
Object
Format
Temp
Internal
Column
Format
Scala,
Python,
R, …
Future?
SPARK-15687
DataFrames
SQL execu$on
Encode
Decode

Developers: Areas that need contribuIon
•  Contribute InnovaIve New Algorithms throughout Apache Spark !
•  Contribute third-party packages that integrate with Apache Spark
–  hNps://sparkhub.databricks.com/ “Free App-store for Apache Spark”
•  Improve single-node performance & scalability
–  Improve performance of large-core systems
•  x86 processor contains 22-cores per chip, SPARC processor contains 32-cores per chip
•  This trend keeps increasing – HUGE BANDWIDTH on CHIP
–  Network bandwidth not keeping up
•  Fix mis-alignments which hurt performance
–  Example: pre-appended 4-byte signed-ints to strings, caused mis-aligned, convert to longs
•  hNps://issues.apache.org/jira/browse/SPARK-16962
40
Shout out to all Developers

Apache Spark is Exci0ng – More Great Work to Come!
Spark’s Whole Ecosystem totally appeals to the performance expert in me !
•  Great Ecosystem for AnalyIcs
– Spark forms a conInuum with other technologies (Kaaa, Solr, …)
•  Spark is fast & scalable because of clean design
– Spark’s in-memory focus
•  Spark can be much faster
– Poised for many innovaIons that take advantage of hardware systems
– Oracle’s Sofware in Silicon can be used on Apache Spark
•  Keep Using, ContribuIng, and Sharing!
41

Oracle SPARC M7 & S7: InnovaIons for Cloud & AnalyIcs
•  32 Cores @ 4.13 GHz, 512GB memory per chip
– >160 GB/s delivered memory bandwidth per chip
•  Java/JVM, Database, ApplicaIons, etc.
SPARC 1.6x to 2.0x faster core vs. x86
– LiNle growth in x86 per core performance
•  Sofware in Silicon Features
– In-Memory SQL AcceleraIon & Decompression
– Hardware accelerated EncrypIon
– Silicon Secured Memory
Deep innova0ons diﬀeren0ate SPARC from the generic compu0ng

hNps://blogs.oracle.com/bestperf
0.8x
1.0x
1.2x
1.4x
1.6x
1.8x
2.0x
2012 2013 2014 2015 2016
Core Performance vs. x86 E5 v2
Java/JVM
OLTP
Mem GB/s
X86: dashed

First Principle Thinking: Bandwidth for Apache Spark
In-memory Scan ul0mately determined by Memory Bandwidth
•  “…no maper how high performance my engine is, if I need to scan a Terabyte of
data to answer my query it’s going to be slow even if you are reading from memory”
–  Patrick Wendell, Databricks (June 4, 2015 on O’Reilly Data Show Podcast with Ben Lorica)
•  Let’s say I have 1 TB In-Memory that I want to scan in 1 second
– We need 31 C4.8xlarge (62 chips) to scan in 1 second, (1024GB / 33.6 GB/s)
– SPARC M7-8 (8-chips) server has 1.2 TB/s delivered Bandwidth in 10 RU
43
IBM Power8 E880 SPARC M7-8 x86 E7 v3 Haswell
Circles show
Processors

Inter-chip
bandwidths
are to scale
Fully connected 2-hop 2-hop
hNp://browser.primatelabs.com/geekbench3/5105516
hNp://browser.primatelabs.com/geekbench3/1694602
4x E5 v3 Haswell
10GbE connected

SPARC DAX Sofware in Silicon for AnalyIcs (Oracle DB & Spark)
•  Integrated Offload
– Data AnalyIcs AcceleraIon (DAX)
– OPEN ! Add to your own ApplicaIons
•  hNps://swisdev.oracle.com
•  It’s more important how you use transistors,
than Moore’s Law (# transistors you make)
Radical Innova0on: Integrated Offload offers 10x faster performance!
• Oracle Database AnalyIc Queries
• SPARC M7 10.8x faster per chip x86 E5 v3
• Same techniques apply to Apache Spark
Memory
Half
BW
Memory
x86 E5 v3
X86 100% U0lized
NO OFFLOAD !
NO OPEN Cores !
Band-
Width
SPARC M7
DAX
OFFLOAD
OFFLOAD
DAX
44
hNps://blogs.oracle.com/bestperf/entry/20151025_imdb_t7_1

QuesIons?
45

Backup Slides
Oracle Database 12c In-memory Database Innova0ons
…all innova0ons that can apply to Apache Spark
47

Oracle DB
Quick Digression (which explains how we can make Spark faster)
Learning From Oracle Database In-Memory?
• OLTP uses proven row format
• AnalyIcs & reporIng use new
in-memory Column format
•  AnalyIcs Compression means huge
amounts of database can now ﬁt
in-memory
• The Oracle Database stores
BOTH row and column formats
for same table
• Simultaneously acIve and
transacIonally consistent
48
Memory Memory
SALES SALES
Row
Format
Column
Format

OperaIon Pushdown: Reduce Rows Processed by Plan
•  When possible, push operaIons down to In-Memory scan
–  Greatly reduces # rows ﬂowing up through the plan
•  For example:
–  Predicate EvaluaIon (for qualifying predicates – equality, range, etc.):
•  Inline predicate evaluaIon within the scan
•  Each IMCU scan only returns qualifying rows instead of all rows
•  Another example
–  AggregaIon (for qualifying aggregates, e.g. sum(), min/max(), etc.):
•  IMCU is aggregated during the scan.
•  Each IMCU scan returns only the aggregate (e.g. the sum)
•  Upper plan nodes aggregates the aggregates (e.g. sum of sums)

IM scan
Products

Sales > 1000
IM scan
Stores

49

State = CA
SALES > 1000
STATE = CA

OperaIon Pushdown: Bloom Filter
•  Bloom Filter:
•  Compact bit vector for set
membership tesIng
•  10g opImizer feature
•  Bloom ﬁlter pushdown:
•  Filtering pushed down to IMCU scan
•  Returns only rows that are likely to be
join candidates
•  Joins tables 10x faster

50
Example: Find total sales in outlet stores
Sales Stores
Store ID
StoreID in
15, 38, 64
Type=‘Outlet’
Type
Sum
Store ID
Amount
Bloom Filter

In-Memory Storage Index: Eliminate IMCUs from Scan
•  Min-Max Pruning
– Min/Max values serve as storage index
– Check predicate against min/max values
– Skip enIre IMCU if predicate not saIsﬁed
– Eliminates processing unnecessary IMCUs
–  Can prune for predicates including equality, range, inlist, …
•  DicIonary pruning
– DicIonaries also serve as storage index
– Check predicate against dicIonary values
•  “Find sales from stores in Nevada”
– Skip enIre IMCU if predicate not saIsﬁed
Min $4000
Max $7000
Min $8000
Max $12000
Min $13000
Max $15000
Example: Find stores with sales greater than $10,000
51


•  Avoid evaluaIng predicates against every
column value
– Check range predicate against min/max values
•  As before, skip IMCUs where min/max disqualiﬁes
predicate
– If min/max indicates all rows will qualify, no need
to evaluate predicates on column values

Min $4000
Max $7000
Min $8000
Max $13000
Min $13000
Max $15000
Example: Find stores with sales between
$8000 and $14000
NO ROWS

Skip IMCU
SOME ROWS

Needs evalua0on
ALL ROWS

Skip Evalua0on
Predicate OpImizaIon: Reduce Predicate EvaluaIons
?
52

Predicate OpImizaIon: Reduce Predicate EvaluaIons
•  If min/max cannot eliminate predicate
– Evaluate predicate once per dicIonary value
– Create list of qualifying dicIonary values
•  Use vector instrucIons to ﬁnd qualifying
values in column
•  Greatly reduces predicate evaluaIons
– Once per disInct value vs. of once per value
•  Also for more complex predicates …
– LIKE predicates: ex: Find sales of product
names containing “mustard”
Example: Find stores with sales between
$8000 and $14000

5
1
4
3
3
4
4
5
5
3
0
1

Vector Compare
{0,1,2,3}
$13,000
$13,500
$13,800
$13,900
$14,500
0
1
2
3
4
Dic0onary Column CU
$15,000 5
53
Apple
English Mustard
Mustard Greens
0
1
2
Dic0onary

Backup Slides
Oracle’s SPARC Processor
54

(1) Factory configured with one (up to 8 processors) or two (up to 4 processors each) sta$c physical domains
(2) 1, 2, 3 or 4 reconfigurable physical domains
(3) Maximum memory capacity is based on 32 GB DIMMs, capacity can double in future with 64 GB DIMMs

SPARC T7 & M7 Systems - All Shipping Now
“-#” indicates how many chips in server
T7-1 T7-2 T7-4 M7-8 M7-16
Processors 1 2 2 or 4 Up to 8 1 Up to 16 2
Max Cores 32 64 128 256 512
Max Threads 256 512 1,024 2,048 4,096
Max Memory 3 .5 TB 1 TB 2 TB 4 TB 8 TB
Form Factor 2U 3U 5U Rack / 10U Rack
Domaining LDOMs LDOMs LDOMs LDOMs, PDOMs 1 LDOMs, PDOMs 2
ConfidenIal: Oracle Restricted
55

(1) Maximum memory capacity is based on 64 GB DIMMs.

SPARC S7 Servers
SPARC S7-2 Server SPARC S7-2L Server
Processors 1 or 2 2
Max Cores/Threads 16 / 128 16 / 128
Max Memory 1 1 TB 1 TB
Form Factor 1U 2U
Max Disk Drives 8 26
PCIe Slots Available 3 6
Integrated Ethernet 4x 10GBase-T 4x 10GBase-T
S7-2L
Storage
Storage
S7-2L

Performance in Spark 2.0, PDX Spark Meetup 8/18/16

Performance in Spark 2.0, PDX Spark Meetup 8/18/16

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Performance in Spark 2.0, PDX Spark Meetup 8/18/16

Ähnlich wie Performance in Spark 2.0, PDX Spark Meetup 8/18/16 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Performance in Spark 2.0, PDX Spark Meetup 8/18/16