SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Building a SIMD supported
vectorized native engine for
Spark SQL
Chendi Xue(chendi.xue@intel.com), Software Engineer
Yuan Zhou(yuan.zhou@intel.com), Software Engineer
Intel Corp
Agenda
▪ Native SQL Engine Introduction
▪ Native SQL Engine Design
▪ Columnar Data Source
▪ Columnar Shuffle
▪ Columnar Compute
▪ Memory Management
▪ Summary
Motivations for Native SQL Engine
• Issue of current Spark SQL Engine:
▪ Internal row based, difficult to use SIMD optimizations
▪ High GC Overhead under low memory
▪ JIT code quality relies on JVM, hard to tune
▪ High overhead of integration with other native library
Proposed Solution
Issue of current Spark SQL Engine:
▪ Internal row based, not possible to use SIMD
→ Columnar-based Arrow Format
▪ High GC Overhead under low memory
→ native codes for core compute instead of java
▪ JIT code quality relies on JVM, hard to tune
→ cpp / llvm / assembly code generation
▪ High overhead of integration with other native
library
→ Lightweighted JNI based call framework
Source: https://www.slideshare.net/dremio/apache-arrow-an-overview
Native SQL Engine Layers
Physical Plan Execution
Columnar PluginJVM SQL Engine
Row Column
Conversion
Operator
strategy
(Cost /
Support )
Spark Application
SQL Python APP Java/Scala APP R APP
Native Arrow Data Source
ColumnarCompute Memory
Management
UDF Cache Scheduler
DAOS / HDFS /
S3 / KUDU /
LOCALFS / …
Parquet /
ORC / CSV /
JASON / …
Query Plan Optimization
ColumnarRules ColumnarCollapseRules ColumnarAQERules
WSCG join/sort/aggr/proj/…
PUSHDOWN
Native Compute
JNI WRAPPER
Native Shuffle
LLVM
Gandiva
CPP Code
Generator
pre-
compiled
kernels
Spark
Compatible
Partition
Streamer /
Compressed
Serialization
Optimal
Batch
Memory
Manage /
Register
• A standard columnar data format as basic data format
• Data keeps on off-heap, data operations offload to highly optimized native library
Data Format
Row RDD Column RDD Optimal Column RDD
iter iter
next()
iter
• Auto split and coalesce
• Tunable batch size
next()
next()
next()
Columnar Spark Plan Rules
Parser Analyzer Optimizer Planner Query Execution
SQL
Dataset
DataFrame
Unresolved
Logical Plan
Logical
Plan
Optimized
Logical Plan
Physical
Plan
Selected
Physical Plans
RDD
Cost
Model
Metadata
Catalog
Cache
Manager
Spark
Native Engine
SQL
Dataset
DataFrame
Unresolved
Logical Plan
Logical
Plan
Optimized
Logical Plan
Physical
Plan
Selected
Physical Plans
Cost
Model
Metadata
Catalog
Cache
Manager
ColumnarAQE
Native SQL Engine
ColumnarPlan
ColumnarWSCG
LLVM/SIMD
Kernel
Columnar Whole Stage Codegen
JVM Native
Stage1 Evaluate
Parquet Read
Filter
Columnar
Shuffle
Columnar Exchange
operator
Columnar
Shuffle
Stage2 Evaluate
Parquet Read
Hash Join
Aggregate
Columnar Exchange
operator
Columnar
Aggregate
Aggregate
JVM Native
Stage1 Evaluate Parquet Read
Gandiva Filter
Columnar
Shuffle
Columnar Exchange
operator
HashJoin
Aggregate
Columnar
Shuffle
Stage2 Evaluate Parquet Read
Columnar Exchange
operator
Columnar
Aggregate
Aggregate
Native SQL Engine Design
▪ Columnar Data Source
▪ Columnar Shuffle
▪ Columnar Compute
▪ Memory Management
Spark Internal Format
UnsafeRow
Columnar Data Source
Columnized Format
(parquet, orc)
Column to Row?
Spark used a virtual
way to treat column
data as row, while
memory is not
adjacent
Row-based Data Source Arrow based Data Source
Arrow Format
MetaData
Parquet orc csv
kudu cassandra HBase
RowBased Format
(csv, …)
Unify and fastJSON
V.S.
Columnar Data Source
11
Spark Arrow DataSource
(pyspark, thriftserver, sparksql,…)
Arrow Java Datasets API
(Zero data copy, memory
reference only)
Arrow C++ Datasets API
(HDFS, localFS, S3A)
(Parquet, ORC, CSV, ..)
▪ Features
▪ Fast / Parallel / Auth supported Native Libraries for HDFS / S3A / Local FS
▪ PushDown supported Pre-executed statistics/metadata filters
▪ Partitioned File and DPP enabled
Columnar Data Source
0 <= ID < 5000
5000 <= ID < 10000
unread ID = 7500
ID = 7500
Truncated
Files
Columnar Shuffle
• Hash-based partitioning(split) with
LLVM optimized execution
• Ser/de-ser based on arrow record
batch
• Efficient data compression for
different data format
• Coalesce batches during shuffle read
• Supports Adaptive Query
Execution(AQE)
ColumnarExchange
Mapper
ColumnarExchange
Reducer
compressed
file
compressed
file
compressed
file
compressed
file
CoalesceBatches
Hash based partition
Supported SQL Operators Overview
Operators
WindowExec
UnionExec
ExpandExec
SortExec
ScalarSubquery
ProjectExec
ShuffledHashJoin
BroadcastJoinExec
FilterExec
ShuffleExchangeExec
BroadcastExchangeExec
datasources.v2.BatchScanExec
datasources.v1.FileScanExec
HashAggregateExec
…..
Expression
NormalizeNaNAndZero
Subtract
Substring
ShiftRight
Round
PromotePrecision
Multiply
Literal
LessThanOrEqual
LessThan
KnownFloatingPointNormalized
IsNull
And
Add
….
Expression
IsNotNull
GreaterThanOrEqual
GreaterThan
EqualTo
ExtractYear
Divide
Concat
Coalesce
CheckOverflow
Cast
CaseWhen
BitwiseAnd
AttributeReference
Alias
….
Automatically fallback to row-based execution if there are unsupported operators/expressions
ColumnarCondProjection
Columnar Projector & Filter
▪ LLVM IR based execution w/ AVX optimized
code
▪ Based on Arrow Gandiva, extended with
more functions
▪ Combined filter & projection into
CondProjector
▪ ColumnarBatch based execution
Scan B
Filter
ColumnarExchange
Projection
Example:
If (Field_A + Field_B + Field_C) as Field_new > Field_D
output [Field_new, Field_A, Field_B, Field_C, Field_D]
LLVM
IR
One call
Native Hashmap
▪ Faster Hashmap building and
lookup w/ AVX optimizations
▪ Compatible with Spark’s
murmurhash if configured
▪ Performance benefits for
HashAggregation and HashJoins
Scan A
Scan B
Filter
BoradcastHashJoin
HashAggregate
BoradcastExchange
ShuffleExchange
HashAggregate
Stage
ColumnarExchange
Stage
Columnar
BroadcastExchange
Stage
ColumnarShuffleReader
ColumnarBroadcastHashJoin
Broadcast data consists of
1. HashMap (key -> indices)
2. Arrow RecordBatch
using ColumnarBroadcastExchange,
data size is reduced by 80%
Stage
ColumnarBroadcastJoin
ColumnarExchange
… …
ColumnarShuffleReader
hash payload
0x8198 <0, 0>, key1
0x7723 <0, 1>,key2
… …
0x6388
<10240, 1076>
Key1076
0x9944
<10240, 1077>
Key1077
0x8761
<10240, 1078>
key1078
+
Native Sort
▪ Faster sort implementation w/
AVX optimizations
▪ Most powerful algorithms used for
different data structures
▪ Performance benefits for sort and
sort merge joins
Scan A Scan B
CondProjection
SortMergeJoin
Exchange
CondProjection
Exchange
Sort Sort
Exchange
Sort
Stage
ColumnarSortMergeJoin
ColumnarExchange
… …
Stage
ColumnarSort
Stage
Columnarsort
Stage
ColumnarShuffleReader
ColumnarSort and ColumnarSortMergeJoin
3 implementation of sort
1. One column data sort(used by semi or anti)
-> inplace radix sort (AVX optimizable)
2. One column of key with multiple columns of payload
-> radix sort to indices (AVX optimizable)
-> lazy materialization
3. Multiple columns of keys with payloads
-> codegened quick sort to indices(AVX optimizable)
-> lazy materialization
AVX optimized project {
chain key_0 in left table #1
apply project
compare key_0 and key_1 in right table #2
apply condition
materialize one line to arrow
}
Stage
ColumnarExchange
Stage
ColumnarBroadcastExchange
Stage
ColumnarBroadcastExchange
Stage
ColumnarExchange
Stage
ColumnarShuffleReader
Stage
ColumnarShuffleReader
ColumnarShuffledHashJoin
ColumnarProject
ColumnarBroadcastJoin
ColumnarFilter
ColumnarBroadcastJoin
ColumnarExchange
Columnar WholeStageCodeGen
WSCG
Code generated Cpp codes
1. Build HashRelation #0
2. Build HashRelation #1
3. Build HashRelation #2
Loop keys_arrays {
probe key_0 in HashRelation #0
apply Project
probe key_1 in HashRelation #1
apply condition
probe key_2 in HashRelation #2
materialize one line to arrow
}
AVX optimized {
probe key_0 in HashRelation #0
apply project
probe key_1 in HashRelation #1
apply condition
probe key_2 in HashRelation #2
materialize one line to arrow
}
g++ compilation
Native SQL engine call flow
Spark Context
Executor
Oap-native-sql
executeColumnar().mapPartitions
JniWrapper
Expression
Tree
RecordBatch
Gandiva
Native SQL Engine
CPPCodeGenerator
RecordBatch
Expression Input Output
Precompiled kernels
Memory Management
SparkTaskMemoryManager ArrowAllocationManager
Native MemoryDirect MemoryJVM Memory
[java] Spark.TaskMemoryManager [Java] arrow.AllocationManager [cpp] arrow::MemoryPool
register
grant
spill
Data
Source
Column
Project
Column
Sort
Column
Shuffle
Column
Reader
Native
Memory
Native
Memory
Direct
Memory
Direct
Memory
Direct
Memory
Column
To Row
JVM
Memory
Row
TakeOrdered
AndProject
JVM
Memory
Column
BHJ
Native
Memory
release release release release release release
Data
retain hashMap
Example run of TPCH-Q4
Summary
▪ AVX instructions can greatly improve performance on SQL workloads
▪ Native SQL is open sourced. For more details please visit:
https://github.com/Intel-bigdata/OAP
▪ Native SQL engine is under heavy development, works for TPC-
H/TPC-DS now
Q&A
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.
Legal Information: Benchmark and Performance
Disclaimers
▪ Performance results are based on testing as of Feb. 2019 & Aug 2020 and
may not reflect all publicly available security updates. See configuration
disclosure for details. No product can be absolutely secure.
▪ Software and workloads used in performance tests may have been
optimized for performance only on Intel microprocessors. Performance
tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of
that product when combined with other products. For more information,
see Performance Benchmark Test Disclosure.
▪ Configurations: see performance benchmark test configurations.
Notices and Disclaimers
▪ No license (express or implied, by estoppel or otherwise) to any intellectual property rights is
granted by this document.
▪ Intel disclaims all express and implied warranties, including without limitation, the implied
warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as
any warranty arising from course of performance, course of dealing, or usage in trade.
▪ This document contains information on products, services and/or processes in development. All
information provided here is subject to change without notice. Contact your Intel representative
to obtain the latest forecast, schedule, specifications and roadmaps.
▪ The products and services described may contain defects or errors known as errata which may
cause deviations from published specifications. Current characterized errata are available on
request.
▪ Intel, the Intel logo, Xeon, Optane, Optane Persistent Memory are trademarks of Intel Corporation
in the U.S. and/or other countries.
▪ *Other names and brands may be claimed as the property of others
▪ © Intel Corporation.

Weitere ähnliche Inhalte

Was ist angesagt?

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Databricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookDatabricks
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Databricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 

Was ist angesagt? (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 

Ähnlich wie Building a SIMD Supported Vectorized Native Engine for Spark SQL

Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APIshareddatamsft
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanJimin Hsieh
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiSri Ambati
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunningguest1f2740
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance TunningTerry Cho
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
 
Byte code engineering 21st May Saturday 2016
Byte code engineering   21st May Saturday 2016Byte code engineering   21st May Saturday 2016
Byte code engineering 21st May Saturday 2016Sarath Soman
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Felix Geisendörfer
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, UkraineVladimir Ivanov
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteElectronic Arts / DICE
 
An Introduction to Java Compiler and Runtime
An Introduction to Java Compiler and RuntimeAn Introduction to Java Compiler and Runtime
An Introduction to Java Compiler and RuntimeOmar Bashir
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Sri Ambati
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 

Ähnlich wie Building a SIMD Supported Vectorized Native Engine for Spark SQL (20)

Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
Byte code engineering 21st May Saturday 2016
Byte code engineering   21st May Saturday 2016Byte code engineering   21st May Saturday 2016
Byte code engineering 21st May Saturday 2016
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
"JIT compiler overview" @ JEEConf 2013, Kiev, Ukraine
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
An Introduction to Java Compiler and Runtime
An Introduction to Java Compiler and RuntimeAn Introduction to Java Compiler and Runtime
An Introduction to Java Compiler and Runtime
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Grails 101
Grails 101Grails 101
Grails 101
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 

Kürzlich hochgeladen (20)

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Building a SIMD Supported Vectorized Native Engine for Spark SQL

  • 1. Building a SIMD supported vectorized native engine for Spark SQL Chendi Xue(chendi.xue@intel.com), Software Engineer Yuan Zhou(yuan.zhou@intel.com), Software Engineer Intel Corp
  • 2. Agenda ▪ Native SQL Engine Introduction ▪ Native SQL Engine Design ▪ Columnar Data Source ▪ Columnar Shuffle ▪ Columnar Compute ▪ Memory Management ▪ Summary
  • 3. Motivations for Native SQL Engine • Issue of current Spark SQL Engine: ▪ Internal row based, difficult to use SIMD optimizations ▪ High GC Overhead under low memory ▪ JIT code quality relies on JVM, hard to tune ▪ High overhead of integration with other native library
  • 4. Proposed Solution Issue of current Spark SQL Engine: ▪ Internal row based, not possible to use SIMD → Columnar-based Arrow Format ▪ High GC Overhead under low memory → native codes for core compute instead of java ▪ JIT code quality relies on JVM, hard to tune → cpp / llvm / assembly code generation ▪ High overhead of integration with other native library → Lightweighted JNI based call framework Source: https://www.slideshare.net/dremio/apache-arrow-an-overview
  • 5. Native SQL Engine Layers Physical Plan Execution Columnar PluginJVM SQL Engine Row Column Conversion Operator strategy (Cost / Support ) Spark Application SQL Python APP Java/Scala APP R APP Native Arrow Data Source ColumnarCompute Memory Management UDF Cache Scheduler DAOS / HDFS / S3 / KUDU / LOCALFS / … Parquet / ORC / CSV / JASON / … Query Plan Optimization ColumnarRules ColumnarCollapseRules ColumnarAQERules WSCG join/sort/aggr/proj/… PUSHDOWN Native Compute JNI WRAPPER Native Shuffle LLVM Gandiva CPP Code Generator pre- compiled kernels Spark Compatible Partition Streamer / Compressed Serialization Optimal Batch Memory Manage / Register • A standard columnar data format as basic data format • Data keeps on off-heap, data operations offload to highly optimized native library
  • 6. Data Format Row RDD Column RDD Optimal Column RDD iter iter next() iter • Auto split and coalesce • Tunable batch size next() next() next()
  • 7. Columnar Spark Plan Rules Parser Analyzer Optimizer Planner Query Execution SQL Dataset DataFrame Unresolved Logical Plan Logical Plan Optimized Logical Plan Physical Plan Selected Physical Plans RDD Cost Model Metadata Catalog Cache Manager Spark Native Engine SQL Dataset DataFrame Unresolved Logical Plan Logical Plan Optimized Logical Plan Physical Plan Selected Physical Plans Cost Model Metadata Catalog Cache Manager ColumnarAQE Native SQL Engine ColumnarPlan ColumnarWSCG LLVM/SIMD Kernel
  • 8. Columnar Whole Stage Codegen JVM Native Stage1 Evaluate Parquet Read Filter Columnar Shuffle Columnar Exchange operator Columnar Shuffle Stage2 Evaluate Parquet Read Hash Join Aggregate Columnar Exchange operator Columnar Aggregate Aggregate JVM Native Stage1 Evaluate Parquet Read Gandiva Filter Columnar Shuffle Columnar Exchange operator HashJoin Aggregate Columnar Shuffle Stage2 Evaluate Parquet Read Columnar Exchange operator Columnar Aggregate Aggregate
  • 9. Native SQL Engine Design ▪ Columnar Data Source ▪ Columnar Shuffle ▪ Columnar Compute ▪ Memory Management
  • 10. Spark Internal Format UnsafeRow Columnar Data Source Columnized Format (parquet, orc) Column to Row? Spark used a virtual way to treat column data as row, while memory is not adjacent Row-based Data Source Arrow based Data Source Arrow Format MetaData Parquet orc csv kudu cassandra HBase RowBased Format (csv, …) Unify and fastJSON V.S.
  • 11. Columnar Data Source 11 Spark Arrow DataSource (pyspark, thriftserver, sparksql,…) Arrow Java Datasets API (Zero data copy, memory reference only) Arrow C++ Datasets API (HDFS, localFS, S3A) (Parquet, ORC, CSV, ..)
  • 12. ▪ Features ▪ Fast / Parallel / Auth supported Native Libraries for HDFS / S3A / Local FS ▪ PushDown supported Pre-executed statistics/metadata filters ▪ Partitioned File and DPP enabled Columnar Data Source 0 <= ID < 5000 5000 <= ID < 10000 unread ID = 7500 ID = 7500 Truncated Files
  • 13. Columnar Shuffle • Hash-based partitioning(split) with LLVM optimized execution • Ser/de-ser based on arrow record batch • Efficient data compression for different data format • Coalesce batches during shuffle read • Supports Adaptive Query Execution(AQE) ColumnarExchange Mapper ColumnarExchange Reducer compressed file compressed file compressed file compressed file CoalesceBatches Hash based partition
  • 14. Supported SQL Operators Overview Operators WindowExec UnionExec ExpandExec SortExec ScalarSubquery ProjectExec ShuffledHashJoin BroadcastJoinExec FilterExec ShuffleExchangeExec BroadcastExchangeExec datasources.v2.BatchScanExec datasources.v1.FileScanExec HashAggregateExec ….. Expression NormalizeNaNAndZero Subtract Substring ShiftRight Round PromotePrecision Multiply Literal LessThanOrEqual LessThan KnownFloatingPointNormalized IsNull And Add …. Expression IsNotNull GreaterThanOrEqual GreaterThan EqualTo ExtractYear Divide Concat Coalesce CheckOverflow Cast CaseWhen BitwiseAnd AttributeReference Alias …. Automatically fallback to row-based execution if there are unsupported operators/expressions
  • 15. ColumnarCondProjection Columnar Projector & Filter ▪ LLVM IR based execution w/ AVX optimized code ▪ Based on Arrow Gandiva, extended with more functions ▪ Combined filter & projection into CondProjector ▪ ColumnarBatch based execution Scan B Filter ColumnarExchange Projection Example: If (Field_A + Field_B + Field_C) as Field_new > Field_D output [Field_new, Field_A, Field_B, Field_C, Field_D] LLVM IR One call
  • 16. Native Hashmap ▪ Faster Hashmap building and lookup w/ AVX optimizations ▪ Compatible with Spark’s murmurhash if configured ▪ Performance benefits for HashAggregation and HashJoins Scan A Scan B Filter BoradcastHashJoin HashAggregate BoradcastExchange ShuffleExchange HashAggregate
  • 17. Stage ColumnarExchange Stage Columnar BroadcastExchange Stage ColumnarShuffleReader ColumnarBroadcastHashJoin Broadcast data consists of 1. HashMap (key -> indices) 2. Arrow RecordBatch using ColumnarBroadcastExchange, data size is reduced by 80% Stage ColumnarBroadcastJoin ColumnarExchange … … ColumnarShuffleReader hash payload 0x8198 <0, 0>, key1 0x7723 <0, 1>,key2 … … 0x6388 <10240, 1076> Key1076 0x9944 <10240, 1077> Key1077 0x8761 <10240, 1078> key1078 +
  • 18. Native Sort ▪ Faster sort implementation w/ AVX optimizations ▪ Most powerful algorithms used for different data structures ▪ Performance benefits for sort and sort merge joins Scan A Scan B CondProjection SortMergeJoin Exchange CondProjection Exchange Sort Sort Exchange Sort
  • 19. Stage ColumnarSortMergeJoin ColumnarExchange … … Stage ColumnarSort Stage Columnarsort Stage ColumnarShuffleReader ColumnarSort and ColumnarSortMergeJoin 3 implementation of sort 1. One column data sort(used by semi or anti) -> inplace radix sort (AVX optimizable) 2. One column of key with multiple columns of payload -> radix sort to indices (AVX optimizable) -> lazy materialization 3. Multiple columns of keys with payloads -> codegened quick sort to indices(AVX optimizable) -> lazy materialization AVX optimized project { chain key_0 in left table #1 apply project compare key_0 and key_1 in right table #2 apply condition materialize one line to arrow }
  • 20. Stage ColumnarExchange Stage ColumnarBroadcastExchange Stage ColumnarBroadcastExchange Stage ColumnarExchange Stage ColumnarShuffleReader Stage ColumnarShuffleReader ColumnarShuffledHashJoin ColumnarProject ColumnarBroadcastJoin ColumnarFilter ColumnarBroadcastJoin ColumnarExchange Columnar WholeStageCodeGen WSCG Code generated Cpp codes 1. Build HashRelation #0 2. Build HashRelation #1 3. Build HashRelation #2 Loop keys_arrays { probe key_0 in HashRelation #0 apply Project probe key_1 in HashRelation #1 apply condition probe key_2 in HashRelation #2 materialize one line to arrow } AVX optimized { probe key_0 in HashRelation #0 apply project probe key_1 in HashRelation #1 apply condition probe key_2 in HashRelation #2 materialize one line to arrow } g++ compilation
  • 21. Native SQL engine call flow Spark Context Executor Oap-native-sql executeColumnar().mapPartitions JniWrapper Expression Tree RecordBatch Gandiva Native SQL Engine CPPCodeGenerator RecordBatch Expression Input Output Precompiled kernels
  • 22. Memory Management SparkTaskMemoryManager ArrowAllocationManager Native MemoryDirect MemoryJVM Memory [java] Spark.TaskMemoryManager [Java] arrow.AllocationManager [cpp] arrow::MemoryPool register grant spill Data Source Column Project Column Sort Column Shuffle Column Reader Native Memory Native Memory Direct Memory Direct Memory Direct Memory Column To Row JVM Memory Row TakeOrdered AndProject JVM Memory Column BHJ Native Memory release release release release release release Data retain hashMap
  • 23. Example run of TPCH-Q4
  • 24. Summary ▪ AVX instructions can greatly improve performance on SQL workloads ▪ Native SQL is open sourced. For more details please visit: https://github.com/Intel-bigdata/OAP ▪ Native SQL engine is under heavy development, works for TPC- H/TPC-DS now
  • 25. Q&A
  • 26. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • 27. Legal Information: Benchmark and Performance Disclaimers ▪ Performance results are based on testing as of Feb. 2019 & Aug 2020 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. ▪ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure. ▪ Configurations: see performance benchmark test configurations.
  • 28. Notices and Disclaimers ▪ No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. ▪ Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. ▪ This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. ▪ The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. ▪ Intel, the Intel logo, Xeon, Optane, Optane Persistent Memory are trademarks of Intel Corporation in the U.S. and/or other countries. ▪ *Other names and brands may be claimed as the property of others ▪ © Intel Corporation.