The RAPIDS Accelerator for Apache Spark is a plugin that enables the power of GPUs to be leveraged in Spark DataFrame and SQL queries, improving the performance of ETL pipelines. User-defined functions (UDFs) in the query appear as opaque transforms and can prevent the RAPIDS Accelerator from processing some query operations on the GPU.
This presentation discusses how users can leverage the RAPIDS Accelerator UDF Compiler to automatically translate some simple UDFs to equivalent Catalyst operations that are processed on the GPU. The presentation also covers how users can provide a GPU version of Scala, Java, or Hive UDFs for maximum control and performance. Sample UDFs for each case will be shown along with how the query plans are impacted when the UDFs are processed on the GPU.
4. No Code Changes
§ Scala
§ Java
§ PySpark
§ Spark SQL
§ SparkR
§ Koalas
§ Requires Spark 3.x
Accelerates SQL and DataFrame with GPUs
start = time.time()
spark.sql(“””
select o_orderpriority, count(*) as order_count
from orders
where
o_orderdate >= date ‘1993-07-01’
and o_orderdate < date ‘1993-07-01’ + interval
‘3’ month
and exists (
select * from lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by o_orderpriority
order by o_orderpriority”””).show()
time.time() - start
5. NDS Benchmark Dataset
• Approximately 3 TB of raw data
• 1 TB of compressed Parquet
• Partitioned
• Double values for decimals
• Stored in HDFS
6. Benchmark Hardware
EGX / NVIDIA Certified OEM Servers
Nodes 8
CPU
2 x AMD EPYC 7452
(64 cores/128 threads)
GPU
2 x NVIDIA Ampere A100, PCIe,
250W, 40GB
RAM 0.5 TB
Storage 4 x 7.68 TB Gen4 U.2 NVMe
Networking
1 x Mellanox CX-6 Single Port
HDR100 QSFP56
Software
HDFS (Hadoop 3.2.1)
Spark 3.0.2 (stand alone)
9. How It Works
Dask,
cuDF, Pandas
Python
Cython
cuDF C++
CUDA Libraries
CUDA
Java
JNI bindings
Spark DataFrame,
Scala, PySpark
10. How It Works
RAPIDS Accelerator
for Apache Spark
UCX Libraries
RAPIDS C++ Libraries
JNI bindings
Mapping From Java/Scala to C++
DISTRIBUTED SCALE-OUT SPARK APPLICATIONS
APACHE SPARK CORE
Spark SQL Spark Shuffle
DataFrame
if gpu_enabled(op, data_type)
call-out to RAPIDS
else
execute standard Spark op
● Custom Spark Shuffle
● Optimized for RDMA and
GPU-to-GPU transfer
CUDA
JNI bindings
Mapping From Java/Scala to C++
11. How It Works
DataFrame
Logical Plan
Physical Plan
RDD[InternalRow]
bar.groupBy(
col(”product_id”),
col(“ds”))
.agg(
max(col(“price”)) -
min(col(“price”)).alias(“range”))
SELECT product_id, ds,
max(price) – min(price) AS
range FROM bar GROUP BY
product_id, ds
QUERY
GPU
PHYSICAL
PLAN
Physical Plan
RDD[ColumnarBatch]
12. Translating a Simple Aggregation Query
CPU
PHYSICAL
PLAN
Read Parquet File
First Stage
Aggregate
Shuffle Exchange
Second Stage
Aggregate
Write Parquet File
Combine Shuffle
Data
Read Parquet File
First Stage
Aggregate
Shuffle Exchange
Second Stage
Aggregate
Write Parquet File
Convert to Row
Format
Convert to Row
Format
GPU
PHYSICAL
PLAN
14. Opaque User-Defined Functions
• Need to translate logic to GPU operations
• UDFs hide custom logic behind a generic interface
• Custom logic may be supported but difficult to discern
• UDFs can force computation to the CPU
15. Columnar and Row Conversions
• CPU executes row-by-row
• GPU executes in columnar batches
• Data format conversion overhead
• Optimizing but never zero cost
17. Automatic Scala UDF Handling
• Optional plugin with the RAPIDS Accelerator
• Uses JVM reflection to analyze UDF bytecode
• Attempts to translate UDF logic to Catalyst operations
• Common math operations
• Type casts
• Conditional (if, case)
• Common string operations
• Date and time parsing via LocalDateTime
18. Scala UDF Example Translation
val myudf = (x: Long, y: String) =>
s"$y := ${2*x}”
spark.register.udf(“myudf”, myudf)
sql(“SELECT myudf(c, s) as udfcol
from data”)
Catalyst Expression Tree
Scala UDF
Concat
s ” := ” Cast
Multiply
2 c
19. Keeping Data on the GPU
Project [if (isnull(c#5L))
null else
myudf(knownnotnull(c#5L),
s#2) AS udfcol#228]
GpuProject [gpuconcat(,
c#2, := , cast((2 * s#5L)
as string)) AS udfcol#230]
20. Scala UDF Compiler Limitations
• No looping constructs
• No higher-order functions
• Corner-case semantic differences (e.g.: divide-by-zero)
22. Alternate UDF Implementation for GPU
• UDF provides implementation for CPU and GPU
• CPU executes row-by-row
• GPU executes in RAPIDS cuDF columnar batches
• Enables GPU-specific algorithms and optimizations
24. RAPIDS UDF Interface
import ai.rapids.cudf.ColumnVector;
/**
* Evaluate a user-defined function with RAPIDS cuDF columnar inputs
* producing a cuDF column as output
*/
public interface RapidsUDF {
ColumnVector evaluateColumnar(ColumnVector... args);
}
25. Case Study: URLDecode
public class URLDecode implements UDF1<String, String> {
/** Row-by-row implementation that executes on the CPU */
@Override
public String call(String s) {
String result = null;
if (s != null) {
result = URLDecoder.decode(s, "utf-8");
}
return result;
}
26. Case Study: URLDecode
public class URLDecode implements UDF1<String, String>, RapidsUDF {
[…]
/** Columnar implementation that runs on the GPU */
@Override
public ColumnVector evaluateColumnar(ColumnVector... args) {
ColumnVector input = args[0];
try (Scalar plusScalar = Scalar.fromString("+");
Scalar spaceScalar = Scalar.fromString(" ");
ColumnVector replaced = input.stringReplace(plusScalar, spaceScalar)) {
return replaced.urlDecode();
}
}
28. Custom Native GPU Code Supported
• Existing cudf Java bindings not required
• UDF can use other CUDA libraries
• Examples in the RAPIDS Accelerator repository
• Cosine similarity operating on float arrays
30. Future Work
• Expand support to other user-defined function types
• UDAF
• Hive UDTF
• Improved Pandas UDF data transfer
31. Improved Pandas Data Transfer
JVM PYTHON
Row Arrow
Run Pandas UDF
Arrow
Row
CPU
Arrow
Arrow
Arrow
Arrow
GPU Run Pandas UDF
32. For More Information
• Check out other RAPIDS Accelerator talks
• SAIS 2020: Deep Dive into GPU Support in Apache Spark 3.x
• GTC 2021: S31846 Running Large-Scale ETL Benchmarks with GPU-
Accelerated Apache Spark
• GTC 2021: S31822 Accelerating Apache Spark Shuffle with UCX
• The RAPIDS Accelerator is open source
• https://github.com/NVIDIA/spark-rapids