A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases

The Glass Half Full
Using Programmable Hardware Accelerators in
Analytical Databases
Zsolt István
IMDEA Software Institute 1

IMDEA Software
Institute
• 16 Faculty in the areas of:
• Program Analysis and Verification
• Languages and Compilers
• Security and Privacy
• Theoretical Computer Science
• Distributed Systems and Databases
• ~10 Post-docs, ~25 PhD Students,
~10 Interns
• Located in UPM Montegancedo Campus,
Madrid
• We are hiring! https://software.imdea.org/

▪ OLAP – Online Analytical Processing
▪ Large datasets – up to TBs
▪ Ad-hoc querying to extract insight, recurring
reporting – Possibly complex operations
▪ Read-mostly workloads, updates in batches
▪ OLTP – Online Transaction Processing
▪ Smaller datasets
▪ Queries known, relate to business actions
▪ Makes heavy use of indexes
▪ Reads and updates intermixed
3
Context: Analytical Databases

4
Databases were a 25 Billion $ market in 2018…
Could we specialize machines to them?
https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/

▪ Fully custom machine for databases
▪ Processors – special ISA microprocessors
▪ Memory – magnetic bubbles and CCDs
▪ Semiconductor technology and
general purpose CPUs took over
5
Database Computer – ’70s
“The first goal is to design it with the
capability of handling a very large on-line
database of 10^10 bytes or beyond since
special-purpose machines are not likely to
be cost-effective for small databases.”
Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases.
IEEE Trans. Computers 28(6): 414-429 (1979)

▪ Based on VAX multi-
processor system
▪ By the time the software
and hardware were
developed, CPUs have
become much faster
▪ Couldn’t keep up with
Moore’s law
6
Gamma Machine – ’80s
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High
Performance Dataflow Database Machine. VLDB 1986: 228-237

7
Data/Compute Gap
CPU Scaling Commodity in Cloud
Specialized
Hardware
Revival

8
Renewed interest in Specialized Hardware
ASICsFPGAsCPUs

Field Programmable Gate Array (FPGA)
▪ Free choice of architecture
▪ Fine-grained pipelining,
communication, distributed memory
▪ Tradeoff: all “code” occupies chip
space
▪ Evolving platform: larger chips, more
heterogeneity
9
Re-programmable Specialized Hardware
Op 1
Op 2
Op 3

10
Integration Options
Accel.
1) On the side 2) In data-path 3) Co-processor
Data
Data Data
Accel.
Accel.

▪ Accelerator
▪ Amazon F1
▪ In data path
▪ Microsoft Catapult
▪ Co-processor
▪ Intel Xeon+FPGA
11
In the Cloud Today
Socket1 Socket2
CPU FPGA
Socket1
CPU FPGA
CPU
FPGA
Intel Xeon+FPGA Gen.1 Intel Xeon+FPGA Gen.2

▪ 1) On the side acceleration introduces overhead
▪ Many related work offers no real speedup if we factor in data
movement, transformation, software overhead… 13
The Glass Half Empty…
0
20
40
60
80
100
120
Software With Acceleration
Query execution time
Compute Data Movement
2xAccel.
Data

▪ 2) “All or nothing” behavior makes query planning difficult
▪ Example: fixed capacity hash table on FPGA
▪ Constant time access for reads and writes
▪ What happens if data doesn’t fit?
▪ Can’t always know the number of keys aprioi
14
#

▪ 3) Analytical databases becoming more optimized / not much
compute in core SQL
▪ X100 [CIDR05] showed that <10% of compute time spent on SQL
operators +,-,*,SUM,AVG in analytical queries
▪ Columnar stores often memory bound (10s of GB/s)
15

▪ On the side acceleration introduces overhead
▪ “All or nothing” behavior makes query planning difficult
▪ Analytical databases becoming more optimized / not much
compute in core SQL
16

▪ On the side acceleration introduces overhead
✓ Reduce data movement bottlenecks
17
The Glass Half Full…

▪ IBEX: Database storage engine with processing offload
▪ Filter and pre-aggregate for analytic workloads
18
Processing in data path: Smart Flash
Database Server
IBEX
SSD
IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan
and G. Alonso, VLDB’14
→ Larger bandwidth, more IOPS
(Samsung YourSQL, MIT BlueDBM)
▪ Opportunity to extend SSDs/Flash
with complex offload
Samsung “smart” SSD

19
Processing in data path: Distributed Processing
Workers(Compute)
Storage
+ Provisioning
+ Scalability
Caribou: Distributed
storage with processing
• Specialized HW nodes
• 10Gbps access
• 25W power cons.
Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage. PVLDB 10(11), 2017.

20
Smart Storage in Databases: Filter push-down
Intel Hyperscan library (Xeon E5-2680 v2)
2.8x
SELECT … FROM customer
WHERE age<35 AND purchases>2
AND address LIKE “%PO. Box 123%”
▪ Challenge: guarantee that filtering never slows down retrieval
▪ Algorithms can be re-imagined to become bandwidth-bound
instead of compute-bound
▪ Extend the state of the art: parameterization without re-programming [FCCM16]
▪ Many options: Regular expressions, comparisons, decompression, …
[FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16

▪ “All or nothing” behavior makes query planning difficult
✓ Hybrid processing
21

▪ Group-by: Compute aggregate function over categories
▪ select avg(salary) from employees group by department
22
IBEX’s Hybrid Group-by
CPUIbex with SW-only Group-By
Projection Selection Group-by
Final
Group
s
Input table
Filtered
data

23
CPUIbex with HW-only Group-By
Final
Group
s
Input table
Filtered
data

CPUIbex with HW-only Group-By
Final
Group
s
Input table
Filtered
data
▪ If number of groups does not fit on FPGA?
▪ Send partial aggregates – finalize in SW
▪ Worst case: same as no acceleration
▪ Best- case: All in HW!
24
CPUIbex with Hybrid Group-by
Input table Projection Selection Group-by Group-by
Final
Group
s
Filtered
data
Partial
Group
s
Challenge: How to split across accelerator and software?

✓ Hybrid Processing
▪ Analytical databases becoming more optimized / not much
compute in core SQL
✓ Emerging compute-intensive workloads
25
The Glass Half Full

▪ Databases adopting new ways of analyzing the data
▪ SAP Hana, Oracle, SQL Server, etc.
▪ Specialized hardware can help both with model building [Kara18],
inference [Owaida18]
▪ Benefits for “classical” algorithms as well
[Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018)
[Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300
26
The Rise of Machine Learning

CPUFPGA Co-processor
27
doppioDB: a hybrid database engine
Database
Engine
(MonetDB)
Hardware
Operator
Software
operator
Software
operator
Hardware
Operator
Hardware
Operator
▪ Goal: extend the capabilities of analytical databases
▪ FPGA works on the same data as software (cache-coherent access)
▪ Can combine SW and HW operators inside the same query
▪ Challenge: ensure high utilization of FPGA, use in many queries
DRAM (DB Tables)
No data copy,
transformation,
partitioning, etc.
Hardware
Operator

K-means – Algorithm
◼ Goal: partition unlabeled data into several
clusters, where the number of clusters is
the “k” in the k-means.
◼ Two steps in each iteration:
◼ Assignment: assign data points to
closet centroid according to distance
metric
◼ Centroid update: the centroids are re-
calculated by averaging all the data
points within each cluster
◼ Long process if the data set and number of
iterations are large
28

DRAM
(DB Tables)
Design – Execution Walk-Through
Receives K-Means parameters1
Fetch the initial centroids and
the data
2
3 Calculates the distance between
a data point and all the centroids
and assign it to closest centroid
4 Accumulates data points per cluster and
counts how many data points are assigned to
each cluster
Collect partial results from each pipeline5
Division for updating new centroid6
Writes back the final results7
1
2
3 4
56
7
Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases. FPL 2018
29

30
Uses of Parallelism
K is known /
Centroids
known
Need to determine K
(Elbow method)
▪ K-Means algorithm
▪ FPGA outperforms several cores of the CPU
▪ Can use parallelism in two ways – cover more queries

▪ Text: Regular expression matching, Edit distance, …
▪ Database ops.: Skyline queries, Group-by aggregations, …
▪ Statistics: Histograms, Count-min sketch, Bloom filters, …
▪ Machine learning: Clustering (K-means), Stochastic Gradient
Descent, Decision Trees, …
▪ Data management: Hash tables, hash functions, …
▪ [Your algorithm here]
31
Wide range of algorithms can benefit from hardware

✓ Hybrid Processing
✓ Emerging compute-intensive workloads
32
Future Challenges…
▪ Managing Programmable Hardware accelerators
▪ Is this the job of the OS or does the DB has to take control?
▪ How to share programmable hardware across tenants
▪ Compilation/synthesis of hardware accelerators
▪ Can we derive accelerators from user queries?
▪ Intermediary DSL or building blocks we could use?
For more details, see: The Glass Half Full: Using Programmable Hardware Accelerators in Analytics. Z. István. IEEE Data Engineering
Bulletin, March 2019.

A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases

Ähnlich wie A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases (20)

Mehr von Facultad de Informática UCM

Mehr von Facultad de Informática UCM (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Databases