GPU databases - How to use them and what the future holds

@arnon86@sqreamtech
GPU DATABASES:
HOW TO USE THEM
AND WHAT THE FUTURE HOLDS
or
GD: HTUT AWTFH
for short

@arnon86@sqreamtech
Before we start…
•We offer a free consultation and assessment
to anyone here
•We can help you understand the benefits of
using a GPU database

@arnon86@sqreamtech
Who I am
•From Israel
•4 years at SQream
•Originally part of the dev team
•Tweet about animals a lot - @arnon86

@arnon86@sqreamtech
Who I am
•A big aviation nerd

@arnon86@sqreamtech
“Moore’s law is ending”

@arnon86@sqreamtech
“The consensus was that if we could keep
doing that, if we could go to chips with
1,000 cores, everything would be fine,”

@arnon86@sqreamtech
“It turns out that’s really hard”
Dr. Doug Burger, an expert in chip design at Microsoft.

@arnon86@sqreamtech
So we just take things parallel, right?

@arnon86@sqreamtech
Let’s talk BIG data
Hundreds of TB
(Sometimes even petabytes of data)
coming in at a rate of multiple terabytes per day
Up to 1-4TB
2010 20162008
Up to 10TB
Data is STILL growing exponentially

@arnon86@sqreamtech
530 PB
12000
PB
15000
PB
CERN NSA Google
We’re in the petabyte age
• Petabyte datasets are now the norm
• Even small companies have dozens of terabytes of data for analysis
• Some outliers have more:
– CERN processes 1 petabyte per day,
stores 530 PB total
– In 2012, Facebook analyzed 5 petabytes per day,
stores estimated a few exabytes
– The NSA might hold 12 exabytes

Are we only analyzing the tip of the iceberg?

@arnon86@sqreamtech
What we’ll talk about
•Why GPUs?
•What are GPU databases?
•When are GPU databases good?
•The future

@arnon86@sqreamtech
What is a GPU?
• A processor specialized for display functions
• The GPU renders images, animations and video for the computer's screen.

@arnon86@sqreamtech
What is a GPGPU?
• A general-purpose GPU (GPGPU) is a GPU that performs non-specialized calculations that
would typically be conducted by the CPU.
• Put simply, it’s about taking the GPU and generalizing it for non-graphics.
• AMD and NVIDIA have their own APIs for doing GPGPU programming – rockM and CUDA
respectively.

@arnon86@sqreamtech
Let’s talk core count

@arnon86@sqreamtech
Tesla p100 – 3584 cuda cores

@arnon86@sqreamtech
it’s not a strange piece of hardware

@arnon86@sqreamtech
Gpus all around
• Pretty much all cloud providers now offer GPU instances
• Most hardware vendors offer specially tuned GPU servers
GPUCLOUD

@arnon86@sqreamtech
How gpu acceleration works

@arnon86@sqreamtech
What are GPU Databases?
• A GPU database is a database, relational or non-relational, that uses a GPU to perform
some database operations
• Most of the GPU databases tend to focus on analytics, and they’re offering it to a market
that was oversold on Hadoop for Big Data analytics
• And they’re typically pretty fast
And they’re not only disrupting the in-memory crowd
• GPU databases are more flexible in processing many different types of data, or much
larger amounts of data

@arnon86@sqreamtech
Why gpus in big data?
• High core count allows offloading of ‘heavy’ stuff like JOINs, ORDER BY, GROUP BY from the
CPU to the GPU
• Compression and Decompression processes reduce PCI and disk I/O. These are basically
free on the GPU
• Can also use GPU to do computationally intensive operations like deep learning,
cryptography.

@arnon86@sqreamtech
Today’s data market - databases
• A lot of new databases are in-memory, because “memory is cheap”
• In-memory can’t handle more than ~2TB without very expensive hardware
• Scaling out with in-memory gets very expensive, very fast:
8 SAP HANA machines for handling 40TB has a TCO of $22,000,000 for 4 years

@arnon86@sqreamtech
There’s more than one type of gpu database
In-memory GPU databases
• Typically for small datasets
• Stores data in-memory
• Very fast performance (milliseconds)
• For relatively simple queries
• Limited due to memory constraints
Big Data GPU databases
• Typically for giant datasets
• Stores data on-disk
• Fast performance (seconds-minutes)
• For complex queries
• Theoretically unlimited data-sets
• A good fit for today’s evolving needs

@arnon86@sqreamtech
Don’t BUY hardware, BUY the results
• Your boss (probably) does not care about the chips in the servers
• GPU is a cool buzzword, but buzzwords alone won’t get the job done
• Achieve incredible speeds without betting the (server) farm
• Evaluate databases based on functionality and what they can do for you

@arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Tracking customer behaviour at a large national mobile telecom operator with Tableau and
SQream DB to improve offering and increase revenue

@arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Understanding 40 million customers with SQream DB
80 nodes – 5 full racks
7600 CPU cores
SQream DB v1.9.6
HP Server with NVIDIA Tesla
96 GB RAM + 6 TB storage
Ingest time
Reporting time
Cost of Ownership $$$10,000,000
120 m
300 m 20 m
10 m
$200,000

@arnon86@sqreamtech
33.70
4.0
56
12,000,000
The cost of performance
ACV calculation on 24 TB of data, 300B rows, 8 different tables - with complex, nested joins
31.70
4.7
4
500,000
Netezza
8 full 42U racks, 56 S-Blades
7 TB RAM
SQream DB v1.9.6
Dell C4130 with 4x NVIDIA Tesla K80
512 GB RAM + iSCSI JBOD (20TB)
Average query time
(seconds)
Processing Units
(S-Blade / GPUs)
Compression ratio
Cost of Ownership $$

Major ad-tech increased revenues by improving bids
A major ad-tech deployed an 8 GPU SQream DB instances to unlock more insights from their Hadoop
cluster
Why they chose SQream DB
• TRILLIONS of ad impressions monthly equate to 360TB (raw).
This was too slow with Hadoop / Phoenix.
• Live analytics was unavailable due to Hadoop limitations
• The need to construct bidding histograms for dynamic CPM campaigns was extremely time-consuming
in the current system – query time around 5 hours!
8x NVIDIA Tesla GPUs
Qumulo NAS – 360TB

@arnon86@sqreamtech
Let’s see it in action

@arnon86@sqreamtech
Genome Research - Speed & Scale
SQream and Sheba medical center cut cancer cure research time from years to weeks
200 GB
Average size of a single human
genome sequencing
2 Months
Time it takes a genome researcher to
compare a handful of sequences
1 PB
The amount of storage needed by a
genome research institute
2 Hours
Time it takes a researcher to
compare up to hundreds of
sequences with SQream DB
x100
Factor of
improvement over
existing methods

@arnon86@sqreamtech
Chanel says racks are fashionable. Our customers
think otherwise

@arnon86@sqreamtech
BE EFFICIENT with your hardware
This configuration can analyze ~40TB of data
SQream DB with Tesla cards

@arnon86@sqreamtech
Environmentally friendly
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
Certified servers
Enabled with
Certified storage

@arnon86@sqreamtech
Let’s talk about the future

@arnon86@sqreamtech
Don’t be afraid of the future
• We know new databases are scary
• It’s a risk, but the reward is big
• Innovate all aspects of your data pipeline
Incremental Cold Fusion
The
scary
zone

@arnon86@sqreamtech
How we see the future of GPU databases
• The future is not just GPU databases. Different databases for different needs.
The relational model is still king for most of us
• More data = more processing power needed.
Scalable database solutions that can handle growing data become more relevant
• GPUs used for compute intensive stuff, e.g. graph processing, machine learning, AI
• Rising GPU offerings in the public cloud will allow adoption by more companies
GPUCLOUD

@arnon86@sqreamtech
How we see the future – hardware/Stack
• Improved programming extensions and better compilers in new CUDA/rockM will make it
easier to write good GPU code
• Faster HBM2 memory and PCIe v5.0 to reduce overhead of GPU processing
• More tightly-knit hardware integration, like the Intel H-series integrated GPU processor

@arnon86@sqreamtech
Reminder
•We offer a free consultation and assessment
to anyone here
•We can help you understand the benefits of
using a GPU database

GPU databases - How to use them and what the future holds

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie GPU databases - How to use them and what the future holds

Ähnlich wie GPU databases - How to use them and what the future holds (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

GPU databases - How to use them and what the future holds