GPU databases are the hottest new thing, with about 7 different companies producing their own variant. In this session, we will discuss why they were created, how they are already disrupting the database world, and what the future of computing holds for them.
This presentation demonstrates how the power of NVIDIA GPUs can be leveraged to both accelerate speed to insight and to scale the amount of hot and warm data analyzed to meet the increasing demands of data scientists and business intelligence professionals alike, as well as to find tactical and strategic insights with greater speed on exponentially growing datasets.
Organizations commonly believe that they are advancing in analytical capabilities due to the rise in the data science profession and the myriad of technologies available for analytics, business intelligence, artificial intelligence and machine learning. However, if you do the math, they are actually falling behind as the increases in the rates of data collection volume far outpace the rate of increases in hot and warm data used for analytics. This is causing organizations to rely on an ever-decreasing percentage of their information assets for decision making.
We talk about why GPU databases were created and share what sets SQream apart from other GPU databases, MPP solutions, in memory and Hadoop based analytic alternatives.
We will also outline how an organization can use GPU databases to thrive in the information revolution by using a significantly greater percentage of its data for analytical purposes, obtaining insights that are desired today, and will remain cost-effective into the next few years when data lakes are expected to balloon from petabytes to exabytes.
2. @arnon86@sqreamtech
Before we start…
•We offer a free consultation and assessment
to anyone here
•We can help you understand the benefits of
using a GPU database
9. @arnon86@sqreamtech
Let’s talk BIG data
Hundreds of TB
(Sometimes even petabytes of data)
coming in at a rate of multiple terabytes per day
Up to 1-4TB
2010 20162008
Up to 10TB
Data is STILL growing exponentially
10. @arnon86@sqreamtech
530 PB
12000
PB
15000
PB
CERN NSA Google
We’re in the petabyte age
• Petabyte datasets are now the norm
• Even small companies have dozens of terabytes of data for analysis
• Some outliers have more:
– CERN processes 1 petabyte per day,
stores 530 PB total
– In 2012, Facebook analyzed 5 petabytes per day,
stores estimated a few exabytes
– The NSA might hold 12 exabytes
14. @arnon86@sqreamtech
What is a GPU?
• A processor specialized for display functions
• The GPU renders images, animations and video for the computer's screen.
15. @arnon86@sqreamtech
What is a GPGPU?
• A general-purpose GPU (GPGPU) is a GPU that performs non-specialized calculations that
would typically be conducted by the CPU.
• Put simply, it’s about taking the GPU and generalizing it for non-graphics.
• AMD and NVIDIA have their own APIs for doing GPGPU programming – rockM and CUDA
respectively.
19. @arnon86@sqreamtech
Gpus all around
• Pretty much all cloud providers now offer GPU instances
• Most hardware vendors offer specially tuned GPU servers
GPUCLOUD
21. @arnon86@sqreamtech
What are GPU Databases?
• A GPU database is a database, relational or non-relational, that uses a GPU to perform
some database operations
• Most of the GPU databases tend to focus on analytics, and they’re offering it to a market
that was oversold on Hadoop for Big Data analytics
• And they’re typically pretty fast
And they’re not only disrupting the in-memory crowd
• GPU databases are more flexible in processing many different types of data, or much
larger amounts of data
22. @arnon86@sqreamtech
Why gpus in big data?
• High core count allows offloading of ‘heavy’ stuff like JOINs, ORDER BY, GROUP BY from the
CPU to the GPU
• Compression and Decompression processes reduce PCI and disk I/O. These are basically
free on the GPU
• Can also use GPU to do computationally intensive operations like deep learning,
cryptography.
23. @arnon86@sqreamtech
Today’s data market - databases
• A lot of new databases are in-memory, because “memory is cheap”
• In-memory can’t handle more than ~2TB without very expensive hardware
• Scaling out with in-memory gets very expensive, very fast:
8 SAP HANA machines for handling 40TB has a TCO of $22,000,000 for 4 years
24. @arnon86@sqreamtech
There’s more than one type of gpu database
In-memory GPU databases
• Typically for small datasets
• Stores data in-memory
• Very fast performance (milliseconds)
• For relatively simple queries
• Limited due to memory constraints
Big Data GPU databases
• Typically for giant datasets
• Stores data on-disk
• Fast performance (seconds-minutes)
• For complex queries
• Theoretically unlimited data-sets
• A good fit for today’s evolving needs
25. @arnon86@sqreamtech
Don’t BUY hardware, BUY the results
• Your boss (probably) does not care about the chips in the servers
• GPU is a cool buzzword, but buzzwords alone won’t get the job done
• Achieve incredible speeds without betting the (server) farm
• Evaluate databases based on functionality and what they can do for you
27. @arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Tracking customer behaviour at a large national mobile telecom operator with Tableau and
SQream DB to improve offering and increase revenue
28. @arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Understanding 40 million customers with SQream DB
80 nodes – 5 full racks
7600 CPU cores
SQream DB v1.9.6
HP Server with NVIDIA Tesla
96 GB RAM + 6 TB storage
Ingest time
Reporting time
Cost of Ownership $$$10,000,000
120 m
300 m 20 m
10 m
$200,000
29. @arnon86@sqreamtech
33.70
4.0
56
12,000,000
The cost of performance
ACV calculation on 24 TB of data, 300B rows, 8 different tables - with complex, nested joins
31.70
4.7
4
500,000
Netezza
8 full 42U racks, 56 S-Blades
7 TB RAM
SQream DB v1.9.6
Dell C4130 with 4x NVIDIA Tesla K80
512 GB RAM + iSCSI JBOD (20TB)
Average query time
(seconds)
Processing Units
(S-Blade / GPUs)
Compression ratio
Cost of Ownership $$
30. Major ad-tech increased revenues by improving bids
A major ad-tech deployed an 8 GPU SQream DB instances to unlock more insights from their Hadoop
cluster
Why they chose SQream DB
• TRILLIONS of ad impressions monthly equate to 360TB (raw).
This was too slow with Hadoop / Phoenix.
• Live analytics was unavailable due to Hadoop limitations
• The need to construct bidding histograms for dynamic CPM campaigns was extremely time-consuming
in the current system – query time around 5 hours!
8x NVIDIA Tesla GPUs
Qumulo NAS – 360TB
32. @arnon86@sqreamtech
Genome Research - Speed & Scale
SQream and Sheba medical center cut cancer cure research time from years to weeks
200 GB
Average size of a single human
genome sequencing
2 Months
Time it takes a genome researcher to
compare a handful of sequences
1 PB
The amount of storage needed by a
genome research institute
2 Hours
Time it takes a researcher to
compare up to hundreds of
sequences with SQream DB
x100
Factor of
improvement over
existing methods
37. @arnon86@sqreamtech
Don’t be afraid of the future
• We know new databases are scary
• It’s a risk, but the reward is big
• Innovate all aspects of your data pipeline
Incremental Cold Fusion
The
scary
zone
38. @arnon86@sqreamtech
How we see the future of GPU databases
• The future is not just GPU databases. Different databases for different needs.
The relational model is still king for most of us
• More data = more processing power needed.
Scalable database solutions that can handle growing data become more relevant
• GPUs used for compute intensive stuff, e.g. graph processing, machine learning, AI
• Rising GPU offerings in the public cloud will allow adoption by more companies
GPUCLOUD
39. @arnon86@sqreamtech
How we see the future – hardware/Stack
• Improved programming extensions and better compilers in new CUDA/rockM will make it
easier to write good GPU code
• Faster HBM2 memory and PCIe v5.0 to reduce overhead of GPU processing
• More tightly-knit hardware integration, like the Intel H-series integrated GPU processor
41. @arnon86@sqreamtech
Don’t BUY hardware, BUY the results
• Your boss (probably) does not care about the chips in the servers
• GPU is a cool buzzword, but buzzwords alone won’t get the job done
• Achieve incredible speeds without betting the (server) farm
• Evaluate databases based on functionality and what they can do for you