The earliest relational databases were monolithic on-premise systems that were powerful and full-featured. Fast forward to the Internet and NoSQL: BigTable, DynamoDB and Cassandra. These distributed systems were built to scale out for ballooning user bases and operations. As more and more companies vied to be the next Google, Amazon, or Facebook, they too "required" horizontal scalability.
But in a real way, NoSQL and even NewSQL have forgotten single node performance where scaling out isn't an option. And single node performance is important because it allows you to do more with much less. With a smaller footprint and simpler stack, overhead decreases and your application can still scale.
In this talk, we describe TimescaleDB's methods for single node performance. The nature of time-series workloads and how data is partitioned allows users to elastically scale up even on single machines, which provides operational ease and architectural simplicity, especially in cloud environments.
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
1. Databases Have Forgotten
About Single Node Performance,
A Wrongheaded Trade Off
Matvey Arye, PhD
Core Database Engineer
mat@timescale.com · github.com/timescale
8. But…
MapReduce
specializes in
long jobs that
touch lots of data
• Grep: 150 Seconds
• Sort: 890 Seconds
• Average job completion time of
Google production workloads: 634
Seconds
(Source: Google MapReduce Paper)
13. The high-latency of distributed databases is nothing new
Core
s
Twitter_rv uk_2007_5
Spark 128 1784s 8000s+
Giraph 128 200s 8000s+
GraphLab 128 242s 714s
Graphx 128 251s 800s
Laptop 1 153s 417s
Laptop* 1 15s 30s
Graph Connected Component Analysis
Source: Scalability! But at what COST? - McSherry et al.
Core
s
Twitter_r
v
uk_2007_5
Spark 128 857s 1759s
Giraph 128 596s 1235s
GraphLab 128 249s 833s
Graphx 128 419s 462s
Laptop 1 300s 651s
Page Rank (20 iterations)
14. But that was 10 years ago
The world has changed
160 GB Spinning Rust
2 Core CPU
4GB RAM
60 TB SSDs
64 Core CPU
TBs of RAM
Edge computing
GPUs, TPUs
Then Now
33. Single node: Scaling up via adding disks
• Faster inserts
• Parallelized queries
How Benefit
Chunks spread across many disks (elastically!)
either RAIDed or via distinct tablespaces
36. Avoid querying chunks via constraint exclusion (more)
SELECT time, device_id, temp FROM data
WHERE time > now() - interval ’24 hours’ Won’t exclude
chunks in plain
PostgreSQL
37. Example Query Optimization
CREATE INDEX ON readings(time);
SELECT date_trunc(‘minute’, time) as bucket,
avg(cpu)
FROM readings
GROUP BY bucket
ORDER BY bucket DESC
LIMIT 10;
Will this usethe index?
48. Lessons for DB designers
1. Consider (insights!):
A. Partitioning even on single-node
• Even across disks
B. Performance analysis:
• High-level: Query optimization
• Low-level: Profiling
C. High-availability is possible on single-node
2. Concentrate on scale-up
49. Lessons for DB users
• Absolute performance as important as scaling numbers.
• Don’t go horizontally distributed unless you have to.
• HA not same as horizontal scalability.
• Replication cheaper than distribution.
• SQL and ACID are extremely useful.
50. Open Source (Apache 2.0)
• github.com/timescale/timescaledb
Join the Community
• slack.timescale.com