The growth of observability trends and Kubernetes adoption generates more demanding requirements for monitoring systems. Volumes of time series data increase exponentially, and old solutions just can’t keep up with the pace. The talk will cover how and why we created a new open source time series database from scratch. Which architectural decisions, which trade-offs we had to take in order to match the new expectations and handle 100 million metrics per second with VictoriaMetrics. The talk will be interesting for software engineers and DevOps familiar with observability and modern monitoring systems, or for those who’re interested in building scalable high performant databases for time series.
3. Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
4. Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
doSimpleThing1()
doSimpleThing2()
5. Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
● I hate over-engineered code, useless abstractions and bloated dependencies
abstractSingletonFabricProducerVisitorOperatorPrototype
6. Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
● I hate over-engineered code, useless abstractions and bloated dependencies
● I like performance optimizations (fasthttp, fastjson, quicktemplate, fastcache)
7. Let’s meet
● I’m Aliaksandr Valialkin - core developer @ VictoriaMetrics
● I like writing programs in Go
● I like simple and clear code
● I hate over-engineered code, useless abstractions and bloated dependencies
● I like performance optimizations (fasthttp, fastjson, quicktemplate, fastcache)
● https://github.com/valyala/
9. What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
10. What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
11. What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
● Easy to setup and operate
12. What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
● Easy to setup and operate
● Low resource usage
13. What is VictoriaMetrics?
● Open source monitoring solution and time series database
● Supports popular data ingestion protocols - Prometheus, InfluxDB, Graphite,
DataDog, OpenTSDB, CSV, JSON
● Can discover and scrape Prometheus targets (Kubernetes too)
● Easy to setup and operate
● Low resource usage
● High performance
18. VictoriaMetrics single-node: scaling data ingestion
● Read incoming data in blocks
● Process blocks in parallel on multiple CPU cores
Client
data Read data
blocks
CPU_1
CPU_2
CPU_N
…
Process blocks
blocks
VictoriaMetrics
19. VictoriaMetrics single-node: scaling data ingestion
● Put the parsed data into independent buffers
CPU_1
CPU_2
CPU_N
…
Parse blocks
Buffer_1
Buffer_2
Buffer_M
…
In-memory buffers
Buffer parsed data
Tech details
20. VictoriaMetrics single-node: scaling data ingestion
● Put the parsed data into independent buffers
● Periodically store buffers to disk as independent LSM parts
Part_1
Part_2
Part_P
…
LSM parts
CPU_1
CPU_2
CPU_N
…
Parse blocks
Buffer_1
Buffer_2
Buffer_M
…
In-memory buffers
Compress and store data
Buffer parsed data
Tech details
22. ● VictoriaMetrics stores data in compressed blocks
● Selected blocks are unpacked in parallel on available CPUs
VictoriaMetrics single-node: scaling querying path
block_1 block_N1
…
series_1
block_1 block_NM
…
series_M
…
CPU_1
CPU_P
…
blocks
block_1 block_N2
…
series_2
23. ● VictoriaMetrics stores data in compressed blocks
● Selected blocks are unpacked in parallel on available CPUs
● Selected time series are processed in parallel on available CPUs
VictoriaMetrics single-node: scaling querying path
block_1 block_N1
…
series_1
block_1 block_NM
…
series_M
…
CPU_1
CPU_P
…
blocks
CPU_1
CPU_P
…
series
block_1 block_N2
…
series_2
25. VictoriaMetrics single-node: scalability limits
● The performance is limited by a single host (CPU, RAM, disk)
● Benchmark numbers:
○ Data ingestion: 300k samples/sec per CPU
○ Active time series: 1 million per GB of RAM
○ Query path: 50 million samples/sec per CPU
26. VictoriaMetrics single-node: scalability limits
● The performance is limited by a single host (CPU, RAM, disk)
● Benchmark numbers:
○ Data ingestion: 300k samples/sec per CPU
○ Active time series: 1 million per GB of RAM
○ Query path: 50 million samples/sec per CPU
● Production numbers:
○ Data ingestion: 2 million samples/sec
○ Active time series: 100 millions
○ Query path: 1 billion samples/sec
○ Total samples: 15 trillions
27. Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
vminsert_1
vminsert_2
vminsert_M
…
HTTP
load
balancer
Incoming
data
29. Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
○ vmselect - processes incoming queries
○ vmstorage - stores the data
vmstorage_1
vmstorage_2
vmstorage_N
…
vminsert_1
vminsert_2
vminsert_M
…
data
HTTP
load
balancer
vmselect_1
vmselect_2
vmselect_P
…
queries
Incoming
data
HTTP
load
balancer
Incoming
queries
30. Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
○ vmselect - processes incoming queries
○ vmstorage - stores the data
● Each component can run on the most suitable hardware
vmstorage_1
vmstorage_2
vmstorage_N
…
vminsert_1
vminsert_2
vminsert_M
…
data
HTTP
load
balancer
vmselect_1
vmselect_2
vmselect_P
…
queries
Incoming
data
HTTP
load
balancer
Incoming
queries
31. Scaling VictoriaMetrics cluster
● VictoriaMetrics cluster consists of three components:
○ vminsert - accepts incoming data
○ vmselect - processes incoming queries
○ vmstorage - stores the data
● Each component can run on the most suitable hardware
● Each component can scale independently to any number of instances
vmstorage_1
vmstorage_2
vmstorage_N
…
vminsert_1
vminsert_2
vminsert_M
…
data
HTTP
load
balancer
vmselect_1
vmselect_2
vmselect_P
…
queries
Incoming
data
HTTP
load
balancer
Incoming
queries
32. VictoriaMetrics cluster: scaling data ingestion
● An http load balancer spreads incoming data among vminsert nodes
● Data ingestion performance scales with the number of vminsert nodes
HTTP load
balancer
vminsert_2
vminsert_1
vminsert_N
…
incoming data
33. VictoriaMetrics cluster: scaling data ingestion
● vminsert automatically shards incoming data among available vmstorage nodes
via consistent hashing
● Each vmstorage node has its own subset of time series (ideally)
● Data ingestion performance scales with the number of vmstorage nodes
vminsert vmstorage_2
vmstorage_1
vmstorage_M
…
sharding
34. VictoriaMetrics cluster: scaling querying path
● An http load balancer spreads incoming queries among vmselect nodes
● QPS scales with the number of vmselect nodes
HTTP load
balancer
vmselect_2
vmselect_1
vmselect_P
…
incoming queries
35. VictoriaMetrics cluster: scaling querying path
● vmselect fetches the needed data from every vmstorage node in parallel
● Querying performance scales with the number of vmstorage nodes
vmselect vmstorage_2
vmstorage_1
vmstorage_N
…
compressed data
36. VictoriaMetrics cluster: scaling querying path
● vmselect fetches the needed data from every vmstorage node in parallel
● Querying performance scales with the number of vmstorage nodes
● vmselect unpacks the fetched data in parallel on available CPUs
● Querying performance scales with the number of vCPUs at a single vmselect node
vmselect vmstorage_2
vmstorage_1
vmstorage_N
…
compressed data
40. VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
41. VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk?
42. VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk? No - cluster capacity scales with disk space and io
43. VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk? No - cluster capacity scales with disk space and io
● Network?
44. VictoriaMetrics cluster: scalability limits
● CPU? No - data ingestion and querying performance scales with CPUs
● RAM? No - cluster capacity scales with RAM
● Disk? No - cluster capacity scales with disk space and io
● Network? Yes!
45. 100M benchmark
● Can VictoriaMetrics cluster accept 100 million samples per second in
production?
● Can VictoriaMetrics cluster handle a billion of active time series
● How much resources does it need?
53. Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
54. Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
● Pushes the real node-exporter metrics to the tested systems
vmagent
node_exporter
scrape
load generator
Prometheus-like system
remote_write
55. Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
● Pushes the real node-exporter metrics to the tested systems
● Allows using the real alerting rules for node-exporter metrics
vmagent
node_exporter
scrape
load generator
Prometheus-like system
remote_write
vmalert
alerting rules
read
queries
56. Prometheus-benchmark
● Helm chart for testing Prometheus-like systems
● Uses production-like workload for data ingestion and querying
● Pushes the real node-exporter metrics to the tested systems
● Allows using the real alerting rules for node-exporter metrics
● https://github.com/VictoriaMetrics/prometheus-benchmark
vmagent
node_exporter
scrape
load generator
Prometheus-like system
remote_write
vmalert
alerting rules
read
queries
63. 100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
64. 100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
● Total number of metrics (aka active series): 820K*1220=1 billion
65. 100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
● Total number of metrics (aka active series): 820K*1220=1 billion
● Scrape interval: 10 seconds
66. 100M benchmark: prometheus-benchmark configs
● 16 load generator pods (8vCPU, 25GB RAM each)
● Scrape targets (node_exporter v1.4.0): 16*51.250=820.000
● Each scrape targets exposes around 1220 metrics
● Total number of metrics (aka active series): 820K*1220=1 billion
● Scrape interval: 10 seconds
● Scrape rate: 1 billion / 10 seconds = 100M samples/sec
67. 100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
68. 100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk)
69. 100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk)
● vminsert: 18 x (16 vCPU, 55GB RAM)
70. 100M benchmark: VictoriaMetrics cluster configs
● Runs in Google Kubernetes Engine via the official VictoriaMetrics helm charts
● vmstorage: 46 x (16 vCPU, 55GB RAM, 2200 GB hdd-based disk)
● vminsert: 18 x (16 vCPU, 55GB RAM)
● vmselect: none (wait for the next talk)
75. 100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
76. 100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%)
77. 100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%)
● Network: 140Gbit/s (can be reduced to 20Gbit/s at the cost of 10% CPU)
78. 100M benchmark: used resources
● vminsert: 206vCPU, 26GB RAM
● vmstorage: 510vCPU, 600GB RAM, 101.2TB disk
● Total: 716vCPU (70%), 626GB RAM (18%), 7.5TB disk (7.5%)
● Network: 140Gbit/s (can be reduced to 20Gbit/s at the cost of 10% CPU)
● Disk IO: 3GB/s write, 450MB/s read
80. 100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
81. 100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
● Total samples ingested: 8.77 trillions
82. 100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
● Total samples ingested: 8.77 trillions
83. 100M benchmark: results
● Stable data ingestion at 100M samples/sec during 24 hours
● Active time series: 1 billion
● Total samples ingested: 8.77 trillions
● Average sample size: 0.85 bytes
84.
85. 100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
86. 100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
vmagent
host_1
host_2
host_1.000.000
…
scrape
VictoriaMetrics cluster
remote_write
a million of hosts
scrape_interval=10s
87. 100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
● Cluster stability improves with the number of nodes
88. 100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
● Cluster stability improves with the number of nodes
● HDD-based disks are enough - there is no need in SSD-based disks
HDD
$40/TB/month
SSD
$170/TB/month
vs
89. 100M benchmark: key takeaways
● VictoriaMetrics cluster performance and capacity scales linearly to 100 nodes
and more
● A single VictoriaMetrics cluster can collect metrics from a million of hosts
● Cluster stability improves with the number of nodes
● HDD-based disks are enough - there is no need in SSD-based disks
● VictoriaMetrics handles large workloads with default configs
90. Reproduce the 100M benchmark on yourself!
● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100
91. Reproduce the 100M benchmark on yourself!
● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100
● Benchmark configs
92. Reproduce the 100M benchmark on yourself!
● https://github.com/VictoriaMetrics/prometheus-benchmark/tree/bm-100
● Benchmark configs
● VictoriaMetrics cluster configs
94. What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
95. What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
96. What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
● Kubernetes-like time series churn rate?
97. What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
● Kubernetes-like time series churn rate?
● A month-long benchmark (needs $$$)?
98. What’s next?
● Benchmark querying performance (50M samples/sec per vCPU processing
speed)?
● A billion samples/sec benchmark?
● 10 billions of active time series?
● Kubernetes-like time series churn rate?
● A month-long benchmark (needs $$$)?
● Share your results!