Slides for a talk.
Talk abstract:
In the dark of the night, if you listen carefully enough, you can hear databases cry. But why? As developers, we rarely consider what happens under the hood of widely used abstractions such as databases. As a consequence, we rarely think about the performance of databases. This is especially true to less widespread, but often very useful NoSQL databases.
In this talk we will take a close look at NoSQL database performance, peek under the hood of the most frequently used features to see how they affect performance and discuss performance issues and bottlenecks inherent to all databases.
13. Things we will take a look at
Storage algorithms & Storage
Indexing & Queries
Network
14. Let’s start with something simple:
Storage.
It just works, no?
Famous last words!
15. Before we discuss storage, here is a riddle…
RavenDB server-wide backup failed
• The instance had multiple databases in single instance
• Plenty of memory and cores, resource usage is small
• Nothing else was running on the machine EXCEPT RavenDB
• Scheduled backup tasks fail soon after they started
23. What can we do about storge issues?
• Load test database code to ensure
• Write-through throughput
• Enough IOPS for expected production load (disk queue length is <= 2)
• Cloud provision IOPS
• Load-test application to find limits of the system
• Monitoring! (too long queues = storage bottleneck)
27. A tale of two primary keys
• One embedded transactional database engine (LMDB)
• 100 transactions, 100 key/value writes per transaction
• Two databases, keys and values have the same size
• One uses sequential keys (UuidCreateSequential)
• One uses random keys (UuidCreate)
28. A tale of two primary keys
0
20
40
60
80
100
120
140
160
180
0 2 4 6 8 10 12 14 16
#ofseeksperTX
Transaction #
B-Tree seeks per write TX
Random
Sequential
29. A tale of two primary keys
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0 2 4 6 8 10 12 14 16
TotalseeklatencyperTX
Transaction #
Total seek latency per TX
Random
Sequential
30. Process Monitor
Types of OS operations to listen (file activity, network activity, etc)
Types of operations
34. The cost of hops in the tree
Disk seek
https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html
Sequential reads
35. Minimize performance impact of keys
• Sequential keys allow better performance
• 1,2,3,4,5
• Users/1, Users/2, Users/3
• Notice: B-Trees are used to store data AND indexes
• Query performance!
37. The tale of an occasionally slow database
• Sometimes, Cassandra database was fast and sometimes not
• This happened non-deterministically
A page in
Cassandra
documentation!
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/slowReads.html
38. Storage algorithms (log-structured storage engines)
Users/1
Users/3
In-memory Storage
Users/3
Users/5
Users/22
Users/3
SSTable SSTable SSTable
Usually a B-Tree or Skip List
49. ACID guarantees!
• Atomicity, Consistency, Isolation, Durability
• Note: not all DBs support it
• All of RDBMS
• Some NoSQLs – RavenDB, LevelDB, LMDB
50. Write-ahead log (WAL) - Atomicity, Durability
Put "A" Put "B" Commit Put "X" CommitWAL
Data Storage
Flush
Writing
51. Write-ahead log (WAL)
• "Write-Through" writes – no caching (otherwise no durability!)
• Lots of small writes (overhead of each write)
Write-Through
Volatile
Buffer
Regular Write Periodic Flush
62. Why?
Field 1 Field 2
Lyon France
Paris France
Oslo Norway
In some databases (like MongoDB)
• Indexed fields are concatenated into single index key
• Filtering only by prefix
Index Key Record IDs
LyonFrance [7],[1],[4]
ParisFrance [5],[6]
OsloNorway [12],[2],[9],[34]
The values are concatenated!
63. Why?
Field 1 Field 2
Lyon France
Paris France
Oslo Norway
In some databases (RavenDB, any Lucene-based index)
• Indexed terms stored separate
• Filtering by one or both fields in any order (union/intersect as needed)
Index Key Record IDs
Lyon [7],[1],[4]
Paris [5],[6]
Oslo [12],[2],[9],[34]
Index Key Record IDs
France [7],[1],[4],[5],[6]
Norway [12],[2],[9],[34]
ShipTo.City Index
ShipTo.Country
Index
64. Collection/table scans are easily overlooked
Collection scan - development
• Small amount of data
• Extremely small query latency
Collection scan - production
• Large amount of data
• HUGE latency (quite often!)
Latency: 50ms vs 50 hours
65. Indexing
• Indexes are stored as trees (usually B-trees)
• Updates have non-trivial complexity!
https://commons.wikimedia.org/wiki/File:Trie_example.svg
66. Indexing
Search time complexity (WHERE clause):
O(log(N)) + O(log(M)) + O(Max(K,P))
Where:
• N and M are amount of rows in indexes
• K and P are result sets of index searches
67. And if we use RDBMS, things become even
more interesting...
Join Algorithm Complexity
Merge Join O(n*log(n) + m*log(m))
Hash Join O(n + m)
Index Join O(m*log(n))
68. And if we have a non-trivial query...
https://dev.to/tyzia/example-of-complex-sql-query-to-get-as-much-data-as-possible-from-database-9he
Those are 10 JOIN statements!
69. …we have complexity between
O(log(n)) and O(too much)!
More often than not it is O(too much)…
70. What can (should!) we do?
• RDBMS
• Proper indexing (kinda obvious, but still )
• Optimize (remove unnecessary JOINs – depends on business logic)
• Reduce query complexity
• Replace ‘row by row’ cursors with set based queries
• Reduce the amount of work queries do (for example, unnecessary sub-queries)
• Remove ORDER BY where it makes sense (huge overhead)
• Other optimizations are possible
• NoSQL
• Proper modeling
• Well planned indexing
71. Let’s talk a bit about NoSQL Data Modeling
• Documents are independent
• Transaction borders
• Depend on context!
• Depend on query pattern
• Depend on data growth
85. What can we do?
• Refactor to reduce number of requests (kinda obvious, but still…)
• NHibernate – Future Queries
• Entity Framework - QueryFuture
• RavenDB – Lazy Queries
86. May sound trivial, but…
Do take a look at database traffic while stress testing and if possible in
production too.
• Fiddler
• Wireshark
• Profilers
• Any other tool to inspect traffic
87. To sum it up
• Databases are abstractions
• Abstractions are leaky and might be the cause of perf issues
• Such perf issues can be dealt with (if we know about the "leak"!)
I have been working on internals of a NoSQL database for some time now.
Mostly I am working on clusters and distributed stuff like replication.
Also, I do support calls and in this talk I will talk about various performance issues I have seen
… a company called Hibernating Rhinos.
We have created profilers for ORMs, but our main business is (click to next slide)
…our main business is RavenDB – NoSQL document database
.Net Core MENTION that RavenDB is built on .Net Core
Many times developers deal with databases by just tweaking settings and blindly playing around.
It is very tempting to think of databases as magic boxes
You query a database and get answers to complex queries…
Things simply… work.
…but databases are not magic!
Databases are abstractions to complex internals
Many things can go wrong under the outermost layer the API
Why things can go wrong? Because ALL abstractions are leaky
There are many types of databases
Give concrete examples of each type of database
RDBMS: MySQL, PostgreSQL, MS-SQL, Oracle
Document DBs: CouchDB, MongoDB, RavenDB
GraphDB: Neo4j, OrientDB
Key/value store: LMDB, BerkeleyDB, Cassandra (hybrid between table and key/value)
Note that there are hybrid databases such as
Cassandra (hybrid column-family store) and
OrientDB (hybrid document & graph)
(more than 400 database companies)
Databases are different, but deep under the hood they use the same algorithms
We will take a look at multiple reasons why databases will cry.
Note: mention that things we take a look at only scratch the surface of potential issues
We will start from storage..
Before we jump into details, here is a riddle…
Mention this is a real support case
involve audience here ask them to theorize
Mention that looking at monitoring numbers (more specifically info about disk perf)
* Mention that this is relevant to many types of DBs
Mention it was SAN timeout
Those database were stored at SAN disk SHARED with other systems
The solution? Make the backups happen one after another (so the storage won’t be overloaded)
How many concurrent IO the storage supports?
Mention that IOPS is input/output operations per second
Load testing of software is understandable, but what about load testing of hardware (storage)?
Tools are needed for it…
Sequential/Random
Sequential – accessing files sequentially (for things like video streaming)
Random – accessing files randomly more seeks
Queues – how much writes each process does concurrently
Threads – how much processes access the disk concurrently
And now, let’s take a look at yet another reason why databases can cry at night
Tell a bit about LMDB what is it?
(OpenLDAP, some Linux distros such as Ubuntu, Debian and Fedora)
Tell about WHAT kind of thing I checked,
Just a note – this information is gathered using Sysinternals Process Monitor
In Linux there is an equivalent - a strace (it is a tool)
Process Monitor is useful to trace the ways any application accesses storage, including OS calls and latencies (in this context)
Not the only kind of trees that are used, but B-Trees are the most prevalent
* B-Trees are self-balancing!
Used in: CouchDB, RavenDB, MS-SQL, Oracle
Non-sequential keys in B trees can cause it being much deeper.
In here – the same data but with different keys
Conclusion: different data patterns can give DIFFERENT performance on the same engine
How many is many? I am talking about millions of data records here
Mention that this is a true story:
Tell about a user that told such story at a workshop (London)
Log Structured Merge Trees are used in database storage engines
Mention that it is optimized for writes
* Used in Lucene, LevelDB, HBase, SQLite4, Apache Cassandra, Tarantool (mail.ru)
the data is kept in multiple different structures, each is optimized for different storages (memory/disk);
* SSTable (each column) ordered immutable map from keys to values
* data is synchronized between the two structures efficiently, in batches
Mention that it is optimized for writes
Log Structured Merge Trees are used in database storage engines
* Used in Lucene, LevelDB, HBase, SQLite4, Apache Cassandra, Tarantool (mail.ru)
* the data is kept in multiple different structures, each is optimized for different storages (memory/disk);
* data is synchronized between the two structures efficiently, in batches
Mention that it is optimized for writes
Log Structured Merge Trees are used in database storage engines
* Used in Lucene, LevelDB, HBase, SQLite4, Apache Cassandra, Tarantool (mail.ru)
* the data is kept in multiple different structures, each is optimized for different storages (memory/disk);
* data is synchronized between the two structures efficiently, in batches
Mention that it is optimized for writes
Log Structured Merge Trees are used in database storage engines
* Used in Lucene, LevelDB, HBase, SQLite4, Apache Cassandra, Tarantool (mail.ru)
* the data is kept in multiple different structures, each is optimized for different storages (memory/disk);
* data is synchronized between the two structures efficiently, in batches
Mention that it is optimized for writes
Log Structured Merge Trees are used in database storage engines
* Used in Lucene, LevelDB, HBase, SQLite4, Apache Cassandra, Tarantool (mail.ru)
* the data is kept in multiple different structures, each is optimized for different storages (memory/disk);
* data is synchronized between the two structures efficiently, in batches
Mention that it is optimized for writes
Log Structured Merge Trees are used in database storage engines
* Used in Lucene, LevelDB, HBase, SQLite4, Apache Cassandra, Tarantool (mail.ru)
* the data is kept in multiple different structures, each is optimized for different storages (memory/disk);
* data is synchronized between the two structures efficiently, in batches
Reading requires searching ALL SSTables because
Since we store data mutation operations, we may have an insert
followed by delete operation, where delete would cancel the originally inserted record
Compaction of SSTables is O(n*S*log(S)) operation
(n -> amount of SSTables, S is count of key/values in each)
Note:
* the compaction is usually asynchronous
* multi-level compaction can be implemented
* But it adds strain to system resources! (Memory, CPUs, Storage)
Mention that it is optimized for writes
Log Structured Merge Trees are used in database storage engines
* Used in Lucene, LevelDB, HBase, SQLite4, Apache Cassandra, Tarantool (mail.ru)
* the data is kept in multiple different structures, each is optimized for different storages (memory/disk);
* data is synchronized between the two structures efficiently, in batches
Usually, there are different types of compaction strategies are possible
For example - LSM compaction strategies in Cassandra
Mention this is a solution!
Since we are talking about storage related algorithms,
There is another thing to consider!
ACID – if a database supports it, it’s basically –
You will lose data only if your server burns down
So, what’s to talk about them?
Except that not all databases support them…
Mention that WAL is used to implement Atomicity and Durability
Explain how it works (in general terms)
Any database that uses WAL benefits from good no-cache performance
We can test for no-cache writes using tools like ATTO Disk Benchmark
There is a big difference between regular write and a “no-cache” write!
* Note that the larger the write batch, the more efficient it gets
- Mention the metaphor of moving people in cars vs moving in them buses
* Note the discrepancy - for 1MB writes
7.95 GB/s vs. 313.22 MB/s
Need to EXPLICITLY say what kind of query we are doing:
Select ALL ORDERS that were sent to Paris or Lyon AND ordered after a *certain* day
we want to be able to query Orders by Shipping city and country
Let’s see an example WHY it is important…
We do index scan, so everything is good…
This is not so good! Because collection scans have linear complexity!
Thus, we can only query by first field OR by both fields
Thus, we can only query by first field OR by both fields
Beware of collection scans!
Since the index is TYPICALLY stored as a tree (can't think of a case where it isn't a tree!)
- ROUGHLY we would have O(log(n)) complexity
don't forget to mention:
searching an index is O(log(n)) EXCEPT for special cases like hash indexes or TRIE
2) the mentioned complexity DEPENDS on a query plan of course
3) M + N is complexity of intersection between two query results
Independent: all data required to process a document is stored within the document itself
Well, not exactly indexing, but still…
MongoDB map/reduce - find how many blogs one author has with its first name and last name.
Note that such operation will do allocations and will take CPU resources
It needs to reflect business logic AND be as efficient as possible
Explain about RavenDB indexes that are defined via LINQ
The same thing with RavenDB indexes.
Such index is flexible but it will do allocations and will take CPU resources
Discrepancy over request/response latency and time spent on server + total response size
(3MB response took 3 seconds to download)
The solution was to specify server-side projection
So ONLY needed information is sent over the wire