WiredTiger is a new open source database engine designed for modern hardware and big data workloads. It offers high performance, low latency, and cost efficiency through its multi-core scalability, flexible storage formats including row and column stores, and non-locking concurrency control algorithms. WiredTiger's founders have decades of experience with database internals and its design is optimized for consistency, adaptability, and maximizing hardware resources.
2. Data Centers are expensive
Company Location Data Center Cost Data Center Size MW
NSA Camp Williams, UT $2B 133
Apple Maiden, NC $1B 67
Internet Villages Annandale, Scot. $1.6B 107
Lockerbie DC Lockerbie, Scotland $1.5B 100
Social Security Baltimore, MD $400M 27
Next Generation Data Wales, UK $300M 20
Facebook Princeville, OR $215M 15
3. WiredTiger Mission
WiredTiger is rethinking data
management for modern hardware
with a focus on multi-core scalability
and maximizing the value of every
byte of RAM.
5. A New Data Management Engine
● Architected for modern computer systems
● Scalable and able to handle big data
● High throughput, consistent low latency
● Row-store, column-store, log structured merge
● ACID transactions, standard isolation levels
● Checkpoint and fine-grained durability
● Supporting columns, indices, projections
● Production quality, fully supported
● NoSQL, Open Source
6. Flexible Storage
● Access methods tailored to workload
o Row store (read mostly of all columns)
o Column store (read mostly of some columns)
o Log-structured merge trees (mostly random writes)
● Compact storage format
o RLE, key-prefix, dictionary and static compression
o Stream compression
● Adapt workload to storage (RAM, SSD, HDD)
7. Flexible Configuration
● API offers a simple key/value store, or
● A complete schema layer
o Specify data types
o Map columns to files
o Automatically maintain indices
o Queries only read required columns
o Projections, index-only scans
● Checkpoint or fine-grained durability
8. Improved Efficiency
● Higher CPU Utilization
o Multi-core scalability
o Minimize contention
between threads
o Non-locking
algorithms
o Hazard pointers
● Lower Power Costs
● Flash Optimized Block
Layout
9. Consistent High Performance
● In-cache or I/O bound
● Workload Configuration
o Efficient sparse data
(column-store)
o Bounded queries and
updates (row-store)
o Write-optimized
(LSM)
● Data structures for
access at RAM speed
10. Consistent Low Latency
● Non-locking algorithms
● Multi-versioned data
● Optimistic concurrency
control
● Deadlock-free
transactions
● I/O shifted to
background threads
11. Cost Effective
Metric
iiBench run cost $6.44 $12.88
Cost per Billion
$20.30 $40.60
inserts*
● WiredTiger provides a 50% cost savings for the same AWS workload
● More details on this benchmark are available here.
13. Management Team
Keith Bostic is a founder and architect at WiredTiger. He was a founder of Sleepycat Software,
(acquired by Oracle Corp. in 2006), and one of the architects of the Berkeley DB, the most widely-used
embedded data management software in the world.
Mr. Bostic was one of architects of the University of California, Berkeley, 2.10BSD and 4BSD releases,
where he lead the 4BSD release Open Source effort. He is the recipient of a USENIX Association
Lifetime Achievement Award (The Flame), which recognizes singular contributions to the UNIX
community.
Dr. Michael Cahill is a founder and architect at WiredTiger. He was an architect of Berkeley DB at
Sleepycat Software and Oracle Corp., responsible for design and implementation of multiversion
concurrency control, as well as SQL interfaces and programming language APIs. Previously, Dr.
Cahill was CTO at Bullant Technology, which grew tenfold and raised over US$30 million from
investors including Intel Capital and JP Morgan during his three year tenure.
Dr. Cahill’s PhD from the University of Sydney is in the area of transaction processing and
concurrency control. His work on a new algorithm for implementing serializable isolation received an
ACM SIGMOD Best Paper award and was added to PostgreSQL 9.1.
14. Summary and Next Steps
We’d like to discuss how we could help you
with your solution.
Thanks! Questions? info@wiredtiger.com
Hinweis der Redaktion
The best number available to estimate the cost of a data center is the number of power supplies: that number determines heating and cooling costs, as well as hardware and software (license units) costs.While the number of CPUs per power supply continues to increase, CPUs are no longer getting faster, and at the data center level we need to look at software efficiencies to gain further scale beyond what the hardware can deliver. For the foreseeable future, multi-core scaling is key to better performance and increased efficiency.Common indexing technology in use today was written for computer architectures of the early 1990s, better software efficiency yields huge benefits
WiredTiger is focused on single-node data management in service of high-end applications, improving application scalability and efficiency via software innovation.
WiredTiger is entirely focused on single-node resource cost per transaction.WiredTiger does not include data distribution or other horizontal scaling software. WiredTiger is intended for applications running on a single node which require the maximum possible performance from the indexing technology, or as a storage technology for applications supporting their own horizontal scaling solutions.
Row-store is a traditional database object, where keys are byte strings and all columns of a row are stored together, best for read-mostly workloads where all columns are equally valuable. Column-store groups columns in storage and only the necessary columns are read to satisfy a query. Log-structured merge trees (LSM) support high-speed random inserts, at the cost of slower reads. WiredTiger supports all three access methods and the access methods can be combined (for example, a sparse, wide table configured with a column-store primary, where indexes are stored in an LSM tree).WiredTiger supports a large number of compression algorithms:RLE: run-length encoding when columns repeatKey-prefix: Btree key-prefix compressionDictionary: unique columns only stored once per write blockStatic: Huffman encodingStream: pluggable stream compression (for example, snappy or zlib); because WiredTiger supports variable-length blocks, stream compression can be applied in all cases, unlike engines where compression must operate in block-sized units.
Unlike other indexing technologies, for example LevelDB and InnoDB, WiredTiger scales linearly as additional cores are added.
iiBench is a standard benchmark used to measure MySQL performance. Compared to InnoDB WiredTiger showed consistently better query rates . . .
. . . and much more consistent latency as you scale rows in the data-store.
The ultimate benefit to the customer is reduced cost. This chart shows the cost of a billion inserts on an Amazon Web Services instance for the popular engine InnoDB versus WiredTiger: WiredTiger returns twice the performance on a typical AWS instance.