2. Speed of light vs spinning metal
What Time Scale
L1 Cache 0.5 ns 0.008 2 m
L2 Cache 7 ns 0.23
RAM 60 ns 1 240 m 1 second
1K over Gbit network 10 µs 167 2.5 minutes
4K read SSD 150 µs 2500
Rotating disk seek 10 ms 167000 40000 km 46 hours
6. B-trees and Transactions
LOG
DATA 64KB blocks w 8x8KB pages
Logical BTREE of 8kb data pages
In the buffer pool (cache)
Buffer
Manager
Transactions append inserted, deleted, original and modified pages to the LOG
CHECKPOINT
7. SQL Server In-memory OLTP
SP
Logging
Latches
Locks
Buffer Manager I/O
Native compiled SPs
Minimal Logging and
checkpointing
Lock-free data structures
Multi-version Currency
control
In Memory and Memory
optimized data structures
8. Sql Server In-memory OLTP
• Heka = Greek for 100
• Transparent for application
• Integrates with Sql Server but with limitations
• Enterprise license
• 5-30x performance gain
18. Complete history of events
• Point in time
• Debugging
• Restore
• Queries
• Audit trail
• New interpretations
19. Example – the model
[Serializable]
public class CommerceModel : Model
{
internal SortedDictionary<Guid, Customer> Customers { get; set; }
internal SortedDictionary<Guid, Order> Orders { get; set; }
internal SortedDictionary<Guid, Product> Products { get; set; }
public CommerceModel()
{
Customers = new SortedDictionary<Guid, Customer>();
Orders = new SortedDictionary<Guid, Order>();
Products = new SortedDictionary<Guid, Product>();
}
}
20. Command
[Serializable]
public class AddCustomer : Command<CommerceModel>
{
public readonly Guid Id;
public readonly string Name;
public AddCustomer(Guid id, String name)
{
Id = id;
Name = name;
}
public override void Execute(CommerceModel model)
{
if (model.Customers.ContainsKey(Id)) Abort("Duplicate customer id");
var customer = new Customer {Id = Id, Name = Name};
model.Customers.Add(Id, customer);
}
}
21. Query
[Serializable]
public class CustomerById : Query<CommerceModel, CustomerView>
{
public readonly Guid Id;
public CustomerById(Guid id)
{
Id = id;
}
public override CustomerView Execute(CommerceModel model)
{
if (!model.Customers.ContainsKey(Id)) throw new Exception("no such customer");
return new CustomerView(model.Customers[Id]);
}
}
22. Start your engines!
static void Main(string[] args)
{
var engine = Engine.For<CommerceModel>();
Guid id = Guid.NewGuid();
var customerCommand = new AddCustomer(id, "Homer");
engine.Execute(customerCommand);
var customerView = engine.Execute(new CustomerById(id));
Console.WriteLine(customerView.Name);
Console.WriteLine("{0} orders", customerView.OrderIds.Count);
Console.ReadLine();
}
Hi I’m Robert from Devrex Labs. We’re a small startup based in Sweden building OrigoDB, an in-memory database for .NET.
Goals: Compare architecture, give a feel, not hands on use, you can figure that out.
Why is memory so much faster?
200 meters to the convenience store down the block and back is 400 meters.
40000 km = circumference of the earth
To give you some perspective...
So why isn’t in-memory the default? Next slide...
99% of all OLTP databases are < 1TB – Michael Stonebraker
https://aws.amazon.com/ec2/instance-types/
R3.8xlarge 32 cores, 244 GB
Azure 112GB
Lot’s of buzz and claims. NEW SQL
Analytics, Transactions, Both or None
What is a database? Key/value store? Transactions? Queries?
OLAP vs. OLTP
SAP HANA Column Store
Let’s start with SQL Server..
RDBMS architecture conceived 70’s, implemented 80’s
How do we organize data on disk to get acceptable performance for general workloads?
Architected for disk access.
Let’s look at how it works
Logical structure of data pages.
B-TREE, 8kb block, buffer pool
Varje tabell är en B-TREE (om den inte är en HEAP), varje index är en b-tree
Effect logging – log the effect of the transaction = modified pages, new pages
Support rollback by including deleted pages and original version of modified page.
Memory optimized structures – no b-trees with 8kb blocks. Linked lists of data rows
Locking – rows, pages, extents. Read/write locks for data in transactions.
Latches – Concurrent transactions, data structures – b-trees.
Logging – No Effect logging, deleted, original, modified. Just row logging. Background process updates FILESTREAM based
SP’s transcompiled to C
No foreign keys
No outer joins
Not all datatypes supported
Measure performance
Compare with VoltDB – All in with in-memory. Redesigned from scratch. Command logging
Show Memory Optimization Wizard for adventureworks
Ok, time to look at contestant number 2.
Redis is a very popular in-memory key/value where the values are complex data structures, not just simple values.
Used by twitter for session data, cache
(twitter, flickr,github,digg,disqus,Instagram,stackoverflow)
AppFabric Cache is going away, Redis on Azure
Open source
Widespread
Drivers for almost all languages
Fast, optimized algorithms
Replication
Sharding
Complete example with source code in PHP at http://redis.io
Not faster but easier. Simplicity. Consistency. Testing.
MOVING DATA BACK AND FORTH
DUAL DOMAIN MODELS
MAPPING
COMPLEXITY
ADD CACHING BECAUSE TOO SLOW, EVEN MORE PROBLEMS
So let’s start off with a bit of theory.
Current state of a system is a function of the initial state and the sequence of operations applied to it.
Examples: Counter, increment, decrement, set, reset, read
Rubiks Cube
ACID –
Rubiks Cube
System scope can be a variable, a data structure, application or entire database.
Deterministic, side effect free operations
OrigoDB State is an object graph defined using NET types and collections
Restore at system start by replaying commands
One simple idea with many names and applications.
Describe each briefly
WAL – this is whats going on in your relational database. SQL Server writes to the transaction log
In-memory database engine/server
Code and data in same process
Write-ahead command logging and snapshots
Open Source single DLL for NET/Mono
Commercial server with mirror replication
In-memory
In-memory object graph, user defined. Probably collections, entities and references.
Your choice.
Is it a database? Is it an object database? Linq queries.
Toolkit
Flexible, configurable, kernels, storage, data model, persistence modes, formatting
Bring your own model. – this is key.
Usually a product based on a specific data model. VoltDB, Raven
Naming. LiveDomain -> LiveDB -> OrigoDB
Code and data in same process
Don’t do CRUD. It’s silly. ORMS are based on crud.
One of the first thing you learn is don’t do SELECT *. EF
Command logging
The in-memory data is a projection of the commands,
compare ES with a single aggregate. Same benefits as ES.
What is OrigoDB?
OrigoDB is an in-memory database toolkit. The core component is the Engine. The engine is 100% ACID, runs in-process and hosts a user defined data model. The data model can be domain specific or generic and is defined using plain old NET types. Persistence is based on snapshots and write-ahead command logging to the underlying storage.
The Model
is an instance of the user defined data model
lives in RAM only
is the data
is a projection of the entire sequence of commands applied to the initial model, usually empty.
can only be accessed through the engine
The Client
has no direct reference to the model
interacts directly with the Engine either in-process or remote
or indirectly via a proxy with the same interface as the model
passes query and command objects to the engine
The Engine
The Engine encapsulates an instance of the model and is responsible for atomicity, consistency, isolation and durability. It performs the following tasks:
writes commands to the journal
executes commands and queries
reads and writes snapshots
restores the model on startup
We call it a toolkit because you have a lot of options
Modelling - define your own model or use an existing one. Generic or domain specific. It’s up to you.
Storage - Default is FileStore. SqlStore or write your own module.
Data format - Choose wire and storage format by plugging in different IFormatter implementations. Binary, JSON, ProtoBuf, etc
Read more in the docs on Extensibility
Design goals
Our initial design goals were focused on rapid development, testability, simplicity, correctness, modularity, flexibility and extensibility. Performance was never a goal but running in-memory with memory optimized data structures outperforms any disk oriented system. But of course a lot of optimization is possible.
OrigoDB is a cousin of Event Sourcing.. The entire database is a single aggregrate and there is single stream om events, the commands that were executed.
Unless designed to partition it won’t scale out
An instance of the model IS the database.
Create your own domain specific model or choose a generic one.
An object IS a strongly typed graph. Constraints.
Guidelines
No side effects or external actions – like send an email
No external dependencies – like datetime.now, random
Unhandled exceptions trigger rollback (full restore)
Call Command.Abort() to signal exception or throw CommandAbortedException
Immutable is good
Serializable
Inheritance
Immutable
Point out View and why
Walk through the code
Explain Engine.For<T>();
Show the Geekstream site, search for ndc, ndc oslo, ode to code
Show the statistics page.
Show the solution explorer. Show model and a few commands, mention it’s on github.
So when do you use ...
OrigoDB when you can, SQL Server when you have to, REDIS if you have to.
The DISEASE and THE CURE – Partners in crime, sql + cache. CQRS is a symptom of
REDIS is cool and lightning fast but relatively limited representation. No querying, so often separate read models/persistence.
SQL Server – existing rdbms, existing operations, policy, infrastructure, licenses
OrigoDB – OLTP + OLAP Whenever data fits in RAM, can even use SQL Backing store.
Auditing
Debugging
Projections
Speed
100% ACID
Testability
Out of the box?
Degrees, not binary.
Trade-offs. Sacrifices for performance. READ_COMMITED is default isolation level.
Atomic? Failures within sql transaction do not roll back unless error level > 10
isolation - > inconsistency: Phantom reads, non-repeatable reads, dirty reads