SortaSQL is a proposal to add seamless horizontal scalability to SQL databases by using the filesystem to store and retrieve data. The SQL database would store metadata and handle queries, while an embedded key-value store manages record storage on files in the local or distributed filesystem. This allows queries to scale across many servers by letting the filesystem handle replication, performance and locking of distributed data files. The architecture involves an application communicating with PostgreSQL over SQL, which uses a SortaSQL plugin to retrieve rows from Kyoto Cabinet key-value files on the POSIX filesystem. Case studies at CloudFlare show how a 400GB per day dataset can be efficiently stored and queried at scale using this approach.
4. Scaling?
• What happens to joins when your data
doesn’t fit in memory?
• I only need get and set for my data
• Sharding is too hard/unreliable
• A “monopolistically competitive market”?
6. Proposal:
Let the Filesystem do the hard work
• RDBMS presents a full SQL interface to
applications, automatically accessing files to get
data as needed
• RDBMS stores metadata allowing it to find the
right data files
• Embedded key/value store handles the record
level storage, locking, caching, etc.
• FS (local or distributed) stores data and is
responsible for replication, performance, locking,
etc.
7. Major Wins
• Scales continuously from 1-100 servers (FS
permitting)
• Hot/cold storage hierarchy
• Allows ad-hoc queries via mature SQL
• Everyone already has built in bindings
8. Architecture
• Application Talks SQL to PostgreSQL
• PostgreSQL stores metadata
• Performs post processing on rows
retrieved from KC files
• KC files live on a POSIX filesystem
10. Big Table
“A Bigtable is a sparse, distributed, persistent
multidimensional sorted map.”
11. Multi-Dimensional
• Storing values as protocol
buffers allow for arbitrarily
complex maps
• Logic so that when maps
get too big, they are
promoted to top level KC
stores
12. Persistent and Sorted
• Any Key/Value store which allows for
binary values accessed via a B+Tree of keys
will do
• We use Kyoto Cabinet (successor to Tokyo
Cabinet)
13. Sparse
• Values can be arbitrarily different.
• NULLs are free (or cheap)
• Protocol Buffers again to the rescue.