Datalevin London-meetup2020

Datalevin
A simple, fast and free Datalog database for everyone
Huahai Yang, Ph.D.
Juji, Inc.
September 22, 2020

Background & Motivation
• Juji is a conversational AI company
• Conversational data query (NLDB)
• Upload a CSV file, then query it
• Natural language => database query
• Context sensitive
• My previous research in NLDB
convinced me:
• NLDB is more of a DB problem, than a NL
problem
• Data themselves provide the best context
• Better DB is the key

Database Design Goals
• Datalog is the best target query
language for NLDB
• Declarative
• Composable
• Amicable for code generation
• In-process embedded use
• Bulk writes, frequent reads
• Multiple DB paradigms
• Transparent data replication

Datalevin Design Principle - Simplicity
• Simple to use
• Just a library, add to deps, and start coding
• Simply require a different namespace to get a different DB paradigm
• Current: Key-value, Datalog
• Future: Graph, Document
• Simple to operate
• No need for complex ops: setup, backup and recovery should be dead simple
• No DB maintenance threads or processes
• No need for performance tuning
• Simple to scale
• Just provision more physical resources

Why fork Datascript?
• Datascript is a great baseline Datalog implementation
• Comprehensive test coverage
• Well maintained code base
• Similar API to Datomic
• Lots of users
• We have very different goals from the alternatives
• No interest in building a Datomic clone
• Focus on query performance
• We have plans to go far beyond NLDB
• Juji Slogan for AI: “Symbolic as the bones, machine learning as the flesh”
• High performance graph database is the basis of symbolic AI of the future

Roles of Database
• Operational
• Database as the surrogate of the external world
• ACID is derived from this use: to maintain the illusion of external world
• Primary, necessary for most use cases
• Focus on present, OLTP
• Archival
• Database as a recording of events and facts
• Don’t need ACID, eventual consistency is fine
• Secondary, necessary for many use cases, but not all
• Focus on provenance and history, OLAP

Merging operational and archival DB is hard
• More stringent performance
requirements
• History has more data than present
• More Complex APIs
• Need to deal with history
• Need to distinguish history and present
• More complex user mental model
• More things to consider -> less simple
• Mind needs to forget to work properly
• Hyperthymesia is a painful condition

Operational DB should be stateful
• In people’s mind, external world is stateful
• Wrong assumption of time model is one of the main
sources of immutable DB programming errors
• “Why do I get the wrong data with this query?”
• “I have to sort by transaction id to get the latest
version?”
• Datalevin is an operational database
• meant to be embedded in applications to manage state

Datalevin Architecture
• LMDB key value store as the
storage
• Optimized Clojure API for
LMDB
• EAV index on top of key-value
• User-facing API on top
• Key-value
• EAV index access
• Datalog
LMDB
Key Value Processing
Key-value
API
Index
Access
API
Datalog
API
EAV Index Processing

LMDB Features
• Lightning Memory Mapped DB
• ACID key value database
• DB is a memory mapped file
• Use OS filesystem cache
• B+ tree, optimized for read
• The fastest key value store for read
• Performs well in writing large values
(>2KB)
• Works on bytes, support range query
• Support multiple independent tables
(DBI)

LMDB Design
• Read and write transactions
• Single writer
• Many concurrent readers (MVCC)
• No locks on read
• Linear scale by reader threads
• Copy on write
• Similar to immutable data structure
• Reclaim obsolete pages
• Read/write do not block each other

Datalevin Optimizations
• Read transaction pool
• Avoid cost of allocating read transactions
• Pre-allocate off-heap buffers in JVM
• Write buffer (one per DBI)
• Read buffer
• Range query start and end buffers
• Auto-resize value buffers
• Re-allocate on overflows
• Auto-resize DB size
• LMDB needs to specify total DB size

Datalevin Key-Value API
• Open/close LMDB
• Open/clear/drop DBI
• Transact key-values as a batch
• :put, :del
• Fetch single value
• get-value, get-first
• Range query
• get-range
• Predicate filtering
• get-some, range-filter
• Counts
• entries, range-count, range-filter-count

EAV Indexing Processing
• Entity-Attribute-Value data model
• Versatile
• relational model: entity = tuple, attribute = column, value = value
• graph model: entity = node, attribute = edge, value = node (ref)
• RDF triple: entity = subject, attribute = predicate, value = object
• The triple is called a “datom”
• Cover indices
• EAV: row oriented index, all datoms
• AEV: column oriented index, all datoms
• AVE: support attribute range query, all datoms
• VAE: graph reverse index, only for reference type datoms

Index Storage
• In memory indices as cache
• Inherits Datascript’s persistent sorted sets
• On disk indices as permanent storage
• Binary encoded datoms into key-values
• LMDB’s key size is fixed at compile time, default: 511 bytes
• Each index is stored in its own DBI
• Key (up to 511 bytes)
• Small value: encoded datom
• Large value: encoded datom with (truncated value + hash) to support range query
• Value (8 bytes)
• Small: a sentinel long, indicating small value
• Large: a long reference to the key of the full datom in the “giant” DBI

Datom Index Disk Format
• Attribute id (aid): binary encoded 32 bit integer
• Entity id (eid): binary encoded 64 bit long
• Value:
• Data type header byte, use disallowed bytes in UTF-8
• Data types: int, long, id, boolean, float, double, byte, bytes, keyword, symbol, instant, uuid
• Potentially truncated prefix bytes of the value
• Each value data type is encoded differently to ensure: bitwise order = value order
• If truncated, a truncator byte
• If truncated, a 32 bit Clojure hash of the value
• A separator byte
aid eid
header separator
hash
truncator
value
511 bytes key

• Giants
• For large values, the full datoms are stored in a giant DBI
• append-only, fast write
• Key: auto-incremental long (gid)
• Value: serialized full datom
• Schema
• Stored in a schema DBI
• Key: attribute name
• Value: serialized Clojure map of attribute properties
• TODO: non-trivial schema migration
More Disk Storage Details

Datalog Query
• Retain most Datascript query logic
• Search on-disk indices instead of in-memory cache
• Leverage indices that Datascript does not enable: AVET and VAET
• Adopted a few performance optimization PRs that Datascript did not
merge
• Cache all on-disk indices access API call results in a LRU cache
• Main reason for the speed advantage shown in query benchmarks
• TODO: move to a more performant query engine
• Datascript query engine does hash joins on returned full datoms
• Nested maps should do less work and be more performant

Datalog Transaction
• Retain Datascript transaction logic
• Reads during transaction: first search in-memory cache, then search on disk
• Transact to in-memory cache
• Identical to Datascript
• Cache content is lost when DB restarts
• Transact to disk storage
• Collect transacted datoms, commit them as a batch
• Sync to disk after each transaction
• Clear on-disk index access cache after a transaction

Status
• Index Access API is identical to Datascript
• Missing feature from Datascript
• Composite tuples (TODO)
• Persisted transaction functions (TODO)
• Features that make sense for in-memory DB (Maybe)
• DB serialization
• DB pretty print

Benchmark: Write
• 100K entities of random people
information
• Bulk load of datoms is fast
• Bulk transaction is fast too
• Transacting small number of
datoms is slow
• Advise: batch as much as possible
data in a transaction

Benchmark: Read
• Datalevin is faster than
Datascript across the board for
all tested Datalog queries

Benchmark: Multi-threads Read
• Does LMDB claim of linear scale by
reader threads hold?
• Yes
• Is Datalevin able to keep the same?
• Yes

Roadmap
• 0.4.0 Distributed mode with raft based replication
• 0.5.0 New Datalog query engine with an optimizer
• 0.6.0 Automatic schema migration
• 0.7.0 Datalog query parity with Datascript
• 0.8.0 Implement loom graph protocols
• 0.9.0 Auto indexing of document fields
• 1.0.0 Materialized views and incremental maintenance

Thank you! Question?
Huahai Yang
https://github.com/huahaiy
@huahaiy
https://juji.io

Datalevin London-meetup2020

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Empfohlen

Empfohlen (20)

Datalevin London-meetup2020