BlackRay is an in-memory database that provides both relational and search engine capabilities. Recent updates include a new SQL parser using Lemon/Quex and initial support for user defined functions. The roadmap includes improved real-time updating, subqueries, and aggregate functions. The team behind BlackRay is working to improve performance, scalability, and platform support going forward.
2. 2 FOSS Asia 2010
The State of the Engine
➔
Brief Technology Overview
➔
New SQL Parser (lemon/quex)
➔
User Defined Functions
➔
BlackRay as a storage engine
➔
Outlook: Realtime Data Updates
4. 4 FOSS Asia 2010
What is BlackRay?
●
BlackRay is a relational, in-memory database
●
Supports SQL, utilizes PostgreSQL drivers
●
Fulltext (Tokenized) Search in Text fields
●
Object-Oriented API Support
●
Persistence via Files, Transaction support
●
Scalable and Fault Tolerant
●
Open Source, Open Community
●
Available under the GPLv2
5. 5 FOSS Asia 2010
Current Release
Current 0.10.0 – Released December 2009
●
Complete rewrite of SQL Parser (boost::spirit2)
●
PostgreSQL client compatibility (via network protocol) to
allow JDBC/ODBC... via PostgreSQL driver
●
Rewritten CLI tools
●
Major bugfixes (potential memory leaks)
●
Better Authentication suppor for Instances
7. 7 FOSS Asia 2010
Why call it Data Engine?
●
BlackRay is a hybrid between a relational database
and a search engine thus we call it „→ data engine“
●
Database features:
●
Relational structure, with Join between tables
●
Wildcards and index functions
●
SQL and JDBC/ODBC
●
Search Engine Features
●
Fulltext retrieval (token index)
●
Phonetic and similar approximation search
●
Extremely low latency
8. 8 FOSS Asia 2010
BlackRay Architecture
C++ API
Java API
Management
Server
Instance
Server
Data Universe
(RAM Resident)
<
Redo Log
Snapshots
SQL
Interface
Postgres*
Clients
L5: Multi-Values
L4: Multi-Tokens
L5: Multi-Values
L3: Row Index
L2: Postings
L1: Dictionary
5-Perspective Index
Python API
PHP API
Python API
C# API
9. 9 FOSS Asia 2010
Data Universe
●
BlackRay features a 5-Perspective Index
●
Layer 1: Dictionary
●
Layer 2: Postings
●
Layer 3: Row Index
●
Layer 4: Multi-Token Layer
●
Layer 5: Multi-Value Layer
●
Layer 1 and 2 comprise a fully inverted Index
●
Statistics in this Index used for Query Plan Building
●
All data - index and raw output - are held in memory
10. 10 FOSS Asia 2010
Core BlackRay Features
●
Standard loaders enable high performance loading
of data into tables
●
Persistence is done via file based snapshots
●
Snapshots enable data versioning and simple
backups
●
Basic ACID Transaction complianc is implemented
in BlackRay, without crash recovery support.
11. 11 FOSS Asia 2010
Query Interfaces
●
BlackRay implements the PostgreSQL server socket
interface and binary APIs in Java, C++ and Python
●
PostgreSQL compatible drivers can be utilized
against BlackRay (JDBC/ODBC)
●
Native API enables object oriented data access
●
Performance of native APIs currently is substantially
better than SQL via PostgreSQL drivers
●
Dynamic query building is very efficient with native
APIs
13. 13 FOSS Asia 2010
A New SQL Parser - again?
●
The 0.10 release included a much improved SQL
parser, built with boost::spirit
●
Quite solid, fast and simple to use
●
However, boost deprecates spirit1
●
boost::spirit2 is not compatible to spirit1, requiring a
rewrite anyways
●
Our impression: spirit2 requires too many resources
and large grammars result in huge generated files
●
Also: spirit and C++ templates do not mix well
14. 14 FOSS Asia 2010
What would be a better choice?
●
Flex/Bison:
●
The obvious choice of MySQL and PostgreSQL
●
Two-step compile process, generates C not C++
●
No Unicode support
●
ANTLR:
●
Odd grammar rules, not optimal for C++
●
Recursive Descent parsers are not suited for SQL
●
Lemon/QUEX
●
Our new choice ;)
15. 15 FOSS Asia 2010
Lemon/Quex: Our experience
●
Lemon:
●
Lemon is part of SQLite
●
Much more intuitive syntax than Flex syntax
●
Quex:
●
Generates tokenizers in C++
●
Unicode and external Parser support
●
Partially buggy but all issues were fixed witihn days
●
Synopsis: Lemon/Quex are like Bison/Flex, just with
Unicode and C++ support and maybe easier to
debug
16. 16 FOSS Asia 2010
Current Progress
●
Basic SQL Features are ported from spirit to
Lemon/Quex
●
The „issue-77“ branch contains all recent SQL
parser code
●
Unit-Testing and Database level testing very solid
●
Will be part of the 0.11 release
17. 17 FOSS Asia 2010
Recent Additions
●
Support for simple (single column) User Defined
Functions is now complete
●
Query portion (no subselect, no aggregate
functions) is very stable
●
Data Definition Language was added recently
●
CREATE SCHEMA
●
CREATE TABLE
●
ALTER TABLE
●
Index is created dynamically, so no CREATE INDEX
required
19. 19 FOSS Asia 2010
User Defined Functions
●
BlackRay was designed with support for Index
functions that operate on data in tables
●
Functions pre-compute index results, improving
speed and enabling queries that are not possible
otherwise
●
Functions are called on data load, and also on
queries.
●
Functions must not maintain state outside of tables
of the same instance they operate on.
20. 20 FOSS Asia 2010
A Sample Function
●
Using functions in BlackRay
SELECT name FROM employee_table WHERE
fx_phonetic (name) = 'mike';
●
Functions need to be loaded beforehand:
CREATE FUNCTION fx_phonetic(varchar,
varchar) RETURNS int AS
'DIRECTORY/funcs', 'phonetic' ;
●
The function must implement the BlackRay default
function signature, which is almost identical to the
MySQL and PGSQL signatures
21. 21 FOSS Asia 2010
Current State
●
User Defined Function Repository fully implemented
●
All built-in functions ported to be compatible to User
Defined Functions
●
SQL support for User Defined Functions under way
●
Will be part of the 0.11 release
23. 23 FOSS Asia 2010
Why even bother?
●
In Fall 2009, we embarked on a little adventure to
implement BlackRay as a storage engine
●
The old Engine had only a minimal SQL interface
and we lacked the expertise to build it ourselves
●
Plugging into the MySQL ecosystem seemed like a
very pleasant choice
●
The features of BlackRay would make it a good
query cache for large disk tables.....
24. 24 FOSS Asia 2010
Our First Problem
BlackRay does not support a simple table scan.....
●
It may seem strange, but due to it's design as an in-
memory index, we do not separate table and index
●
Each column index basically is the data of the column
●
BlackRay distinguishes select and output columns, both
of which remain in RAM
●
The index therefore was never designed to be forced
back into a row format, for a simple table walk
25. 25 FOSS Asia 2010
Possible Solution?
So, we can walk the Index instead?
●
Rather than scanning the table, it is possible to scan the
index instead
●
This only works for the columns markes „searchable“
●
Causes nasty errors when trying to select against result-
only columns
●
In tokenized index columns, getting the data back out
means concatenation with a blank between values – not
nice, as tokenizing can follow complex rules
●
Requires Refactoring of our Layer 3 (Row-Index)
26. 26 FOSS Asia 2010
Next Issue
Optimizing Queries
●
The BlackRay Optimizer uses the Layer 1/2 (Inverted
Index) and Layer 4 (Multi-Tokens) Data to chose a Query
Path
●
In BlackRay „SELECT text FROM t WHERE text LIKE
'*pattern1*' AND text LIKE '*pattern2* is extremely
efficient as the inverted index has all the data
●
Even with OR this is an efficient Query, due to the fact
that we can immediately chose the smaller query first
and eliminate double matches
27. 27 FOSS Asia 2010
Next Issue
●
Optimizing in the Storage Engine Interface?
●
In BlackRay, the Optimizer uses the AST from the SQL
Parser to figure out what to optimize
●
Based on a field or single Index level, the number of
matches really are not useful
●
Without utilizing the Layer2 and Layer4 structures, we
lose performance by several orders of magnitude
●
Personal Opinion: The MySQL Optimizer really seems to
like table scans, and tricking it with random vs
sequential read cost did not do the trick
28. 28 FOSS Asia 2010
Functions in the Index
●
Columns can take functions to be used on the data
upon indexing, and when select is carried out
●
The most common functions are
– TOKENIZE – to support multi-token indexes
– PHONETIC – match against defined phonetic rules
– ALIAS – match a token against words with similar meaning
●
Internally these functions could be considered
Meta-Columns on the Index
●
To be able to chose the proper column, we need to
know what function was used in the select
29. 29 FOSS Asia 2010
Functions in the Index
Consider this Query:
SELECT text FROM t WHERE fx_phonetic(text)
LIKE 'maier%';
●
Functions can take more than one parameter, and
may be nested
●
We could not quite figure out how to explain this to
the MySQL Parser
●
The function data would need to be available to the
Index to chose where to look
30. 30 FOSS Asia 2010
Threading Models...
●
BlackRay has a highly optimized Threading Model
●
In RAM, we do not expect I/O-waits, so a model of
two dedicated Threads per CPU core works really
well
●
Locking in the Index is built around this model
●
„One Thread per Conection“ requires at least a
careful review of the way critical data structures are
accessed
31. 31 FOSS Asia 2010
.... Our Conclusion
●
Currently, BlackRay really does not fit too well into
the storage engingine architecture
●
Did we lose all hope? Absolutely not.....
●
BlackRay Applications could really benefit from
being able to utilize MySQL features, including the
Archive Engine as well as temporary tables in Heap
●
Thanks to the excellent Blog and postings by Brian
Aker, which allowed us to not make all beginner
mistakes ourselves
33. 33 FOSS Asia 2010
Current Challenges
●
Bulk Updates
●
BlackRay supports Insert and Delete via the Bulk Loader
●
Updates are done via Insert & Delete
●
Insert/Delete via API
●
An API exists for Insert/Delete
●
The Insert/Delete API is separate from the Query API
●
Both APIs cannot be used in the same Thread
●
Insert/Delete via SQL
●
Currently Insert/Delete are not available via SQL
34. 34 FOSS Asia 2010
Supporting Insert/Delete
●
Pull together Insert/Delete and Query APIs
●
Take out the separate APIs
●
Unified API will then support transactions
●
Enable Insert/Delete via SQL
●
Extend the SQL Grammar to include INSERT/DELETE
●
Implement the functions via the unified API
●
The Bulk Loader and SQL
●
Rewrite of the Bulk Loader to utilize the unified API,
rather than SQL
35. 35 FOSS Asia 2010
Performance Impact
●
Insert and Delete has a severe performance impact
on parallel queries
●
Locking needs to be utilized to ensure transactional
integrity, causing queries to stall on data
modification
●
Currently BlackRay uses sorted lists for the data
ductionary and the other index layers
●
For indeces that have frequent changes, it may be
much more desirable to utilize other basic data
structures underneath the index
37. 37 FOSS Asia 2010
Immediate Roadmap
●
Planned 0.11.0 – Due in Fall 2010
●
Pluggable Function architecture (loadable libraries)
●
Make all index functions available in SQL
●
Support for Prepared Statements (ODBC/JDBC)
●
Improved thread and memory management (Perftools?)
●
BlackRay Admin Console (Remora) 0.11
●
Engine Statistics via GUI
●
Cluster Node management
38. 38 FOSS Asia 2010
Shortterm Roadmap
●
Planned 0.12.0 – Due in February 2012
●
Realtime INSERT/UPDATE/DELETE
●
SQL to support subselect
●
Default aggregate functions (SUM/AVG/....)
●
Fix several potential memory leaks (smart pointers)
●
The 0.12 release should be the last pre-GA release
39. 39 FOSS Asia 2010
Midterm Roadmap
●
Scalability Features
●
Sharding & Partitioning Options
●
Federated Search
●
Fully portable snapshot format (across platforms)
●
Query Performance Analyzer
●
Improved Statistics Module with GUI
●
BlackRay as a Storage Backend for SUN OpenDS
LDAP Engine
40. 40 FOSS Asia 2010
Midterm Roadmap
●
Security Features
●
Improved User and Access Control concepts
●
SSL for all connections
●
External User Store (LDAP/OpenSSO/PAM...)
●
Increased Platform support
●
Windows 7 and Windows Server platforms
●
Embedded platforms
●
Other, random features by popular request.
42. 42 FOSS Asia 2010
SoftMethod GmbH
●
SoftMethod GmbH initiated the project in 2005
●
Company was founded in 2004 and currently has
10 employees
●
Focus of SoftMethod is high performance software
engineering
●
Product portfolio includes telco/contact center and
LDAP applications
●
SoftMethod also offers load testing and technical
software quality assurance support.
43. 43 FOSS Asia 2010
Development Team
●
Felix Schupp, Initiator and Project Sponsor
●
Thomas Wunschel, Architect and Lead Developer
●
Mike Alexeev, Key Contributor (SQL/Functions)
●
Souvik Roy, Performance Analysis and Tools
●
Simon Courtenage, C++ and boost expert
45. 45 FOSS Asia 2010
What to do next
●
Get BlackRay:
●
Register yourself on http://forge.softmethod.de
●
SVN checkout available at
http://svn.softmethod.de/opensource/blackray/trunk
●
Get Involved
●
Anyone can register and create tickets, news etc
●
We have an active mailing list for discussion as well
●
Contribute
●
We require a signed Contributor agreement before being
allowed commit access to the repository