SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
1
Presentation Agenda


➔
    Brief History
➔
    Technology Overview
➔
    Positioning towards other Projects
➔
    Roadmap
➔
    The Team
➔
    Wrap-Up



2
Brief BlackRay History
What is BlackRay?
●
        BlackRay is a relational, in-memory database
●
        SQL with JDBC and ODBC Driver support
●
        Fulltext (Tokenized) Search in Text fields
●
        Object-Oriented API Support
●
        Persistence via Files
●
        Scalable and Fault Tolerant
●
        Open Source, Open Community
●
        Available under the GPLv2

    4
BlackRay History
●
        Concept of BlackRay was developed in 1999 for a
        Web Phone Directory at Deutsche Telekom
●
        Development of current BlackRay started in 2005,
        as a Data Access Library
●
        First Production Use at Deutsche Telekom in 2006
●
        Evolution to Data Engine in 2007 and 2008
●
        Open Source under GPLv2 since June 2009




    5
Why Build Another Database?
●
        Rather unique set of requirements:
        ●
            Phone Directory with approx. 80 Million subscribers
        ●
            All queries in the 10-500 Millisecond range
        ●
            Approximately 2000 concurrent users
        ●
            Over 500 Queries per Second (sustained)
        ●
            Updates once a day may only take several minutes
        ●
            Index needs to be tokenized (SQL: CONTAINS)
        ●
            Phonetic search
        ●
            Extensive Wildcard queries (leading/midspan/trailing)


    6
Decision to Implement BlackRay
●
        Decision was formed in Mid 2005
●
        Designed as a lightweight data access engine
●
        Implementation in C++ for performance and
        maintainability
●
        APIs for access across the network from different
        languages (Java, C++)
●
        Feature set was designed for our specific business
        case in a particular project


    7
Current Release
●
        Current release 0.9.0 released on June 12th, 2009
●
        Production level quality
●
        All relevant index functions for large scale search
        applications
●
        APIs in C++ and Java fully functional
●
        SQL still only a small subset of the API functionality




    8
Some more details
●
        Written in C++
●
        Relies heavily on boost
●
        Compiles well with gcc and Sun Studio 12
●
        Behaves well on Linux, Solaris, OpenSolaris and
        MacOS
●
        Complete 64 Bit development and platform support
●
        Use of cmake for multi platform support



    9
Technology Overview
Why call it Data Engine?
●
     BlackRay is a hybrid between a relational database
     and a search engine → thus we call it „data engine“
●
     Database features:
     ●
         Relational structure, with Join between tables
     ●
         Wildcards and index functions
     ●
         SQL and JDBC/ODBC
●
     Search Engine Features
     ●
         Fulltext retrieval (token index)
     ●
         Phonetic and similar approximation search
     ●
         Extremely low latency
11
BlackRay Architecture

     Postgres*     SQL                Management
      Clients    Interface              Server

                                      Data Universe
     C++ API
                 Instance
                  Server            5-Perspective Index
     Java API
                                      L1: Dictionary
                 Redo Log              L2: Postings
 Python API
                                      L3: Row Index
                                     L4: Multi-Tokens
      C# API
                                     L5: Multi-Values
                        Snapshots          <
     PHP API


12
Hierarchical Model
●
     Each BlackRay node can hold many Instances
●
     One Management process per node
●
     Each Instance runs in a separate Process
●
     Instances are completely separated from each other
●
     Snapshots (for persistence) are taken on an
     Instance level
●
     In an Instance, Schemas and Tables can be created
●
     Within one Instance, Queries can span across
     Tables and Schemas
13
Getting Data Into BlackRay
●
     Once an Instane is created, data can be loaded
●
     Schemas, Tables, and Indexes are created using an
     XML description language
●
     Standard loader utility to load CSV data
●
     Bulk loading is done with logging disabled
●
     Indexing is done with maximum degree of
     parallelism depending on CPUs
●
     After all data is indexed, a snapshot can be taken


14
Basic Load Performance Data
●
     German yellowpage and whitepage data
     ●
         60 Million subscribers
     ●
         100 Million phone numbers
     ●
         Raw data approx 16GB
●
     Indexing performance
     ●
         Total index size 11GB
     ●
         Time to index: 40 Minutes, on dual 2GHz Xeon (Linux)
     ●
         Updates: 300MB, 200K rows in approx 5 minutes
●
     Time to load snapshot: 3.5 Minutes for 11GB

15
Data Universe
●
     BlackRay features a 5-Perspective Index
     ●
         Layer 1: Dictionary
     ●
         Layer 2: Postings
     ●
         Layer 3: Row Index
     ●
         Layer 4: Multi-Token Layer
     ●
         Layer 5: Multi-Value Layer
●
     Layer 1 and 2 comprise a fully inverted Index
●
     Statistics in this Index used for Query Plan Building
●
     All data - index and raw output - are held in memory

16
Snapshots and Data Versioning
●
     Persistence is done via file based snapshots
●
     Snapshots consist of all schemas in one instance
●
     Snapshots have a version number
●
     To make a backup of the data, simply copy the
     snapshot file to a backup media
●
     It is possible to load an older snapshot: Data is
     version controlled if older snapshots are stored
●
     Note: Currently Snapshots are not fully portable.


17
Transactions in BlackRay
●
     BlackRay supports transactions via a Redo Log
●
     All commands that modify data are logged if
     requested
●
     In case of a crash, the latest snapshot will be loaded
●
     Replay of the transaction log will then bring the
     database back to a consistent state
●
     Redo Log is eliminated when a snapshot is persisted
●
     For better performance snapshots should be taken
     periodically, ideally after each bulk update

18
Native Query APIs
●
     C++, Java and Python Object Oriented APIs are
     available
●
     Built with ICE (ZeroC) as the Network and Object
     Brokerage Protocol
●
     Query Objects are constructed using an Object
     Builder
●
     Execution via the network, Results as Objects
●
     Load balacing and failover built into the protocol
●
     Native support for Ruby, PHP and C# upcoming

19
Native Query APIs
●
     Queries can use any combination of OR, AND
●
     Index functions (phonetic, sysnonyms, stopwords)
     can be stacked
●
     Token search (fulltext) and wildcard are also
     supported
●
     Advantage of the APIs:
     ●
         Minimized overhead
     ●
         Very low latency
     ●
         High availability

20
Standard Query Interface
●
     BlackRay implements the PostgreSQL server socket
     interface
●
     All PostgreSQL compatible frontends can be utilized
     against BlackRay
●
     JDBC, ODBC, native drivers using socket interface
     are all supported
●
     Limitations: SQL is only a reasonable subset of
     SQL92, Metadata is not yet fully implemented...
●
     Currently not all index functions can be accessed
     via SQL statements
21
Management Features
●
     Management Server acts as central broker for all
     Instances
●
     Command line tools for administration
●
     SNMP management:
     ●
         Health check of Instances
     ●
         Statistics , including access counters and performance
         measurements




22
Clustering and Replication
●
     BlackRay supports a multi-node setup for fault
     tolerance
●
     Cluster have N query nodes and single update node
●
     All query nodes are equal and require the full index
     data available
●
     Index creation is done offline on the update node
●
     Query nodes can reload snapshots to receive
     updated data, via shared storage or local copy
●
     Load-balancing handled by native APIs

23
Administration Requirements
●
     In-Memory Databases require very little
     administrative tasks
●
     Configuration via one file per Instance
●
     Disk layout etc all are of no importance
●
     Backups are performed by copying snapshots
●
     Recovery is done by restoring a snapshot
●
     No daily administration required
●
     SNMP allows remote supervision with common tools


24
Positioning BlackRay
BlackRay and other Projects
●
     BlackRay is being positioned as a Query intensive
     database addition
●
     Ideally suited where updates are done in Bulk and
     searches outweigh updates by many orders of
     magnitude
●
     Wildcards come with little overhead: No overhead
     for trailing wildcard, some overhead for leading and
     midspan wildcard
●
     Good match when index functions such as phonetic
     or word rotation/position search combined with
     relational data are required
26
FOSS Projects with similar Goals
●
     Relational Databases (the usual suspects):
     ●
         MySQL, MariaDB, Drizzle....
     ●
         PostgreSQL...
     ●
         Many more alternatives, including embedded etc....
●
     Fulltext Search:
     ●
         Sphinx
     ●
         Lucene
●
     In-Memory Databases:
     ●
         FastDB
     ●
         HSQLDB/H2 (Java → Garbage collection issues....)
27
Commercial Alternatives
●
     Commercial In-Memory Databases
     ●
         ORACLE/TimesTen (Acquired by ORACLE in 2005)
     ●
         IBM/SolidDB (Acquired by IBM in 2007)
     ●
         VoltDB (No real data available as of yet)
     ●
         eXtremeDB (embedded use only)

●
     Dual-Licensed Alternatives
     ●
         CSQL (Open Source Version is severely crippled)
     ●
         MySQL with memcached

28
Is it the right thing for me?
●
     BlackRay is not designed as a 100% RDBMS
     replacement
●
     Questions to ask:
     ●
         Do I need ad-hoc data updates, or are updates done in
         bulk?
     ●
         How important are fulltext search and extensive
         wildcards?
     ●
         How large is my data? Gigabytes: OK. Terrabytes: Not yet
     ●
         Do I need a relational data model?
     ●
         Is SQL an important feature?

29
BlackRay will fit you well when...

     … searches outweigh updates
     … data is primarily updated in bulk
     … fulltext search is important
     … you have Gigabytes (not Terabytes) of data
     … index functions (phonetic …) are required
     … SQL is necessary
     … a relational data model is required (JOIN)
     … source code must be available
     … high availability/clustering is required
30
Project Roadmap
Immediate Roadmap
●
     Upcoming 0.10.0 – Due in December 2009
     ●
         Complete rewrite of SQL Parser (boost::spirit2)
     ●
         PostgreSQL client compatibility (via network protocol) to
         allow JDBC/ODBC... via PostgreSQL driver
     ●
         Rewritten CLI tools
     ●
         Major bugfixes (potential memory leaks)
     ●
         Authentication supported for Instances
●
     BlackRay Admin Console (Remora) 0.10



32
Immediate Roadmap
●
     Planned 0.11.0 – Due in February 2010
     ●
         Support for ad-hoc data manilulation via SQL
         (INSERT/UPDATE/DELETE)
     ●
         Aggregate functions for SELECT
     ●
         Make all index functions available in SQL
     ●
         Support for Prepared Statements (ODBC/JDBC)
     ●
         Improved thread and memory management (Perftools?)
●
     BlackRay Admin Console (Remora) 0.11
     ●
         Engine Statistics via GUI
     ●
         Cluster Node management
33
Midterm Roadmap
●
     Scalability Features
     ●
         Sharding & Partitioning Options
     ●
         Federated Search
●
     Fully portable snapshot format (across platforms)
●
     Query Performance Analyzer
●
     Improved Statistics Module with GUI
●
     Removal of ICE as a core component
●
     BlackRay as a Storage Backend for SUN OpenDS
     LDAP Engine
34
Midterm Roadmap
●
     Security Features
     ●
         Improved User and Access Control concepts
     ●
         SSL for all connections
     ●
         External User Store (LDAP/OpenSSO/PAM...)
●
     Increased Platform support
     ●
         Windows 7 and Windows Server platforms
     ●
         Embedded platforms
●
     Other, random features by popular request.


35
Longterm Roadmap
●
     Integration with other DBMSs
     ●
         Storage Engine for other DBMSs
         MariaDB/MySQL: → Depends on potential modification
         needs of the storage engine interfaces
     ●
         Trigger-based updates to support BlackRay as a Query
         cache instance over a regular DBMS
●
     Update: The Storage Engine Interface Issue was
     discussed at the OpenSQL Camp 2009 in Portland,
     Tokutek suggested a portable and more efficient
     Universal Storage Engine Interface

36
Longterm Roadmap
●
     Standalone Engine Improvements
     ●
         SQL92 compliance: SUBSELECT/UNION support
     ●
         Triggers in BlackRay
     ●
         Full ACID Transaction support




37
The Team behind BlackRay
SoftMethod GmbH
●
     SoftMethod GmbH initiated the project in 2005 and
●
     Company was founded in 2004 and currently has
     10 employees
●
     Focus of SoftMethod is high performance software
     engineering
●
     Product portfolio includes directory assistance and
     LDAP enabled applications
●
     SoftMethod also offers load testing and technical
     software quality assurance support.

39
Development Team
●
     SoftMethod Core Team
     ●
         Thomas Wunschel – Lead Developer
     ●
         Felix Schupp – Project Sponsor
     ●
         Andreas Meyer – Documentation, Porting
     ●
         Frank Fiedler – PhD Student in Database Systems
●
     Key Contributors
     ●
         Mike Alexeev – Senior Developer
     ●
         Souvik Roy, Intern at SoftMethod GmbH
     ●
         Andreas Strafner – Developer, Porting to AIX

40
Thomas Wunschel
●
     Director of Development, SoftMethod GmbH
●
     Almost 10 years of development experience
●
     Involved with BlackRay and its applications since
     2005
●
     Currently involved in the Network Protocol Stack
●
     Lead Designer and Decision Lead for new Features




41
Felix Schupp
●
     Managing Director, SoftMethod GmbH
●
     Over 10 years of commercial software development
●
     Designer of first BlackRay predecessor in 1999
●
     Project sponsor and spokesperson
●
     Responsible for funding and applications
●
     Guide and coach in the development process




42
Mike Alexeev
●
     Senior Software Developer
●
     Over 10 years of C++ experience
●
     First outside committer to BlackRay
●
     Currently involved in rewriting the SQL grammar to
     support all index features available via the APIs




43
Souvik Roy
●
     Computer Science Student and Software Developer
●
     Seceral years of C++ and Java experience
●
     Current Intern at SoftMethod GmbH
●
     Google Summer of Code participant
●
     Working on reliable performance comparison with
     other Engines
●
     Provides additional sample applications



44
Wrap-Up
What to do next
●
     Get BlackRay:
     ●
         Register yourself on http://forge.softmethod.de
     ●
         SVN checkout available at
         http://svn.softmethod.de/opensource/blackray/trunk
●
     Get Involved
     ●
         Anyone can register and create tickets, news etc
     ●
         We have an active mailing list for discussion as well
●
     Contribute
     ●
         We require a signed Contributor agreement before being
         allowed commit access to the repository
46
Contact Us
●
     Website:     http://www.blackray.org
●
     Twitter:     http://twitter.com/dataengine
●
     Facebook     http://facebook.com/dataengine
●
     Mailing List: http://lists.softmethod.de
●
     Download:    http://sourceforge.net/projects/blackray

●
     Felix:       felix.schupp@softmethod.de
●
     Thomas:      thomas.wunschel@softmethod.de

47

Weitere ähnliche Inhalte

Was ist angesagt?

Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres OpenPostgresOpen
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1PgTraining
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPavan Deolasee
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streamingdatamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
Postgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowPostgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowEDB
 
PelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesPelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesClark & Parsia LLC
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame APIdatamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streamingdatamantra
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Sparkdatamantra
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1datamantra
 
Grid Job Management
Grid Job ManagementGrid Job Management
Grid Job Managementpycontw
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good? Alkin Tezuysal
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkabandatamantra
 

Was ist angesagt? (20)

Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL Cluster
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Postgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowPostgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to Know
 
PelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesPelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic Technologies
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Stardog talk-dc-march-17
Stardog talk-dc-march-17Stardog talk-dc-march-17
Stardog talk-dc-march-17
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
 
Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1Building Distributed Systems from Scratch - Part 1
Building Distributed Systems from Scratch - Part 1
 
Grid Job Management
Grid Job ManagementGrid Job Management
Grid Job Management
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good?
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 

Andere mochten auch

For appbank20101119b
For appbank20101119bFor appbank20101119b
For appbank20101119b春秋 山本
 
eラーニングと著作権処理の実際
eラーニングと著作権処理の実際eラーニングと著作権処理の実際
eラーニングと著作権処理の実際Takahiro Sumiya
 
创业融资Abc
创业融资Abc创业融资Abc
创业融资Abccaleng
 
Catálogo KEEN FALL WINTER 2013
Catálogo KEEN FALL WINTER 2013Catálogo KEEN FALL WINTER 2013
Catálogo KEEN FALL WINTER 2013alenkia
 
Lengsunly2.Lnk
Lengsunly2.LnkLengsunly2.Lnk
Lengsunly2.LnkSUNLYLENG
 
著作権について
著作権について著作権について
著作権についてk plus
 
From GNETS to Home School
From GNETS to Home SchoolFrom GNETS to Home School
From GNETS to Home Schooleeniarrol
 
Html vnn canban
Html vnn canbanHtml vnn canban
Html vnn canbanhieusy
 
Splunk for db_connect
Splunk for db_connectSplunk for db_connect
Splunk for db_connectGreg Hanchin
 
Hsinchun Chen.doc.doc
Hsinchun Chen.doc.docHsinchun Chen.doc.doc
Hsinchun Chen.doc.docbutest
 
20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)
20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)
20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)戦略人事コンサルタント
 
ENC Times-January 09,2017
ENC Times-January 09,2017ENC Times-January 09,2017
ENC Times-January 09,2017ENC
 
名古屋大学2009
名古屋大学2009名古屋大学2009
名古屋大学2009env67
 
Christopher O'Dea Bronze Age 2.0
Christopher O'Dea Bronze Age 2.0Christopher O'Dea Bronze Age 2.0
Christopher O'Dea Bronze Age 2.0Mediabistro
 

Andere mochten auch (20)

For appbank20101119b
For appbank20101119bFor appbank20101119b
For appbank20101119b
 
2014 foe
2014 foe2014 foe
2014 foe
 
eラーニングと著作権処理の実際
eラーニングと著作権処理の実際eラーニングと著作権処理の実際
eラーニングと著作権処理の実際
 
创业融资Abc
创业融资Abc创业融资Abc
创业融资Abc
 
Catálogo KEEN FALL WINTER 2013
Catálogo KEEN FALL WINTER 2013Catálogo KEEN FALL WINTER 2013
Catálogo KEEN FALL WINTER 2013
 
Autonomouscars
Autonomouscars Autonomouscars
Autonomouscars
 
Lengsunly2.Lnk
Lengsunly2.LnkLengsunly2.Lnk
Lengsunly2.Lnk
 
著作権について
著作権について著作権について
著作権について
 
Html02
Html02Html02
Html02
 
From GNETS to Home School
From GNETS to Home SchoolFrom GNETS to Home School
From GNETS to Home School
 
Html vnn canban
Html vnn canbanHtml vnn canban
Html vnn canban
 
Splunk for db_connect
Splunk for db_connectSplunk for db_connect
Splunk for db_connect
 
Family Connection Newsletter June 2012
Family Connection Newsletter June 2012Family Connection Newsletter June 2012
Family Connection Newsletter June 2012
 
Dng de-01012014
Dng de-01012014Dng de-01012014
Dng de-01012014
 
Counterfeit drugs and the threat to Mississippi patients
Counterfeit drugs and the threat to Mississippi patientsCounterfeit drugs and the threat to Mississippi patients
Counterfeit drugs and the threat to Mississippi patients
 
Hsinchun Chen.doc.doc
Hsinchun Chen.doc.docHsinchun Chen.doc.doc
Hsinchun Chen.doc.doc
 
20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)
20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)
20130628JAA講演会用配布資料(ソーシャルメディアの基本と活用事例)(slideshare用)
 
ENC Times-January 09,2017
ENC Times-January 09,2017ENC Times-January 09,2017
ENC Times-January 09,2017
 
名古屋大学2009
名古屋大学2009名古屋大学2009
名古屋大学2009
 
Christopher O'Dea Bronze Age 2.0
Christopher O'Dea Bronze Age 2.0Christopher O'Dea Bronze Age 2.0
Christopher O'Dea Bronze Age 2.0
 

Ähnlich wie Blackray @ SAPO CodeBits 2009

Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointChristopher Dubois
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireSimon J Mudd
 
The Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage EngineThe Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage Enginefschupp
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedis Labs
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
JDG 7 & Spark Integration
JDG 7 & Spark IntegrationJDG 7 & Spark Integration
JDG 7 & Spark IntegrationTed Won
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May MeetupDeepak Garg
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using AnsibleAlok Patra
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackMatt Lord
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStackTesora
 

Ähnlich wie Blackray @ SAPO CodeBits 2009 (20)

Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the Wire
 
The Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage EngineThe Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage Engine
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
JDG 7 & Spark Integration
JDG 7 & Spark IntegrationJDG 7 & Spark Integration
JDG 7 & Spark Integration
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Blackray @ SAPO CodeBits 2009

  • 1. 1
  • 2. Presentation Agenda ➔ Brief History ➔ Technology Overview ➔ Positioning towards other Projects ➔ Roadmap ➔ The Team ➔ Wrap-Up 2
  • 4. What is BlackRay? ● BlackRay is a relational, in-memory database ● SQL with JDBC and ODBC Driver support ● Fulltext (Tokenized) Search in Text fields ● Object-Oriented API Support ● Persistence via Files ● Scalable and Fault Tolerant ● Open Source, Open Community ● Available under the GPLv2 4
  • 5. BlackRay History ● Concept of BlackRay was developed in 1999 for a Web Phone Directory at Deutsche Telekom ● Development of current BlackRay started in 2005, as a Data Access Library ● First Production Use at Deutsche Telekom in 2006 ● Evolution to Data Engine in 2007 and 2008 ● Open Source under GPLv2 since June 2009 5
  • 6. Why Build Another Database? ● Rather unique set of requirements: ● Phone Directory with approx. 80 Million subscribers ● All queries in the 10-500 Millisecond range ● Approximately 2000 concurrent users ● Over 500 Queries per Second (sustained) ● Updates once a day may only take several minutes ● Index needs to be tokenized (SQL: CONTAINS) ● Phonetic search ● Extensive Wildcard queries (leading/midspan/trailing) 6
  • 7. Decision to Implement BlackRay ● Decision was formed in Mid 2005 ● Designed as a lightweight data access engine ● Implementation in C++ for performance and maintainability ● APIs for access across the network from different languages (Java, C++) ● Feature set was designed for our specific business case in a particular project 7
  • 8. Current Release ● Current release 0.9.0 released on June 12th, 2009 ● Production level quality ● All relevant index functions for large scale search applications ● APIs in C++ and Java fully functional ● SQL still only a small subset of the API functionality 8
  • 9. Some more details ● Written in C++ ● Relies heavily on boost ● Compiles well with gcc and Sun Studio 12 ● Behaves well on Linux, Solaris, OpenSolaris and MacOS ● Complete 64 Bit development and platform support ● Use of cmake for multi platform support 9
  • 11. Why call it Data Engine? ● BlackRay is a hybrid between a relational database and a search engine → thus we call it „data engine“ ● Database features: ● Relational structure, with Join between tables ● Wildcards and index functions ● SQL and JDBC/ODBC ● Search Engine Features ● Fulltext retrieval (token index) ● Phonetic and similar approximation search ● Extremely low latency 11
  • 12. BlackRay Architecture Postgres* SQL Management Clients Interface Server Data Universe C++ API Instance Server 5-Perspective Index Java API L1: Dictionary Redo Log L2: Postings Python API L3: Row Index L4: Multi-Tokens C# API L5: Multi-Values Snapshots < PHP API 12
  • 13. Hierarchical Model ● Each BlackRay node can hold many Instances ● One Management process per node ● Each Instance runs in a separate Process ● Instances are completely separated from each other ● Snapshots (for persistence) are taken on an Instance level ● In an Instance, Schemas and Tables can be created ● Within one Instance, Queries can span across Tables and Schemas 13
  • 14. Getting Data Into BlackRay ● Once an Instane is created, data can be loaded ● Schemas, Tables, and Indexes are created using an XML description language ● Standard loader utility to load CSV data ● Bulk loading is done with logging disabled ● Indexing is done with maximum degree of parallelism depending on CPUs ● After all data is indexed, a snapshot can be taken 14
  • 15. Basic Load Performance Data ● German yellowpage and whitepage data ● 60 Million subscribers ● 100 Million phone numbers ● Raw data approx 16GB ● Indexing performance ● Total index size 11GB ● Time to index: 40 Minutes, on dual 2GHz Xeon (Linux) ● Updates: 300MB, 200K rows in approx 5 minutes ● Time to load snapshot: 3.5 Minutes for 11GB 15
  • 16. Data Universe ● BlackRay features a 5-Perspective Index ● Layer 1: Dictionary ● Layer 2: Postings ● Layer 3: Row Index ● Layer 4: Multi-Token Layer ● Layer 5: Multi-Value Layer ● Layer 1 and 2 comprise a fully inverted Index ● Statistics in this Index used for Query Plan Building ● All data - index and raw output - are held in memory 16
  • 17. Snapshots and Data Versioning ● Persistence is done via file based snapshots ● Snapshots consist of all schemas in one instance ● Snapshots have a version number ● To make a backup of the data, simply copy the snapshot file to a backup media ● It is possible to load an older snapshot: Data is version controlled if older snapshots are stored ● Note: Currently Snapshots are not fully portable. 17
  • 18. Transactions in BlackRay ● BlackRay supports transactions via a Redo Log ● All commands that modify data are logged if requested ● In case of a crash, the latest snapshot will be loaded ● Replay of the transaction log will then bring the database back to a consistent state ● Redo Log is eliminated when a snapshot is persisted ● For better performance snapshots should be taken periodically, ideally after each bulk update 18
  • 19. Native Query APIs ● C++, Java and Python Object Oriented APIs are available ● Built with ICE (ZeroC) as the Network and Object Brokerage Protocol ● Query Objects are constructed using an Object Builder ● Execution via the network, Results as Objects ● Load balacing and failover built into the protocol ● Native support for Ruby, PHP and C# upcoming 19
  • 20. Native Query APIs ● Queries can use any combination of OR, AND ● Index functions (phonetic, sysnonyms, stopwords) can be stacked ● Token search (fulltext) and wildcard are also supported ● Advantage of the APIs: ● Minimized overhead ● Very low latency ● High availability 20
  • 21. Standard Query Interface ● BlackRay implements the PostgreSQL server socket interface ● All PostgreSQL compatible frontends can be utilized against BlackRay ● JDBC, ODBC, native drivers using socket interface are all supported ● Limitations: SQL is only a reasonable subset of SQL92, Metadata is not yet fully implemented... ● Currently not all index functions can be accessed via SQL statements 21
  • 22. Management Features ● Management Server acts as central broker for all Instances ● Command line tools for administration ● SNMP management: ● Health check of Instances ● Statistics , including access counters and performance measurements 22
  • 23. Clustering and Replication ● BlackRay supports a multi-node setup for fault tolerance ● Cluster have N query nodes and single update node ● All query nodes are equal and require the full index data available ● Index creation is done offline on the update node ● Query nodes can reload snapshots to receive updated data, via shared storage or local copy ● Load-balancing handled by native APIs 23
  • 24. Administration Requirements ● In-Memory Databases require very little administrative tasks ● Configuration via one file per Instance ● Disk layout etc all are of no importance ● Backups are performed by copying snapshots ● Recovery is done by restoring a snapshot ● No daily administration required ● SNMP allows remote supervision with common tools 24
  • 26. BlackRay and other Projects ● BlackRay is being positioned as a Query intensive database addition ● Ideally suited where updates are done in Bulk and searches outweigh updates by many orders of magnitude ● Wildcards come with little overhead: No overhead for trailing wildcard, some overhead for leading and midspan wildcard ● Good match when index functions such as phonetic or word rotation/position search combined with relational data are required 26
  • 27. FOSS Projects with similar Goals ● Relational Databases (the usual suspects): ● MySQL, MariaDB, Drizzle.... ● PostgreSQL... ● Many more alternatives, including embedded etc.... ● Fulltext Search: ● Sphinx ● Lucene ● In-Memory Databases: ● FastDB ● HSQLDB/H2 (Java → Garbage collection issues....) 27
  • 28. Commercial Alternatives ● Commercial In-Memory Databases ● ORACLE/TimesTen (Acquired by ORACLE in 2005) ● IBM/SolidDB (Acquired by IBM in 2007) ● VoltDB (No real data available as of yet) ● eXtremeDB (embedded use only) ● Dual-Licensed Alternatives ● CSQL (Open Source Version is severely crippled) ● MySQL with memcached 28
  • 29. Is it the right thing for me? ● BlackRay is not designed as a 100% RDBMS replacement ● Questions to ask: ● Do I need ad-hoc data updates, or are updates done in bulk? ● How important are fulltext search and extensive wildcards? ● How large is my data? Gigabytes: OK. Terrabytes: Not yet ● Do I need a relational data model? ● Is SQL an important feature? 29
  • 30. BlackRay will fit you well when... … searches outweigh updates … data is primarily updated in bulk … fulltext search is important … you have Gigabytes (not Terabytes) of data … index functions (phonetic …) are required … SQL is necessary … a relational data model is required (JOIN) … source code must be available … high availability/clustering is required 30
  • 32. Immediate Roadmap ● Upcoming 0.10.0 – Due in December 2009 ● Complete rewrite of SQL Parser (boost::spirit2) ● PostgreSQL client compatibility (via network protocol) to allow JDBC/ODBC... via PostgreSQL driver ● Rewritten CLI tools ● Major bugfixes (potential memory leaks) ● Authentication supported for Instances ● BlackRay Admin Console (Remora) 0.10 32
  • 33. Immediate Roadmap ● Planned 0.11.0 – Due in February 2010 ● Support for ad-hoc data manilulation via SQL (INSERT/UPDATE/DELETE) ● Aggregate functions for SELECT ● Make all index functions available in SQL ● Support for Prepared Statements (ODBC/JDBC) ● Improved thread and memory management (Perftools?) ● BlackRay Admin Console (Remora) 0.11 ● Engine Statistics via GUI ● Cluster Node management 33
  • 34. Midterm Roadmap ● Scalability Features ● Sharding & Partitioning Options ● Federated Search ● Fully portable snapshot format (across platforms) ● Query Performance Analyzer ● Improved Statistics Module with GUI ● Removal of ICE as a core component ● BlackRay as a Storage Backend for SUN OpenDS LDAP Engine 34
  • 35. Midterm Roadmap ● Security Features ● Improved User and Access Control concepts ● SSL for all connections ● External User Store (LDAP/OpenSSO/PAM...) ● Increased Platform support ● Windows 7 and Windows Server platforms ● Embedded platforms ● Other, random features by popular request. 35
  • 36. Longterm Roadmap ● Integration with other DBMSs ● Storage Engine for other DBMSs MariaDB/MySQL: → Depends on potential modification needs of the storage engine interfaces ● Trigger-based updates to support BlackRay as a Query cache instance over a regular DBMS ● Update: The Storage Engine Interface Issue was discussed at the OpenSQL Camp 2009 in Portland, Tokutek suggested a portable and more efficient Universal Storage Engine Interface 36
  • 37. Longterm Roadmap ● Standalone Engine Improvements ● SQL92 compliance: SUBSELECT/UNION support ● Triggers in BlackRay ● Full ACID Transaction support 37
  • 38. The Team behind BlackRay
  • 39. SoftMethod GmbH ● SoftMethod GmbH initiated the project in 2005 and ● Company was founded in 2004 and currently has 10 employees ● Focus of SoftMethod is high performance software engineering ● Product portfolio includes directory assistance and LDAP enabled applications ● SoftMethod also offers load testing and technical software quality assurance support. 39
  • 40. Development Team ● SoftMethod Core Team ● Thomas Wunschel – Lead Developer ● Felix Schupp – Project Sponsor ● Andreas Meyer – Documentation, Porting ● Frank Fiedler – PhD Student in Database Systems ● Key Contributors ● Mike Alexeev – Senior Developer ● Souvik Roy, Intern at SoftMethod GmbH ● Andreas Strafner – Developer, Porting to AIX 40
  • 41. Thomas Wunschel ● Director of Development, SoftMethod GmbH ● Almost 10 years of development experience ● Involved with BlackRay and its applications since 2005 ● Currently involved in the Network Protocol Stack ● Lead Designer and Decision Lead for new Features 41
  • 42. Felix Schupp ● Managing Director, SoftMethod GmbH ● Over 10 years of commercial software development ● Designer of first BlackRay predecessor in 1999 ● Project sponsor and spokesperson ● Responsible for funding and applications ● Guide and coach in the development process 42
  • 43. Mike Alexeev ● Senior Software Developer ● Over 10 years of C++ experience ● First outside committer to BlackRay ● Currently involved in rewriting the SQL grammar to support all index features available via the APIs 43
  • 44. Souvik Roy ● Computer Science Student and Software Developer ● Seceral years of C++ and Java experience ● Current Intern at SoftMethod GmbH ● Google Summer of Code participant ● Working on reliable performance comparison with other Engines ● Provides additional sample applications 44
  • 46. What to do next ● Get BlackRay: ● Register yourself on http://forge.softmethod.de ● SVN checkout available at http://svn.softmethod.de/opensource/blackray/trunk ● Get Involved ● Anyone can register and create tickets, news etc ● We have an active mailing list for discussion as well ● Contribute ● We require a signed Contributor agreement before being allowed commit access to the repository 46
  • 47. Contact Us ● Website: http://www.blackray.org ● Twitter: http://twitter.com/dataengine ● Facebook http://facebook.com/dataengine ● Mailing List: http://lists.softmethod.de ● Download: http://sourceforge.net/projects/blackray ● Felix: felix.schupp@softmethod.de ● Thomas: thomas.wunschel@softmethod.de 47