MariaDB's Andrew Hutchings and Shane Johnson walk through new features of the MariaDB ColumnStore storage engine, tools and adapters, then provide a sneak peak at what's planned for the next release.
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
What’s new in MariaDB ColumnStore
1. What’s new in
MariaDB ColumnStore
Andrew Hutchings
Technical Lead, MariaDB ColumnStore
MariaDB Corporation
Shane K Johnson
Senior Director of Product Marketing
MariaDB Corporation
2. Agenda
1. Quick overview of MariaDB ColumnStore
2. The evolution of MariaDB ColumnStore
3. Recap of key MariaDB ColumnStore 1.1 features
4. What’s new in MariaDB ColumnStore 1.2
3. Server 2
MariaDB ColumnStore – overview (1/2)
MariaDB Server
ColumnStore
(interface)
InnoDB
ColumnStore
(storage)
User
Module (UM)
Performance
Module (PM)
Disk
Disk
Server 1
9. Recap of ColumnStore 1.1 key features
1. Bulk data adapters
2. CDC streaming data adapter
3. User-defined aggregate functions (distributed)
10. MariaDB Server
ColumnStore
(interface)
MariaDB Server
ColumnStore
(interface)
ColumnStore
(storage)
Write engine
ColumnStore
(storage)
Write engine
Application/Service/Script
(back end)
Bulk data adapter
1. For each row
a. For each column
bulkInsert->setColumn
b. bulkInsert->writeRow
2. bulkInsert->commit
* Buffer 100,000 rows by default
ColumnStore
(storage)
Write engine
MariaDB
MaxScale
Application
(front end)
Bulk data adapters
15. Pentaho Data Integration adapter
● This adapter implements the Pentaho Data Integration / Kettle SDK to enable
rapid data loading into ColumnStore by leveraging the bulk load API
● This will provide orders of magnitude performance improvement over the DML
based adapter
● Supported on Windows 10, Ubuntu 16, and RHEL / CentOS 7
● For more details:
https://mariadb.com/kb/en/library/columnstore-streaming-data-adapters/#colum
nstore-pentaho-data-integration-data-adapter
16. Pentaho Data Integration adapter – usage
● As a consumer of the ColumnStore
Bulk API, copy of the cluster
ColumnStore.xml is required
● In addition, a JDBC connection is
required for metadata and to
support update / delete DML
● A target table must be defined as
the target for the data stream
17. Pentaho Data Integration adapter – usage
● After the target table is defined the
mapping from the input stream to
the target table must be defined
● The map all inputs button will
attempt to auto map the columns if
possible
18. Remote import: mcsimport
● Batch
● CSV
● Command line
● Can run outside a UM/PM
● Local source files
● Auto committed
PM 1
Write engine
Files
PM 2
Write engine
PM n
Write engine
Files Files
Server
mcsimport
MariaDB
Server (UM)
CSV
19. Windows support for adapters and tools
● Support is now provided for the bulk data adapter, mcsimport and Pentaho
Data Integration adapter on Microsoft Windows 10
● This opens up a broader range of integration opportunities (ETL and custom
data loading) on the desktop
● A windows specific installer is provided which installs the necessary
dependencies
● Running ColumnStore itself within Windows is still best achieved through using
the Windows Linux Subsystem or the Docker container with Docker for
Windows
20. Multi-parameter Distributed UDAF
● Distributed user-defined aggregate functions (UDAF) can now take more than
one parameter – both aggregate and window functions are supported
● Enables more complex functions to be distributed to PMs:
○ Multi-column functions (e.g., linear regression)
○ Implemented using this framework - details on the next slide
○ Single-column functions with an extra parameter (e.g., custom percentile)
● Requires the C++ SDK and including the compiled library on each node
● For more details see:
https://mariadb.com/kb/en/library/columnstore-user-defined-aggregate-and-win
dow-functions/
21. Regression functions (1/2)
● REGR_AVGX(ColumnY, ColumnX)
○ Average of the independent variable (sum(ColumnX)/N), where N is number of rows
processed by the query
● REGR_AVGY(ColumnY, ColumnX)
○ Average of the dependent variable (sum(ColumnY)/N), where N is number of rows
processed by the query
● REGR_COUNT(ColumnY, ColumnX)
○ The total number of input rows in which both column Y and column X are nonnull
● REGR_INTERCEPT(ColumnY, ColumnX)
○ The y-intercept of the least-squares-fit linear equation determined by the pairs
22. Regression functions (2/2)
● REGR_R2(ColumnY, ColumnX)
○ Square of the correlation coefficient: regr_intercept(ColumnY, ColumnX)
● REGR_SLOPE(ColumnY, ColumnX)
○ The slope of the least-squares-fit linear equation determined by the pairs
● REGR_SXX(ColumnY, ColumnX)
○ REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs
● REGR_SXY(ColumnY, ColumnX)
○ REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs
● REGR_SYY(ColumnY, ColumnX)
○ REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs
23. Data types
● An explicit TIME datatype is now supported for capturing the time of day
○ This is very useful for financial applications
○ Avoids use of a custom numeric type as a workaround
○ Uses 8 bytes of storage
○ Supported range is '-838:59:59.999999' to '838:59:59.999999'
● Additionally, precision up to milli/micro second for DATETIME and TIME data
types allow more fine-grained time specification
● Boolean data type is supported.
24. Additional features
● CREATE TABLE .. LIKE ..;
● GROUP BY is pushed down in vtable_mode 0 (executed by MariaDB Server)
● Reserved words and non-alphanumeric characters for table/column names
● Cross-engine joins with SSL connections
● Improvements to non-root install to not require sudo privileges for install user
○ Recommend to use the 'mysql' user
● 80 bug fixes
● 40+ bug fixes coming in the soon-to-be-released 1.2.3 maintenance release
25. Convergence
● Internal refactoring and preparation to remove to get off a MariaDB Server fork
● MariaDB Server 10.4 will include additional optimizer and storage engine API
enhancements so we can complete the process
● Goal is to install ColumnStore on top of a standard MariaDB server installation
● postConfigure will still be required to configure the ColumnStore cluster