Big Data Analytics with MariaDB ColumnStore

MariaDB
ColumnStore
BigData Analytics

Agenda
Session 1
Overview
Architecture
Session 2
Window Functions
Analytic Functions
Session 3
Demo – DEX Data Explorer

Analytics
vs
Data Warehouse
What questions do you have?
What had happen?

Data Warehousing
Selective column
based queries
Large number
of dimensions
High Performance
Analytics On Large
Volume Of Data
Reporting and analysis
on millions or billions
of rows
From datasets
containing millions
to trillions of rows
Terabytes to Petabytes
of datasets
Analytics Require
Statistical Algorithms,
Windowing Functions
Learning from data and
understanding data
Technical Use Cases

Data Scientist/Engineer
What tool(s) do I use?
SQL interfaces
What’s inside the dataset?
Data Exploration
What story can I tell?
Visualization
(picture worth 1000 words)

MariaDB ColumnStore
•  GPLv2 Open Source
•  Columnar, Massively Parallel
MariaDB Storage Engine
•  Scalable, high-performance
analytics platform
•  Built in redundancy and
high availability
•  Runs on premise, on AWS cloud
•  Full SQL syntax and capabilities
regardless of platform
Big Data Sources Analytics Insight
MariaDB ColumnStore
. . .
Node 1 Node 2 Node 3 Node N
Local / AWS® / GlusterFS ®
ELT
Tools
BI Tools
Analyticials

MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
User Connections
User Module nUser Module 1
Performance
Module n
Performance
Module 2
Performance
Module 1
MariaDB
Front End
Query Engine
User Module
Processes SQL Requests
Performance Module
Distributed Processing Engine

MariaDB ColumnStore
High performance columnar storage engine that support wide variety of
analytical use cases with SQL in a highly scalable distributed environments
Parallel query
processing for
distributed
environments
Faster, More
Efficient Queries
Single SQL Interface
for OLTP and
analytics
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open
Source to Big Data
Analytics
Better Price
Performance

OLTP/NoSQL
Workloads
Suited for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows.
OLAP/Analytic/
Reporting Workloads
Workload – Query Vision/Scope
1 100 10,000
10-100GB
10,000,000,000
1-10TB
1,000,000 100,000,000
100-1,000GB

Sizing
Minimum Spec
UM
4 core,
32 G RAM PM
4 core,
16 G RAM
Typical Server spec
PM
8 core 64G RAM
UM
8 core, 264G RAM
Data Storage
External Data Volumes
•  Maximum 2 data volume per IO
channel per PM node server
•  up to 2TB on the disk per data
volume ≈ Max 4 TB per PM node
Local disk
Up to 2TB on the disk per
PM node server
DETAILED SIZING GUIDE
based on data size
and workload

Sizing - Example
•  MariaDB ColumnStore 60TB uncompressed data =
6TB compressed data at 10x compression
•  2UM - 8 core 512G(based on work load)
•  6 TB compressed = 3 data volume (at 2TB per volume)
-  with 1 data volume per PM node - 3PMs
•  Data growth - 2TB per month, Data retention - 2 years
-  Plan for 2TB X24 = 48 TB additional
-  48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume)
with 1 data volume per PM node - 3 additional PMs
•  Total 6 PMs, 2 UMs

Analytics with
MariaDB
ColumnStore
SQL Features
Aggregation
Window Functions

ColumnStore
SQL Features
Source : InfiniDB SQL Syntax Guide
Cross Engine
Joins
CTE
DML
Aggregation
DDL
Disk Based
Joins
Windowing
Functions
SELECT
QUERY

MAX RANK
MIN DENSE_RANK
COUNT PERCENT_RANK
SUM NTH_VALUE
AVG FIRST_VALUE
VARIANCE LAST_VALUE
VAR_POP CUME_DIST
VAR_SAMP LAG
STD LEAD
STDDEV NTILE
STDDEV_POP PERCENTILE_CONT
STDDEV_SAMP PERCENTILE_DISC
ROW_NUMBER MEDIAN
•  Aggregate over a series of related rows
•  Simplified function for complex statistical
analytics over sliding window per row
-  Cumulative, moving or centered aggregates
-  Simple Statistical functions like rank, max, min,
average, median
-  More complex functions such as distribution,
percentile, lag, lead
-  Without running complex sub-queries
Windowing Functions
Source : InfiniDB SQL Syntax Guide

Data exploration
Dataset
Import
Data
Visualization
Dataset Exploration Demo

Big Data Analytics with MariaDB ColumnStore

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Big Data Analytics with MariaDB ColumnStore

Ähnlich wie Big Data Analytics with MariaDB ColumnStore (20)

Mehr von MariaDB plc

Mehr von MariaDB plc (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data Analytics with MariaDB ColumnStore