In this session Alejandro Infanzon, Solutions Engineer, introduces the linear regression and statistical functions that debuted in MariaDB ColumnStore 1.2, and how you can use them to support powerful analytics. He explains how to perform even-more-powerful analytics by writing multi-parameter user-defined functions (UDFs) – also new in MariaDB ColumnStore 1.2.
3. Why Analytics ?
Extract value form your data
Get the most
value of your
data assets
Improve the
quality of
decision making
Improve
planning and
forecasting
Introduce new
products and
services
5. Types of Analytics
Uncover correlations and patterns
Descriptive Analytics
What happened?
1 Diagnostic Analytics
How or why did it happen?
2
Predictive Analytics
What is likely to happen next?
34
Prescriptive Analytics
What should I do about it?
6. Top Tools Used for Analytics
● Jupyter (Julia, Python and R) notebooks
are popular with Data Scientists.
● SQL is important in data science.
● Prebuilt analytical functions / aggregations
* https://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning-
software-leaders.html
25. User Defined
Functions (UDF)
• Extend MariaDB with a
new functions.
• UDFs Work like a native
(built-in) MariaDB
functions.
Alternative ways
• Modifying and compiling
the server source code.
• Writing a stored function.
26. “Hello World” UDF Example
Four key steps
Write UDF
code
Compile to
Shared Library
Move library
to Plugin
directory
Register,
verify and Run
the UDF
31. Challenges Faced:
Take 2+ Days to Load
Data
Unsustainable Oracle
LIcensing Model
Historically would need to
create/manage indexes
to get query performance
Health population data analysis to equip epidemiologist, public health
officers, and insurance providers with evidence based outcome and efficacy
of treatment. The insurance providers further use this insight to measure &
predict treatment costs and population health over several years.
Benefits Realized:
Data Loads now take minutes
Can query 200+ columns with fast performance
Easy administration - no indexes
High Performance Data Visualization with Tableau
Center for Information
Management (CIM)
32. Genus provides
farmers with superior
genetics that enable
them to produce
higher-quality animal
protein more efficiently,
in the form of meat and
milk.
Challenges Faced:
Leveraging Oracle was
Cost Prohibitive
Existing process had
slow data loads
Data Scientist wanted to
use SQL as primary
interface
Didn’t want to have to
provide heavy database
administration
Benefits Realized:
Fast loading of raw data. A few Gigs to 20 Gig data per load
Leverage Known Interfaces:
Easy to use SQL Front End
Python bulk data adapters allows them to directly publish results from
machine learning data models instead of scheduling cpimport jobs
Fast query results
Easy to maintain
Much more affordable cost structure
Genus PLC
33. Industry: Telco
Data: call and text logs
Use case: Mobile app
use analytics
Details:
30 million text and 3
million phone call per day
1.5 billion rows of logs
per day
The text and call volume
rate will continue to grow
InnoDB backend hit the scale limit of 6TB and it requires lot of
performance tuning and index management
Migrated to MariaDB AX
Able to process 24 month - 24TB vs 6 months limitation of InnoDB
Same BI tools and client applications worked with MariaDB AX
seamlessly
Customer Use Case: Pinger