SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Applying Linear
Regression and
Predictive Analytics
Alejandro Infanzon
Solutions Architect
MariaDB Corporation
Topics
● Need for Analytics / SQL
● MariaDB Platform Introduction.
● Demo Statistical Functions
● Demo “Hello World” UDF
Why Analytics ?
Extract value form your data
Get the most
value of your
data assets
Improve the
quality of
decision making
Improve
planning and
forecasting
Introduce new
products and
services
Analytics Data Challenge
Data must be managed!
Types of Analytics
Uncover correlations and patterns
Descriptive Analytics
What happened?
1 Diagnostic Analytics
How or why did it happen?
2
Predictive Analytics
What is likely to happen next?
34
Prescriptive Analytics
What should I do about it?
Top Tools Used for Analytics
● Jupyter (Julia, Python and R) notebooks
are popular with Data Scientists.
● SQL is important in data science.
● Prebuilt analytical functions / aggregations
* https://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning-
software-leaders.html
MariaDB Platform X3
Why MariaDB Platform X3 ?
MariaDB Platform: use cases
Unified Platform
MariaDB Server
Row storage
OLTP
MariaDB Server
HTAP
Row storage Columnar storage
MariaDB Server
Columnar storage
OLAP
Transactions Transactions + Analytics Analytics
MariaDB Platform X3 High Level
MariaDB Platform X3 Solution
Encompassing applications, tools, database and support services
MariaDB Server
Enterprise analytics
Enterprise high availability Enterprise scalability
Enterprise performance Enterprise integration
Enterprise security
MariaDB connectors
Enterprise tools Enterprise support
- Management
- Monitoring
- Backup/restore
- Technical
- Consultative
- Notifications
- Alerts
MariaDB Platform X3: Pluggable Storage
Right storage for different workloads
MariaDB Server
tbl_products tbl_purchases tbl_carts tbl_clicks
InnoDB MyRocks Spider ColumnStore
General purpose Write optimized Sharded Columnar
Product catalog Purchases Shopping carts Clickstream events
Jupyter Notebook
Interaction
Literate programing interaction ?
Available analytical functions / aggregations ?
The Demo Environment
Data Layer
Docker ColumnStore Container
Application Layer
Connectivity Layer pymysql connector
Python Jupyter DBeaver
Demo Tables
Using different storage engines
Database Connection
Using PYMSQL module
Basic Statistical Functions
Top 10 records from the breeds table
Basic Statistical Functions (Cont.)
Using PYMSQL module
Using Basic Statistics
Find out extra-small and extra-large dog breeds
Advanced Statistical Functions
Top 10 rows of the employees table
Advanced Statistical Functions ( Cont. )
Covariance between days since hired and salary
Advanced Statistical Functions ( Cont. )
Calculate the correlation between hire date and salary
Advanced Statistical Functions ( Cont. )
Calculate Linear Regression between hire date and salary
Advanced Statistical Functions ( Cont. )
Plotting the linear regression
Custom Analytical
Functions
Can I build my own analytical functions ?
User Defined
Functions (UDF)
• Extend MariaDB with a
new functions.
• UDFs Work like a native
(built-in) MariaDB
functions.
Alternative ways
• Modifying and compiling
the server source code.
• Writing a stored function.
“Hello World” UDF Example
Four key steps
Write UDF
code
Compile to
Shared Library
Move library
to Plugin
directory
Register,
verify and Run
the UDF
“Hello World” UDF Anatomy
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <my_global.h>
#include <my_sys.h>
#include <mysql.h>
#include <ctype.h>
#include <math.h>
Main UDF structure
my_bool radius_init(
UDF_INIT *initid
, UDF_ARGS *args
, char *message)
{
if ((args->arg_count != 1)) {
strcpy(message, "AICA: Message");
return 1;
}
args->arg_type[0] = REAL_RESULT;
return 0;
}
double radius(
UDF_INIT *initid
, UDF_ARGS *args
, char *result
, unsigned long *length
, char *is_null
, char *error)
{
double r = *((double*) (args->args[0]));
double pi = 3.14;
return pi * 2 * (r);
}
void radius_deinit(
UDF_INIT * initid)
{
//nothing
}
Libraries
Function
Initialization
Function
Body
De-allocate
resources
Compile and Move
Place .so in the Plugin Directory
Register, Verify and Run
Place .so in the Plugin Directory
THANK YOU!
Challenges Faced:
Take 2+ Days to Load
Data
Unsustainable Oracle
LIcensing Model
Historically would need to
create/manage indexes
to get query performance
Health population data analysis to equip epidemiologist, public health
officers, and insurance providers with evidence based outcome and efficacy
of treatment. The insurance providers further use this insight to measure &
predict treatment costs and population health over several years.
Benefits Realized:
Data Loads now take minutes
Can query 200+ columns with fast performance
Easy administration - no indexes
High Performance Data Visualization with Tableau
Center for Information
Management (CIM)
Genus provides
farmers with superior
genetics that enable
them to produce
higher-quality animal
protein more efficiently,
in the form of meat and
milk.
Challenges Faced:
Leveraging Oracle was
Cost Prohibitive
Existing process had
slow data loads
Data Scientist wanted to
use SQL as primary
interface
Didn’t want to have to
provide heavy database
administration
Benefits Realized:
Fast loading of raw data. A few Gigs to 20 Gig data per load
Leverage Known Interfaces:
Easy to use SQL Front End
Python bulk data adapters allows them to directly publish results from
machine learning data models instead of scheduling cpimport jobs
Fast query results
Easy to maintain
Much more affordable cost structure
Genus PLC
Industry: Telco
Data: call and text logs
Use case: Mobile app
use analytics
Details:
30 million text and 3
million phone call per day
1.5 billion rows of logs
per day
The text and call volume
rate will continue to grow
InnoDB backend hit the scale limit of 6TB and it requires lot of
performance tuning and index management
Migrated to MariaDB AX
Able to process 24 month - 24TB vs 6 months limitation of InnoDB
Same BI tools and client applications worked with MariaDB AX
seamlessly
Customer Use Case: Pinger

Weitere ähnliche Inhalte

Was ist angesagt?

Matteo Del Giudice, Politecnico di Torino, Italy.
Matteo Del Giudice, Politecnico di Torino, Italy.Matteo Del Giudice, Politecnico di Torino, Italy.
Matteo Del Giudice, Politecnico di Torino, Italy.ARC research group
 
Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...
Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...
Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...ARC research group
 
Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.
Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.
Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.ARC research group
 
Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.
Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.
Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.ARC research group
 
Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.
Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.
Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.ARC research group
 
Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.
Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.
Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.ARC research group
 
Overheating study on residential buildings
Overheating study on residential buildingsOverheating study on residential buildings
Overheating study on residential buildingsSustainableEnergyAut
 
Ventilation inspection schemes in France
Ventilation inspection schemes in France Ventilation inspection schemes in France
Ventilation inspection schemes in France SustainableEnergyAut
 
Semanco workshop Theme2 - E4R
Semanco workshop Theme2 - E4RSemanco workshop Theme2 - E4R
Semanco workshop Theme2 - E4RARCSalle
 
2020 status report building integrated photovoltaics BIPV
2020 status report building integrated photovoltaics BIPV2020 status report building integrated photovoltaics BIPV
2020 status report building integrated photovoltaics BIPVLeonardo ENERGY
 
Epbd issues in relation to iIAQ and ventilation
Epbd issues in relation to iIAQ and ventilation Epbd issues in relation to iIAQ and ventilation
Epbd issues in relation to iIAQ and ventilation SustainableEnergyAut
 
Investigating Business Models for Building Integrated Photovoltaics (BIPV)
Investigating Business Models for Building Integrated Photovoltaics (BIPV)Investigating Business Models for Building Integrated Photovoltaics (BIPV)
Investigating Business Models for Building Integrated Photovoltaics (BIPV)Leonardo ENERGY
 
Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...
Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...
Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...ARC research group
 

Was ist angesagt? (20)

Matteo Del Giudice, Politecnico di Torino, Italy.
Matteo Del Giudice, Politecnico di Torino, Italy.Matteo Del Giudice, Politecnico di Torino, Italy.
Matteo Del Giudice, Politecnico di Torino, Italy.
 
Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...
Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...
Patrik Kolar, Head of Department B (LIFE and Horizon 2020 Energy, Environment...
 
Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.
Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.
Álvaro Sicilia, ARC Engineering and Architecture La Salle, Barcelona, Spain.
 
Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.
Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.
Jure Čižman, Jožef Stefan Institute, Ljubljana, Slovenia.
 
Matt's New Resume
Matt's New ResumeMatt's New Resume
Matt's New Resume
 
BIM - What the Client wants from Manufacturers #BIM4M2help
BIM - What the Client wants from Manufacturers #BIM4M2helpBIM - What the Client wants from Manufacturers #BIM4M2help
BIM - What the Client wants from Manufacturers #BIM4M2help
 
Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.
Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.
Lluís Morer, Catalan Institute for Energy (ICAEN), Barcelona, Spain.
 
Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.
Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.
Boris Sučić, Jožef Stefan Institute, Ljubljana, Slovenia.
 
bRlb
bRlbbRlb
bRlb
 
Overheating study on residential buildings
Overheating study on residential buildingsOverheating study on residential buildings
Overheating study on residential buildings
 
Ventilation inspection schemes in France
Ventilation inspection schemes in France Ventilation inspection schemes in France
Ventilation inspection schemes in France
 
Semanco workshop Theme2 - E4R
Semanco workshop Theme2 - E4RSemanco workshop Theme2 - E4R
Semanco workshop Theme2 - E4R
 
Presentation on Sustainability with BIM
Presentation on Sustainability with BIMPresentation on Sustainability with BIM
Presentation on Sustainability with BIM
 
Presentation
PresentationPresentation
Presentation
 
Product Data Templates (PDTs) and CObie - #BIM4M2help
Product Data Templates (PDTs) and CObie - #BIM4M2helpProduct Data Templates (PDTs) and CObie - #BIM4M2help
Product Data Templates (PDTs) and CObie - #BIM4M2help
 
2020 status report building integrated photovoltaics BIPV
2020 status report building integrated photovoltaics BIPV2020 status report building integrated photovoltaics BIPV
2020 status report building integrated photovoltaics BIPV
 
Epbd issues in relation to iIAQ and ventilation
Epbd issues in relation to iIAQ and ventilation Epbd issues in relation to iIAQ and ventilation
Epbd issues in relation to iIAQ and ventilation
 
French ventilation task force
French ventilation task force French ventilation task force
French ventilation task force
 
Investigating Business Models for Building Integrated Photovoltaics (BIPV)
Investigating Business Models for Building Integrated Photovoltaics (BIPV)Investigating Business Models for Building Integrated Photovoltaics (BIPV)
Investigating Business Models for Building Integrated Photovoltaics (BIPV)
 
Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...
Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...
Matija Vajdic, Joint Secretariat, Interreg Central Europe Programme, Vienna, ...
 

Ähnlich wie Applying linear regression and predictive analytics

Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Dougsichie
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Predicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AIPredicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AIAlexander Gedranovich
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsMariaDB plc
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AISri Ambati
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...mattdenesuk
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarMS Cloud Summit
 
01_Team_03_CS_591_Project
01_Team_03_CS_591_Project01_Team_03_CS_591_Project
01_Team_03_CS_591_Projectharsh mehta
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Data Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd DecData Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd DecJonathan Woodward
 

Ähnlich wie Applying linear regression and predictive analytics (20)

Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Doug
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Predicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AIPredicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AI
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AI
 
Insync10 anthony spierings
Insync10 anthony spieringsInsync10 anthony spierings
Insync10 anthony spierings
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
01_Team_03_CS_591_Project
01_Team_03_CS_591_Project01_Team_03_CS_591_Project
01_Team_03_CS_591_Project
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Data Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd DecData Culture Series - Keynote - 3rd Dec
Data Culture Series - Keynote - 3rd Dec
 

Mehr von MariaDB plc

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBMariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerMariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysisMariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoringMariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorMariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB plc
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQLMariaDB plc
 

Mehr von MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 

Kürzlich hochgeladen

Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 

Kürzlich hochgeladen (20)

Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 

Applying linear regression and predictive analytics

  • 1. Applying Linear Regression and Predictive Analytics Alejandro Infanzon Solutions Architect MariaDB Corporation
  • 2. Topics ● Need for Analytics / SQL ● MariaDB Platform Introduction. ● Demo Statistical Functions ● Demo “Hello World” UDF
  • 3. Why Analytics ? Extract value form your data Get the most value of your data assets Improve the quality of decision making Improve planning and forecasting Introduce new products and services
  • 4. Analytics Data Challenge Data must be managed!
  • 5. Types of Analytics Uncover correlations and patterns Descriptive Analytics What happened? 1 Diagnostic Analytics How or why did it happen? 2 Predictive Analytics What is likely to happen next? 34 Prescriptive Analytics What should I do about it?
  • 6. Top Tools Used for Analytics ● Jupyter (Julia, Python and R) notebooks are popular with Data Scientists. ● SQL is important in data science. ● Prebuilt analytical functions / aggregations * https://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning- software-leaders.html
  • 7. MariaDB Platform X3 Why MariaDB Platform X3 ?
  • 8. MariaDB Platform: use cases Unified Platform MariaDB Server Row storage OLTP MariaDB Server HTAP Row storage Columnar storage MariaDB Server Columnar storage OLAP Transactions Transactions + Analytics Analytics
  • 9. MariaDB Platform X3 High Level
  • 10. MariaDB Platform X3 Solution Encompassing applications, tools, database and support services MariaDB Server Enterprise analytics Enterprise high availability Enterprise scalability Enterprise performance Enterprise integration Enterprise security MariaDB connectors Enterprise tools Enterprise support - Management - Monitoring - Backup/restore - Technical - Consultative - Notifications - Alerts
  • 11. MariaDB Platform X3: Pluggable Storage Right storage for different workloads MariaDB Server tbl_products tbl_purchases tbl_carts tbl_clicks InnoDB MyRocks Spider ColumnStore General purpose Write optimized Sharded Columnar Product catalog Purchases Shopping carts Clickstream events
  • 12. Jupyter Notebook Interaction Literate programing interaction ? Available analytical functions / aggregations ?
  • 13. The Demo Environment Data Layer Docker ColumnStore Container Application Layer Connectivity Layer pymysql connector Python Jupyter DBeaver
  • 14. Demo Tables Using different storage engines
  • 16. Basic Statistical Functions Top 10 records from the breeds table
  • 17. Basic Statistical Functions (Cont.) Using PYMSQL module
  • 18. Using Basic Statistics Find out extra-small and extra-large dog breeds
  • 19. Advanced Statistical Functions Top 10 rows of the employees table
  • 20. Advanced Statistical Functions ( Cont. ) Covariance between days since hired and salary
  • 21. Advanced Statistical Functions ( Cont. ) Calculate the correlation between hire date and salary
  • 22. Advanced Statistical Functions ( Cont. ) Calculate Linear Regression between hire date and salary
  • 23. Advanced Statistical Functions ( Cont. ) Plotting the linear regression
  • 24. Custom Analytical Functions Can I build my own analytical functions ?
  • 25. User Defined Functions (UDF) • Extend MariaDB with a new functions. • UDFs Work like a native (built-in) MariaDB functions. Alternative ways • Modifying and compiling the server source code. • Writing a stored function.
  • 26. “Hello World” UDF Example Four key steps Write UDF code Compile to Shared Library Move library to Plugin directory Register, verify and Run the UDF
  • 27. “Hello World” UDF Anatomy #include <stdlib.h> #include <stdio.h> #include <string.h> #include <my_global.h> #include <my_sys.h> #include <mysql.h> #include <ctype.h> #include <math.h> Main UDF structure my_bool radius_init( UDF_INIT *initid , UDF_ARGS *args , char *message) { if ((args->arg_count != 1)) { strcpy(message, "AICA: Message"); return 1; } args->arg_type[0] = REAL_RESULT; return 0; } double radius( UDF_INIT *initid , UDF_ARGS *args , char *result , unsigned long *length , char *is_null , char *error) { double r = *((double*) (args->args[0])); double pi = 3.14; return pi * 2 * (r); } void radius_deinit( UDF_INIT * initid) { //nothing } Libraries Function Initialization Function Body De-allocate resources
  • 28. Compile and Move Place .so in the Plugin Directory
  • 29. Register, Verify and Run Place .so in the Plugin Directory
  • 31. Challenges Faced: Take 2+ Days to Load Data Unsustainable Oracle LIcensing Model Historically would need to create/manage indexes to get query performance Health population data analysis to equip epidemiologist, public health officers, and insurance providers with evidence based outcome and efficacy of treatment. The insurance providers further use this insight to measure & predict treatment costs and population health over several years. Benefits Realized: Data Loads now take minutes Can query 200+ columns with fast performance Easy administration - no indexes High Performance Data Visualization with Tableau Center for Information Management (CIM)
  • 32. Genus provides farmers with superior genetics that enable them to produce higher-quality animal protein more efficiently, in the form of meat and milk. Challenges Faced: Leveraging Oracle was Cost Prohibitive Existing process had slow data loads Data Scientist wanted to use SQL as primary interface Didn’t want to have to provide heavy database administration Benefits Realized: Fast loading of raw data. A few Gigs to 20 Gig data per load Leverage Known Interfaces: Easy to use SQL Front End Python bulk data adapters allows them to directly publish results from machine learning data models instead of scheduling cpimport jobs Fast query results Easy to maintain Much more affordable cost structure Genus PLC
  • 33. Industry: Telco Data: call and text logs Use case: Mobile app use analytics Details: 30 million text and 3 million phone call per day 1.5 billion rows of logs per day The text and call volume rate will continue to grow InnoDB backend hit the scale limit of 6TB and it requires lot of performance tuning and index management Migrated to MariaDB AX Able to process 24 month - 24TB vs 6 months limitation of InnoDB Same BI tools and client applications worked with MariaDB AX seamlessly Customer Use Case: Pinger