With businesses today needing to store a lot more data and for longer periods of time, while also empowering their customers to analyze it in real time and in unexpected ways, analytical workloads are fast exceeding what traditional databases and data warehouses are capable of. In this session, MariaDB's Shane Johnson describes modern analytics requirements and employs real-world use cases to show how MariaDB ColumnStore is helping customers meet these new requirements.
3. Customer use case (Healthcare)
Industry: healthcare (Medicaid)
Data: surveys
Use case: decision support system
Details:
1. Identify trends and patterns
2. Determine population cohorts
3. Predict health outcomes
4. Anticipate funding / capacity
5. Recommend intervention
Can’t do complex queries on current
hardware with Oracle and snowflake
schemas
Limited to optimizing for simple, known
queries (2-3 columns)
Replaced with ColumnStore
> a single table
> 2.5 million rows, 248 columns >
complex, ad-hoc queries
> query 20+ columns in seconds
5. Modern analytics in the real world
– financial services –
Drivers
Become customer-centric
Facilitate regulatory compliance
Create competitive advantages
Goals
Improve customer satisfaction
Mitigate financial risks
Predict market changes
Use cases
Fraud detection: identify patterns + detect anomalies in financial transactions
Compliance archiving: store financial trade history for long-term retention
Investment forecasting: analyze financial markets + securities to predict ROI
6. Modern analytics in the real world
– OTC Markets Group –
About
Provides investors with the information
necessary to intelligently analyze,
value and trade securities through the
broker of their choice.
Use cases
Subscribers analyze quote and trading
data
Regulatory agencies build compliance
reports on demand
7. Modern analytics in the real world
– OTC Markets Group –
Data
10TB of rolling data (5 years)
10,000 U.S. and global securities
100,000 trades
24 million quotes
Why MariaDB AX?
HPE Vertica license (8TB)
Cost
Scalability
Performance
8. Modern analytics in the real world
– healthcare –
Drivers
Digital transformation
Electronic health records (EHRs)
Value-base care (VBC)
Goals
Improved population health
Better patient experiences
Reduced cost of care
Use cases
Population health mgt: analyze claims/surveys to recommend interventins
Evidence-based medicine: improve diagnostic accuracy by analyzing EHRs
Precision medicine: identify targeted treatments by analyzing genomes
9. Modern analytics in the real world
– Institute for Health Metrics and Evaluation –
About
An independent population health
research center at UW Medicine, and
coordinating center for the Global
Burden of Disease study.
Use cases
Enable the public to analyze global
health population data via online data
visualization tools
10.
11. Modern analytics in the real world
– Institute for Health Metrics and Evaluation –
Data
30TB of data
100 billion data points
Multi-billion row tables
Why MariaDB AX?
Outgrown Oracle MySQL
2010 – 2 billion data points
2018 – 100 billion data points
Open source
Performance
Compression (40%)
12. Modern analytics in the real world
– telecommunications –
Drivers
Improve sales, marketing and
operational efficiency
Goals
High customer retention
Better network optimizations
New services and revenue
Use cases
Churn prevention: analyze customer plans/usage to create retention programs
Cross-selling: identify opportunities by analyzing call detail records (CDRs)
Network optimization: analyze traffic/cell tower data to optimize capacity
13. Modern analytics in the real world
– Pinger, Inc. –
About
An innnovator in mobile
communications with mobile apps
connecting hundreds of millions of
people globally via 3+ billion calling
minutes and 100+ billion text
messages.
Use cases
To support customer behavioural
analysis based on historical data and
usage
14. Modern analytics in the real world
– Pinger, Inc. –
Data
30 million texts
3 million phone calls
1.5 billions logs a day
24 months’ worth of data
Why MariaDB AX?
Outgrown Oracle MySQL (6TB)
Limited to 6 month history
Less performance tuning
No index management
MySQL protocol
15. Modern analytics in the real world
– manufacturing –
Drivers
Smart manufacturing
Industry 4.0
Goals
Improve operational efficiency
Reduce manufacturing overhead
Minimize non-productive time (NPT)
Use cases
Yield optimization: improve first time yield (FTY) via thousands of simulations
Predictive maintenance: analyze sensor data to predict machine failoure
Quality control: detect defects and process anomalies by analyzing assembly data
16. Modern analytics in the real world
– SMART InSight –
About
A provider of big data discovery and
analytics software for enterprises who
want a 360-degree view of customers
and products.
Use cases
Enable manufacturers to correlate
customer issues with design and
manufacturing flaws with
comprehensive analysis of
manufacturing sites
17. Modern analytics in the real world
– SMART InSight –
Why MariaDB AX?
Performance: high-throughput, low-latency aggregates
Easy to use for everyones: standard SQL
18. Modern analytics in the real world
– digital advertising –
Drivers
Granularity of data
Demographic and behavioral
Social and location
Goals
Deliver the right ad to the right person at
the right time, in the right location and
through the right medium
Use cases
Audience segmentation: improve ad relevance via fine-grained visitor profiles
Ad placement: choose where to show ads based on click and conversion data
Real-time bidding: analyze big request/response history to optimize prices
19. Modern analytics in the real world
– digital advertising vendor –
About
A leader in analytics software with a
unified advertising platform for ad
publishers running on scalable cloud
infrastructure enabling them to
manage the entire ad serving process.
Use cases
Enable customers to create a custom
report on up to 30 columns on
demand
20. Modern analytics in the real world
– digital advertising vendor –
Data
300 million impressions a month
70 million rows a day
60TB of uncompressed data
Why MariaDB AX?
Outgrown Oracle MySQL
Limited to 16 columns
High-performance, no indexes
21. Customer use cases by industry
Finance
Identify trade patterns
Detect fraud and anomolies
Predict trading outcomes
Manufacturing
Simulations to improve design/yield
Detect production anomolies
Predict machine failures (sensor data)
Telecom
Behavioral analysis of customer calls
Network analysis (perf and reliability)
Healthcare
Find genetic profiles/matches
Analyze health vs spending
Predict viral oubreaks
23. Application
(eCommerce)
Transactional
Show me all new products in
the science fiction category
Analytical
Show me the top products
added to shopping carts or
purchased today, and with
low inventory.
Actionable insight
I should buy one now
because everyone wants
one, and they’ll be sold out
by the end of the day!
24. Application
(banking)
Transactional
Show me all pending
transactions
Analytical
Show me when my balance
will run low based on
historical transactions and
my current spending rate
Actionable insight
I should transfer some
money from my savings
account to my checking
account until I get paid again!
25. Application
(pay-per-click ads)
Transactional
Create campaign
Create ad group
Create ad
Capture impressions
Capture clicks
Capture charges
Analytical
Show me which ads will
perform worse based past
performance and changes in
keyword costs and searches
Actionable insight
I should pause this ad and
raise the budget on others in
order to meet my goals!
30. MariaDB TX 3.0
MariaDB Server 10.3
MariaDB MaxScale 2.2
InnoDB/MyRocks
MariaDB AX 2.0
MariaDB Server 10.2
MariaDB MaxScale 2.2
ColumnStore 1.2
MariaDB Platform X3
MariaDB MaxScale 2.3
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
31. The database proxy inspects queries and routes them to transactional
and/or analytical database instances.
The change-data-capture stream replicates all writes from transactional
databases to analytical databases within microbatches.
MariaDB Platform X3
MariaDB MaxScale 2.3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
Transactional Analytical
32. Applications
Containers
MariaDB Platform X3
MariaDB MaxScale 2.3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
Transactional Analytical
Kubernetes (Helm) Docker (Compose)
C JDBC ODBC Node.js
Ingest streaming data
Kafka connector
Administration
SQL Diagnostic
Manager
SQLyog
MariaDB Backup
MariaDB Flashback
Import bulk data
Spark connector
C/Java/Python API
34. Hybrid workloads: why scalability is needed
Applications have transactional
and analytical queries
1. Constrained by limited,
lightweight analytics
2. Need full analytics to
create competitive features
Outgrowing OLTP
Applications with lots of
customers, lots of transactions
1. Limited to current or recent
transaction data (months)
2. Need access to all
historical data (years)
Using historical data
SaaS customers are becoming
data-driven organizations
1. They don’t have access to
their own data
2. They need to analyze it in
unknown/unexpected ways
Exposing analytics
36. Real-world use case #1: THINQ
THINQ is a communications firm providing a cloud-based communications platform
(e.g., IP telephony and messaging) and least cost routing to telcos, contact centers
and enterprise organizations.
Problem #1: They reached the analytical limits of InnoDB
● They capture call detail records (OLTP), but need to analyze them as well
○ They are wide rows with lots of columns, but only a few columns are needed
● The couldn’t buffer entire rows in memory because it would be cost prohibitive
○ They need to support frequent, ad hoc requests for real-time aggregates
○ For example, show the number of calls in the past hour, and then refresh
37. Real-world use case #1: THINQ
THINQ is a communications firm providing a cloud-based communications platform
(e.g., IP telephony and messaging) and least cost routing to telcos, contact centers
and enterprise organizations.
Problem #2: Their data pipeline was too complex and expensive
● First, the data was collected and stored
● Next, it was processed via ETL and stored in expensive HA storage
● Finally, it was imported into the data warehouse
38. Real-world use case #1: THINQ
THINQ is a communications firm providing a cloud-based communications platform
(e.g., IP telephony and messaging) and least cost routing to telcos, contact centers
and enterprise organizations.
Solution: CDRs written to both InnoDB and ColumnStore (for analytics)
● The platform has two data pipelines for the same data (e.g., CDRs)
○ The first pipeline is for transactions (to capture the CDRs in real time)
○ The second pipeline for analytics (to analyze all of the CDRs)
● The data for analytics is collected in a message queue (RabbitMQ)
○ It is written to cheaper storage on CS nodes, then imported via CS import tool
39. Real-world use case #2: Qberg
Qberg is a market research firm providing e-commerce and retail customers with
pricing data – for example, customers can see a) the latest price of a product
(transactional) or b) how a product’s price has changed over time (analytical).
Problem #1: They reached the analytical limits of InnoDB
● They needed better performance
○ Queries were taking minutes, too long for customers
● They needed to store more data
○ Limited to 6 months of data, but needed a years worth
40. Real-world use case #2: Qberg
Qberg is a market research firm providing e-commerce and retail customers with
pricing data – for example, customers can see a) the latest price of a product
(transactional) or b) how a product’s price has changed over time (analytical).
Problem #2: They faced evolving customer requirements
● Customers became more demanding
○ Started looking for trends and comparing data
● Customers had more control
○ Were creating their own queries
○ Were constantly changing queries
41. Real-world use case #2: Qberg
Qberg is a market research firm providing e-commerce and retail customers with
pricing data – for example, customers can see a) the latest price of a product
(transactional) or b) how a product’s price has changed over time (analytical).
Solution: InnoDB for current data (TX) and ColumnStore for historical data (AX)
● Scalability: going from 90 days of data to multiple years worth
● Performance: analytical queries went from minutes to seconds
● Isolation: transactional inserts/updates don’t impact analytics
● Routing: transactional to InnoDB, analytical to ColumnStore
● ETL: replacing in-house ETL with built-in CDC
42. Real-world use case #3: Finance (Trading)
A financial services companies providing investors with price and liquidity information
for 10,000 US and global securities.
Problem: They reached the analytical limits of InnoDB (performance)
● They needed to make current and historical trade data available for analytics
○ Analyzing trades across days was taking too long
● They needed to maintain years worth of data to comply with regulations
○ They could not store enough data in InnoDB
○ They tried exporting data from MariaDB to Vertica (not enough capacity)
43. Real-world use case #3: Finance (Trading)
A financial services companies providing investors with price and liquidity information
for 10,000 US and global securities.
Solution: They use InnoDB for current trades and ColumnStore for historical
● They store current/live trade data on premises
● They store historical trade data in the cloud (AWS)
● They use CDC to replicate data from InnoDB to ColumnStore (hybrid cloud)