Learn about the three advances in database technologies that eliminate the need for star schemas and the resulting maintenance nightmare.
Relational databases in the 1980s were typically designed using the Codd-Date rules for data normalization. It was the most efficient way to store data used in operations. As BI and multi-dimensional analysis became popular, the relational databases began to have performance issues when multiple joins were requested. The development of the star schema was a clever way to get around performance issues and ensure that multi-dimensional queries could be resolved quickly. But this design came with its own set of problems.
Unfortunately, the analytic process is never simple. Business users always think up unimaginable ways to query the data. And the data itself often changes in unpredictable ways. These result in the need for new dimensions, new and mostly redundant star schemas and their indexes, maintenance difficulties in handling slowly changing dimensions, and other problems causing the analytical environment to become overly complex, very difficult to maintain, long delays in new capabilities, resulting in an unsatisfactory environment for both the users and those maintaining it.
There must be a better way!
Watch this webinar to learn:
- The three technological advances in data storage that eliminate star schemas
- How these innovations benefit analytical environments
- The steps you will need to take to reap the benefits of being star schema-free
2. 2
Nick Jewell
Sr Director, Product Marketing
at Incorta
Technology Evangelist.
25+ years Analytics Expertise
in Computer-Aided Drug Design,
Financial Services & Consulting
dataIQ100 (2018,2020,2021)
DataKind Ambassador
Nick Jewell
Claudia Imhoff
Founder of Boulder BI Brain
Trust (BBBT)
A thought leader, visionary, and
practitioner, Claudia Imhoff, Ph.D., is
an internationally recognized expert
on analytics, business intelligence,
and the architectures to support these
initiatives.
Claudia Imhoff
Speakers
Pallavi Mishra
Sales Engineer at Incorta
Focused. Determined. Passionate. I
am a keen observer and a quick
learner. I have an inclination
towards leveraging the evolving
technology in conjunction with the
right business acumen to solve
complex problems.
Pallavi Mishra
3. 1970’s - 2000’s
Relational Databases
• Good for highly structured data
• Simple and Reliable
• Good for small to medium data sets
3
“How much information is there in the
world”
1997 Michael Lesk
“There may be a few thousand petabytes of
information…we will be able to save
everything..no information thrown away…typical
information will never be looked at”
https://www.lesk.com/mlesk/ksg97/ksg.html
4. Rise of Internet 1990’s - early 2000’s
4
Rise in data (Doug Laney)
Volume: Clickstream
Velocity: High velocity transactions,
digitalisation, multi-channels
Variety:
• Structured
• Semi-Structured
• Unstructured
Telecommunication in optimally compressed MB
5. Agenda
Life of the Star Schema
Death of the Star Schema
Benefits of Eliminating Star Schemas
Getting Started
5
7. Genesis of the Star Schema
7
The Data Warehouse
Era begins
• Contains integrated data
from multiple sources
• Sole purpose was decision
support
Relational DBMS
technology used Date-
Codd rules for data design
• Most efficient way to store data
• Least efficient performance for
multi-join queries
Enter the Star Schema:
a database design that
mirrors the business
• Allows the business
community to ask many
questions
• And get reasonable response
times
It’s the 80’s!
8. Genesis of the Star Schema
8
Star Schema – a physical
instantiation of a multi-join
process
• Significant data denormalization process
to improve join performance
• Fact table surrounded by dimension
tables
• Great way to perform multi-dimensional
analysis…
• As long as analytical processes
or data never change…
Time_ID
Product_ID
Program_ID
Location_ID
Customer_ID
Order_ID
etc.
----------------
Counts
Usage
Dollars Customer
Order
Location
Time
Product
Channel
Program
Campaign
9. Difficulties Develop…
…As long as the
analytical processes or
data never change…
• But they do – They are
unpredictable, fluid,
always changing
The result?
• Slowly changing dimensional
maintenance skyrockets
• Need for new dimensions
constantly
• Need for new (mostly
redundant) star schemas
• Loss of flexibility and agility!
Analytical environments
become nightmares of
complexity
Business community
is not amused…
9
11. Hurrah for technological advances!
1. Cloud storage of data
2. In-memory
3. New query engines
Today
There Must Be A Better Way!
=
11
12. Data is stored in the cloud
(Parquet)
First Leg: Columnar Storage of Data
Much reduced costs (elasticity of cloud
implementations)
Data storage orchestration over different
storage formats
• RAM (Random Access Memory)
• SSD (Solid State Drive)
• HDD (Spinning Discs)
Optimization that improves performance by
better I/O, usage of query engines,
columnar/in-memory storage
12
13. Most recently, reduced costs of
memory mean data can now
reside there rather than on disk
Second Leg: In-Memory
• Optimizes performance for queries by
eliminating requests to disk-stored
data
• Improves scalability with decreased
cost of memory
13
14. New query engines are what
make star schemas irrelevant
Third Leg: New Query Engines
• These engines that provide real-
time joins between complex data
tables = virtual star schemas
• They create the needed
aggregations at the same
• This yields much-needed flexibility
in number of queries resolved
From: www.biodataanalysis.de
14
15. With all three legs in place, a
star schema is replaced easily
Death of the Star Schema
From: www.newsweek.com 15
• Data is quickly ingested and integrated
• ETL process is simplified by removing
star schema creation/maintenance
from it
• Data from many complex data tables is
quickly joined and presented
• For example, a fact joined to a fact is
almost impossible to do in star schema
implementations
• With the improved performance as
discussed, this is now possible!
17. Benefits of Star Schema-less Environment
17
Individualized
Reusable
Artistic
Experimental
Industrial
Built-for-purpose
based on users
and queries
18. Benefits of Star Schema-less Environment
18
Flexibility and agility return to
the data warehouse
environment
• Business users can ask impromptu
questions – with virtually unlimited
dimensionality
• They can use much more complex,
detailed data
• All while receiving better response times
Maintenance is greatly
simplified!
• Design sessions are reduced
• ETL is simplified
• Maintenance is lessened
19. Data storage
requirements are
reduced
• Columnar storage
compresses the data
• No indexes are needed
Developers are freed up to
do more valuable activities
than maintaining star
schemas
• They can focus on increased
availability and volumes of new
data sources
• They can focus on more advanced
forms of analyses and
experimentation capabilities
Re-evaluating star
schemas can uncover
unknown errors
19
Benefits of Star Schema-less Environment
21. 21
Getting Started
Many organizations have
“legacy” data warehouses. If
so, here are the steps to use in
migrating to a star schema-
less environment:
01
Evaluate your ETL processes
• Determine where the star schema
bottlenecks are
• Decide which star schemas are
particularly burdensome in terms of
creation/maintenance
• Target these for migration
22. 22
Getting Started
03
Begin analyzing the detailed
data from which the star
schema was developed
• This data can add even more flexibility
and agility to the overall environment
• You may discover errors in previous
implementations
• It’s also a quick win for developers &
business users
02
Group selected star schemas by
the business problems they solve
• Prioritize those business problem stars as to
their criticality, maintenance difficulty,
requests for updates
• Each grouping may become its own project
• This gives you a clear path forward
23. 23
Getting Started
05
Expand data acquisition
horizons
• There is data that you might have
thought was beyond your development
capabilities
• BUT data volumes, query performance,
and time to delivery are not big
problems now
04
Create a migration path
• Move the set of star schema data for each
business problem into the new environment
according to the priority schedule
• Quick win!
24. 24
Getting Started
07
If you have a green field situation
– lucky you!
• You still need to understand the business
users’ needs but go beyond those needs
and embellish
• You still need to determine how much ETL
and data quality processing will be required
• Matthew will talk about a new approach to
analytics in the next section
06
Life is good!
• Reduced burden of star schema design,
creation, & maintenance means freed up time
for development
• Use that time to begin reducing backlogs of
analytical requests
25. Summary
25
Given the advances in analytical
technologies, it is time to rethink
data warehouse design and
processes
• You still need the star schema design phase as
a mandatory step
• You still need a repository of analytical data
• You still need ETL or some form of data
integration and quality processes BUT less of it
• You still need to perform maintenance on the
stored data BUT there is less of it, no indexes,
and simpler data schemas
You can now solve many of the
past, difficult problems
• By bringing in better, faster, and more flexible
decision-making into your organization
From:
LifeIsGood.com
26. Star Schemas in the Real World
Powerful Insights … but with a huge supporting cast
26
27. “Modern” Data Architecture
A Complex and Inflexible Nightmare That Limits Insights from Perishable Data
BUSINESS
DATA
SOURCES
Sources
HUMAN RESOURCES
FINANCE
SUPPLY CHAIN
Tools
RAW DATA
ZONE
Data Lake
REFINED
DATA ZONE
Data
Warehouses
BUSINESS
DATA ZONE
Star Schemas
Transform
25%
Extract
100%
Aggregate
10%
27
29. 29
Do it all again for every new question
Question! New Data?
Weeks of work
Call IT
Get on a list
Transform Data
Lots of SQL/ETL
Prep Data
Cubes & Marts
Ready!
Only a few
weeks later!
THE
“MODERN”
WAY
Bringing data
to BI
THE AGILE
WAY
Bringing BI to
the data Question! I see it already
and I can
load it myself
New insights
within minutes
Data Architecture to Transform Business
What Changes When You Deliver 100% of Your Data for Analytics
30. 30
Incorta Unified Data & Analytics Platform
Data Enrichment
Data Science
Notebooks
Custom
Logic
Materialized
Views
Machine
Learning
Spark Cluster
Advanced Analytics & Machine Learning
Data Acquisition
Connectors Parallel
Data
Loader
Schema
Detection
Direct
Data
Mapping
LOADER SERVICE
Shared Storage
Metadata Admin
Parquet
Columnar
Storage
Direct
Data
Map
31. Data Acquisition
31
Incorta Unified Data & Analytics Platform
Connectors Parallel
Data
Loader
Schema
Detection
Direct
Data
Mapping
LOADER SERVICE
Data Enrichment
Data Science
Notebooks
Custom
Logic
Materialized
Views
Machine
Learning
Spark Cluster
Advanced Analytics & Machine Learning
Shared Storage
Metadata Admin
Parquet
Columnar
Storage
Direct
Data
Map
Data Analytics
In-Memory
Analytics
Engine
ANALYTICS SERVICE
Business
Views,
Security
Data Visualization
SQL / Open Access
32. “Data Architecture…
…defines the blueprint for managing data assets
by aligning with organizational strategy…”
Aligning Data Architecture to Business Needs
Data Management Body of Knowledge Definition
32
33. 33
Blueprints Provide a Huge Head Start
Pre-Built Dashboard and Schemas Get You Up and Running Quickly on Enterprise Data
Raw tables Helper tables
Aggregated
Business
Views
Blueprints
Business
logic
37. SEE YA LATER
STAR SCHEMA
Find out why the world’s most valuable companies rely
on Incorta to acquire, enrich, analyze and act on data
with unmatched speed.
START YOUR CLOUD TRIAL TODAY
cloud.incorta.com/signup