Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

The Death of the Star Schema

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 38 Anzeige

The Death of the Star Schema

Herunterladen, um offline zu lesen

Learn about the three advances in database technologies that eliminate the need for star schemas and the resulting maintenance nightmare.

Relational databases in the 1980s were typically designed using the Codd-Date rules for data normalization. It was the most efficient way to store data used in operations. As BI and multi-dimensional analysis became popular, the relational databases began to have performance issues when multiple joins were requested. The development of the star schema was a clever way to get around performance issues and ensure that multi-dimensional queries could be resolved quickly. But this design came with its own set of problems.

Unfortunately, the analytic process is never simple. Business users always think up unimaginable ways to query the data. And the data itself often changes in unpredictable ways. These result in the need for new dimensions, new and mostly redundant star schemas and their indexes, maintenance difficulties in handling slowly changing dimensions, and other problems causing the analytical environment to become overly complex, very difficult to maintain, long delays in new capabilities, resulting in an unsatisfactory environment for both the users and those maintaining it.

There must be a better way!

Watch this webinar to learn:

- The three technological advances in data storage that eliminate star schemas
- How these innovations benefit analytical environments
- The steps you will need to take to reap the benefits of being star schema-free

Learn about the three advances in database technologies that eliminate the need for star schemas and the resulting maintenance nightmare.

Relational databases in the 1980s were typically designed using the Codd-Date rules for data normalization. It was the most efficient way to store data used in operations. As BI and multi-dimensional analysis became popular, the relational databases began to have performance issues when multiple joins were requested. The development of the star schema was a clever way to get around performance issues and ensure that multi-dimensional queries could be resolved quickly. But this design came with its own set of problems.

Unfortunately, the analytic process is never simple. Business users always think up unimaginable ways to query the data. And the data itself often changes in unpredictable ways. These result in the need for new dimensions, new and mostly redundant star schemas and their indexes, maintenance difficulties in handling slowly changing dimensions, and other problems causing the analytical environment to become overly complex, very difficult to maintain, long delays in new capabilities, resulting in an unsatisfactory environment for both the users and those maintaining it.

There must be a better way!

Watch this webinar to learn:

- The three technological advances in data storage that eliminate star schemas
- How these innovations benefit analytical environments
- The steps you will need to take to reap the benefits of being star schema-free

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie The Death of the Star Schema (20)

Anzeige

Weitere von DATAVERSITY (20)

Aktuellste (20)

Anzeige

The Death of the Star Schema

  1. 1. The Death of the Star Schema WEBINAR
  2. 2. 2 Nick Jewell Sr Director, Product Marketing at Incorta Technology Evangelist. 25+ years Analytics Expertise in Computer-Aided Drug Design, Financial Services & Consulting dataIQ100 (2018,2020,2021) DataKind Ambassador Nick Jewell Claudia Imhoff Founder of Boulder BI Brain Trust (BBBT) A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Claudia Imhoff Speakers Pallavi Mishra Sales Engineer at Incorta Focused. Determined. Passionate. I am a keen observer and a quick learner. I have an inclination towards leveraging the evolving technology in conjunction with the right business acumen to solve complex problems. Pallavi Mishra
  3. 3. 1970’s - 2000’s Relational Databases • Good for highly structured data • Simple and Reliable • Good for small to medium data sets 3 “How much information is there in the world” 1997 Michael Lesk “There may be a few thousand petabytes of information…we will be able to save everything..no information thrown away…typical information will never be looked at” https://www.lesk.com/mlesk/ksg97/ksg.html
  4. 4. Rise of Internet 1990’s - early 2000’s 4 Rise in data (Doug Laney) Volume: Clickstream Velocity: High velocity transactions, digitalisation, multi-channels Variety: • Structured • Semi-Structured • Unstructured Telecommunication in optimally compressed MB
  5. 5. Agenda Life of the Star Schema Death of the Star Schema Benefits of Eliminating Star Schemas Getting Started 5
  6. 6. Life of the Star Schema 6
  7. 7. Genesis of the Star Schema 7 The Data Warehouse Era begins • Contains integrated data from multiple sources • Sole purpose was decision support Relational DBMS technology used Date- Codd rules for data design • Most efficient way to store data • Least efficient performance for multi-join queries Enter the Star Schema: a database design that mirrors the business • Allows the business community to ask many questions • And get reasonable response times It’s the 80’s!
  8. 8. Genesis of the Star Schema 8 Star Schema – a physical instantiation of a multi-join process • Significant data denormalization process to improve join performance • Fact table surrounded by dimension tables • Great way to perform multi-dimensional analysis… • As long as analytical processes or data never change… Time_ID Product_ID Program_ID Location_ID Customer_ID Order_ID etc. ---------------- Counts Usage Dollars Customer Order Location Time Product Channel Program Campaign
  9. 9. Difficulties Develop… …As long as the analytical processes or data never change… • But they do – They are unpredictable, fluid, always changing The result? • Slowly changing dimensional maintenance skyrockets • Need for new dimensions constantly • Need for new (mostly redundant) star schemas • Loss of flexibility and agility! Analytical environments become nightmares of complexity Business community is not amused… 9
  10. 10. Death of the Star Schema 10
  11. 11. Hurrah for technological advances! 1. Cloud storage of data 2. In-memory 3. New query engines Today There Must Be A Better Way! = 11
  12. 12. Data is stored in the cloud (Parquet) First Leg: Columnar Storage of Data Much reduced costs (elasticity of cloud implementations) Data storage orchestration over different storage formats • RAM (Random Access Memory) • SSD (Solid State Drive) • HDD (Spinning Discs) Optimization that improves performance by better I/O, usage of query engines, columnar/in-memory storage 12
  13. 13. Most recently, reduced costs of memory mean data can now reside there rather than on disk Second Leg: In-Memory • Optimizes performance for queries by eliminating requests to disk-stored data • Improves scalability with decreased cost of memory 13
  14. 14. New query engines are what make star schemas irrelevant Third Leg: New Query Engines • These engines that provide real- time joins between complex data tables = virtual star schemas • They create the needed aggregations at the same • This yields much-needed flexibility in number of queries resolved From: www.biodataanalysis.de 14
  15. 15. With all three legs in place, a star schema is replaced easily Death of the Star Schema From: www.newsweek.com 15 • Data is quickly ingested and integrated • ETL process is simplified by removing star schema creation/maintenance from it • Data from many complex data tables is quickly joined and presented • For example, a fact joined to a fact is almost impossible to do in star schema implementations • With the improved performance as discussed, this is now possible!
  16. 16. Benefits of Eliminating Star Schemas 16
  17. 17. Benefits of Star Schema-less Environment 17 Individualized Reusable Artistic Experimental Industrial Built-for-purpose based on users and queries
  18. 18. Benefits of Star Schema-less Environment 18 Flexibility and agility return to the data warehouse environment • Business users can ask impromptu questions – with virtually unlimited dimensionality • They can use much more complex, detailed data • All while receiving better response times Maintenance is greatly simplified! • Design sessions are reduced • ETL is simplified • Maintenance is lessened
  19. 19. Data storage requirements are reduced • Columnar storage compresses the data • No indexes are needed Developers are freed up to do more valuable activities than maintaining star schemas • They can focus on increased availability and volumes of new data sources • They can focus on more advanced forms of analyses and experimentation capabilities Re-evaluating star schemas can uncover unknown errors 19 Benefits of Star Schema-less Environment
  20. 20. Getting Started 20
  21. 21. 21 Getting Started Many organizations have “legacy” data warehouses. If so, here are the steps to use in migrating to a star schema- less environment: 01 Evaluate your ETL processes • Determine where the star schema bottlenecks are • Decide which star schemas are particularly burdensome in terms of creation/maintenance • Target these for migration
  22. 22. 22 Getting Started 03 Begin analyzing the detailed data from which the star schema was developed • This data can add even more flexibility and agility to the overall environment • You may discover errors in previous implementations • It’s also a quick win for developers & business users 02 Group selected star schemas by the business problems they solve • Prioritize those business problem stars as to their criticality, maintenance difficulty, requests for updates • Each grouping may become its own project • This gives you a clear path forward
  23. 23. 23 Getting Started 05 Expand data acquisition horizons • There is data that you might have thought was beyond your development capabilities • BUT data volumes, query performance, and time to delivery are not big problems now 04 Create a migration path • Move the set of star schema data for each business problem into the new environment according to the priority schedule • Quick win!
  24. 24. 24 Getting Started 07 If you have a green field situation – lucky you! • You still need to understand the business users’ needs but go beyond those needs and embellish • You still need to determine how much ETL and data quality processing will be required • Matthew will talk about a new approach to analytics in the next section 06 Life is good! • Reduced burden of star schema design, creation, & maintenance means freed up time for development • Use that time to begin reducing backlogs of analytical requests
  25. 25. Summary 25 Given the advances in analytical technologies, it is time to rethink data warehouse design and processes • You still need the star schema design phase as a mandatory step • You still need a repository of analytical data • You still need ETL or some form of data integration and quality processes BUT less of it • You still need to perform maintenance on the stored data BUT there is less of it, no indexes, and simpler data schemas You can now solve many of the past, difficult problems • By bringing in better, faster, and more flexible decision-making into your organization From: LifeIsGood.com
  26. 26. Star Schemas in the Real World Powerful Insights … but with a huge supporting cast 26
  27. 27. “Modern” Data Architecture A Complex and Inflexible Nightmare That Limits Insights from Perishable Data BUSINESS DATA SOURCES Sources HUMAN RESOURCES FINANCE SUPPLY CHAIN Tools RAW DATA ZONE Data Lake REFINED DATA ZONE Data Warehouses BUSINESS DATA ZONE Star Schemas Transform 25% Extract 100% Aggregate 10% 27
  28. 28. © Incorta, Inc. All Rights Reserved Internal Use Only Data Ingest/ Loading Querying 3NF / Bronze Data
  29. 29. 29 Do it all again for every new question Question! New Data? Weeks of work Call IT Get on a list Transform Data Lots of SQL/ETL Prep Data Cubes & Marts Ready! Only a few weeks later! THE “MODERN” WAY Bringing data to BI THE AGILE WAY Bringing BI to the data Question! I see it already and I can load it myself New insights within minutes Data Architecture to Transform Business What Changes When You Deliver 100% of Your Data for Analytics
  30. 30. 30 Incorta Unified Data & Analytics Platform Data Enrichment Data Science Notebooks Custom Logic Materialized Views Machine Learning Spark Cluster Advanced Analytics & Machine Learning Data Acquisition Connectors Parallel Data Loader Schema Detection Direct Data Mapping LOADER SERVICE Shared Storage Metadata Admin Parquet Columnar Storage Direct Data Map
  31. 31. Data Acquisition 31 Incorta Unified Data & Analytics Platform Connectors Parallel Data Loader Schema Detection Direct Data Mapping LOADER SERVICE Data Enrichment Data Science Notebooks Custom Logic Materialized Views Machine Learning Spark Cluster Advanced Analytics & Machine Learning Shared Storage Metadata Admin Parquet Columnar Storage Direct Data Map Data Analytics In-Memory Analytics Engine ANALYTICS SERVICE Business Views, Security Data Visualization SQL / Open Access
  32. 32. “Data Architecture… …defines the blueprint for managing data assets by aligning with organizational strategy…” Aligning Data Architecture to Business Needs Data Management Body of Knowledge Definition 32
  33. 33. 33 Blueprints Provide a Huge Head Start Pre-Built Dashboard and Schemas Get You Up and Running Quickly on Enterprise Data Raw tables Helper tables Aggregated Business Views Blueprints Business logic
  34. 34. Essential Components for Modern Data Architecture From Raw Data to Actionable Insights 34
  35. 35. Demo
  36. 36. Q&A
  37. 37. SEE YA LATER STAR SCHEMA Find out why the world’s most valuable companies rely on Incorta to acquire, enrich, analyze and act on data with unmatched speed. START YOUR CLOUD TRIAL TODAY cloud.incorta.com/signup
  38. 38. The Direct Data Platform™

×