SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Open Source Data Warehousing:
     MySQL and Beyond



             Alex Meadows
          Twitter: @DBA_Alex
        Percona MySQL University
               Raleigh, NC
                1/29/2013
What Is Data Warehousing?
●   Central repository
●   Oriented on Reporting and Analysis
●   Integrates multiple sources
●   Core to Business Intelligence and Advanced
    Analytics
●   Helps keep source systems clean and lean
Warehouse Methodologies
●   Inmon’s 3NF/Hub and Spoke Model
●   Kimball’s Conformed Dimension Model
●   Linstedt’s Data Vault Model
●   Rönnbäck’s Anchor Model/6NF
Source: http://www.anchormodeling.com/wp-content/uploads/2011/05/Anchor-Modeling-GSE.pdf
Common DW Challenges
●   Data storage increases significantly
        ●   Time based snapshots
        ●   Storing source changes
●   Massive queries
        ●   Joining many tables, from multiple sources
        ●   Exploratory vs reporting
●   Source Issues Magnified
●   Scalability
Inmon’s 3NF Model
●   Original data warehouse model
●   Move historical data into own data store
●   Data transformed to 3NF
        ●   Entities and relationships
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum (PostgreSQL derivative)
●   Any other traditional RDBMS
Cautions
●   Indexing
●   Replication
●   Partitioning
Kimball’s Conformed Dimensions
●   Normal database modeling does not meet needs of
    reporting and analysis
●   Denormalize data
●   Dimensions
       ●    How does data need to be filtered?
●   Facts
       ●    What are we wanting to analyze/measure?
Source: http://blog-mstechnology.blogspot.com/2010/06/bi-dimensional-model-star-schema.html
Open Source Software
●   Greenplum (PostgreSQL derivative)
●   InfiniDB (MySQL derivative)
●   Infobright (MySQL derivative)
●   Other columnar data stores
Columnar Data Stores
●   Designed for conformed dimensions
●   High Performance
       ●   Self-indexing based on usage
       ●   High compression of data
Row vs Columnar Databases




Source: http://dbbest.com/blog/column-oriented-database-technologies/
Cautions
●   Traditional RDBMS
       ●   Not built for conformed dimensions!
       ●   Performance will become issue
Inmon’s Hub and Spoke
●   Combines
        ●   3NF central data warehouse
        ●   Conformed dimensions
●   Becomes foundation for further variants
●   Linstedt’s Data Vault Model
●   Mixes 3NF and Conformed Dimensions
●   Model data per business entities and their
    relationships
●   Hubs
        ●   Store unique business entity identifiers (keys)
●   Links
        ●   Relate hubs and other links to form relationships
●   Satellites
        ●   Store unique information regarding entity or
              relationship
Source: http://danlinstedt.com/about/data-vault-basics/
Cautions
●   While you get the best mix between 3NF and
    conformed dimensions, data marts are still needed
●   Issues seen with both 3NF and conformed
    dimensions can be found here
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum
●   Other Traditional RDBMS
●   NoSQL
       ●   Hadoop
●   Rönnbäck’s Anchor Model/6NF
●   Focus is on the data and it’s relationships.
●   Anchors
        ●   Model entities and events
●   Attributes
        ●   Model properties of anchors
●   Ties
        ●   Model relationships between anchors
●   Knots
        ●   Model relationships between shared properties
Source: http://en.wikipedia.org/wiki/Anchor_Modeling
Cautions
●   Number of joins will be an issue for some databases
●   Queries will become complex
        ●   Joins
        ●   Finding properties/valuable information
        ●   Every column in traditional tables becomes own
             unique table
?
Open Source Software
●   Anchor Modeling website
        ●   http://www.anchormodeling.com
        ●   Web based design tools
●   No databases built specifically for 6NF

Weitere ähnliche Inhalte

Was ist angesagt?

Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your DataAlex Meadows
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zainKenAndTea
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)Sergio Fernández
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.jsMax Neunhöffer
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?FlyData Inc.
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaThomas Kurz
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsShawn Zhu
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Tsendsuren Munkhdalai
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model databaseMahdi Atawneh
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar PresentationMuntazir Mehdi
 

Was ist angesagt? (20)

NoSQL
NoSQLNoSQL
NoSQL
 
CSCi226PPT1
CSCi226PPT1CSCi226PPT1
CSCi226PPT1
 
Database
DatabaseDatabase
Database
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
NoSQL
NoSQLNoSQL
NoSQL
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zain
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
No sql
No sqlNo sql
No sql
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache Marmotta
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
Core Data
Core DataCore Data
Core Data
 

Ähnlich wie Open source data_warehousing_overview

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland Bouman
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBKnoldus Inc.
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous PersistenceJervin Real
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesnicolascombin1
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL DatabasesAbiral Gautam
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...adeel8937
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 

Ähnlich wie Open source data_warehousing_overview (20)

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Datastore PPT.pptx
Datastore PPT.pptxDatastore PPT.pptx
Datastore PPT.pptx
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSql
NoSqlNoSql
NoSql
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 

Mehr von Alex Meadows

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven WorldAlex Meadows
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?Alex Meadows
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data WarehousingAlex Meadows
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A ServiceAlex Meadows
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehousesAlex Meadows
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsAlex Meadows
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - IntroductionAlex Meadows
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview Alex Meadows
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceAlex Meadows
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence OverviewAlex Meadows
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleAlex Meadows
 

Mehr von Alex Meadows (14)

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven World
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A Service
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analytics
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettle
 

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Open source data_warehousing_overview

  • 1. Open Source Data Warehousing: MySQL and Beyond Alex Meadows Twitter: @DBA_Alex Percona MySQL University Raleigh, NC 1/29/2013
  • 2. What Is Data Warehousing? ● Central repository ● Oriented on Reporting and Analysis ● Integrates multiple sources ● Core to Business Intelligence and Advanced Analytics ● Helps keep source systems clean and lean
  • 3. Warehouse Methodologies ● Inmon’s 3NF/Hub and Spoke Model ● Kimball’s Conformed Dimension Model ● Linstedt’s Data Vault Model ● Rönnbäck’s Anchor Model/6NF
  • 5. Common DW Challenges ● Data storage increases significantly ● Time based snapshots ● Storing source changes ● Massive queries ● Joining many tables, from multiple sources ● Exploratory vs reporting ● Source Issues Magnified ● Scalability
  • 6. Inmon’s 3NF Model ● Original data warehouse model ● Move historical data into own data store ● Data transformed to 3NF ● Entities and relationships
  • 7. Open Source Software ● MySQL ● PostgreSQL ● Greenplum (PostgreSQL derivative) ● Any other traditional RDBMS
  • 8. Cautions ● Indexing ● Replication ● Partitioning
  • 9. Kimball’s Conformed Dimensions ● Normal database modeling does not meet needs of reporting and analysis ● Denormalize data ● Dimensions ● How does data need to be filtered? ● Facts ● What are we wanting to analyze/measure?
  • 11. Open Source Software ● Greenplum (PostgreSQL derivative) ● InfiniDB (MySQL derivative) ● Infobright (MySQL derivative) ● Other columnar data stores
  • 12. Columnar Data Stores ● Designed for conformed dimensions ● High Performance ● Self-indexing based on usage ● High compression of data
  • 13. Row vs Columnar Databases Source: http://dbbest.com/blog/column-oriented-database-technologies/
  • 14. Cautions ● Traditional RDBMS ● Not built for conformed dimensions! ● Performance will become issue
  • 15. Inmon’s Hub and Spoke ● Combines ● 3NF central data warehouse ● Conformed dimensions ● Becomes foundation for further variants
  • 16. Linstedt’s Data Vault Model ● Mixes 3NF and Conformed Dimensions ● Model data per business entities and their relationships ● Hubs ● Store unique business entity identifiers (keys) ● Links ● Relate hubs and other links to form relationships ● Satellites ● Store unique information regarding entity or relationship
  • 18. Cautions ● While you get the best mix between 3NF and conformed dimensions, data marts are still needed ● Issues seen with both 3NF and conformed dimensions can be found here
  • 19. Open Source Software ● MySQL ● PostgreSQL ● Greenplum ● Other Traditional RDBMS ● NoSQL ● Hadoop
  • 20. Rönnbäck’s Anchor Model/6NF ● Focus is on the data and it’s relationships. ● Anchors ● Model entities and events ● Attributes ● Model properties of anchors ● Ties ● Model relationships between anchors ● Knots ● Model relationships between shared properties
  • 22. Cautions ● Number of joins will be an issue for some databases ● Queries will become complex ● Joins ● Finding properties/valuable information ● Every column in traditional tables becomes own unique table
  • 23. ?
  • 24. Open Source Software ● Anchor Modeling website ● http://www.anchormodeling.com ● Web based design tools ● No databases built specifically for 6NF