SlideShare ist ein Scribd-Unternehmen logo
1 von 6
What is a Data Warehouse? :<br />,[object Object]
A data warehouse (or mart) is way of storing data for later retrieval. This retrieval isalmost always used to support decision-making in the organization. That is why manydata warehouses are considered to be DSS (Decision-Support Systems).
Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data
Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data.
A data warehouse stores current and historical dataOLTP:<br />,[object Object]
This is a standard, normalized database structure.

Weitere ähnliche Inhalte

Was ist angesagt?

Data cleansing
Data cleansingData cleansing
Data cleansing
kunaljain1701
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
jagdish_93
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMS
koolkampus
 

Was ist angesagt? (20)

DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
Spatial Database
Spatial DatabaseSpatial Database
Spatial Database
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Relational Data Model Introduction
Relational Data Model IntroductionRelational Data Model Introduction
Relational Data Model Introduction
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Data Visualisation for Data Science
Data Visualisation for Data ScienceData Visualisation for Data Science
Data Visualisation for Data Science
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMS
 
Ppt
PptPpt
Ppt
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Data cubes
Data cubesData cubes
Data cubes
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
 

Ă„hnlich wie Star schema

Data warehouse
Data warehouseData warehouse
Data warehouse
_123_
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
theextraaedge
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
Amit Sharma
 
Sqlserver interview questions
Sqlserver interview questionsSqlserver interview questions
Sqlserver interview questions
Taj Basha
 

Ă„hnlich wie Star schema (20)

Dw concepts
Dw conceptsDw concepts
Dw concepts
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Case study: Implementation of dimension table and fact table
Case study: Implementation of dimension table and fact tableCase study: Implementation of dimension table and fact table
Case study: Implementation of dimension table and fact table
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptx
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARN
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
ETL QA
ETL QAETL QA
ETL QA
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 
Sqlserver interview questions
Sqlserver interview questionsSqlserver interview questions
Sqlserver interview questions
 
3dw
3dw3dw
3dw
 
Data Warehouse by Amr Ali
Data Warehouse by Amr AliData Warehouse by Amr Ali
Data Warehouse by Amr Ali
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 

KĂĽrzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

KĂĽrzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Star schema

  • 1.
  • 2. A data warehouse (or mart) is way of storing data for later retrieval. This retrieval isalmost always used to support decision-making in the organization. That is why manydata warehouses are considered to be DSS (Decision-Support Systems).
  • 3. Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data
  • 4. Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data.
  • 5.
  • 6. This is a standard, normalized database structure.
  • 7.
  • 8.
  • 9. Therefore, with each transaction, these indexes must be updated along withthe table. This overhead can significantly decrease our performance.
  • 10. There are some disadvantages to an OLTP structure, especially when we go to retrieve thedata for analysis.
  • 11. For one, we now must utilize joins and query multiple tables to get allthe data we want. Joins tend to be slower than reading from a single table, so we want tominimize the number of tables in any single query.
  • 12. One of the advantages of OLTP is also a disadvantage: fewer indexes per table.
  • 13. In general terms,the fewer indexes we have, the faster inserts, updates, and deletes will be.
  • 14. However, againin general terms, the fewer indexes we have, the slower select queries will run.
  • 15. Since one of our design goals to speed transactions is to minimize the numberof indexes, we are limiting ourselves when it comes to doing data retrieval.
  • 16.
  • 17. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multipledimensions.
  • 18.
  • 19. Identify measures or facts (sales dollar).
  • 20. Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
  • 21. List the columns that describe each dimension.(region name, branch name, region name).
  • 22.
  • 23. In a star schema, a dimension table will not have any parent table.
  • 24. Whereas in a snow flake schema, a dimension table will have one or more parent tables.
  • 25. Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
  • 26.
  • 27. When I talk about “by” conditions, I am referring to looking at data by certain conditions
  • 28. For example, if we take the question “On a quarterly and then monthly basis, are DairyProduct sales cyclical” we can break this down into this: “We want to see total sales bycategory (just Dairy Products in this case),by quarter or by month.”
  • 29. Here we are looking at an aggregated value, the sum of sales, by specific criteria.
  • 30. When we talk about the way we want to look at data, we usually want to see some sort ofaggregated data. These data are called measures.
  • 31. These measures are numeric values that are measurable and additive.
  • 32. We need to look at our measures using those “by” conditions. These “by” conditions are called dimensions.
  • 33. When we say we want to know our sales dollars, we almost always mean by day, or by quarter, or by year.
  • 34. These by conditions will map into dimensions:there is almost always a time dimension, and product and geographic dimensions are verycommon as well.
  • 35. Therefore, in designing a star schema, our first order of business is usually to determine
  • 36.
  • 37. This key is often just an identity column, consisting of an automatically incrementing number.
  • 38. (The value of the primary key is meaningless; our information is stored in the other fields.)
  • 39. These other fields contain the full descriptions of what we are after.
  • 40. For example, if we have a Product dimension (which is common) we have fields in it that contain the description, the category name, the sub-category name, etc.
  • 41. These fields do not contain codes that link us to other tables. Because the fields are the full descriptions, the dimension tables are often fat; they contain many large fields.
  • 42. Dimension tables are often short, however. We may have many products, but even so, the dimension table cannot compare in size to a normal fact table.
  • 43. Dimension tables are often short, however. We may have many products, but even so, the dimension table cannot compare in size to a normal fact table.
  • 44. Our dimension table might look something like this:
  • 45. Notice that both Category and Subcategory are stored in the table and not linked in through joined tables that store the hierarchy information.
  • 46.
  • 47. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables.
  • 48. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
  • 49. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
  • 50.
  • 51. Identify measures or facts (sales dollar).
  • 52. Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
  • 53. List the columns that describe each dimension.(region name, branch name, region name).
  • 54.
  • 55. The measures are numeric and additive across some or all of the dimensions.
  • 56. For example, sales are numeric and we can look at total sales for a product, or category, and we can look at total sales by any time period.
  • 57. While the dimension tables are short and fat, the fact tables are generally long and skinny.
  • 58. They are long because they can hold the number of records represented by the product of the counts in all the dimension tables.
  • 59. In this schema, we have product, time and store dimensions. If we assume we have ten years of daily data, 200 stores, and we sell 500 products, we have a potential of 365,000,000 records (3650 days * 200 stores * 500 products). As you can see, this makes the fact table long.
  • 60. The fact table is skinny because of the fields it holds. The primary key is made up of foreign keys that have migrated from the dimension tables.
  • 61. These fields are just some sort of numeric value. In addition, our measures are also numeric. Therefore, the size of each record is generally much smaller than those in our dimension tables.
  • 62.
  • 63. Non Additive - Measures that cannot be added across all dimensions.
  • 64. Semi Additive - Measures that can be added across few dimensions and not with others.