SlideShare ist ein Scribd-Unternehmen logo
1 von 6
What is a Data Warehouse? :<br />,[object Object]
A data warehouse (or mart) is way of storing data for later retrieval. This retrieval isalmost always used to support decision-making in the organization. That is why manydata warehouses are considered to be DSS (Decision-Support Systems).
Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data
Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data.
A data warehouse stores current and historical dataOLTP:<br />,[object Object]
This is a standard, normalized database structure.

Weitere ähnliche Inhalte

Was ist angesagt?

2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS
koolkampus
 

Was ist angesagt? (20)

Data cubes
Data cubesData cubes
Data cubes
 
Normalization
NormalizationNormalization
Normalization
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS2. Entity Relationship Model in DBMS
2. Entity Relationship Model in DBMS
 
Data Models
Data ModelsData Models
Data Models
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Tableau
TableauTableau
Tableau
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data Dictionary
Data DictionaryData Dictionary
Data Dictionary
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Tableau ppt
Tableau pptTableau ppt
Tableau ppt
 
Tableau Presentation
Tableau PresentationTableau Presentation
Tableau Presentation
 
Data dictionary
Data dictionaryData dictionary
Data dictionary
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 

Andere mochten auch (6)

OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Difference between star schema and snowflake schema
Difference between star schema and snowflake schemaDifference between star schema and snowflake schema
Difference between star schema and snowflake schema
 
Data warehouse : Order Management
Data warehouse : Order ManagementData warehouse : Order Management
Data warehouse : Order Management
 
OLAP
OLAPOLAP
OLAP
 
Types of Hotel Rooms
Types of Hotel RoomsTypes of Hotel Rooms
Types of Hotel Rooms
 

Ähnlich wie Star schema

Data warehouse
Data warehouseData warehouse
Data warehouse
_123_
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
theextraaedge
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
Ashish Chandwani
 

Ähnlich wie Star schema (20)

Dw concepts
Dw conceptsDw concepts
Dw concepts
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Case study: Implementation of dimension table and fact table
Case study: Implementation of dimension table and fact tableCase study: Implementation of dimension table and fact table
Case study: Implementation of dimension table and fact table
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptx
 
IBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARNIBM Cognos tutorial - ABC LEARN
IBM Cognos tutorial - ABC LEARN
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
ETL QA
ETL QAETL QA
ETL QA
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 
Sqlserver interview questions
Sqlserver interview questionsSqlserver interview questions
Sqlserver interview questions
 
3dw
3dw3dw
3dw
 
Data Warehouse by Amr Ali
Data Warehouse by Amr AliData Warehouse by Amr Ali
Data Warehouse by Amr Ali
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Star schema

  • 1.
  • 2. A data warehouse (or mart) is way of storing data for later retrieval. This retrieval isalmost always used to support decision-making in the organization. That is why manydata warehouses are considered to be DSS (Decision-Support Systems).
  • 3. Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data
  • 4. Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data.
  • 5.
  • 6. This is a standard, normalized database structure.
  • 7.
  • 8.
  • 9. Therefore, with each transaction, these indexes must be updated along withthe table. This overhead can significantly decrease our performance.
  • 10. There are some disadvantages to an OLTP structure, especially when we go to retrieve thedata for analysis.
  • 11. For one, we now must utilize joins and query multiple tables to get allthe data we want. Joins tend to be slower than reading from a single table, so we want tominimize the number of tables in any single query.
  • 12. One of the advantages of OLTP is also a disadvantage: fewer indexes per table.
  • 13. In general terms,the fewer indexes we have, the faster inserts, updates, and deletes will be.
  • 14. However, againin general terms, the fewer indexes we have, the slower select queries will run.
  • 15. Since one of our design goals to speed transactions is to minimize the numberof indexes, we are limiting ourselves when it comes to doing data retrieval.
  • 16.
  • 17. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multipledimensions.
  • 18.
  • 19. Identify measures or facts (sales dollar).
  • 20. Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
  • 21. List the columns that describe each dimension.(region name, branch name, region name).
  • 22.
  • 23. In a star schema, a dimension table will not have any parent table.
  • 24. Whereas in a snow flake schema, a dimension table will have one or more parent tables.
  • 25. Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
  • 26.
  • 27. When I talk about “by” conditions, I am referring to looking at data by certain conditions
  • 28. For example, if we take the question “On a quarterly and then monthly basis, are DairyProduct sales cyclical” we can break this down into this: “We want to see total sales bycategory (just Dairy Products in this case),by quarter or by month.”
  • 29. Here we are looking at an aggregated value, the sum of sales, by specific criteria.
  • 30. When we talk about the way we want to look at data, we usually want to see some sort ofaggregated data. These data are called measures.
  • 31. These measures are numeric values that are measurable and additive.
  • 32. We need to look at our measures using those “by” conditions. These “by” conditions are called dimensions.
  • 33. When we say we want to know our sales dollars, we almost always mean by day, or by quarter, or by year.
  • 34. These by conditions will map into dimensions:there is almost always a time dimension, and product and geographic dimensions are verycommon as well.
  • 35. Therefore, in designing a star schema, our first order of business is usually to determine
  • 36.
  • 37. This key is often just an identity column, consisting of an automatically incrementing number.
  • 38. (The value of the primary key is meaningless; our information is stored in the other fields.)
  • 39. These other fields contain the full descriptions of what we are after.
  • 40. For example, if we have a Product dimension (which is common) we have fields in it that contain the description, the category name, the sub-category name, etc.
  • 41. These fields do not contain codes that link us to other tables. Because the fields are the full descriptions, the dimension tables are often fat; they contain many large fields.
  • 42. Dimension tables are often short, however. We may have many products, but even so, the dimension table cannot compare in size to a normal fact table.
  • 43. Dimension tables are often short, however. We may have many products, but even so, the dimension table cannot compare in size to a normal fact table.
  • 44. Our dimension table might look something like this:
  • 45. Notice that both Category and Subcategory are stored in the table and not linked in through joined tables that store the hierarchy information.
  • 46.
  • 47. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables.
  • 48. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
  • 49. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
  • 50.
  • 51. Identify measures or facts (sales dollar).
  • 52. Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
  • 53. List the columns that describe each dimension.(region name, branch name, region name).
  • 54.
  • 55. The measures are numeric and additive across some or all of the dimensions.
  • 56. For example, sales are numeric and we can look at total sales for a product, or category, and we can look at total sales by any time period.
  • 57. While the dimension tables are short and fat, the fact tables are generally long and skinny.
  • 58. They are long because they can hold the number of records represented by the product of the counts in all the dimension tables.
  • 59. In this schema, we have product, time and store dimensions. If we assume we have ten years of daily data, 200 stores, and we sell 500 products, we have a potential of 365,000,000 records (3650 days * 200 stores * 500 products). As you can see, this makes the fact table long.
  • 60. The fact table is skinny because of the fields it holds. The primary key is made up of foreign keys that have migrated from the dimension tables.
  • 61. These fields are just some sort of numeric value. In addition, our measures are also numeric. Therefore, the size of each record is generally much smaller than those in our dimension tables.
  • 62.
  • 63. Non Additive - Measures that cannot be added across all dimensions.
  • 64. Semi Additive - Measures that can be added across few dimensions and not with others.