SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Data Warehouse
Slide 2-2
3
Agenda
 Data Warehouse architecture &
building blocks
 ER modeling review
 Need for Dimensional Modeling
 Dimensional modeling & its inside
 Comparison of ER with dimensional
Data Warehouse
5
Components
 Major components
 Source data component
 Data staging component
 Information delivery component
 Metadata component
 Management and control component
6
1. Source Data Components
 Source data can be grouped into 4 components
 Production data
 Comes from operational systems of enterprise
 Some segments are selected from it
 Narrow scope, e.g. order details
 Internal data
 Private datasheet, documents, customer profiles etc.
 E.g. Customer profiles for specific offering
 Special strategies to transform ‘it’ to DW (text document)
 Archived data
 Old data is archived
 DW have snapshots of historical data
 External data
 Executives depend upon external sources
 E.g. market data of competitors, car rental require new manufacturing.
Define conversion
7
Architecture of DW
8
2. Data Staging Components
 After data is extracted, data is to be prepared
 Data extracted from sources needs to be changed,
converted and made ready in suitable format
 Three major functions to make data ready
 Extract
 Transform
 Load
 Staging area provides a place and area with a set
of functions to
 Clean
 Change
 Combine
 Convert
9
Architecture of DW
10
3. Data Storage Components
 Separate repository
 Data structured for efficient processing
 Redundancy is increased
 Updated after specific periods
 Only read-only
11
Architecture of DW
12
4. Information Delivery Component
 Authentication issues
 Active monitoring services
Performance, DBA note selected
aggregates to change storage
User performance
Aggregate awareness
E.g. mining, OLAP etc
Slide 2-13
14
DW Design
15
Architecture of DW
16
Background (ER Modeling)
 For ER modeling, entities are collected from the
environment
 Each entity act as a table
 Success reasons
 Normalized after ER, since it removes redundancy (to
handle update/delete anomalies)
 But number of tables is increased
 Is useful for fast access of small amount of data
ER Drawbacks for DW / Need of Dimensional
Modeling
 ER Hard to remember, due to increased number of tables
 Complex for queries with multiple tables (table joins)
 Conventional RDBMS optimized for small number of tables
whereas large number of tables might be required in DW
 Ideally no calculated attributes
 The DW does not require to update data like in OLTP system
so there is no need of normalization
 OLAP is not the only purpose of DW, we need a model that
facilitate integration of data, data mining, historically
consolidated data.
 Efficient indexing scheme to avoid screening of all data
 De-Normalization (in DW)
 Add primary key
 Direct relationships
 Re-introduce redundancy
17
18
Dimensional Modeling
 Dimensional Modeling focuses subject-
orientation, critical factors of business
 Critical factors are stored in facts
 Redundancy is no problem, achieve efficiency
 Logical design technique for high performance
 Is the modeling technique for storage
Dimensional Modeling (cont.)
 Two important concepts
Fact
 Numeric measurements, represent business activity/event
 Are pre-computed, redundant
 Example: Profit, quantity sold
Dimension
 Qualifying characteristics, perspective to a fact
 Example: date (Date, month, quarter, year)
19
20
Dimensional Modeling (cont.)
 Facts are stored in fact table
 Dimensions are represented by dimension tables
 Dimensions are degrees in which facts can be judged
 Each fact is surrounded by dimension tables
 Looks like a star so called Star Schema
21
Example
TIME
time_key
(PK)
SQL_date
day_of_wee
k
month
STORE
store_key
(PK)
store_ID
store_name
address
district
floor_type
CLERK
clerk_key
(PK)
clerk_id
PRODUCT
product_key
(PK)
SKU
description
brand
category
CUSTOMER
customer_key
(PK)
customer_nam
e
purchase_profi
le
credit_profile
address
PROMOTION
promotion_key
(PK)
promotion_nam
FACT
time_key (FK)
store_key (FK)
clerk_key (FK)
product_key
(FK)
customer_key
(FK)
promotion_key
(FK)
dollars_sold
units_sold
22
Inside Dimensional Modeling
 Inside Dimension table
 Key attribute of dimension table, for identification
 Large no of columns, wide table
 Non-calculated attributes, textual attributes
 Attributes are not directly related
 Un-normalized in Star schema
 Ability to drill-down and drill-up are two ways of
exploiting dimensions
 Can have multiple hierarchies
 Relatively small number of records
23
Inside Dimensional Modeling
 Have two types of attributes
 Key attributes, for connections
 Facts
 Inside fact table
 Concatenated key
 Grain or level of data identified
 Large number of records
 Limited attributes
 Sparse data set
 Degenerate dimensions (order number Average products per
order)
 Fact-less fact table
24
Star Schema Keys
 Primary keys
 Identifying attribute in dimension table
 Relationship attributes combine together to form P.K
 Surrogate keys
 Replacement of primary key
 System generated
 Foreign keys
 Collection of primary keys of dimension tables
 Primary key to fact table
 System generated
 Collection of P.Ks
25
Advantage of Star Schema
 Ease for users to understand
 Optimized for navigation (less joins fast)
 Most suitable for query processing
Karen Corral, et al. (2006) The impact of alternative
diagrams on the accuracy of recall: A comparison of
star-schema diagrams and entity-relationship diagrams,
Decision Support Systems, 42(1), 450-468.
DATA WAREHOUSES AND
DATA MARTS
DATA WAREHOUSES AND DATA MARTS
 Bill Inmon stated, “The single most important issue facing the IT manager
this year is whether to build the data warehouse first or the data mart first.”
 This statement is true even today. Let us examine this statement and take a
stand
 Before deciding to build a data warehouse for your organization, you need to
ask the
 Following basic and fundamental questions and address the relevant issues:
 Top-down or bottom-up approach?
 Enterprise-wide or departmental?
 Which first—data warehouse or data mart?
 Build pilot or go with a full-fledged implementation?
 Dependent or independent data marts?
Data Granularity
Top Down Versus Bottom Approach
A Practical Approach
 In order to formulate an approach for your organization, you need to examine
what exactly
 Your organization wants. Is your organization looking for long-term results or
fast data
 Marts for only a few subjects for now? Does your organization want quick,
proof-of-concept,
 Throw-away implementations? Or, do you want to look into some other practical
approach?
 Although both the top-down and the bottom-up approaches each have their own
advantages and drawbacks, a compromise approach accommodating both views
appears to be practical.
 The chief proponent of this practical approach is Ralph Kimball, an eminent
author and data warehouse expert. The steps in this practical approach are as
follows:
1. Plan and define requirements at the overall corporate level
2. Create a surrounding architecture for a complete warehouse
3. Conform and standardize the data content
4. Implement the data warehouse as a series of supermarts, one at a time
METADATA IN THE DATA
WAREHOUSE
Types of Metadata
 Metadata in a data warehouse fall into three major categories:
 Operational Metadata
 Extraction and Transformation Metadata
 End-User Metadata
Operational Metadata
 As you know, data for the data warehouse comes from several operational
systems of the enterprise. These source systems contain different data structures.
 The data elements selected for the data warehouse have various field lengths and
data types.
 In selecting data from the source systems for the data warehouse, you split
records, combine parts of records from different source files, and deal with
multiple coding schemes and field lengths.
 When you deliver information to the end-users, you must be able to tie that back
to the original source data sets.
 Operational metadata contain all of this information about the operational data
sources.
Extraction and Transformation
Metadata
 Extraction and transformation metadata contain data about the extraction of data
from the source systems, namely, the extraction frequencies, extraction methods,
and business rules for the data extraction.
 Also, this category of metadata contains information about all the data
transformations that take place in the data staging area.
End-User Metadata
 The end-user metadata is the navigational map of the data warehouse.
 It enables the end-users to find information from the data warehouse.
 The end-user metadata allows the end-users to use their own business
terminologies.
Significance
 Why is metadata especially important in a data warehouse?
 First, it acts as the glue that connects all parts of the data
warehouse.
 Next, it provides information about the contents and structures to
the developers.
 Finally, it opens the door to the end-users and makes the contents
recognizable in their own terms.
Exercise
 A data warehouse is subject-oriented. What would be the major critical
business subjects for the following companies?
 An international manufacturing company
 A local community bank
 A domestic hotel chain
 You are the data analyst on the project team building a data warehouse for
an insurance company. List the possible data sources from which you will
bring the data into your data warehouse. State your assumptions.
 For an airlines company, identify three operational applications that would
feed into the data warehouse. What would be the data load and refresh
cycles?
 Prepare a table showing all the potential users and informationdelivery
methods for a data warehouse supporting a large national grocery chain.
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall Slide 2-37

Weitere ähnliche Inhalte

Ähnlich wie 3._DWH_Architecture__Components.ppt

Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologytovetrivel
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessJawaherAlbaddawi
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkSlava Kokaev
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Andrey Akulov
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Data architecture in enterprise architecture is the design of data for use in...
Data architecture in enterprise architecture is the design of data for use in...Data architecture in enterprise architecture is the design of data for use in...
Data architecture in enterprise architecture is the design of data for use in...Rasmita Panda
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data WarehousesMichael Lamont
 

Ähnlich wie 3._DWH_Architecture__Components.ppt (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Planning Data Warehouse
Planning Data WarehousePlanning Data Warehouse
Planning Data Warehouse
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
 
Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
DW 101
DW 101DW 101
DW 101
 
E05WAREH1.PPT
E05WAREH1.PPTE05WAREH1.PPT
E05WAREH1.PPT
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Data architecture in enterprise architecture is the design of data for use in...
Data architecture in enterprise architecture is the design of data for use in...Data architecture in enterprise architecture is the design of data for use in...
Data architecture in enterprise architecture is the design of data for use in...
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
8143320263_krishna12
8143320263_krishna128143320263_krishna12
8143320263_krishna12
 
Course Outline Ch 2
Course Outline Ch 2Course Outline Ch 2
Course Outline Ch 2
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 

Kürzlich hochgeladen

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

3._DWH_Architecture__Components.ppt

  • 3. 3 Agenda  Data Warehouse architecture & building blocks  ER modeling review  Need for Dimensional Modeling  Dimensional modeling & its inside  Comparison of ER with dimensional
  • 5. 5 Components  Major components  Source data component  Data staging component  Information delivery component  Metadata component  Management and control component
  • 6. 6 1. Source Data Components  Source data can be grouped into 4 components  Production data  Comes from operational systems of enterprise  Some segments are selected from it  Narrow scope, e.g. order details  Internal data  Private datasheet, documents, customer profiles etc.  E.g. Customer profiles for specific offering  Special strategies to transform ‘it’ to DW (text document)  Archived data  Old data is archived  DW have snapshots of historical data  External data  Executives depend upon external sources  E.g. market data of competitors, car rental require new manufacturing. Define conversion
  • 8. 8 2. Data Staging Components  After data is extracted, data is to be prepared  Data extracted from sources needs to be changed, converted and made ready in suitable format  Three major functions to make data ready  Extract  Transform  Load  Staging area provides a place and area with a set of functions to  Clean  Change  Combine  Convert
  • 10. 10 3. Data Storage Components  Separate repository  Data structured for efficient processing  Redundancy is increased  Updated after specific periods  Only read-only
  • 12. 12 4. Information Delivery Component  Authentication issues  Active monitoring services Performance, DBA note selected aggregates to change storage User performance Aggregate awareness E.g. mining, OLAP etc
  • 16. 16 Background (ER Modeling)  For ER modeling, entities are collected from the environment  Each entity act as a table  Success reasons  Normalized after ER, since it removes redundancy (to handle update/delete anomalies)  But number of tables is increased  Is useful for fast access of small amount of data
  • 17. ER Drawbacks for DW / Need of Dimensional Modeling  ER Hard to remember, due to increased number of tables  Complex for queries with multiple tables (table joins)  Conventional RDBMS optimized for small number of tables whereas large number of tables might be required in DW  Ideally no calculated attributes  The DW does not require to update data like in OLTP system so there is no need of normalization  OLAP is not the only purpose of DW, we need a model that facilitate integration of data, data mining, historically consolidated data.  Efficient indexing scheme to avoid screening of all data  De-Normalization (in DW)  Add primary key  Direct relationships  Re-introduce redundancy 17
  • 18. 18 Dimensional Modeling  Dimensional Modeling focuses subject- orientation, critical factors of business  Critical factors are stored in facts  Redundancy is no problem, achieve efficiency  Logical design technique for high performance  Is the modeling technique for storage
  • 19. Dimensional Modeling (cont.)  Two important concepts Fact  Numeric measurements, represent business activity/event  Are pre-computed, redundant  Example: Profit, quantity sold Dimension  Qualifying characteristics, perspective to a fact  Example: date (Date, month, quarter, year) 19
  • 20. 20 Dimensional Modeling (cont.)  Facts are stored in fact table  Dimensions are represented by dimension tables  Dimensions are degrees in which facts can be judged  Each fact is surrounded by dimension tables  Looks like a star so called Star Schema
  • 22. 22 Inside Dimensional Modeling  Inside Dimension table  Key attribute of dimension table, for identification  Large no of columns, wide table  Non-calculated attributes, textual attributes  Attributes are not directly related  Un-normalized in Star schema  Ability to drill-down and drill-up are two ways of exploiting dimensions  Can have multiple hierarchies  Relatively small number of records
  • 23. 23 Inside Dimensional Modeling  Have two types of attributes  Key attributes, for connections  Facts  Inside fact table  Concatenated key  Grain or level of data identified  Large number of records  Limited attributes  Sparse data set  Degenerate dimensions (order number Average products per order)  Fact-less fact table
  • 24. 24 Star Schema Keys  Primary keys  Identifying attribute in dimension table  Relationship attributes combine together to form P.K  Surrogate keys  Replacement of primary key  System generated  Foreign keys  Collection of primary keys of dimension tables  Primary key to fact table  System generated  Collection of P.Ks
  • 25. 25 Advantage of Star Schema  Ease for users to understand  Optimized for navigation (less joins fast)  Most suitable for query processing Karen Corral, et al. (2006) The impact of alternative diagrams on the accuracy of recall: A comparison of star-schema diagrams and entity-relationship diagrams, Decision Support Systems, 42(1), 450-468.
  • 27. DATA WAREHOUSES AND DATA MARTS  Bill Inmon stated, “The single most important issue facing the IT manager this year is whether to build the data warehouse first or the data mart first.”  This statement is true even today. Let us examine this statement and take a stand  Before deciding to build a data warehouse for your organization, you need to ask the  Following basic and fundamental questions and address the relevant issues:  Top-down or bottom-up approach?  Enterprise-wide or departmental?  Which first—data warehouse or data mart?  Build pilot or go with a full-fledged implementation?  Dependent or independent data marts?
  • 29. Top Down Versus Bottom Approach
  • 30.
  • 31. A Practical Approach  In order to formulate an approach for your organization, you need to examine what exactly  Your organization wants. Is your organization looking for long-term results or fast data  Marts for only a few subjects for now? Does your organization want quick, proof-of-concept,  Throw-away implementations? Or, do you want to look into some other practical approach?
  • 32.  Although both the top-down and the bottom-up approaches each have their own advantages and drawbacks, a compromise approach accommodating both views appears to be practical.  The chief proponent of this practical approach is Ralph Kimball, an eminent author and data warehouse expert. The steps in this practical approach are as follows: 1. Plan and define requirements at the overall corporate level 2. Create a surrounding architecture for a complete warehouse 3. Conform and standardize the data content 4. Implement the data warehouse as a series of supermarts, one at a time
  • 33. METADATA IN THE DATA WAREHOUSE Types of Metadata  Metadata in a data warehouse fall into three major categories:  Operational Metadata  Extraction and Transformation Metadata  End-User Metadata
  • 34. Operational Metadata  As you know, data for the data warehouse comes from several operational systems of the enterprise. These source systems contain different data structures.  The data elements selected for the data warehouse have various field lengths and data types.  In selecting data from the source systems for the data warehouse, you split records, combine parts of records from different source files, and deal with multiple coding schemes and field lengths.  When you deliver information to the end-users, you must be able to tie that back to the original source data sets.  Operational metadata contain all of this information about the operational data sources.
  • 35. Extraction and Transformation Metadata  Extraction and transformation metadata contain data about the extraction of data from the source systems, namely, the extraction frequencies, extraction methods, and business rules for the data extraction.  Also, this category of metadata contains information about all the data transformations that take place in the data staging area. End-User Metadata  The end-user metadata is the navigational map of the data warehouse.  It enables the end-users to find information from the data warehouse.  The end-user metadata allows the end-users to use their own business terminologies.
  • 36. Significance  Why is metadata especially important in a data warehouse?  First, it acts as the glue that connects all parts of the data warehouse.  Next, it provides information about the contents and structures to the developers.  Finally, it opens the door to the end-users and makes the contents recognizable in their own terms.
  • 37. Exercise  A data warehouse is subject-oriented. What would be the major critical business subjects for the following companies?  An international manufacturing company  A local community bank  A domestic hotel chain  You are the data analyst on the project team building a data warehouse for an insurance company. List the possible data sources from which you will bring the data into your data warehouse. State your assumptions.  For an airlines company, identify three operational applications that would feed into the data warehouse. What would be the data load and refresh cycles?  Prepare a table showing all the potential users and informationdelivery methods for a data warehouse supporting a large national grocery chain. Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall Slide 2-37