SlideShare a Scribd company logo
1 of 57
An Introduction
to Data
Warehousing
Presented by Animesh Srivastava
In the beginning..
 In the early 80s the concept of RDBMS ushered in an era of improved access to
the valuable information contained deep within data
 Our need for information grew exponentially and we needed a solution for an
efficient Decision Supporting System to run the business(OLAP)
 With the growing data OLTP systems became inefficient and not optimal for
complex query processing, reporting and analytical need
In the beginning..
 In the early 70s Bill Inmon also famous as “The Father of Data Warehouse”
coined the term “Data Warehouse”
 According to Inmon “Data Warehouse is a collection of integrated, subject-
oriented databases designed to support a DSS (decision support system),
where each unit of data is non-volatile and relevant to some moment in
time”
These concerns
have existed
for more than
three decades-
 “We collect tons of data, but we can’t
access it.”
 “We need to slice and dice the data
every which way.”
 “Business people need to get at the
data easily.”
 “Just show me what is important.”
 “We spend entire meetings arguing
about who has the right numbers rather
than making decisions.”
 “We want people to use information to
support more fact-based decision
making.”
Why we need DWH?
 Consolidation of information
resources across multiples
platforms and even geographies at
a single premise by extracting(E),
transforming(T) and finally
loading(L)
 Improved query performance
 Foundation of Data Mining, Data
Visualization, Advanced reporting
using BI, OLAP tools
What is DWH
used for?
 Information and Knowledge:
 Intelligent Reporting and analytics
 Studying the trends and nature of
business over time
 Predicting the future with the help of
data
 Help business take key decisions and
make strategies
 Manage the history of transactions
happened within an organization
Examples:
What is DWH used
for?
Examples
 E-commerce
Providing cross sales, suggesting
different products like AMAZON
providing different features like
‘People also bought/viewed’.
Companies often use customer data
to predict or manipulate the
customer requirements considering
all the parameters like gender,
geography, age etc
What is DWH used
for?
Examples
 Supermarkets
One notable recent example of this was with
the US retailer Target. As part of its Data
Mining program, the company developed
rules to predict if their shoppers were likely
to be pregnant. By looking at the contents of
their customers shopping baskets, they could
spot customers who they thought were likely
to be expecting and begin targeting
promotions for nappies, cotton wool and so
on. The prediction was so accurate that
Target made the news by sending
promotional coupons to families who did not
yet realize they were pregnant!
What is DWH used for?
Examples
 Crime Agencies (where is crime
most likely to happen and
when?)
Crime prevention agencies use
analytics and Data Mining to spot
trends across myriads of data
helping with everything from
where to deploy police manpower.
What is DWH used
for?
Examples
 Service Providers-
Mobile phone and utilities companies
use Data Mining and Business
Intelligence to predict ‘churn’, the
terms they use for when a customer
leaves their company to get their
phone/gas/broadband from another
provider. They collate billing
information, customer services
interactions, website visits and other
metrics to give each customer a
probability score, then target offers
and incentives to customers whom they
perceive to be at a higher risk of
churning.
Responsibilities of DW/BI Managers:
 Understand the business users and determine the decisions that the business
users want to make with the help of the DW/BI system.
 Deliver high-quality, relevant, and accessible information and analytics to the
business users:
 Produce robust, presentable and meaningful data
 Continuously monitor the accuracy of the data and analyses
 Adapt to changing user profiles, requirements, and business priorities, along with
the availability of new data sources
 Sustain the DW/BI environment by updating the DW/BI system on a regular
basis and justify staffing and on-going expenditures
Pre-requisites before diving in…
 Types of Databases:
 OLTP
 OLAP
 What is Normalization?
 Types (1NF, 2NF, 3NF)
 Keys (Primary, Foreign, Composite, Surrogate)
 Data Modelling
 Conceptual
 Logical
 Physical
 E-R models
 Dimensional Modelling
 Star Schema
 Fact Tables
 Dimension tables
 Slowly changing dimensions (SCD – I, SCD – II, SCD – III)
OLTP and OLAP Databases
 Types of Databases:
 OLTP - Online transactional processing
 OLAP - Online analytical processing
 One of the most important assets of any organization is its information. This
asset is almost always used for two purposes: operational record keeping and
analytical decision making.
 Simply speaking, the operational systems (OLTP) are where you put the data
in, and the DW/BI system (OLAP) is where you get the data out.
OLTP vs OLAP
OLTP
 OLTPs are the original source of
the data.
 To control and run fundamental
business tasks
 Reveals a snapshot of ongoing
business processes
 Highly normalized with many
tables
 Low in space
 Volatile
OLAP
 OLAP data comes from the various
OLTP Databases
 To help with planning, problem
solving, and decision support
 Multi-dimensional views of various
kinds of business activities
 De-normalized, star schema
 High space
 Non-volatile
What is Normalization?
 Normalization is a database design technique which organizes tables in a
manner that reduces redundancy and dependency of data.
 It divides larger tables to smaller tables and links them using relationships.
 There are 3 prominent types of normalization but these techniques are still
evolving and we have totally 7 normalization techniques
 1NF
 2NF
 3NF
Types of
Normalization:
1NF
 Rules:
 Each table cell should
contain a single
value.
 Each record needs to
be unique
Types of
Normalization:
2NF
 Rules:
 The entity should be
in1NF form
 All attributes within the
entity should depend
solely on the unique
identifier of the entity
Types of Normalization:
3NF
 Product table  Brand table
Rules:
• The entity should be in 2NF form
• No column entry should be dependent on any other entry (value) other
than the key for the table
Types of Normalization:
3NF
 Product brand table  ER model:
Different types of keys -
 Primary Key - It identify each record uniquely in table. Primary key does not allow null value
in the column and keeps unique values throughout the column.
 Foreign Key - In a relationship between two tables, a primary key of one table is referred as a
foreign key in another table. Foreign key can have duplicate values in it and can also keep
null values if column is defined to accept nulls.
 Candidate Key- It can be selected as a primary key of the table. A table can have multiple
candidate keys, out of which one can be selected as a primary key.
 Unique Key - It can contain only unique values it also permits NULL values
 Alternate Key - It is a candidate key, not selected as primary key of the table.
 Composite Key (also known as compound key or concatenated key) - It is a group of two or
more columns that identifies each row of a table uniquely. Individual column of composite
key might not able to uniquely identify the record. It can be a primary key or candidate key
also.
 Super Key - It is a set of columns that uniquely identifies each row in a table. Super key may
hold some additional columns which are not strictly required to uniquely identify each row.
Primary key and candidate keys are minimal super keys or you can say subset of super keys.
Data Modelling
 Conceptual representation of different database tables/objects which depicts
the blueprint of the whole schema.
 There are 3 types of data modelling techniques which can be mentioned in
the below hierarchical manner:
 Conceptual
 Logical
 Physical
Data Modelling
(Conceptual Model)
 The main aim of this model is to
establish the entities, their attributes,
and their relationships. It has very less
details available of the actual Database
structure.
 The 3 basic tenants of Data Model are
 Entity: A real-world thing (
Constomer & Product)
 Attribute: Characteristics or
properties of an entity
 Relationship: Dependency or
association between two entities.
Data Modelling
(Logical Model)
 It defines the structure of the data
elements and set the relationships
between them.
 The advantage of the Logical data
model is to provide a foundation to
form the base for the Physical
model.
 At this Data Modeling level, no
primary or secondary key is
defined.
Data Modelling
(Physical Model)
 It describes the database specific
implementation of the data model.
 It offers an abstraction of the
database and helps generate
schema.
 This type of Data model also helps
to visualize database structure.
 It helps to model database columns
keys, constraints, indexes,
triggers, and other RDBMS
features.
Entity Relationship Models/Diagrams
 Entity-relationship diagrams (ER diagrams or ERDs) are drawings that
communicate the relationships between tables.
 It is a type of flowchart that illustrates how “entities” such as people, objects
or concepts relate to each other within a system.
 ER Diagrams are most often used to design or debug relational databases.
 Also known as ERDs or ER Models, they use a defined set of symbols such as
rectangles, diamonds, ovals and connecting lines to depict the
interconnectedness of entities, relationships and their attributes.
Dimensional Modelling
 Dimensional modeling is widely accepted as the preferred technique for
presenting analytic data because it addresses two simultaneous requirements:
 Deliver data that’s understandable to the business users.
 Deliver fast query performance.
 Both 3NF and dimensional models can be represented in ERDs because both
consist of joined relational tables; the key difference between 3NF and
dimensional models is the degree of normalization.
 Normalized 3NF structures divide data into many discrete entities, each of
which becomes a relational table. A database of sales orders might start with
a record for each order line but turn into a complex spider web diagram as a
3NF model, perhaps consisting of hundreds of normalized tables.
Dimensional Modelling (contd.)
 Normalized 3NF structures are immensely useful in operational processing
(OLTP) because an update or insert transaction touches the database in only
one place.
 Normalized models, however, are too complicated for BI queries. Users can’t
understand, navigate, or remember normalized models that resemble a map
of a city.
 The complexity of users’ unpredictable queries overwhelms the database
optimizers, resulting in disastrous query performance.
 Fortunately, dimensional modeling addresses the problem of overly complex
schemas in the presentation area.
 A dimensional model contains the same information as a normalized model,
but packages the data in a format that delivers user understandability, query
performance, and resilience to change.
Dimensional Modelling
– Star Schema
 Dimensional models
implemented in relational
database management systems
are referred to as star schemas
because of their resemblance to
a star-like structure.
 The downside of dimensional
modelling is that you pay a load
performance price for these
capabilities, especially with large
data sets.
Dimensional Modelling
– Fact Tables
 The fact table in a dimensional
model stores the performance
measurements resulting from an
organization’s business process
events.
 You should strive to store the low-
level measurement data resulting
from a business process in a single
dimensional model (grain ex: "balls
per innings").
 Imagine standing in the marketplace
watching products being sold and
writing down the unit quantity and
dollar sales amount for each product
in each sales transaction.
Dimensional Modelling – Fact Tables
(contd.)
 Types of facts additive, semi-additive, non-additive:
 Additive - Additive measures can be summed across any of the dimensions
associated with the fact table. (sales amount)
 Semi-Additive – They can be summed across some dimensions, but not all; balance
amounts are common semi-additive facts because they are additive across all
dimensions except time
 Non – Additive - Some measures are completely non-additive, such as ratios,
percentages and percentiles.
 Despite their sparsity, fact tables usually make up 90 percent or more of the
total space consumed by a dimensional model.
 Fact tables tend to be deep in terms of the number of rows, but narrow in
terms of the number of columns.
Dimensional Modelling
– Dimension Tables
 The dimension tables contain the textual
context associated with a business process
measurement event.
 They describe the “who, what, where, when,
how, and why” associated with the event.
 Dimension tables often have many columns or
attributes.
 Dimension tables tend to have fewer rows
than fact tables, but can be wide with many
large text columns.
 Each dimension is defined by a single primary
key , which serves as the basis for referential
integrity with any given fact table to which it
is joined.
Dimensional Modelling
– Dimension Tables
 Dimension attributes serve as the primary
source of query constraints, groupings, and
report labels.
 Dimension attributes are critical to making
the DW/BI system usable and understandable.
 The analytic power of the DW/BI environment
is directly proportional to the quality and
depth of the dimension attributes.
 Instead of third normal form, dimension
tables typically are highly denormalized with
flattened many-to-one relationships within a
single dimension table.
 We can almost always trade off dimension
table space for simplicity and accessibility.
Facts and Dimensions joined in a star
schema
 The first thing to notice about the dimensional schema is its simplicity and
symmetry.
 The charm of the design in is that it is highly recognizable to business users.
 Furthermore, the reduced number of tables and use of meaningful business
descriptors make it easy to navigate and less likely that mistakes will occur.
 The simplicity of a dimensional model also has performance benefits.
 Database optimizers process these simple schemas with fewer joins more
efficiently.
 Dimension attributes supply the report filters and labeling, whereas the fact
tables supply the report’s numeric values.
Facts and Dimensions
joined in a star
schema
SELECT st.district_name,
pd.brand,
SUM(sf.sales_dollars) AS "Sales
Dollars"
FROM store st,--dimension table
product pd,--dimension table
DATE dt,--dimension table
sales_facts sf --fact table
WHERE dt.month_name ="January"
AND dt.year =2013
AND st.store_key = sf.store_key
AND pd.product_key = sf.product_key
AND dt.date_key = sf.date_key
GROUP BY st.district_name, pd.brand
Slowly changing dimensions
 It is a dimension that stores and manages both current and historical data
over time in a data warehouse. It is considered and implemented as one of
the most critical ETL tasks in tracking the history of dimension records.
 There are three types of SCDs and you can use Warehouse Builder to define,
deploy, and load all three types of SCDs.
 Type 1 SCDs - Overwriting
 Type 2 SCDs - Creating another dimension record
 Type 3 SCDs - Creating a current value field
Slowly Changing
Dimensions (contd.)
 Type 1 -Overwriting the old
value. In this method no history
of dimension changes is kept in
the database. The old dimension
value is simply overwritten be
the new one.
Slowly Changing
Dimensions (contd.)
 Type 2 - Creating a new
additional record. In this
methodology all history of
dimension changes is kept in the
database. Changes in the
attributes are captured by
adding a new row with a new
surrogate key to the dimension
table.
Slowly Changing
Dimensions (contd.)
 Type 3 - Adding a new column. In
this type usually only the current
and previous value of dimension
is kept in the database. The new
value is loaded into 'current/new'
column and the old one into
'old/previous' column. Generally
speaking the history is limited to
the number of column created
for storing historical data. This is
the least commonly needed
technique.
Data Marts
 A data mart is focused on a single functional
area of an organization and contains a subset
of data stored in a Data Warehouse.
 A data mart is a condensed version of Data
Warehouse and is designed for use by a
specific department, unit or set of users in an
organization. E.g., Marketing, Sales, HR or
finance.
 It is often controlled by a single department
in an organization.
 Data Mart usually draws data from only a few
sources compared to a Data warehouse.
 Data marts are small in size and are more
flexible compared to a Data warehouse.
A glimpse of Data Warehouse
Architecture
Data Warehouse Architecture
Data Warehouse Architecture (Inmon)
Data Warehouse Architecture (Kimball)
Data Warehouse Architectures (In Detail)
 Topics:
 Independent Data Mart Architecture
 Kimball Architecture
 Inmon Architecture (Hub-and-Spoke Corporate Information Factory)
 Key differences between Inmon and Kimball architectures
Independent Data
Mart Architecture
 Data is deployed on a departmental
basis without concern to sharing and
integrating information across the
enterprise.
 Its less recommended but this approach
is prevalent, especially in large
organizations.
 It’s the path of least resistance for fast
development at relatively low cost, at
least in the short run.
 Multiple uncoordinated extracts from
the same operational sources and
redundant storage of analytic data are
inefficient and wasteful in the long run.
Kimball Architecture
 There are four separate and distinct components to consider in the DW/BI
environment:
 Operational source systems (OLTP)
 ETL system
 Data presentation area (Dimensional Model)
 Business intelligence applications.
Kimball Architecture (contd.)
Kimball Architecture (contd.)
 Operational Source Systems:
 These are the operational systems of record that capture the business’s
transactions (OLTP)
 Extract, Transformation, and Load System:
 Extraction is the first step in the process of getting data into the data warehouse
environment
 There are numerous potential transformations, such as :
 Cleansing the data. Ex converting to standard date formats or correcting the misspellings.
 Combining data from multiple sources, and de-duplicating data.
 Primary mission of the ETL system is to hand off the dimension and fact tables in
the delivery step, these subsystems are critical.
Kimball Architecture (contd.)
 Data presentation area:
 The dimensional model with the star schema which is the byproduct of the whole
ETL design.
 Business Intelligence Applications:
 By definition, all BI applications query the data in the DW/BI presentation area to
generate the data in more meaningful and descriptive form using various reporting
techniques like pie charts, bar graphs, cross tabs etc.
Hub and spoke
Corporate Information
Factory (Inmon)
 This model identifies the key subject areas,
and most importantly, the key entities the
business operates with and cares about, like
customer, product, vendor, etc.
 Detailed logical model is created for each
major entity. Ex: A logical model will be built
for Customer with all the details related to
that entity.
 The key point here is that the entity structure
is built in normalized form (3NF), data
redundancy is avoided as much as possible.
 Data marts specific for departments are built
on top of the 3NF model and the data marts
can have de-normalized data to help with
reporting/ fast querying.
Kimball vs Inmon Architectures
Characteristics Favours Kimball Favours Inmon
Business decision support
requirements
Tactical Strategic
Data integration
requirements
Individual business
requirements
Enterprise-wide integration
The structure of data
KPI, business performance
measures, scorecards…
Data that meet multiple and
varied information needs
and non-metric data
Persistence of data in
source systems
Source systems are quite
stable
Source systems have high
rate of change
Skill sets Small team of generalists Bigger team of specialists
Time constraint
Urgent needs for the first
data warehouse
Longer time is allowed to
meet business’ needs.
Cost to build Low start-up cost High start-up costs
Refrences
 The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling,
2nd Edition Ralph Kimball, Margy Ross
 Introduction to Data Warehousing Concepts docs.oracle.com
 A Short History of Data Warehousing – DATAVERSITY www.dataversity.net
 Data integration – Wikipedia en.wikipedia.org
 Data Warehouse Design: The Good, the Bad, the Ugly blog.panoply.io
 ER Model Basic Conceptswww.tutorialspoint.com
 Oracle ATG Web Commerce - Search ERDdocs.oracle.com
 Document Entities | Oracle Magazineblogs.oracle.com
Thank You!

More Related Content

What's hot (20)

Data modelling interview question
Data modelling interview questionData modelling interview question
Data modelling interview question
 
Difference between fact tables and dimension tables
Difference between fact tables and dimension tablesDifference between fact tables and dimension tables
Difference between fact tables and dimension tables
 
Data modeling dbms
Data modeling dbmsData modeling dbms
Data modeling dbms
 
Introduction to Database Concepts
Introduction to Database ConceptsIntroduction to Database Concepts
Introduction to Database Concepts
 
Building a data warehouse
Building a data warehouseBuilding a data warehouse
Building a data warehouse
 
Databases and its representation
Databases and its representationDatabases and its representation
Databases and its representation
 
Fact table facts
Fact table factsFact table facts
Fact table facts
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data models
Data modelsData models
Data models
 
Data processing
Data processingData processing
Data processing
 
Database and types of database
Database and types of databaseDatabase and types of database
Database and types of database
 
Software Programs for Data Analysis
Software Programs for Data AnalysisSoftware Programs for Data Analysis
Software Programs for Data Analysis
 
Design approach
Design approachDesign approach
Design approach
 
Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentation
 
Data Modeling Basics
Data Modeling BasicsData Modeling Basics
Data Modeling Basics
 
Leisure Life E-Commerce Bookstore
Leisure Life E-Commerce BookstoreLeisure Life E-Commerce Bookstore
Leisure Life E-Commerce Bookstore
 
Intro To DataBase
Intro To DataBaseIntro To DataBase
Intro To DataBase
 
Database Basics
Database BasicsDatabase Basics
Database Basics
 
Star schema PPT
Star schema PPTStar schema PPT
Star schema PPT
 
Database intro
Database introDatabase intro
Database intro
 

Similar to Introduction to Data Warehousing

It 302 computerized accounting (week 2) - sharifah
It 302   computerized accounting (week 2) - sharifahIt 302   computerized accounting (week 2) - sharifah
It 302 computerized accounting (week 2) - sharifahalish sha
 
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docxGROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docxnyashatumba
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxhajon27910
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke LonerganLuke Lonergan
 
Database and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceDatabase and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceYeng Ferraris Portes
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to databaseSuleman Memon
 
Understanding Data Modelling Techniques: A Compre….pdf
Understanding Data Modelling Techniques: A Compre….pdfUnderstanding Data Modelling Techniques: A Compre….pdf
Understanding Data Modelling Techniques: A Compre….pdfLynn588356
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Information Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docxInformation Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docxjaggernaoma
 
DATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfDATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfNikitaKumari71
 

Similar to Introduction to Data Warehousing (20)

It 302 computerized accounting (week 2) - sharifah
It 302   computerized accounting (week 2) - sharifahIt 302   computerized accounting (week 2) - sharifah
It 302 computerized accounting (week 2) - sharifah
 
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docxGROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
GROUP 5-1 FINAL DOC SOFTWARE PROJECT MANAGEMENT.docx
 
Database
DatabaseDatabase
Database
 
Data Management
Data ManagementData Management
Data Management
 
Bi assignment
Bi assignmentBi assignment
Bi assignment
 
BI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptxBI_LECTURE_4-2021.pptx
BI_LECTURE_4-2021.pptx
 
Database
DatabaseDatabase
Database
 
Database
DatabaseDatabase
Database
 
Fundamentals of Database Design
Fundamentals of Database DesignFundamentals of Database Design
Fundamentals of Database Design
 
Star schema
Star schemaStar schema
Star schema
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
Database Concept by Luke Lonergan
Database Concept by Luke LonerganDatabase Concept by Luke Lonergan
Database Concept by Luke Lonergan
 
Database and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceDatabase and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business Intelligence
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Dbms slide share.pptx
Dbms slide share.pptxDbms slide share.pptx
Dbms slide share.pptx
 
Understanding Data Modelling Techniques: A Compre….pdf
Understanding Data Modelling Techniques: A Compre….pdfUnderstanding Data Modelling Techniques: A Compre….pdf
Understanding Data Modelling Techniques: A Compre….pdf
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Information Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docxInformation Systems For Business and BeyondChapter 4Data a.docx
Information Systems For Business and BeyondChapter 4Data a.docx
 
DATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdfDATABASE MANAGEMENT SYSTEMS.pdf
DATABASE MANAGEMENT SYSTEMS.pdf
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Introduction to Data Warehousing

  • 2. In the beginning..  In the early 80s the concept of RDBMS ushered in an era of improved access to the valuable information contained deep within data  Our need for information grew exponentially and we needed a solution for an efficient Decision Supporting System to run the business(OLAP)  With the growing data OLTP systems became inefficient and not optimal for complex query processing, reporting and analytical need
  • 3. In the beginning..  In the early 70s Bill Inmon also famous as “The Father of Data Warehouse” coined the term “Data Warehouse”  According to Inmon “Data Warehouse is a collection of integrated, subject- oriented databases designed to support a DSS (decision support system), where each unit of data is non-volatile and relevant to some moment in time”
  • 4. These concerns have existed for more than three decades-  “We collect tons of data, but we can’t access it.”  “We need to slice and dice the data every which way.”  “Business people need to get at the data easily.”  “Just show me what is important.”  “We spend entire meetings arguing about who has the right numbers rather than making decisions.”  “We want people to use information to support more fact-based decision making.”
  • 5. Why we need DWH?  Consolidation of information resources across multiples platforms and even geographies at a single premise by extracting(E), transforming(T) and finally loading(L)  Improved query performance  Foundation of Data Mining, Data Visualization, Advanced reporting using BI, OLAP tools
  • 6.
  • 7. What is DWH used for?  Information and Knowledge:  Intelligent Reporting and analytics  Studying the trends and nature of business over time  Predicting the future with the help of data  Help business take key decisions and make strategies  Manage the history of transactions happened within an organization Examples:
  • 8. What is DWH used for? Examples  E-commerce Providing cross sales, suggesting different products like AMAZON providing different features like ‘People also bought/viewed’. Companies often use customer data to predict or manipulate the customer requirements considering all the parameters like gender, geography, age etc
  • 9. What is DWH used for? Examples  Supermarkets One notable recent example of this was with the US retailer Target. As part of its Data Mining program, the company developed rules to predict if their shoppers were likely to be pregnant. By looking at the contents of their customers shopping baskets, they could spot customers who they thought were likely to be expecting and begin targeting promotions for nappies, cotton wool and so on. The prediction was so accurate that Target made the news by sending promotional coupons to families who did not yet realize they were pregnant!
  • 10. What is DWH used for? Examples  Crime Agencies (where is crime most likely to happen and when?) Crime prevention agencies use analytics and Data Mining to spot trends across myriads of data helping with everything from where to deploy police manpower.
  • 11. What is DWH used for? Examples  Service Providers- Mobile phone and utilities companies use Data Mining and Business Intelligence to predict ‘churn’, the terms they use for when a customer leaves their company to get their phone/gas/broadband from another provider. They collate billing information, customer services interactions, website visits and other metrics to give each customer a probability score, then target offers and incentives to customers whom they perceive to be at a higher risk of churning.
  • 12. Responsibilities of DW/BI Managers:  Understand the business users and determine the decisions that the business users want to make with the help of the DW/BI system.  Deliver high-quality, relevant, and accessible information and analytics to the business users:  Produce robust, presentable and meaningful data  Continuously monitor the accuracy of the data and analyses  Adapt to changing user profiles, requirements, and business priorities, along with the availability of new data sources  Sustain the DW/BI environment by updating the DW/BI system on a regular basis and justify staffing and on-going expenditures
  • 13. Pre-requisites before diving in…  Types of Databases:  OLTP  OLAP  What is Normalization?  Types (1NF, 2NF, 3NF)  Keys (Primary, Foreign, Composite, Surrogate)  Data Modelling  Conceptual  Logical  Physical  E-R models  Dimensional Modelling  Star Schema  Fact Tables  Dimension tables  Slowly changing dimensions (SCD – I, SCD – II, SCD – III)
  • 14. OLTP and OLAP Databases  Types of Databases:  OLTP - Online transactional processing  OLAP - Online analytical processing  One of the most important assets of any organization is its information. This asset is almost always used for two purposes: operational record keeping and analytical decision making.  Simply speaking, the operational systems (OLTP) are where you put the data in, and the DW/BI system (OLAP) is where you get the data out.
  • 15. OLTP vs OLAP OLTP  OLTPs are the original source of the data.  To control and run fundamental business tasks  Reveals a snapshot of ongoing business processes  Highly normalized with many tables  Low in space  Volatile OLAP  OLAP data comes from the various OLTP Databases  To help with planning, problem solving, and decision support  Multi-dimensional views of various kinds of business activities  De-normalized, star schema  High space  Non-volatile
  • 16. What is Normalization?  Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data.  It divides larger tables to smaller tables and links them using relationships.  There are 3 prominent types of normalization but these techniques are still evolving and we have totally 7 normalization techniques  1NF  2NF  3NF
  • 17. Types of Normalization: 1NF  Rules:  Each table cell should contain a single value.  Each record needs to be unique
  • 18. Types of Normalization: 2NF  Rules:  The entity should be in1NF form  All attributes within the entity should depend solely on the unique identifier of the entity
  • 19. Types of Normalization: 3NF  Product table  Brand table Rules: • The entity should be in 2NF form • No column entry should be dependent on any other entry (value) other than the key for the table
  • 20. Types of Normalization: 3NF  Product brand table  ER model:
  • 21. Different types of keys -  Primary Key - It identify each record uniquely in table. Primary key does not allow null value in the column and keeps unique values throughout the column.  Foreign Key - In a relationship between two tables, a primary key of one table is referred as a foreign key in another table. Foreign key can have duplicate values in it and can also keep null values if column is defined to accept nulls.  Candidate Key- It can be selected as a primary key of the table. A table can have multiple candidate keys, out of which one can be selected as a primary key.  Unique Key - It can contain only unique values it also permits NULL values  Alternate Key - It is a candidate key, not selected as primary key of the table.  Composite Key (also known as compound key or concatenated key) - It is a group of two or more columns that identifies each row of a table uniquely. Individual column of composite key might not able to uniquely identify the record. It can be a primary key or candidate key also.  Super Key - It is a set of columns that uniquely identifies each row in a table. Super key may hold some additional columns which are not strictly required to uniquely identify each row. Primary key and candidate keys are minimal super keys or you can say subset of super keys.
  • 22.
  • 23. Data Modelling  Conceptual representation of different database tables/objects which depicts the blueprint of the whole schema.  There are 3 types of data modelling techniques which can be mentioned in the below hierarchical manner:  Conceptual  Logical  Physical
  • 24. Data Modelling (Conceptual Model)  The main aim of this model is to establish the entities, their attributes, and their relationships. It has very less details available of the actual Database structure.  The 3 basic tenants of Data Model are  Entity: A real-world thing ( Constomer & Product)  Attribute: Characteristics or properties of an entity  Relationship: Dependency or association between two entities.
  • 25. Data Modelling (Logical Model)  It defines the structure of the data elements and set the relationships between them.  The advantage of the Logical data model is to provide a foundation to form the base for the Physical model.  At this Data Modeling level, no primary or secondary key is defined.
  • 26. Data Modelling (Physical Model)  It describes the database specific implementation of the data model.  It offers an abstraction of the database and helps generate schema.  This type of Data model also helps to visualize database structure.  It helps to model database columns keys, constraints, indexes, triggers, and other RDBMS features.
  • 27. Entity Relationship Models/Diagrams  Entity-relationship diagrams (ER diagrams or ERDs) are drawings that communicate the relationships between tables.  It is a type of flowchart that illustrates how “entities” such as people, objects or concepts relate to each other within a system.  ER Diagrams are most often used to design or debug relational databases.  Also known as ERDs or ER Models, they use a defined set of symbols such as rectangles, diamonds, ovals and connecting lines to depict the interconnectedness of entities, relationships and their attributes.
  • 28.
  • 29. Dimensional Modelling  Dimensional modeling is widely accepted as the preferred technique for presenting analytic data because it addresses two simultaneous requirements:  Deliver data that’s understandable to the business users.  Deliver fast query performance.  Both 3NF and dimensional models can be represented in ERDs because both consist of joined relational tables; the key difference between 3NF and dimensional models is the degree of normalization.  Normalized 3NF structures divide data into many discrete entities, each of which becomes a relational table. A database of sales orders might start with a record for each order line but turn into a complex spider web diagram as a 3NF model, perhaps consisting of hundreds of normalized tables.
  • 30.
  • 31. Dimensional Modelling (contd.)  Normalized 3NF structures are immensely useful in operational processing (OLTP) because an update or insert transaction touches the database in only one place.  Normalized models, however, are too complicated for BI queries. Users can’t understand, navigate, or remember normalized models that resemble a map of a city.  The complexity of users’ unpredictable queries overwhelms the database optimizers, resulting in disastrous query performance.  Fortunately, dimensional modeling addresses the problem of overly complex schemas in the presentation area.  A dimensional model contains the same information as a normalized model, but packages the data in a format that delivers user understandability, query performance, and resilience to change.
  • 32. Dimensional Modelling – Star Schema  Dimensional models implemented in relational database management systems are referred to as star schemas because of their resemblance to a star-like structure.  The downside of dimensional modelling is that you pay a load performance price for these capabilities, especially with large data sets.
  • 33. Dimensional Modelling – Fact Tables  The fact table in a dimensional model stores the performance measurements resulting from an organization’s business process events.  You should strive to store the low- level measurement data resulting from a business process in a single dimensional model (grain ex: "balls per innings").  Imagine standing in the marketplace watching products being sold and writing down the unit quantity and dollar sales amount for each product in each sales transaction.
  • 34. Dimensional Modelling – Fact Tables (contd.)  Types of facts additive, semi-additive, non-additive:  Additive - Additive measures can be summed across any of the dimensions associated with the fact table. (sales amount)  Semi-Additive – They can be summed across some dimensions, but not all; balance amounts are common semi-additive facts because they are additive across all dimensions except time  Non – Additive - Some measures are completely non-additive, such as ratios, percentages and percentiles.  Despite their sparsity, fact tables usually make up 90 percent or more of the total space consumed by a dimensional model.  Fact tables tend to be deep in terms of the number of rows, but narrow in terms of the number of columns.
  • 35. Dimensional Modelling – Dimension Tables  The dimension tables contain the textual context associated with a business process measurement event.  They describe the “who, what, where, when, how, and why” associated with the event.  Dimension tables often have many columns or attributes.  Dimension tables tend to have fewer rows than fact tables, but can be wide with many large text columns.  Each dimension is defined by a single primary key , which serves as the basis for referential integrity with any given fact table to which it is joined.
  • 36. Dimensional Modelling – Dimension Tables  Dimension attributes serve as the primary source of query constraints, groupings, and report labels.  Dimension attributes are critical to making the DW/BI system usable and understandable.  The analytic power of the DW/BI environment is directly proportional to the quality and depth of the dimension attributes.  Instead of third normal form, dimension tables typically are highly denormalized with flattened many-to-one relationships within a single dimension table.  We can almost always trade off dimension table space for simplicity and accessibility.
  • 37. Facts and Dimensions joined in a star schema  The first thing to notice about the dimensional schema is its simplicity and symmetry.  The charm of the design in is that it is highly recognizable to business users.  Furthermore, the reduced number of tables and use of meaningful business descriptors make it easy to navigate and less likely that mistakes will occur.  The simplicity of a dimensional model also has performance benefits.  Database optimizers process these simple schemas with fewer joins more efficiently.  Dimension attributes supply the report filters and labeling, whereas the fact tables supply the report’s numeric values.
  • 38. Facts and Dimensions joined in a star schema SELECT st.district_name, pd.brand, SUM(sf.sales_dollars) AS "Sales Dollars" FROM store st,--dimension table product pd,--dimension table DATE dt,--dimension table sales_facts sf --fact table WHERE dt.month_name ="January" AND dt.year =2013 AND st.store_key = sf.store_key AND pd.product_key = sf.product_key AND dt.date_key = sf.date_key GROUP BY st.district_name, pd.brand
  • 39. Slowly changing dimensions  It is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.  There are three types of SCDs and you can use Warehouse Builder to define, deploy, and load all three types of SCDs.  Type 1 SCDs - Overwriting  Type 2 SCDs - Creating another dimension record  Type 3 SCDs - Creating a current value field
  • 40. Slowly Changing Dimensions (contd.)  Type 1 -Overwriting the old value. In this method no history of dimension changes is kept in the database. The old dimension value is simply overwritten be the new one.
  • 41. Slowly Changing Dimensions (contd.)  Type 2 - Creating a new additional record. In this methodology all history of dimension changes is kept in the database. Changes in the attributes are captured by adding a new row with a new surrogate key to the dimension table.
  • 42. Slowly Changing Dimensions (contd.)  Type 3 - Adding a new column. In this type usually only the current and previous value of dimension is kept in the database. The new value is loaded into 'current/new' column and the old one into 'old/previous' column. Generally speaking the history is limited to the number of column created for storing historical data. This is the least commonly needed technique.
  • 43. Data Marts  A data mart is focused on a single functional area of an organization and contains a subset of data stored in a Data Warehouse.  A data mart is a condensed version of Data Warehouse and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR or finance.  It is often controlled by a single department in an organization.  Data Mart usually draws data from only a few sources compared to a Data warehouse.  Data marts are small in size and are more flexible compared to a Data warehouse.
  • 44. A glimpse of Data Warehouse Architecture
  • 48. Data Warehouse Architectures (In Detail)  Topics:  Independent Data Mart Architecture  Kimball Architecture  Inmon Architecture (Hub-and-Spoke Corporate Information Factory)  Key differences between Inmon and Kimball architectures
  • 49. Independent Data Mart Architecture  Data is deployed on a departmental basis without concern to sharing and integrating information across the enterprise.  Its less recommended but this approach is prevalent, especially in large organizations.  It’s the path of least resistance for fast development at relatively low cost, at least in the short run.  Multiple uncoordinated extracts from the same operational sources and redundant storage of analytic data are inefficient and wasteful in the long run.
  • 50. Kimball Architecture  There are four separate and distinct components to consider in the DW/BI environment:  Operational source systems (OLTP)  ETL system  Data presentation area (Dimensional Model)  Business intelligence applications.
  • 52. Kimball Architecture (contd.)  Operational Source Systems:  These are the operational systems of record that capture the business’s transactions (OLTP)  Extract, Transformation, and Load System:  Extraction is the first step in the process of getting data into the data warehouse environment  There are numerous potential transformations, such as :  Cleansing the data. Ex converting to standard date formats or correcting the misspellings.  Combining data from multiple sources, and de-duplicating data.  Primary mission of the ETL system is to hand off the dimension and fact tables in the delivery step, these subsystems are critical.
  • 53. Kimball Architecture (contd.)  Data presentation area:  The dimensional model with the star schema which is the byproduct of the whole ETL design.  Business Intelligence Applications:  By definition, all BI applications query the data in the DW/BI presentation area to generate the data in more meaningful and descriptive form using various reporting techniques like pie charts, bar graphs, cross tabs etc.
  • 54. Hub and spoke Corporate Information Factory (Inmon)  This model identifies the key subject areas, and most importantly, the key entities the business operates with and cares about, like customer, product, vendor, etc.  Detailed logical model is created for each major entity. Ex: A logical model will be built for Customer with all the details related to that entity.  The key point here is that the entity structure is built in normalized form (3NF), data redundancy is avoided as much as possible.  Data marts specific for departments are built on top of the 3NF model and the data marts can have de-normalized data to help with reporting/ fast querying.
  • 55. Kimball vs Inmon Architectures Characteristics Favours Kimball Favours Inmon Business decision support requirements Tactical Strategic Data integration requirements Individual business requirements Enterprise-wide integration The structure of data KPI, business performance measures, scorecards… Data that meet multiple and varied information needs and non-metric data Persistence of data in source systems Source systems are quite stable Source systems have high rate of change Skill sets Small team of generalists Bigger team of specialists Time constraint Urgent needs for the first data warehouse Longer time is allowed to meet business’ needs. Cost to build Low start-up cost High start-up costs
  • 56. Refrences  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition Ralph Kimball, Margy Ross  Introduction to Data Warehousing Concepts docs.oracle.com  A Short History of Data Warehousing – DATAVERSITY www.dataversity.net  Data integration – Wikipedia en.wikipedia.org  Data Warehouse Design: The Good, the Bad, the Ugly blog.panoply.io  ER Model Basic Conceptswww.tutorialspoint.com  Oracle ATG Web Commerce - Search ERDdocs.oracle.com  Document Entities | Oracle Magazineblogs.oracle.com