Bill Inmon – the “father of data warehouse” – has written 53 books published in nine languages. Bill’s latest adventure is the building of technology known as textual disambiguation – technology that reads raw text in a narrative format and allows the text to be placed in a conventional data base so that it can be analyzed by standard analytical technology, thereby creating unique business value for Big Data/unstructured data. Bill was named by ComputerWorld as one of the ten most influential people in the history of the computer profession. Bill lives in Castle Rock, Colorado. For more information about textual disambiguation refer to www.forestrimtech.com.
2. The data warehouse
- a definition
A subject oriented, non volatile,
integrated, time variant collection
of data for the support of management’s
decisions
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
3. Granular, detailed data and lots of it
Data that can be shaped and reshaped
A foundation of reconcilability
A basis for new, unknown analysis
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
5. key
An identifier
Unique or non unique
Often a compound key
May be natural or blind
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
6. time
Time variancy
- continuous
- from date/to date
- periodic discrete
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
9. No overlap
Discontinuity is a possibility
999000 From the beginning of time to the end of time
Continuous time span data
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
10. Periodic discrete structure
Jan 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Feb 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Mar 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Apr 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
The notion of taking a snapshot as of some one
moment in time
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
11. Periodic discrete structure
Jan 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Feb 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Mar 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
Apr 1
Expenses
Revenues
No of employees
Stock price
Price per share
………………….
The structure says nothing about values as of any other
date
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
12. Periodic discrete structure
For few variables
For slow changing variables
Continuous time span data
For many variables
For quickly changing variables
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
13. Primary data
Primary data relates directly to the key
Example – key – ssno
- primary data – name, date of birth
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
14. Secondary data
Secondary data relates directly to
the primary data
Example – key – ssno
- primary data – name, date of birth
- secondary data – address, zip, phone
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
15. The granular data in the
data warehouse –
- serves as a basis for
many other forms of DSS
- is instantly available
- forms a foundation of
reconcilability
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
16. Relational
structures Star joins
requirements
The data warehouse is shaped by the data model;
The star join world is shaped by requirements
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
17. Often called
Multi dimensional data
Often called
Atomic data
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
21. ETL
Extract/transform/load
The integration and conversion of data
is the most difficult part of the data warehouse
process
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
22. Transformation code can
be generated manually or
automatically.
Automatically is always
preferred
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
23. The functions performed
by the ETL process are
not trivial -
Convert
Reformat
Add time element
Restructure
New key
Add default values
Change dbms
Change operating system
Summarize
Break into multiple records
Convert key structure
Merge records
Collect metadata
Conform to data model
Select data/reject data
Add indexes
Change encoding
Change hardware environments
Resequence data
Ascii to ebcdic;ebcdic to ascii
Partition data
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
24. ETL performed in host
environment
ETL performed in
source environment
ETL processing can be
performed in different places
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C
25. data warehouse –
at the center of the
decision making of
the corporation
Forest
Rim
Technology
Copyright Inmon Consulting Services, 2008C