1) Data warehousing aims to bring together information from multiple sources to provide a consistent database for decision support queries and analytical applications, offloading these tasks from operational transaction systems.
2) OLAP is focused on efficient multidimensional analysis of large data volumes for decision making, while OLTP is aimed at reliable processing of high-volume transactions.
3) A data warehouse is a subject-oriented, integrated collection of historical and summarized data used for analysis and decision making, separate from operational databases.
2. Motivation
Aims of information technology:
• To help workers in their everyday business activity
and improve their productivity ie. clerical data
processing tasks
• To help knowledge workers (executives,
managers, analysts) make faster and better
decisions ie. decision support systems
• Two types of applications:
– Operational applications
– Analytical applications
3. Motivation
• In most organizations, data about specific parts of
business is there - lots and lots of data,
somewhere, in some form.
On the other hand:
Data is available but not information -- and not
the right information at the right time.
Aim of data warehouse is to:
• bring together information from multiple sources
as to provide a consistent database source for
decision support queries.
• off-load decision support applications from the on-
line transaction system.
4. Warehousing
• Growing industry: $ 8 billion in 1998
• Range from desktop to huge
warehouses
– Walmart: 900-CPU, 2,700 disks, 23TB
– Teradata system
• Lots of new terms
– ROLAP, MOLAP, HOLAP
– rollup. drill-down, slice& dice
5. The Architecture of Data
What’s has been
learned from data
summaries by Logical model
Business
who, what, rules physical layout of
when,
data
where,... Metadata
Database schema who,
what,
Summary data
when,
Operational data where,
6. Decision Support and OLAP
• DSS: Information technology to help
knowledge workers (executives, managers,
analysts) make faster and better decisions:
what were the sales volumes by region and by
product category in the last year?
how did the share price of computer manufacturers
correlate with quarterly profits over the past 10
years?
will a 10% discount increase sales volume
sufficiently?
• OLAP is an element of decision support
system
• Data mining is a powerful, high-
performance data analysis tool for decision
7. Data Processing Models
There are two basic data processing
models:
• OLTP – the main aim of OLTP is reliable
and efficient processing of a large number
of transactions and ensuring data
consistency.
• OLAP – the main aim of OLAP is efficient
multidimensional processing of large data
volumes.
8. Traditional OLTP
Traditionally, DBMS have been used for
on-line transaction processing (OLTP)
• order entry: pull up order xx-yy-zz and update
status field
• banking: transfer $100 from account X to account
Y
• clerical data processing tasks
• detailed up-to-date data
• structured, repetitive tasks
• short transactions are the unit of work
• read and/or update a few records
• isolation, recovery, and integrity are critical
9. OLTP vs. OLAP
• OLTP: On Line Transaction Processing
– Describes processing at operational sites
• OLAP: On Line Analytical Processing
– Describes processing at warehouse
10. OLTP vs. OLAP
OLTP OLAP
users Clerk, IT professional Knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical, summarized
detailed, flat relational multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write, lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
11. What is a Data Warehouse
• “A data warehouse is a subject-
oriented, integrated, time-variant,
and nonvolatile collection of data in
support of management’s decision-
making process.” --- W. H. Inmon
• Collection of data that is used primarily
in organizational decision making
• A decision support database that is
maintained separately from the
organization’s operational database
12. Data Warehouse - Subject
Oriented
• Subject oriented: oriented to the major
subject areas of the corporation that
have been defined in the data model.
• E.g. for an insurance company: customer,
product, transaction or activity, policy,
claim, account, and etc.
• Operational DB and applications may
be organized differently
• E.g. based on type of insurance's: auto,
life, medical, fire, ...
13. Data Warehouse – Integrated
• There is no consistency in encoding,
naming conventions, …, among
different data sources
• Heterogeneous data sources
• When data is moved to the warehouse,
it is converted.
14. Data Warehouse - Non-Volatile
• Operational data is regularly accessed
and manipulated a record at a time, and
update is done to data in the
operational environment.
• Warehouse Data is loaded and
accessed. Update of data does not
occur in the data warehouse
environment.
15. Data Warehouse - Time Variance
• The time horizon for the data warehouse is
significantly longer than that of operational
systems.
• Operational database: current value data.
• Data warehouse data : nothing more than a
sophisticated series of snapshots, taken of at
some moment in time.
• The key structure of operational data may or
may not contain some element of time. The
key structure of the data warehouse always
contains some element of time.
16. Why Separate Data
Warehouse?
• Performance
• special data organization, access methods,
and implementation methods are needed
to support multidimensional views and
operations typical of OLAP
• Complex OLAP queries would degrade
performance for operational transactions
• Concurrency control and recovery modes
of OLTP are not compatible with OLAP
analysis
17. Why Separate Data
Warehouse?
• Function
• missing data: Decision support requires
historical data which operational DBs do
not typically maintain
• data consolidation: DS requires
consolidation (aggregation, summarization)
of data from heterogeneous sources:
operational DBs, external sources
• data quality: different sources typically use
inconsistent data representations, codes
and formats which have to be reconciled.
18. Advantages of Warehousing
• High query performance
• Local processing at sources unaffected
• Can operate when sources unavailable
• Extra information at warehouse
– Modify, summarize (store aggregates)
– Add historical information