The document discusses online analytical processing (OLAP) and the need for OLAP capabilities beyond basic data analysis. It describes how OLAP uses multidimensional data models and pre-computed aggregates to provide fast and interactive analysis of data across multiple dimensions. Different approaches for implementing OLAP like ROLAP, MOLAP, and hybrid systems are covered.
2. OBJECTIVES
What is OLAP
Need for OLAP
Features & functions of OLAP
Different OLAP models
OLAP implementations
2
3. DEMAND FOR OLAP
To develop DM, three approaches
In all approaches, Data Marts rest
on Dimensional Model
Data Marts are sufficient for basic
data analysis
Users need to go beyond such
basic analysis
3
4. DEMAND FOR OLAP
Need for Multidimensional Analysis
Fast Access & Powerful
Calculations
Limitations of other analysis
methods like:
SQL
Spreadsheets
Report Writers
4
5. DEMAND FOR OLAP
Traditional tools of report writers,
query products, spreadsheets, &
language interfaces do not match the
user expectations as far as
performing multidimensional analysis
with complex calculations is
concerned.
Tools used with OLTP and basic DW
environments do not match up to the
task
5
6. OLAP IS THE ANSWER!
OLAP is a category of software technology
that enables analysts, managers, and
executives to gain insight into the data
through fast, consistent, interactive, access in
a wide variety of possible views of information
that has been transformed from raw data to
reflect the real dimensionality of the
enterprise as understood by the user.
6
7. 7
Why is OLAP useful?
Facilitates multidimensional data
analysis by pre-computing
aggregates across many sets of
dimensions
Provides for:
Greater speed and responsiveness
Improved user interactivity
8. DATA WAREHOUSES
A data warehouse is based on a
multidimensional data model which views data
in the form of a data cube
A data cube allows data to be modeled and
viewed in multiple dimensions
In data warehousing literature, an n-D base cube
is called a base cuboid. The top most 0-D
cuboid, which holds the highest-level of
summarization, is called the apex cuboid. The
lattice of cuboids forms a data cube.
8
9. LATTICE OF CUBOIDS
9
all
time item location supplier
time,item time,location
item,location
time,supplier
item,supplier
location,supplier
time,item,location
time,location,supplier
time,item,supplier
item,location,supplier
time, item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
11. AGGREGATES
11
• Add up amounts for day 1
• In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
81
12. AGGREGATES
12
• Add up amounts by day
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
ans date sum
1 81
2 48
13. Operators: sum, count, max, min, median,
avg
“Having” clause
Using dimension hierarchy
average by region (within store)
maximum by month (within date)
13
Aggregates
17. AGGREGATION
USING HIERARCHIES
17
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
region A region B
p1 56 54
p2 11 8
customer
region
country
(customer c1 in Region A;
customers c2, c3 in Region B)
19. CUBE AGGREGATES LATTICE
19
all
city product date
c1 c2 c3
p1 67 12 50
city, product city, date product, date
city, product, date
day 2
c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
c1 c2 c3
p1 56 4 50
p2 11 8
day 1
129
use greedy
algorithm to
decide what
to materialize
21. DIMENSION HIERARCHIES
21
all
city product date
city, product
city, date product, date
city, product, date
state, date
state, product, date
state
state, product
not all arcs shown...
22. INTERESTING HIERARCHY
22
all
years
quarters
months
weeks
days
time day week month quarter year
1 1 1 1 2000
2 1 1 1 2000
3 1 1 1 2000
4 1 1 1 2000
5 1 1 1 2000
6 1 1 1 2000
7 1 1 1 2000
8 2 1 1 2000
conceptual
dimension table
23. SAMPLE CUBE
23
Total annual sales
of TV in U.S.A. Date
Total annual sales
of PC in U.S.A.
Total annual sales
Total Q1 sales of VCR in U.S.A.
In U.S.A
Total Q1 sales
In Canada
Total Q1 sales
In Mexico
Country
sum
sum
TV
PC
VCR
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
Total Q2 sales
In all countries
Total Q1 sales
In all countries
Total sales
In U.S.A
Total sales
In Canada
Total sales
In Mexico
TOTAL SALES
30. OTHER OLAP
OPERATIONS
o Drill-Across: Queries involving more than one fact table
o Drill-Through: Makes use of SQL to drill through the
bottom level of a data cube down to its back-end relational
tables
o Pivot (rotate): Pivot (also called "rotate") is a
visualization operation which rotates the data axes in
view in order to provide an alternative presentation of
the data. Other examples include rotating the axes in a
3-D cube, or transforming a 3-D cube into a series of 2-
D planes.
30
31. OTHER OLAP
OPERATIONS
31
o Moving Averages
o Growth Rates
o Depreciation
o Currency Conversion
o Statistical Functions
o Top N or Bottom N queries
32. 32
Conceptual vs. Actual
The “cube” is a logical way of
visualizing the data in an OLAP
setting
Not how the data is actually
represented on disk
Two ways of storing data:
ROLAP: Relational OLAP
MOLAP: Multidimensional OLAP
33. Construction of the data cube is key
to the operation of OLAP
The computation process creates a
set of aggregates on the various
dimensions of the data
The CUBE operator
33
OLAP & CUBE
35. 35
The CUBE Operator
Proposed by Gray et al*
Effectively involves a series of
GROUP-BY operations to
aggregate data
Creates power set on all
attributes according to:
A measure
An aggregator function
*J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D. Reichart, M. Venkatrao, F. Pellow and H. Pirahesh.
Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals.
Data Mining and Knowledge Discovery, 1:29-54, 1997.
36. Problem: this generates a lot of data
and work (2n sets in total, where n is
the number of dimensions)
Solution: optimized algorithms to run
faster, consume less memory, and
perform fewer I/Os.
36
CUBING Problem
37. o ROLAP-based cubing algorithms
(Agarwal et al’96)
o Array-based cubing algorithm
(Zhao et al’97)
37
Efficient Computation
of Data Cubes
S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta, J. F. Naughton, R.
Ramakrishnan and S.Sarawagi.
On the computation of multidimensional aggregates. In VLDB'96.
Y. Zhao, P. M. Deshpande, and J. F. Naughton.
An array-based algorithm for simultaneous multidimensional aggregates.
In SIGMOD'97.
38. o How many cuboids in a cube with 3 dimensions?
o Answer:
o As many group by operations?
o No hierarchies involved!!
o π (Li +1), where Li is the number of levels
associated with dimension I
o 10 dimensions & 4 levels for each
dimension
o Total Cuboids = 510
38
Efficient Computation
of Data Cubes
39. It is all about which DBMS you choose
to store your data warehouse data
RDBMS – ROLAP
MDDB – MOLAP
BOTH - HOLAP
39
Approaches to OLAP
Servers
40. Three possibilities for OLAP servers
(1) Relational OLAP (ROLAP)
Relational and specialized relational DBMS to store and
manage warehouse data
OLAP middleware to support missing pieces
(2) Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures
(3) Hybrid OLAP (HOLAP)
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools
40
Approaches to OLAP
Servers
41. Special schema design: star, snowflake
Special indexes: bitmap, multi-table join
Proven technology (relational model, DBMS), tend
to outperform specialized MDDB especially on
large data sets
Products
IBM DB2, Oracle, Sybase IQ, RedBrick,
Informix
41
ROLAP
42. Defines complex, multi-dimensional data with
simple model
Reduces the number of joins a query has to
process
Allows the data warehouse to evolve with
relatively low maintenance
Can contain both detailed and summarized data.
ROLAP is based on familiar, proven, and already
selected technologies.
BUT!!!
SQL for multi-dimensional manipulation of
calculations.
42
ROLAP
43. MDDB: a special-purpose data model
Facts stored in multi-dimensional
arrays
Dimensions used to index array
Sometimes on top of relational DB
Products
Pilot, Arbor Essbase, Gentia
43
MOLAP
44. Pre-calculating or pre-consolidating transactional data
improves speed.
BUT
Fully pre-consolidating incoming data, MDDs require an
enormous amount of overhead both in processing time and
in storage. An input file of 200MB can easily expand to 5GB
MDDBs are great candidates for the < 100GB department
data marts.
With MDDs, application design is essentially the definition of
dimensions and calculation rules, while the RDBMS
requires that the database schema be a star or snowflake.
44
MOLAP
45. User Needs
Multidimensional view
Excellent Performance
Analytical Flexibility
Real-Time Data Access
High Data Capacity
MIS Needs
Leverages Data Warehouse
Easy Development
Low Structure Maintenance
Low Aggregate Maintenance
45
OLAP Needs
46. Multidimensional View
All true OLAP tools, whether they work with a
MDDB or an RDBMS, provide a
multidimensional view of data.
For example, decision makers may view
sales by office, quarter, representative,
product, etc. This perspective on data, which
mirrors the way business professional think,
allows for more intuitive and more powerful
analysis.
46
OLAP Needs: User
Needs
47. Excellent Performance
The performance of your decision support
tool directly depends on the way it
manages aggregates.
RDBMS
Calculate aggregates on fly (response time
suffers)
DBA creates summary tables to store
aggregates (enormous amount of disk space)
47
OLAP Needs: User
Needs
48. Excellent Performance
For example, suppose you have a Sales indicator with
six dimensions—Representatives, Products, Customers,
Regions, Months, and Years.
MOLAP tools will store a given aggregate, such as the
November 1997 government sales of product A504 by
representative 1040 in New York, in 1 cell of the MDDB.
In contrast, ROLAP tools consume 600% more space,
because they require a record of seven values—six
foreign keys and the actual aggregate—in a relational
summary table.
48
OLAP Needs: User
Needs
50. Excellent Performance
50
OLAP Needs: User
Needs
RDBMSs must use several summary tables to store the aggregates
that a MOLAP could store in just one cube. For example, consider a Sales
indicator with three dimensions: Months, Regions, and Products. The
indicator cube will contain seven sets of aggregates:
• Sales by month
• Sales by product
• Sales by region
• Sales by month and product
• Sales by month and region
• Sales by product and region
• Sales by product, month, and region
To store these aggregates in an RDBMS, you’d have to create seven
summary tables, one for each aggregate set.
HOW MANY SUMMARY TABLES FOR 6 DIMENSIONS?
(Separate fact table and shrunken dimension table approach for storing
aggregates)
51. Excellent Performance
51
OLAP Needs: User
Needs
• Huge amounts of extra storage space is required (even if
there is no sparsity failure)
• Maintenance costs are high
• Lot of statistical analysis needs to be done to decide which
aggregates are to be precomputed
• DBA must keep the cost/performance ratio in check
52. Excellent Performance
52
OLAP Needs: User
Needs
• In contrast, we’ve seen that multidimensional databases
store aggregates in a very compact structure that consumes
very little disk space and requires very little maintenance
• All levels of consolidation can therefore be precomputed
and stored in MDDB
• As a result, fast response time is not limited to the most frequently
accessed queries; all aggregates can be accessed with lightning
speed.
53. Analytical Flexibility
Both ROLAP & MOLAP tools offer comparative
performance for
Comparative Analysis
Roll-up and Drill-down
Slicing & Dicing
Only MOLAP tools offer ‘what-if’ analysis
53
OLAP Needs: User
Needs
54. Real-Time Data Access
MOLAP tools load data into the multidimensional cubes.
Consequently, the data being accessed is only as recent as the
last load.
Some applications require real-time data access
Process of continually refreshing the data attaches higher costs
to operating a MOLAP system
Some MOLAP tools offer reach-through functionality to access
volatile data stored outside the MDDB
Unfortunately, users must be aware of the underlying database
structure
Relational data access is too complex for the typical user
54
OLAP Needs: User
Needs
55. Real-Time Data Access
ROLAP tools maintain a constant link to the
operational RDBMS, which provides users with up-to-
the-minute, accurate data
(Real-Time Data Warehousing)
Industries & organizations with highly volatile data
particularly benefit from this access to live,
operational data.
55
OLAP Needs: User
Needs
56. High Capacity Data
MOLAP products are limited by the size of the cube
defined by the multidimensional view. When
dimension elements are predefined, the scope of
available data is limited at the onset.
ROLAP tools circumvent this barrier. Dynamic
dimensions are not stored in the predefined
multidimensional model, but fetched at run time from
the RDBMS.
56
OLAP Needs: User
Needs
57. High Capacity Data
57
OLAP Needs: User
Needs
o In MOLAP, only aggregates are stored in the cube.
Atomic, operational data are forced out of the user’s
analytical realm.
o ROLAP systems can access extremely detailed
operational data, as well as aggregated data stored in
summary tables.
58. MIS Needs
Administrators should be able to
leverage their existing relational
databases without devoting large
amounts of time and effort to intricate
development, fine tuning, or intensive
maintenance.
58
OLAP Needs
59. Leveraging Data Warehouse
Both the finance and the MIS departments of your
organization will appreciate a decision support tool
that leverages existing investments in data
warehousing.
MIS staff that opts for a MOLAP tool must duplicate
data in its own proprietary MDDB.
MIS staff that chooses a ROLAP tool will be able to
access the data warehouse directly.
59
OLAP Needs: MIS
Needs
60. Easy Development
MOLAP development is straightforward, it requires no fine
tuning and creates its own aggregates.
ROLAP tools, on the other hand, require a specific schema for
the relational database.
Skilled DBAs must provide the appropriate schema (star or
snowflake schema), tune the database, and create the
appropriate summary tables.
However, many ROLAP tools are metadata-driven, which
means the multidimensional view is generated and maintained
more easily.
60
OLAP Needs: MIS
Needs
61. Low Structure Maintenance
The structure of a MOLAP tool’s underlying MDDB greatly
depends on each of its dimensions. When one dimension
changes, the entire MDDB must be re-structured.
Multi-matrix MDDBs reduce the maintenance burden
ROLAP systems do not store data in a proprietary structure.
They build and maintain a constant link between the
multidimensional view and the underlying RDBMS using the
metadata.
No database restructuring is required.
61
OLAP Needs: MIS
Needs
62. Low Aggregate Maintenance
MOLAP tools automatically create high-level aggregates
based on your lower-level MDDB data and aggregate
definitions.
When data is updated, the aggregates are automatically
updated and stored in the MDDB.
With ROLAP tools, MIS staff must continually monitor the use
of summary tables to keep their cost/performance ratio in
check.
DBAs inevitably use sophisticated statistics to isolate only the
most frequently accessed aggregates, and store them in
summary tables.
These tables leave ROLAP administrators with a heavy
maintenance burden.
62
OLAP Needs: MIS
Needs
64. 64
ROLAP vs. MOLAP
1) Performance:
• How fast will the system appear to the end-user?
• MDD server vendors believe this is a key point
in their favor.
2) Data volume and scalability:
• While MDD servers can handle up to 100GB of
storage, RDBMS servers can handle hundreds of
gigabytes and terabytes.
65. o Best of both worlds
o Storing detailed data in RDBMS
o Storing aggregated data in MDBMS
o User access via MOLAP tools
65
Hybrid OLAP - HOLAP
66. 66
HOLAP
Multi-dimensional
access
Multidimensional
Viewer
Relational
Viewer
MDBMS Server Client
Multi-dimensional
data
RDBMS Server
SQL-Read
User
data Meta data
Derived
data
SQL-Reach
Through
SQL-Read
67. IF
A. You require write access
B. Your data is under 50 GB
C. Your timetable to implement is 60-90 days
D. Lowest level already aggregated
E. Data access on aggregated level
F. You’re developing a general-purpose application for inventory movement or assets management
THEN
Consider an MDD /MOLAP solution for your data mart
IF
A. Your data is over 100 GB
B. You have a "read-only" requirement
C. Historical data at the lowest level of granularity
D. Detailed access, long-running queries
E. Data assigned to lowest level elements
THEN
Consider an RDBMS/ROLAP solution for your data mart.
IF
A. OLAP on aggregated and detailed data
B. Different user groups
C. Ease of use and detailed data
THEN
Consider an HOLAP for your data mart
67
ROLAP, MOLAP, or
HOLAP
68. ROLAP: RDBMS -> star/snowflake schema
MOLAP: MDDB -> Cube structures
ROLAP or MOLAP: Data models used play major role in
performance differences
MOLAP: for summarized and relatively lesser volumes of data
(100GB)
ROLAP: for detailed and larger volumes of data
Both storage methods have strengths and weaknesses
The choice is requirement specific, though currently data
warehouses are predominantly built using RDBMSs/ROLAP.
HOLAP is emerging as the OLPA server of choice
68
Conclusions