2. DWH & OLAP
•Relationship between DWH &
OLAP
•Data Warehouse & OLAP go
together.
•Analysis supported by OLAP
2
3. Supporting the human thought process
How many such query sequences can be programmed in advance? 3
THOUGHT PROCESSTHOUGHT PROCESS QUERY SEQUENCEQUERY SEQUENCE
An enterprise wide fall in profit
Profit down by a large
percentage consistently during
last quarter only. Rest is OK
What is special about last
quarter ?
Products alone doing OK, but
North region is most
problematic.
What was the quarterly sales
during last year ??
What was the quarterly sales at
regional level during last year ??
What was the monthly sale for
last quarter group by products
What was the monthly sale of
products in north at store level
group by products purchased
OK. So the problem is the high
cost of products purchased
in north.
What was the quarterly sales at
product level during last year?
What was the monthly sale for
last quarter group by region
4. Analysis of last example
• Analysis is Ad-hoc
• Analysis is interactive (user driven)
• Analysis is iterative
– Answer to one question leads to a dozen more
• Analysis is directional
– Drill Down
– Roll Up
– Pivot
4
More in
subsequent
slides
5. Challenges…
• Not feasible to write predefined queries.
– Fails to remain user_driven (becomes programmer
driven).
– Fails to remain ad_hoc and hence is not interactive.
• Enable ad-hoc query support
– Business user can not build his/her own queries
(does not know SQL, should not know it).
– On_the_go SQL generation and execution too slow.
5
6. Challenges
• Contradiction
– Want to compute answers in advance, but don't
know the questions
• Solution
– Compute answers to “all” possible “queries”. But
how?
– NOTE: Queries are multidimensional aggregates at
some level
6
7. “All” possible queries (level aggregates)
7
Province Frontier Punjab...
Division MultanLahorePeshawarMardan ......
Lahore ... Gugranwala
City
Zone GulbergDefense ...
District LahorePeshawar
ALL ALLALL ALL
8. Where Does OLAP Fit In?
• It is a classification of applications, NOT a
database design technique.
• Analytical processing uses multi-level
aggregates, instead of record level access.
• Objective is to support very
I. fast
II. iterative and
III. ad-hoc decision-making.
8
9. Where does OLAP fit in?
9
Transaction
Data
Presentation
Tools
Reports
OLAP
Data Cube
(MOLAP)
Data
Loading
?
Decision
Maker
10. OLTP vs. OLAP
Feature OLTP OLAP
Level of data Detailed Aggregated
Amount of data per
transaction
Small Large
Views Pre-defined User-defined
Typical write
operation
Update, insert, delete Bulk insert
“age” of data Current (60-90 days) Historical 5-10 years and
also current
Number of users High Low-Med
Tables Flat tables Multi-Dimensional tables
Database size Med (109
B – 1012
B) High (1012
B – 1015
B)
Query Optimizing Requires experience Already “optimized”
Data availability High Low-Med
10
11. OLAP FASMI Test
Fast: Delivers information to the user at a fairly
constant rate. Most queries answered in under five
seconds.
Analysis: Performs basic numerical and statistical
analysis of the data, pre-defined by an application
developer or defined ad-hocly by the user.
Shared: Implements the security requirements
necessary for sharing potentially confidential data
across a large user population.
Multi-dimensional: The essential characteristic of
OLAP.
Information: Accesses all the data and information
necessary and relevant for the application, wherever
it may reside and not limited by volume.
...from the OLAP Report by Pendse and Creeth.
11
Hinweis der Redaktion
In olap nothing is online, it is online that it has requirement that ans should come under 5 sec, so an impression of online goes to user. Oltp sys are fully normalized and olap are highly denormalized. We will discuss the ways to make the olap.
The concept of dwh without olap is impossible. When a dwh is developed, olap is used for analysis. Olap is an enviornment or tool which must need a dwh. Olap is not a physical implementation it is a framework, will discuss in detail.
First see how the decision makers think and how could we fill it with the help of queries. The decision maker need the ans of questions so that he could take a decision and we have to mimic it with sql queries. A profit fall or drop occurs and decision maker wants to know why. So the corresponding query will be “what was the quarterly sales during last year” means he wants to know what was the sale in 4 quarters of last year. Then he wants to anlyse w.r.t region and further he see it from another angle that What was the quarterly sales at product level during last year?
Now he see the data from a different angles, because it shouldn’t be the case that some informative area of data should not remain untouch. The ans of these queries will be displayed before him in the form of charts. So when he saw chart he found that rest is ok but the fall of profit occurred only in the last quarter. But then he wants to explore the data of last quarter and wants to explore it, so he say what is special about last quarter. So he say that what was the quarterly sales at product level during last year? And then he saw monthly sales w.r.t products. And then he saw products w.r.t region. So he identified the region which is problematic. Then he saw w.r.t another angle he say What was the monthly sale of products in north at store level group by products purchased
So it revealed that the products purchased at north region was high but to meet the challenge of compititors it may not be feasible to sale the products on those prices, so the fall of profit occurs. Because when you take less profit automatically your gross profit reduces. You will remain in profit but its percentage will fall as compare to the previous years profit. So this is the thought process having support from the anallysis tool known as the olap.
But here a question come that How many such query sequences can be programmed in advance? So can you write the sequence of queries, ans is no. because the questions coming in the mind of decision makers cant be told in advance, so he reveals by exploring from different angles. So don’t you need the speed, and also if you are a decision maker and wants to know the ans from IT people, and they say we will answer them after 2 months. So you will be frustrated. So now we have to design the syst by how your business runs.
So last slide reveals. That query seq is adhoc, no one knows in adv which and how many queries will come first. So a major diff b/w olap and oltp sys. The tool is used by decision maker interactively i.e using mouse on screen. So an interactive process. It should be interactive otherwise thought process breaks. Another thing is the process is iterative, so you know the ans of one question then other then next. It has direction i.e. you aggregate or see the elements i.e. from week to month, or months to days.
Piovting is the way to change the things in which respect you were seeing the data.
You cannot program the queries in adv because you don’t know which queries will come, and you also don’t know how many queries are there. So these are challenges, so we can say, when a decision maker moves the mouse we track the mouse and generate the sql on the go. But suppose it is true, still you have to address the vldbs with billions of rows so how can you bring the answer within 5 secs.
When we discuss analysis, we do not retrieve data at record level, decision maker wants to see the aggregates, then he needs to explore. So the all possible questions are actually all possible aggregates, and you give an enviorment to deciosn maker where he can explore them.
Aggregates have hierarchy which could be drilled down.
Olap is a framework, having different flavours. It has a few chractristics i.e the support of adhoc queires, which should be fast, and iterative, and a point missing that is interactive. So you have to see these chractristics in an application to consider it as the olap.
A diagram from past. Here data from oltp systems loads in dwh and data is loaded in an implementation of olap here the implementation is multidimensional olap, it is a multidimensional data structure and we displayed 3 dimesions because cannot display four dimensions in an instant. Then generate graphical reports.
The data cube contains the information based on the transaction data.
Which tool or sys could considered as olap. They should have 5 chractristics.
Shared: user must be able to see their specific data only.
Multidimensional: databases or data structures should be multidimensional
Info: it should not say I can show this info but not this. It should be able to show all info.