Aggreagate awareness

MF-Retarget: Aggregate Awareness in Multiple Fact
Table Schema Data Warehouses

Karin Becker, Duncan Dubugras Ruiz, and Kellyne Santos

Faculdade de Informática – Pontifícia Universidade Católica do Rio Grande do Sul
http://www.inf.pucrs.br/{~kbecker | ~duncan}
{kbecker, duncan} @inf.pucrs.br, kellyne@ufs.br

Abstract. Performance is a critical issue in Data Warehouse systems (DWs),
due to the large amounts of data manipulated, and the type of analysis
performed. A common technique used to improve performance is the use of
pre-computed aggregate data, but the use of aggregates must be transparent for
DW users. In this work, we present MF-Retarget, a query retargeting
mechanism that deals with both conventional star schemas and multiple fact
table (MFT) schemas. This type of multidimensional schema is often used to
implement a DW using distinct, but interrelated Data Marts. The paper presents
the retargeting algorithm and initial performance tests.

1 Introduction

Data warehouses (DW) are analytical databases aimed at providing intuitive access to
information useful for decision-making processes. A Data Mart (DM), often referred
to as a subject-oriented DW, represents a subset of the DW, comprised of relevant
data for a particular business function (e.g. marketing, sales). DW/DM handle large
volumes of data, and they are often designed using a star schema, which contains
relatively few tables and well-defined join paths. On-line Analytical Processing
(OLAP) systems are the predominant front-end tools used in DW environments,
which typically explore this multidimensional data structure [3, 13]. OLAP operations
(e.g. drill down, roll up, slice and dice) typically result in SQL queries in which
aggregation functions (e.g. SUM, COUNT) are applied to fact table attributes, using
dimension table attributes as grouping columns (group by clause).
A multiple fact tables (MFT) schema is a variation of the star schema, in which
there are several fact tables, necessary to represent unrelated facts, facts of different
granularity, or even to improve performance [10]. A major use of MFT schemas is to
implement a DW through a set of distributed subject-oriented DMs [8, 10], preferably
related through a set of conformed dimensions [6], i.e. dimensions that have the same
meaning at every possible fact table to which it can be joined. In such architecture, a
major responsibility of the central DW design team is to establish, publish and
enforce the conformed dimensions. However, these efforts of the design team are not
enough to guarantee the ease combination, by end users, of facts coming from more
than one DM. Indeed, the straightforward join of facts and dimensions in MFT
schemas imposes a number of restrictions, which are not always possible to be

Y. Manolopoulos and P. Návrat (Eds): ADBIS 2002, pp. 41-51, 2002.

42 Karin Becker et al.

observed, otherwise, one risks to produce incorrect results. Most users do not have the
technical skills for realising the involved subtleties and their implications in terms of
query formulation. Therefore, for most users, queries involving MFT schemas are
more easily handled through appropriate interfaces or specific applications that hide
from them all difficulties involved.
In this paper, we propose MF-Retarget, a query retargeting mechanism that handles
MFT schemas and which is additionally aggregate aware. Indeed, precomputed
aggregation is one of the most efficient performance strategies to solve queries in DW
environments [8]. The retargeting service provides users with transparency from both
an aggregate retargeting perspective (aggregate unawareness) and multiple fact tables’
schema complexity perspective, freeing users from query formulation idiosyncrasies.
The algorithm is generic to work properly regardless the number of fact tables
involved in the query.
The remainder of this paper is structured as follows. Section 2 presents related
work on the use of aggregates. The retargeting algorithm is described in Section 3,
and Section 4 presents some initial performance tests. Conclusions and future work
are addressed in Section 5.

2 Related Work

2.1 Computation of Aggregates

In a DW, most frequently users are interested in some level of summarisation of
original data. One of the most efficient strategies for handling this problem is the use
of pre-computed aggregates for the combination of dimensions/dimension attributes
providing the greatest benefit for answering queries (e.g. frequent or expensive
queries) [4, 8,15, 16].
The computation of aggregates can be dynamic or static. In the former case, it is up
to the OLAP tool or database engine to decide which aggregates are “beneficial”, a
concept that varies from tool to tool. Works such as [1, 2, 4, 5, 7, 14] address dynamic
computation of aggregates. These approaches are different from the static context in
that not only the cost of executing the query is considered, but also
maintenance/reorganisation costs which takes place as query is processed [2, 5]. In
the static context, aggregates are created off-line, and therefore maintenance/
reorganisation costs are not that critical. It should be clear that dynamic and static
aggregate computations are complementary mechanisms. The first addresses
performance tuning from a technical perspective. The latter, addressed in this paper, is
essential from a corporate point of view.
Organisationally, static aggregate computation is fundamental because aggregates
are created based on corporate decisional requirements, prioritising types of analysis
or types of users. Of course, decisional support requirements vary overtime, so it is
fundamental that the DBA monitors the use of the analytical database in order to
revise the necessity of existing and/or new aggregates.
Design alternatives for representing aggregates are extensively discussed in
pragmatic literature such as [6, 10]. Storing each aggregate in its own fact table

MF-Retarget: Aggregate Awareness in Multiple Fact Table Schema Data Warehouses 43

presents many advantages in terms of easiness of manipulation, maintenance,
performance and storage requirements. Aggregation also leads to smaller
representations of dimensions, commonly referred to as shrunken dimensions.
Aggregates should, whenever possible, refer to shrunken dimensions, instead of
original dimensions. A shrunken dimension is commonly stored in a separate table
with its own primary key.
User tools or applications should not reference the aggregate to be used in SQL
queries. First, it must be possible to include/remove aggregates without affecting
users or existing applications. Second, users cannot be in charge of performance
improvement by the selection of the appropriate aggregate.

2.2 Aggregate Retargeting Services

There are three major options where query-retargeting services can be located in the
DW architecture: the desktop, the OLAP server or the database engine [16]. The
query retargeting service can also be located in between these layers, in case no
access to the DBMS engine/OLAP tool source code is provided. Most works in the
literature (e.g. [1, 2, 3, 4, 5, 7, 14]) focus on dynamic computation of aggregates,
considering strategies that are embedded in query processors, such that the retargeting
service can change completely the query execution plan. Dynamic aggregation also
considers a specific moment of user analysis (e.g. a sequence of related drills), and
not the organisational requirements as a whole.
Kimball et al. [6] sketch a query-retargeting algorithm for statically pre-computed
aggregates, which could be inserted as a layer between front-end toll and
OLAP/server DBMS engine. The algorithm is based on the concept of “family of
schemas”, composed of one base fact table and all of its related aggregate tables. One
of the advantages of such algorithm is that it requires very few metadata, basically the
size of each fact table and the available attributes for each aggregate. In this paper, we
extend this algorithm to deal with MFT schemas. Such an extension is useful for DW
architectures implemented by a set of subject-oriented set of DMs, in which users
wishes to performed both separate and integrated analysis.

3 MF-Retarget

The striking feature of MF-Retarget is its ability to handle MFT schemas with the use
of aggregates, yet providing total transparency for users. The joining of several fact
tables requires in the general case that each individual table must be summarised
individually first until all tables are in the same summarising level (exactly the same
dimensions), and then joined. However, most users do not have the technical skills for
realising the problems involved, nor the requirements in terms of query formulation.
See [11] for a deeper discussion on the subtleties involved. Additionally, it should
benefit from the use of aggregates as a query performance tuning mechanism. Hence,
transparency in the context of MFT schemas considered must have a twofold
meaning: a) aggregate unawareness, and b) MFT join complexity unawareness.


MF-Retarget is a retargeting service intended to lie between the front-end tool and
the DBMS, which accepts as input a query written by a user through a proper user
interface (e.g. a graphical one intended for naive users, a specific application). The
algorithm assumes that:
− Users are unaware of MFT joining complexities, and always write a single
query in terms of desired facts and dimensions. The retargeting service is
responsible for rewriting the query to produce correct results, assuming as a
premise that it is always necessary to bring each fact table to the same
summarisation level before joining them.
− Users are unaware of the existence of aggregates, and always formulate the
query in terms of the original base tables and dimensions. The retargeting
service is responsible for rewriting the query in terms of aggregates, if possible.
− Retargeting queries involving a single fact table is a special case of MFT
schemas, and therefore, the algorithm should provide good results in both cases.
The remainder of this section addresses an illustration scenario and describes the
algorithm. Further details on the algorithm, the required metadata and MF_Retarget
prototype can be obtained in [11].

3.3 Algorithm Illustration

To illustrate the functioning of the algorithm, let us consider the example depicted in
Figure 1, a simplification of the MFT schema proposed in the APB-1 OLAP Council
benchmark [9]. For each fact table the fields prefixed by * compose its primary key.
In dimension tables, only one field, the lower one in the hierarchy, composes its
primary key (also prefixed by *). The branches show the referential integrity from a
fact or aggregate table for each of its dimensions. The MFT schema of Figure 1(a)
shows two fact tables (Sales and Inventory), related by three conformed dimensions:
Customer, TimeDim and Product. Sales have an additional dimension, namely
Channel. Also, Figure 1(a) shows some possible aggregates for this schema. In the
picture, grey boxes correspond to shrunken dimensions, i.e. hierarchic dimensions
without one or more lower-level fields.
For example, consider a user wishes to analyse comparatively quarterly Sales of
product divisions, with the corresponding status of the Inventory. This query cannot
be answered simply by joining facts of distinct tables, because these facts represent
information on different granularity, and therefore, they should be brought to the
same summarisation level before they can be joined, otherwise inaccurate results will
be produced. To free the user from the difficulties involved in MFT schemas, MF-
Retarget assumes the user states a single query in terms of facts, dimensions and
desired aggregation level (the input shown in Figure 2, for the example considered).
The retargeting mechanism has then two goals: to correct the query, and try to make it
more efficient with the use of aggregates.
Considering the aggregates illustrated in Figure 1(a), the algorithm realises that
Aggregate4 is the best candidate to answer the question, because it contains all
necessary data, is the smaller one, and already joins Sales and Inventory tables (in
that order). In the absence of Aggregate4, Aggregate1 and Aggregate2 will be used.
If the algorithm does not find any aggregate that can answer the query in a more

TimeDim

Year
Sales Quarter Inventory
Customer *Month
*Cust_ID *Cust_ID
Channel *Prod_ID *Prod_ID
Retailer
*Chan_ID *Store *Time_ID
*Base *Time_ID Product StockUnits
UnitsSold
DollarSales Division
Line
Family
Group
Class Aggregate3
*Code
*Prod_ID
*Time_ID
Aggregate1 StockUnits
*Cust_ID Sh_Customer Aggregate2
*Prod_ID
*Chan_ID *Retailer *Cust_ID
Sh_TimeDim
*Time_ID *Prod_ID
UnitsSold *Time_ID
Year
DollarSales *Quarter StockUnits (a)

Sales Inventory
Sh_Product Aggregate4

Division *Prod_ID Aggregate1 Aggregate2 Aggregate3
*Line *Time_ID
UnitsSold
DollarSales Aggregate4
StockUnits (b)

Fig. 1. MFT schema and possible aggregates, and schema derivation graph

efficient way, at least it transforms the query to produce correct results. Figure 3
shows the results from the algorithm for these three situations.
It should be pointed out that the best aggregate is not always the one that already
joins distinct fact tables. Indeed, in case smaller individual aggregates exist, the cost
of joining them can be smaller than the cost of summarising a much bigger joined pre-
computed aggregate.

3.4 The Algorithm

The algorithm assumes that users have to inform only the tables (fact/dimensions), the
grouping columns (which are the same ones listed in the select clause), the
summarisation functions applied to the measurements, and possibly additional
restrictions in the where clause. It considers the following restrictions to input
queries: a) monoblock queries (select from where group by); b) only transitive
aggregation functions are used; c) all dimensions listed in the from clause apply to all
fact tables listed.
For the algorithm, the relationship between schemas is represented by a directed
acyclic graph G(V, E). In the graph, V represents a set of star schemas, and E
corresponds to set of derivation relationships between any two schemas. The edges of
E form derivation paths, meaning that schema at the end of any path could be derived
by the aggregation of schema related to the start of that path. The use of graph
structures for representing relationships between aggregates is well known [4, 13].
Figure 1(b) presents the derivation graph for the example of Figure 1(a). We assume
that only transitive aggregation functions (i.e. SUM, MAX and MIN) are used in both
the derivation relationships and queries.


“The units sold and units in stock, per quarter and product division”
Select P.Division Division, T.Quarter Quarter, SUM(S.UnitsSold) UnitsSold,
SUM(I.StockUnits) StockUnits From TimeDim T, Product P, Sales S, Inventory I
Where T.Month=S.Time_ID and P.Code=S.Prod_ID and T.Month=I.Time_ID
and P.Code=I.Prod_ID Group by P.Division, T.Quarter

Fig. 2. Input SQL query from a naive DW user

a) considering the existence of Aggregate4:
Select P.Division Division, T.Quarter Quarter, SUM (UnitsSold) UnitsSold,
SUM (StockUnits) StockUnits From Sh_TimeDim T, Sh_Product P, Aggregate4 A4
Where T.Quarter=A4.Time_ID and P.Line=A4.Prod_ID Group by P.Division, T.Quarter

b) in the absence of Aggregate4:
Create view V1 (Division, Quarter, UnitsSold) as Select P.Division, T.Quarter,
SUM (S.UnitsSold) From Sh_TimeDim T, Product P, Aggregate1 A1
Where T.Quarter=A1.Time_ID and P.Code=A1.Prod_ID Group by P.Division, T.Quarter

Create view V2 (Division, Quarter, StockUnits) as Select P.Division, T.Quarter,
SUM (I.StockUnits) From Sh_TimeDim T, Product P, Aggregate2 A2
Where T.Quarter=A2.Time_ID and P.Code=A2.Prod_ID Group by P.Division, T.Quarter
Select V1.Division Division, V1.Quarter Quarter, UnitsSold, StockUnits
From V1, V2 Where V1.Division = V2.Division and V1.Quarter = V2.Quarter

c) if no aggregates are found:
Create view V1 (Division, Quarter, UnitsSold) as
Select P.Division, T.Quarter, SUM (S.UnitsSold) From TimeDim T, Product P, Sales S
Where T.Month=S.Time_ID and P.Code=S.Prod_ID Group by P.Division, T.Quarter

Create view V2 (Division, Quarter, StockUnits) as Select P.Division, T.Quarter,
SUM (I.StockUnits) From TimeDim T, Product P, Inventory I
Where T.Month=I.Time_ID and P.Code= I.Prod_ID Group by P.Division, T.Quarter

Select V1.Division Division, V1.Quarter Quarter, UnitsSold, StockUnits
From V1, V2 Where V1.Division = V2.Division and V1.Quarter = V2.Quarter

Fig. 3. Possible outputs of the algorithm

The algorithm is divided into 4 steps, which for clarity purposes are individually
presented and illustrated using the example of Section 3.1:
1. Divide the original query into component queries;
2. For each component query, select candidate schema(s) for answering the query;
3. Select best candidates.
4. Rewrite the query

Step 1: Division into Component Queries.
For each fact table Fi listed in the from clause of the original query Q, a component
query Ci (i>0) is created, according to the following algorithm:
1. For each fact table Fi listed in the from clause of Q, create a component query
Ci such that:
1.1. Ci from clause := Fi and all dimensions listed in the from clause of Q;
1.2. Ci where clause := all join conditions of Q necessary to relate Fi to the
dimensions, together with any additional conditions involving these
dimensions or Fi;
1.3. Ci group by clause := all attributes used in the group by clause of Q;
1.4. Ci select clause := all attributes used in the group by clause of Q, in
addition to all aggregation function(s) applied to Fi attributes.


Notice that a query referring a single fact table is treated as a special case of queries
involving several fact tables. In that case, Step 1 produces a single component query
that is equal to the original query Q. Figure 4 shows the component queries created
for the query input illustrated in Figure 2.

C1 -> Select P.Division, T.Quarter, SUM (S.UnitsSold) UnitsSold
From TimeDim T, Product P, Sales S
Where T.Month=S.Time_ID and P.Code=S.Prod_ID Group by P.Division, T.Quarter
C2 -> Select P.Division, T.Quarter, SUM (I.StockUnits) StockUnits
From TimeDim T, Product P, Inventory I
Where T.Month=I.Time_ID and P.Code=I.Prod_ID Group by P.Division, T.Quarter

Fig. 4. Component queries for input of Figure 2

Step 2: Candidates for Component Queries
This step generates for each component query Ci (i>0) resulting from Step 1 the
respective candidate set CSi. Each candidate belonging to CSi is a schema (base or
aggregate) that answers Ci.
2 For each component query Ci, generated in Step 1:
2.1. Let n:= the node that corresponds to the base schema of Ci; mark n as
“visited”; let CSi := n;
2.2. Using a depth-first traversing, examine all schemas derived from n until all
nodes that can be reach from it are marked as “visited”;
2.2.1. Let n := next node; mark n as “visited”;
2.2.2. If all query attributes (select and where clauses) of Ci belong to
schema n and each aggregation function of the select clause of Ci is
exactly the same of one used in the fact table of schema n
Then CSi := CSi ∪ n;
Else Mark all nodes that can be reached from n as “visited”.

Each time this step is executed for a component query, the graph is traversed using
a depth-first algorithm starting from the corresponding base schema. When the
algorithm detects that a schema cannot answer the component query, all schemas at
the end of a derivation path starting from it are disregarded. Each CSi element is a
valid candidate to replace Fi in the from clause of Q. Therefore, every tuple (e1, …,
en), where e1 ∈ CS1, …, en ∈ CSn, are valid combinations for the rewrite of Q.
Considering the graph depicted in Figure 1(b), and the component queries of Figure 4:
− CS1 = {Sales, Aggregate1, Aggregate4} for component query C1;
− CS2 = {Inventory, Aggregate2, Aggregate3, Aggregate4} for C2.

Step 3: Selection of Best Candidates
Let T be a set of tuples (e1, …, en), where e1 ∈ CS1, …, en ∈ CSn, (n>0), representing
the Cartesian Product of candidate sets CS1 X .. X CSn. Let t be a tuple of T. The
present version of the algorithm bases this choice on the concept of accumulated size
to choose the best candidate. The accumulated size of t(e1, …, en), AS(t), is a function
that returns the sum of records that must be handled if the query were rewritten using
t. For summing the number of records, AS(t) computes only once the size of a given


table, in case it is included in more than one candidate set CSi. Thus, if Aggregate4 is
chosen only its records need to be processed, and only once. In all other cases, records
from the different fact tables in t are processed, considering each table only once.
This may suggest that, in a multi-fact query, the best t will ever be the one where
e1 = …= en. However, this is not true. Indeed, the cost of processing more records
(I/O cost) has a stronger impact than the cost of joining tables. Notice that this step
can be improved in many ways, by varying the cost function used to prioritise the
candidates for query rewrite. An immediate improvement of this function is the
consideration of index information to be combined with table size.

3 Consider all CSi sets generated in Step 2 and T, the Cartesian Product of
candidate sets CS1 X .. X CSn
3.1. t' := t(e1, …, en) ∈ T with the smallest accumulated AS(t), considering all
t(e1, …, en) ∈ T, where e1 ∈ CS1, …, en ∈ CSn;

Step 4: Query Reformulation
Once the best candidate for each component query is determined, the query is
rewritten. If the set of best candidates resulting from Step 3 has a single element, i.e.
a common aggregate for all component queries, a single query is written using such
aggregate and respective shrunken dimensions. This is the case for our example,
where Figure 3(a) displays the rewritten query. Otherwise, the query is rewritten in
terms of views that summarise the best aggregates individually, and then join them
(e.g. as in Figure 3(b) and (c)). If there is a common best candidate that answers more
than one component query, but not all of them, a single view is created for that set of
component queries. This trivial algorithm is not presented due to space limitations.

4 Tests
Initial performance tests were performed based on the MFT schema presented in the
APB-1 OLAP Benchmark [9], which comprises 4 fact tables. For the tests, we used
Inventory, Sales and corresponding dimensions, as depicted in Figure 1(a). The
aggregates were defined to experiment performance under different levels of
aggregation (compression factor). We did not use the semantics of the aggregates, nor
user requirements expressed in the benchmark (e.g. queries). We also disregarded the
number of records of the resulting database for aggregate selection. Two tests were
executed, referred to Test1 and Test2, described in the remaining of this section.

4.1 Test1
The goal of Test1 was to verify whether the algorithm performed well and correctly,
considering both star and MFT schemas. APB-1 program was executed with
parameters 10, 0.1 and 10, which resulted in a Sales table with 1,239,300 records,
whereas Inventory comprised 270,000 records. We executed five queries, three of
them involving a single fact table (Sales), and two of them the join of both fact tables.
For each query, we calculated the number of records of the resulting table, and the
processing time considering all possible alternatives.


It was possible to verify that the algorithm always chose the aggregate with the
smallest processing time, regardless whether the query involved a single fact table or
a join of multiple fact tables. Proportionally, there was a significant gain in the vast
majority of cases, but absolute gains were not always significant. The magnitude of
performance gain seems to be a function of the (fact) table size, aggregate
compression factor, and output table size.

4.2 Test2
Test2 was executed running APB-1 with Inventory Sales
parameters 10, 1 and 10. The derivation
graph for this test is depicted in Figure 5,
I1 S1
and Table 1 describes the properties of the
schemas: number of records, compression
factor (CF) with regard to both the base I2S2
fact table and its deriving schema(s), and I3 S3
the difference between the derived/
deriving schemas in terms of dimensions. Fig. 5. Derivation graph used in Test2
We executed a single query that
involved facts from both Sales and Inventory tables, and which could be answered by
(the combination of) all aggregates. The goal was to compare the respective absolute
processing times. Table 2 displays in the first row the elapsed time measured for each
execution. Subsequent rows show the gains in using an aggregate with regard to
bigger alternatives. For instance, the use of aggregate I2S2 represents a gain of 52%
with respect to the join of I1 and S1, calculated as (time(I1 and S1) –
time(I2S2))/time(I1 and S1), and 96% with regard to the join of Inventory and
Sales. It is possible to verify that the gains considering absolute times are very
significant this time. The use of the accumulated size AS(t) as the main criterion to
prioritise aggregate candidates seems to be simple but efficient, although it still can be
improved in many ways, particularly indexing.

5 Conclusions

In this work we presented MF-Retarget, a retargeting mechanism that deals with both
MFT schemas and statically computed aggregates. The algorithm provides two types
of transparency:
a) aggregate unawareness, and
b) users are spared from the complexities of queries in MFT schemas.

This retargeting service is intended to be implemented as a layer between user
front-end tool and DBMS engine. Thus, it can be complementary to gains already
provided by OLAP tools/DBMS engines in the context of dynamic computation of
aggregates. Further details on the implementation can be obtained in [11].


Tab. 1. Inventory/Sales derivation graph description

Table or Records CF (base Deriving CF(Deriving Shrunken Eliminated
Aggregate schema) schema schema) Dims. dims.
Sales (S) 13,122,000
S1 2,400,948 18.3 % Sales 18.3 % Yes Yes
S3 614,520 4.7 % S1 25.6% Yes
Inventory (I) 12,396,150
I1 2,496,762 20.1% Inventory 20.1% Yes
I3 631,800 5.1 % I1 25.3 % Yes
I2S2 2,400,948 18.3 % (S) S1 and I1 96.2% I1
19.4 % (I) 100 % S1

Tab. 2. Results with larger tables
Query1: 614,520 rec.
Inventory and Sales I1 and S1 I2S2 I3 and S3
Time (hours:min:sec) 20:35:43 1:27:04 0:42:09 0:15:22
Inventory/Sales (%) 93 96 99
I1 and S1 (%) 52 82
I2S2 (%) 64
AS(t) 25,518,150 4,897,710 2,400,948 1,246,320

Preliminary tests confirmed the algorithm always provided the best response time.
Proportional gains are always significant, but absolute gains increase with bigger fact
tables. It is obvious that additional tests are required to determine precise directives
for the construction of aggregates in MFT schemas, and under which circumstances
the processing gains are significant. It is also important to refine our criteria for
selecting the best candidate. It is also important to use indexing information in
addition to table number of records for aggregate selection.
Future work also includes, among other topics, the better definition of function
costs for prioritising candidate aggregates, use of indexes in the function cost, the
integration of the retargeting mechanism into a DW architecture, support for
aggregates monitoring and recommendation for aggregates reorganisation, the use of
the proposed algorithm in the context of dynamic aggregate computation.

Acknowledgements

This work was partially financed by FAPERGS, Brazil.

References
1. Baralis, E., Paraboshi, S., Teniente, E. Materialized Views Selection in a Multidimensional
Database. Proceedings of the VLDB’97 (1997). 156-165.
2. Chaudhuri, S., Shim, K. An Overview of Cost-Based Optimization of Queries with
Aggregates. Bulletin of TCDE (IEEE), v. 18 n. 3 (1995). 3-9.


3. Gray, J., Chaudhuri, S. et al. Data cube: a relational aggregation operator generalizing
Group-by, Cross-tab and Subtotals. Data Mining and Knowledge Discovery, v. 1, n. 1
(1997) 29-53.
4. Gupta, A., Harinarayan, V., Quass, D. Aggregate-query Processing in Data Warehousing
Environments. In: Proceedings of the VLDB’95 (1995). 358-369.
5. Gupta, H.; Mumick, I. Selection of views to materialize under a maintenance cost constraint.
Proceedings of the ICDT (1999). 453-470.
6. Kimball, R. et al. The Data Warehouse Lifecycle Toolkit : expert methods for designing,
developing, and deploying data warehouses. John Wiley & Sons, (1998)
7. Kotodis, Y., Roussopoulos, N. Dynamat : A Dynamic View Management System for Data
Warehouse. Proceedings of the ACM SIGMOD 1999. (1999) 371-382
8. Meredith, M., Khader, A. Divide and Aggregate: designing large warehouses. Database
Programming & Design. June, (1996). 24-30.
9. OLAP Council. APB-1 OLAP Benchmark (Release II). Online. Captured in May 2000.
Available at : http://www.olapcouncil.org/research/bmarkco.htm, Nov. 1998.
10.Poe, V., Klauer, P., Brobst, S. Building a Data Warehouse for Decision Support 2nd edition.
Prentice Hall (1998).
11.Santos, K. MF-Retarget: a multiple-fact table aggregate retargeting mechanism. MSc.
Dissertation. Faculdade de Informatica - PUCRS, Brazil. (2000). (in Portuguese)
12.Sapia, C. On modeling and Predicting Query Behavior in OLAP Systems. Proceedings of
the DMDW’99. (1999)
13.Sarawagi, S. Agrawal, R. Gupta, A. On Computing the Data Cube. IBM Almaden Research
Center : Technical Report. San Jose, CA (1996).
14.Srivasta, D., et al. Answering Queries with Aggregation Using Views. Proceedings of the
VLDB’96 (1996). 318-329
15.Wekind, H., et al. Preaggregation in Multidimensional Data Warehouse Environments.
Proceedings of the ISDSS'97 (1997). 581-59
16.Winter, R. Be Aggregate Aware. Intelligent Enterprise Magazine, v. 2 n. 13 (1999).

Aggreagate awareness

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Aggreagate awareness

Similar to Aggreagate awareness (20)

Recently uploaded

Recently uploaded (20)

Aggreagate awareness