SlideShare a Scribd company logo
1 of 47
H-Frag:
A Hybrid Memory Data Cube Approach for High Dimension
Relations
Rodrigo Rocha Silva
Doctoral Student
Prof. Dr. Celso Massaki Hirata
Advisor
Prof. Dr. Joubert de Castro Lima
Co-Advisor
ITA – AERONAUTICS INSTITUTE OF TECHNOLOGY
Electronic Engineering and Computer Science Division - EEC/I
Department of Computer Science
Brazil
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
What is H-Frag?
Is a method for data cube
computation that extends the frag-
cubing approach enabling the
computation of massive data cubes
by making use of external memory,
rather than fully relying on the main
memory only.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Topics
– Motivation;
– Data Cube;
– Frag-Cubing;
– H-Frag approach;
– Experiments;
– Results;
– Conclusions;
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Motivation
Users need to view data in a tangible way, reports, cross
tables and dashboards are usually the most used tools for
visualizing data.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Approaches that use inverted indexes indexes, such as
Frag-Cubing, are considered efficient in terms of runtime
and main memory usage for massive data cube
computation and query.
• Approaches that use main memory only, are limited
when the data cube size exceeds the main memory
capacity.
Motivation
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
A data cube has exponential complexity
in its runtime and storage space when
the number of dimensions increases
linearly.
Data Cube
For an input with size d the
output has size 2d
Allows the materialization of all or some cells or tuples of a
cube, which is represented by measures and dimensions.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Data Cube
Subjects
Department
Year
Hour
Day
Year
A dimension may contain a hierarchical relation between two or
more members.
The individual members of a dimension may be hierarchically
related to each other.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Base Relation R – 11 tuples
A B C COUNT
a1 b1 c1 1
a3 b3 c2 1
a2 b3 c2 1
a3 b1 c1 1
a2 b1 c1 1
a2 b2 c2 1
a1 b1 c2 1
a2 b2 c1 1
a3 b1 c2 1
a1 b3 c2 1
a2 b1 c2 1
A B C COUNT
* * * 11
a1 * * 3
a2 * * 5
a3 * * 3
* b1 * 6
* b2 * 2
* b3 * 3
* * c1 4
* * c2 7
a1 b1 * 2
a1 b3 * 1
a2 b1 * 2
a2 b2 * 2
a2 b3 * 1
a3 b1 * 2
a3 b3 * 1
a1 * c1 1
a1 * c2 2
a2 * c1 2
a2 * c2 3
a3 * c1 1
a3 * c2 2
* b1 c1 3
* b1 c2 3
FULL 3D CUBE
A B C COUNT
* b2 c1 1
* b2 c2 1
* b3 c2 3
a1 b1 c1 1
a3 b3 c2 1
a2 b3 c2 1
a3 b1 c1 1
a2 b1 c1 1
a2 b2 c2 1
a1 b1 c2 1
a2 b2 c1 1
a3 b1 c2 1
a1 b3 c2 1
a2 b1 c2 1
+
38 aggregations
Data Cube
Construction of a complete data cube is
an exponential problem
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Related Work – Frag-Cubing Approach
• Splits data vertically;
• Reduces high-dimensional cube into cuboids of lower
dimension;
• Offers tradeoffs between the data cube computation
runtime and the pre-processing of aggregations;
…FEDCBA
CUBE
ABC
CUBE
DEF
Dimensions
From book Han and Kamber: Data Mining Concepts and Techniques
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
For a 5-dimension relation:
two shell fragments can be built: (A, B, C) and (D, E)
tid A B C D E
1 a1 b1 c1 d1 e1
2 a1 b2 c1 d2 e1
3 a1 b2 c1 d1 e2
4 a2 b1 c1 d1 e2
5 a2 b1 c1 d1 e3
Related Work – Frag-Cubing Example
From book Han and Kamber: Data Mining Concepts and Techniques
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
• Build traditional invert index or RID list
Attribute Value TID List List Size
a1 1 2 3 3
a2 4 5 2
b1 1 4 5 3
b2 2 3 2
c1 1 2 3 4 5 5
d1 1 3 4 5 4
d2 2 1
e1 1 2 2
e2 3 4 2
e3 5 1
Related Work – Frag-Cubing 1-D Inverted Indexes
From book Han and Kamber: Data Mining Concepts and Techniques
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Generalize the 1-D inverted indexes to multi-dimensional ones
in the data cube sense; Computes all cuboids for data cubes
ABC and DE while retaining the inverted indexes;
For example, shell fragment
cube ABC contains 7
cuboids:
A, B, C
AB, AC, BC
ABC
111 2 3 1 4 5a1 b1
04 5 2 3a2 b2
24 54 5 1 4 5a2 b1
22 31 2 3 2 3a1 b2
List SizeTID ListIntersectionCell







 
Related Work – Frag-Cubing Approach
From book Han and Kamber: Data Mining Concepts and Techniques
This completes the offline computation stage
Frag-cubing proposes to compute only the cuboid of a given
fragment during the processes of the data cube computation.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
• If measures other than count measures are present,
store in ID_measure table separate from the shell
fragments
tid count sum
1 5 70
2 3 10
3 8 20
4 5 40
5 2 30
Related Work – Frag-Cubing Measure Table
From book Han and Kamber: Data Mining Concepts and Techniques
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Once the data cube is computed into fragments, the query
process follows these steps:
 Divides the query into fragment;
 Fetches the corresponding tid-list for each fragment from the
fragment cube;
 Intersects the tid-lists from each fragment in order to construct an
instantiated base table;
Related Work – Frag-Cubing Query
From book Han and Kamber: Data Mining Concepts and Techniques
Online
Computation
Base Table
 Computes the data cube
using the base table with any
cubing algorithm.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Approach
Implements a set of tuple identifiers per dimension attribute,
similar to Frag-Cubing;
• H-Frag allows larger cubes_ by using external memory to store
some of the computed cubes, rather than relying on the main
memory only.
The main challenge of using external memory_ is
to define the criteria to select which fragments
of the cube_ should be in main memory.
H-Frag, selects fragments of the cube_ according
to the attribute values frequencies_ and
dimension cardinalities, to be stored in main
memory.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Architecture
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Computation
First, the computation component _ scans Entry Relation
completely_ in order to obtain the frequency of each attribute
value for each dimension.
Then, the average frequency is obtained, and attribute
values with frequencies lower than the average are marked_
in order to be stored in the external memory.
scans
Frequencies of the attribute values
attribute values are marked in
order to be stored in the
External
Memory
Entry
Relation
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
The frequency of each attribute value is: fa1=4, fa2=3, fa3=2,
fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6.
H-Frag Computation – Example
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Computation – First Step
- 3 is the average frequency in the dimensions A and B;
- In dimension C, the average frequency is 4.5 (let´s consider 4).
fa1=4, fa2=3, fa3=2; -> (4+3+2)/3 = 3;
fb1=4, fb2=2, fb3=3; -> (4+2+3)/3 = 3;
fc1=3 and fc2=6; -> (3+6)/2 = 4.5
External
Memory
a3, b2, b3 and c1
The attribute values a3, b2, b3 and c1 are marked to be stored in
the external memory, because they are below the average.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Computation – Second Step
The Entry Relation_ is scanned a second time by the computation
component_ in order to select the attribute values to be stored in the
external memory;
Each attribute value_ and its tid-list_ is stored in the external memory;
H-Frag splits the Entry Relation into complementary portions defined by
the user, with several tuples in each portion.
a single attribute value can have several complementary tid-lists in external
memory, since RAM can get full;
scans
to select the attribute values to be stored
attribute value and its tid-list
External
Memory
Entry
Relation
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Computation – Second Step
• In order to avoid attribute values with low number of tids in the
external memory, H-Frag defines an occurrence percentage for each
attribute value inside a portion.
Entry Relation
1 a1 b1 c1 d1 e1 f1 g1 h1
2 a1 b2 c2 d1 e2 f2 g2 h2
3 a3 b8 c3 d3 e4 f5 g6 h7
4 a5 b6 c5 d5 e4 f5 g5 h6
5 a9 b9 c9 d9 e9 f9 g9 h9
6 a9 b4 c4 d4 e3 f4 g4 h4
7 a5 b7 c7 d7 e7 f7 g7 h7
8 a7 b7 c7 d7 e7 f7 g7 h7
first
portion
second
portion
if portion
equals 4
a1 and d1
tid-list stored
e4 and f5
tid-list stored
a9
tid-list stored
. . .
Each attribute value, related to 50% of the number of processed tuples,
- in relation to the total number of tuples in the portion - will have its
tid-list stored in the external
memory.
Each portion_ should be stored fully in the main memory.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
if 80% of the available working memory is being used, all the
tid-lists of the processed attribute values and all measure
values are stored in the external memory.
H-Frag Computation – Second Step
a1 = { 1, .. 4}
a2 = { 2, .. 8}
b1 = { 1, .. 3}
c1 = { 3, .. 7}
c2 = { 2, .. 4}
20 %
all
stored
working memory
This way, H-Frag eliminates the problem when there are many
attribute values below 50% of a
portion, which can happen_ in relations
with high cardinality and low skew.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Computation – Measure Values
The measure values are grouped by portions;
Each group of measure values_ is identified by a tid interval
or range;
This way, H-Frag will generate a few files with the measure
values.
For example, when a portion of 10 tuples, in which
the initial tid equals 1 is processed, a file with
measure values identified as 1_10 will be generated.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Relation is scanned for a third time. As a result, it generates
a map with the top frequent attribute values of relation and
their tid-lists.
Such a map_ is kept in the main memory.
H-Frag Computation – Third Step
scans
map with the
tid-lists of the top frequent
attribute values
Entry
Relation
Main
Memory
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
Example of the computing process given this relation
Remembering that the frequencies
of the attribute values are:
a1=4, a2=3, a3=2, b1=4, b2=2,
b3=3, c1=3 and c2=6
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
Attribute Value Tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
Attribute Values in External
Memory
3
Example of the computing process
stores a tid-sublist each time the attribute value is associated to 50% or
more of the tids of the defined portions
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
Attribute Value Tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
Attribute Values in External
Memory
In this example, let`s
consider the size portion
equals 2
3
Example of the computing process
The attribute value a2,
which frequency is 3, will
have stored a sublist with
tids 2 and 3_ and another
sub list with tid 6
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
Attribute Value Tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
Attribute Values in External
Memory
In this example, let`s
consider the size portion
equals 2
3 4.5
Example of the computing process
The attribute value C1 with
frequency is 3, will have
only one tid-list stored in
the external memory, with
three tids
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Attribute Value tids
a1 1,5,8,9
b1 1,5,6,7
c2 2,3,4,6,8,9
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
Frequent Attribute Values in
Main Memory
Attribute Value Tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
Attribute Values in External
Memory
Frequencies of the attribute values: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6.
Example of the computing process
which are the most frequent
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Attribute Value tids
a1 1,5,8,9
b1 1,5,6,7
c2 2,3,4,6,8,9
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
Frequent Attribute Values in
Main Memory
Attribute Value Tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
Attribute Values in External
Memory
Frequencies of the attribute values: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6.
Example of the computing process
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Attribute Value tids
a1 1,5,8,9
b1 1,5,6,7
c2 2,3,4,6,8,9
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
Frequent Attribute Values in
Main Memory
Attribute Value Tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
Attribute Values in External
Memory
tids M1 M2 Group ID
1 1.5 1
1_3
2 2.5 1
3 2 3
4 78.5 2
3_6
5 100 5
6 102.5 4
7 100 2
7_9
8 22.5 3
9 13.89 8
Measure Values
Relation in External
Memory
Example of the computing process
identifies
the tids’
range of the
processed
tuples
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
The same H-Frag Computation algorithm
H-Frag Update
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Update Relation: New Tuples
tid A B C M1 M2
1 a1 b1 c1 1.5 1
2 a2 b2 c2 2.5 1
3 a2 b2 c2 2 3
4 a3 b3 c2 78.5 2
5 a1 b1 c1 100 5
6 a2 b1 c2 102.5 4
7 a3 b1 c1 100 2
8 a1 b3 c2 22.5 3
9 a1 b3 c2 13.89 8
tid A B C M1 M2
10 a4 b4 c4 3 7
11 a3 b3 c1 4.7 12
12 a1 b1 c2 5.5 6
Attribute Value tids
a2 2, 3
a2 6
a3 4, 7
a4 10
a3 11
b2 2, 3
b3 4, 8
b3 9
b3 11
b4 10
c1 1, 5
c1 11
C4 10
new tuples
Attribute Value tids
a1 1,5,8,9,12
b1 1,5,6,7,12
c2 2,3,4,6,8,9,12
Records in the main memory are
updated with the new tids_
or_ are replaced by attribute values_
which become more frequent
new records
are created in
the external
memory
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Updates: attribute values are merged
Suppose that_ attribute value a2 and a3 are merged as attribute
value a9.
The attribute values a9 _ will have the highest frequency _ and will
replace attribute value a1 in the main memory.
Therefore, the attribute value a1 will be stored in external memory.
Attribute Value Tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
a2 + a3 = a9 : {2, 3, 6, 4, 7}
External Memory
Attribute Value tids
a9 2, 3, 6, 4, 7
b1 1,5,6,7
c2 2,3,4,6,8,9
Attribute Value Tids
a1 1,5,8,9
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
External Memory
Main Memory
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Update: new dimensions and measures
tid A B C D M1 M2 M3
1 a1 b1 c1 d1 1.5 1 6
2 a2 b2 c2 d1 2.5 1 5.66
3 a2 b2 c2 d1 2 3 78.98
4 a3 b3 c2 d1 78.5 2 2.98
5 a1 b1 c1 d3 100 5 1.65
6 a2 b1 c2 d2 102.5 4 2.69
7 a3 b1 c1 d1 100 2 6.87
8 a1 b3 c2 d3 22.5 3 98.999
9 a1 b3 c2 d2 13.89 8 78.995
Attribute Value tids
a1 1,5,8,9
b1 1,5,6,7
c2 2,3,4,6,8,9
d1 1,2,3,4,7
Attribute Values in External
Memory
Frequent Attribute Values in
Main Memory
Attribute Value tids
a2 2, 3
a2 6
a3 4, 7
b2 2, 3
b3 4, 8
b3 9
c1 1, 5, 7
d2 6,9
d3 5,8
tids M1 M2 M3
1 1.5 1 6
2 2.5 1 5.66
3 2 3 78.98
4 78.5 2 2.98
5 100 5 1.65
6 102.5 4 2.69
7 100 2 6.87
8 22.5 3 98.999
9 13.89 8 78.995
Measure Values
Relation in External
Memory
the computing
algorithm processes
only the new
dimensions and
measures
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
The H-Frag approach enables point queries and range queries
rOp= (greater than + less than + between + some + different +
similar x (fv1 … fvn))
H-Frag Range and Inquire Query
It also allows inquire queries such as sub-cube and distinct.
iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn))
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Range and Inquire Query
In order to achieve better performance, H-Frag organizes sub-cube
queries, by always starting by the queries that generate fewer
intersections.
As a result of Q, we have qR=(TID1, TID2 … TIDk), where TIDi is the ith
tuple identifier of relation R.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Query
For each type of query, it’s
checked whether each
attribute value is stored in the
external memory
when getting each tid-list for
the attribute values that meet
the user's query an
intersection operation of those
lists is performed, and this _
generates the end of the
query.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
H-Frag Query - example
q={?,?,c2}
a query like this, with two inquire operators
would be executed in SQL as follows:
SELECT a, '*', 'c2', COUNT(a) FROM TABLE WHERE c = 'c2'
GROUP BY 1,2,3 UNION
SELECT '*', B, 'c2', COUNT (b) FROM TABLE WHERE c = 'c2'
GROUP BY 1,2,3 UNION
SELECT A, B, 'c2', COUNT (*) FROM TABLE WHERE c = 'c2'
GROUP BY 1,2,3;
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Experiments
• We experimented H-Frag Computation and Query algorithms against Frag-Cubing
algorithm used in [Li et al. 2004];
• The H-Frag algorithms were coded in Java 64 bits;
• Frag-Cubing is a free and open source C++ application (http://illimine.cs.uiuc.edu/);
• The synthetic base relations were created by using data generator provided by the
IlliMine project;
• The IlliMine project is an open-source project that provides various approaches for
data mining and machine learning.
• Frag-Cubing approach is part of IlliMine project.
• We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each core,
12MB cache and 128GB of RAM DDR3 1333MHz.
• The system runs Windows Server 2008 64 bits, High Performance version.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Results - Performance Evaluation of Point Queries
Response time per query over 100 trials: T=107; C=104; D=30, S=0
In average, point queries were answered 3 times slower when
accomplished by the H-Frag approach if compared to the Frag-Cubing
approach.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Results - Performance Evaluation of Inquire Operators
Response time queries with inquire operators: T = 107; C = 104; D = 30, S = 0.
Queries _ with two inquire operators _ were answered by the H-Frag approach
about 2.5 times slower than when answered by the Frag-Cubing approach.
• The Frag-Cubing approach _ did not perform queries with three inquire
operators due to memory overflow.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Results: Where the relation with different numbers of dimensions
were computed.
T = 107; C = 104; D = 30, S = 0.
The runtime was linear in
both approaches.
In average, the hybrid
memory usage_ caused the
H-Frag approach_ to
consume 3 times less main
memory.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Results - Massive Data Cube
One relation with T = 109 tuples was computed by the H-Frag
approach.
This experiment_ took 64 hours_ and consumed 126 GB of RAM.
Queries_ with five range operators, ten point operators, and
one inquire operator were answered in less than 35 seconds.
Data cubes, with a high number of tuples_ could not be
computed by the Frag-Cubing approach using the main
memory only. This_ was demonstrated_ by trying to compute a
base relation with 200 million tuples and 60 dimensions.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Conclusions
• H-Frag has linear runtime and memory consumption, similar to
Frag-Cubing;
• When compared to Frag-Cubing, H-Frag is faster to answer sub-cube
queries.
• It introduces a different cube representation with less empty cells_
than Frag-Cubing;
• Frag-Cubing cannot answer two sub-cube operators in a data
cube with 200 million tuples , C=104, D=30 and S=0.
• We had scenarios where the Frag-Cubing approach failed to compute the
data cube due to the main memory lack.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Conclusions
Interesting research directions_ to further extend H-Frag:
 First, we must experiment H-Frag_ with holistic
measures.
 Top-k query is part of our interest, since inverted index
is also useful for this type of problem.
 Multicore and multicomputer versions of H-Frag must
be implemented.
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva
Acknowledgements
Thank you very much
e-mail rrochas@gmail.com

More Related Content

Similar to A Hybrid Memory Data Cube Approach for High Dimension Relations

Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopSkillspeed
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Varied encounters with data science (slide share)
Varied encounters with data science (slide share)Varied encounters with data science (slide share)
Varied encounters with data science (slide share)gilbert.peffer
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Aravindharamanan S
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Data Con LA
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)GICTTraining
 
Data Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI ArchitectureData Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI ArchitectureDATAVERSITY
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...DataBench
 
Comments on Simulations Watch Case Part I.pdf
Comments on  Simulations Watch Case Part I.pdfComments on  Simulations Watch Case Part I.pdf
Comments on Simulations Watch Case Part I.pdfBrij Consulting, LLC
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedInside Analysis
 

Similar to A Hybrid Memory Data Cube Approach for High Dimension Relations (20)

Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Varied encounters with data science (slide share)
Varied encounters with data science (slide share)Varied encounters with data science (slide share)
Varied encounters with data science (slide share)
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
Big Data Day LA 2015 - Scalable and High-Performance Analytics with Distribut...
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)
 
Data Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI ArchitectureData Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
Data Insights and Analytics: Simplifying Data Lake and Modern BI Architecture
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
FDS_dept_ppt.pptx
FDS_dept_ppt.pptxFDS_dept_ppt.pptx
FDS_dept_ppt.pptx
 
Comments on Simulations Watch Case Part I.pdf
Comments on  Simulations Watch Case Part I.pdfComments on  Simulations Watch Case Part I.pdf
Comments on Simulations Watch Case Part I.pdf
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has Arrived
 

Recently uploaded

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

A Hybrid Memory Data Cube Approach for High Dimension Relations

  • 1. H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations Rodrigo Rocha Silva Doctoral Student Prof. Dr. Celso Massaki Hirata Advisor Prof. Dr. Joubert de Castro Lima Co-Advisor ITA – AERONAUTICS INSTITUTE OF TECHNOLOGY Electronic Engineering and Computer Science Division - EEC/I Department of Computer Science Brazil
  • 2. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva What is H-Frag? Is a method for data cube computation that extends the frag- cubing approach enabling the computation of massive data cubes by making use of external memory, rather than fully relying on the main memory only.
  • 3. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Topics – Motivation; – Data Cube; – Frag-Cubing; – H-Frag approach; – Experiments; – Results; – Conclusions;
  • 4. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Motivation Users need to view data in a tangible way, reports, cross tables and dashboards are usually the most used tools for visualizing data.
  • 5. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Approaches that use inverted indexes indexes, such as Frag-Cubing, are considered efficient in terms of runtime and main memory usage for massive data cube computation and query. • Approaches that use main memory only, are limited when the data cube size exceeds the main memory capacity. Motivation
  • 6. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva A data cube has exponential complexity in its runtime and storage space when the number of dimensions increases linearly. Data Cube For an input with size d the output has size 2d Allows the materialization of all or some cells or tuples of a cube, which is represented by measures and dimensions.
  • 7. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Data Cube Subjects Department Year Hour Day Year A dimension may contain a hierarchical relation between two or more members. The individual members of a dimension may be hierarchically related to each other.
  • 8. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Base Relation R – 11 tuples A B C COUNT a1 b1 c1 1 a3 b3 c2 1 a2 b3 c2 1 a3 b1 c1 1 a2 b1 c1 1 a2 b2 c2 1 a1 b1 c2 1 a2 b2 c1 1 a3 b1 c2 1 a1 b3 c2 1 a2 b1 c2 1 A B C COUNT * * * 11 a1 * * 3 a2 * * 5 a3 * * 3 * b1 * 6 * b2 * 2 * b3 * 3 * * c1 4 * * c2 7 a1 b1 * 2 a1 b3 * 1 a2 b1 * 2 a2 b2 * 2 a2 b3 * 1 a3 b1 * 2 a3 b3 * 1 a1 * c1 1 a1 * c2 2 a2 * c1 2 a2 * c2 3 a3 * c1 1 a3 * c2 2 * b1 c1 3 * b1 c2 3 FULL 3D CUBE A B C COUNT * b2 c1 1 * b2 c2 1 * b3 c2 3 a1 b1 c1 1 a3 b3 c2 1 a2 b3 c2 1 a3 b1 c1 1 a2 b1 c1 1 a2 b2 c2 1 a1 b1 c2 1 a2 b2 c1 1 a3 b1 c2 1 a1 b3 c2 1 a2 b1 c2 1 + 38 aggregations Data Cube Construction of a complete data cube is an exponential problem
  • 9. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Related Work – Frag-Cubing Approach • Splits data vertically; • Reduces high-dimensional cube into cuboids of lower dimension; • Offers tradeoffs between the data cube computation runtime and the pre-processing of aggregations; …FEDCBA CUBE ABC CUBE DEF Dimensions From book Han and Kamber: Data Mining Concepts and Techniques
  • 10. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva For a 5-dimension relation: two shell fragments can be built: (A, B, C) and (D, E) tid A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 Related Work – Frag-Cubing Example From book Han and Kamber: Data Mining Concepts and Techniques
  • 11. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva • Build traditional invert index or RID list Attribute Value TID List List Size a1 1 2 3 3 a2 4 5 2 b1 1 4 5 3 b2 2 3 2 c1 1 2 3 4 5 5 d1 1 3 4 5 4 d2 2 1 e1 1 2 2 e2 3 4 2 e3 5 1 Related Work – Frag-Cubing 1-D Inverted Indexes From book Han and Kamber: Data Mining Concepts and Techniques
  • 12. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Generalize the 1-D inverted indexes to multi-dimensional ones in the data cube sense; Computes all cuboids for data cubes ABC and DE while retaining the inverted indexes; For example, shell fragment cube ABC contains 7 cuboids: A, B, C AB, AC, BC ABC 111 2 3 1 4 5a1 b1 04 5 2 3a2 b2 24 54 5 1 4 5a2 b1 22 31 2 3 2 3a1 b2 List SizeTID ListIntersectionCell          Related Work – Frag-Cubing Approach From book Han and Kamber: Data Mining Concepts and Techniques This completes the offline computation stage Frag-cubing proposes to compute only the cuboid of a given fragment during the processes of the data cube computation.
  • 13. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva • If measures other than count measures are present, store in ID_measure table separate from the shell fragments tid count sum 1 5 70 2 3 10 3 8 20 4 5 40 5 2 30 Related Work – Frag-Cubing Measure Table From book Han and Kamber: Data Mining Concepts and Techniques
  • 14. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Once the data cube is computed into fragments, the query process follows these steps:  Divides the query into fragment;  Fetches the corresponding tid-list for each fragment from the fragment cube;  Intersects the tid-lists from each fragment in order to construct an instantiated base table; Related Work – Frag-Cubing Query From book Han and Kamber: Data Mining Concepts and Techniques Online Computation Base Table  Computes the data cube using the base table with any cubing algorithm.
  • 15. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Approach Implements a set of tuple identifiers per dimension attribute, similar to Frag-Cubing; • H-Frag allows larger cubes_ by using external memory to store some of the computed cubes, rather than relying on the main memory only. The main challenge of using external memory_ is to define the criteria to select which fragments of the cube_ should be in main memory. H-Frag, selects fragments of the cube_ according to the attribute values frequencies_ and dimension cardinalities, to be stored in main memory.
  • 16. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Architecture
  • 17. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation First, the computation component _ scans Entry Relation completely_ in order to obtain the frequency of each attribute value for each dimension. Then, the average frequency is obtained, and attribute values with frequencies lower than the average are marked_ in order to be stored in the external memory. scans Frequencies of the attribute values attribute values are marked in order to be stored in the External Memory Entry Relation
  • 18. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 The frequency of each attribute value is: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6. H-Frag Computation – Example
  • 19. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – First Step - 3 is the average frequency in the dimensions A and B; - In dimension C, the average frequency is 4.5 (let´s consider 4). fa1=4, fa2=3, fa3=2; -> (4+3+2)/3 = 3; fb1=4, fb2=2, fb3=3; -> (4+2+3)/3 = 3; fc1=3 and fc2=6; -> (3+6)/2 = 4.5 External Memory a3, b2, b3 and c1 The attribute values a3, b2, b3 and c1 are marked to be stored in the external memory, because they are below the average.
  • 20. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – Second Step The Entry Relation_ is scanned a second time by the computation component_ in order to select the attribute values to be stored in the external memory; Each attribute value_ and its tid-list_ is stored in the external memory; H-Frag splits the Entry Relation into complementary portions defined by the user, with several tuples in each portion. a single attribute value can have several complementary tid-lists in external memory, since RAM can get full; scans to select the attribute values to be stored attribute value and its tid-list External Memory Entry Relation
  • 21. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – Second Step • In order to avoid attribute values with low number of tids in the external memory, H-Frag defines an occurrence percentage for each attribute value inside a portion. Entry Relation 1 a1 b1 c1 d1 e1 f1 g1 h1 2 a1 b2 c2 d1 e2 f2 g2 h2 3 a3 b8 c3 d3 e4 f5 g6 h7 4 a5 b6 c5 d5 e4 f5 g5 h6 5 a9 b9 c9 d9 e9 f9 g9 h9 6 a9 b4 c4 d4 e3 f4 g4 h4 7 a5 b7 c7 d7 e7 f7 g7 h7 8 a7 b7 c7 d7 e7 f7 g7 h7 first portion second portion if portion equals 4 a1 and d1 tid-list stored e4 and f5 tid-list stored a9 tid-list stored . . . Each attribute value, related to 50% of the number of processed tuples, - in relation to the total number of tuples in the portion - will have its tid-list stored in the external memory. Each portion_ should be stored fully in the main memory.
  • 22. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva if 80% of the available working memory is being used, all the tid-lists of the processed attribute values and all measure values are stored in the external memory. H-Frag Computation – Second Step a1 = { 1, .. 4} a2 = { 2, .. 8} b1 = { 1, .. 3} c1 = { 3, .. 7} c2 = { 2, .. 4} 20 % all stored working memory This way, H-Frag eliminates the problem when there are many attribute values below 50% of a portion, which can happen_ in relations with high cardinality and low skew.
  • 23. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – Measure Values The measure values are grouped by portions; Each group of measure values_ is identified by a tid interval or range; This way, H-Frag will generate a few files with the measure values. For example, when a portion of 10 tuples, in which the initial tid equals 1 is processed, a file with measure values identified as 1_10 will be generated.
  • 24. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Relation is scanned for a third time. As a result, it generates a map with the top frequent attribute values of relation and their tid-lists. Such a map_ is kept in the main memory. H-Frag Computation – Third Step scans map with the tid-lists of the top frequent attribute values Entry Relation Main Memory
  • 25. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Example of the computing process given this relation Remembering that the frequencies of the attribute values are: a1=4, a2=3, a3=2, b1=4, b2=2, b3=3, c1=3 and c2=6
  • 26. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory 3 Example of the computing process stores a tid-sublist each time the attribute value is associated to 50% or more of the tids of the defined portions
  • 27. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory In this example, let`s consider the size portion equals 2 3 Example of the computing process The attribute value a2, which frequency is 3, will have stored a sublist with tids 2 and 3_ and another sub list with tid 6
  • 28. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory In this example, let`s consider the size portion equals 2 3 4.5 Example of the computing process The attribute value C1 with frequency is 3, will have only one tid-list stored in the external memory, with three tids
  • 29. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Frequent Attribute Values in Main Memory Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory Frequencies of the attribute values: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6. Example of the computing process which are the most frequent
  • 30. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Frequent Attribute Values in Main Memory Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory Frequencies of the attribute values: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6. Example of the computing process
  • 31. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Frequent Attribute Values in Main Memory Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory tids M1 M2 Group ID 1 1.5 1 1_3 2 2.5 1 3 2 3 4 78.5 2 3_6 5 100 5 6 102.5 4 7 100 2 7_9 8 22.5 3 9 13.89 8 Measure Values Relation in External Memory Example of the computing process identifies the tids’ range of the processed tuples
  • 32. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva The same H-Frag Computation algorithm H-Frag Update
  • 33. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Update Relation: New Tuples tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 tid A B C M1 M2 10 a4 b4 c4 3 7 11 a3 b3 c1 4.7 12 12 a1 b1 c2 5.5 6 Attribute Value tids a2 2, 3 a2 6 a3 4, 7 a4 10 a3 11 b2 2, 3 b3 4, 8 b3 9 b3 11 b4 10 c1 1, 5 c1 11 C4 10 new tuples Attribute Value tids a1 1,5,8,9,12 b1 1,5,6,7,12 c2 2,3,4,6,8,9,12 Records in the main memory are updated with the new tids_ or_ are replaced by attribute values_ which become more frequent new records are created in the external memory
  • 34. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Updates: attribute values are merged Suppose that_ attribute value a2 and a3 are merged as attribute value a9. The attribute values a9 _ will have the highest frequency _ and will replace attribute value a1 in the main memory. Therefore, the attribute value a1 will be stored in external memory. Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 a2 + a3 = a9 : {2, 3, 6, 4, 7} External Memory Attribute Value tids a9 2, 3, 6, 4, 7 b1 1,5,6,7 c2 2,3,4,6,8,9 Attribute Value Tids a1 1,5,8,9 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 External Memory Main Memory
  • 35. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Update: new dimensions and measures tid A B C D M1 M2 M3 1 a1 b1 c1 d1 1.5 1 6 2 a2 b2 c2 d1 2.5 1 5.66 3 a2 b2 c2 d1 2 3 78.98 4 a3 b3 c2 d1 78.5 2 2.98 5 a1 b1 c1 d3 100 5 1.65 6 a2 b1 c2 d2 102.5 4 2.69 7 a3 b1 c1 d1 100 2 6.87 8 a1 b3 c2 d3 22.5 3 98.999 9 a1 b3 c2 d2 13.89 8 78.995 Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 d1 1,2,3,4,7 Attribute Values in External Memory Frequent Attribute Values in Main Memory Attribute Value tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 d2 6,9 d3 5,8 tids M1 M2 M3 1 1.5 1 6 2 2.5 1 5.66 3 2 3 78.98 4 78.5 2 2.98 5 100 5 1.65 6 102.5 4 2.69 7 100 2 6.87 8 22.5 3 98.999 9 13.89 8 78.995 Measure Values Relation in External Memory the computing algorithm processes only the new dimensions and measures
  • 36. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva The H-Frag approach enables point queries and range queries rOp= (greater than + less than + between + some + different + similar x (fv1 … fvn)) H-Frag Range and Inquire Query It also allows inquire queries such as sub-cube and distinct. iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn))
  • 37. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Range and Inquire Query In order to achieve better performance, H-Frag organizes sub-cube queries, by always starting by the queries that generate fewer intersections. As a result of Q, we have qR=(TID1, TID2 … TIDk), where TIDi is the ith tuple identifier of relation R.
  • 38. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Query For each type of query, it’s checked whether each attribute value is stored in the external memory when getting each tid-list for the attribute values that meet the user's query an intersection operation of those lists is performed, and this _ generates the end of the query.
  • 39. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Query - example q={?,?,c2} a query like this, with two inquire operators would be executed in SQL as follows: SELECT a, '*', 'c2', COUNT(a) FROM TABLE WHERE c = 'c2' GROUP BY 1,2,3 UNION SELECT '*', B, 'c2', COUNT (b) FROM TABLE WHERE c = 'c2' GROUP BY 1,2,3 UNION SELECT A, B, 'c2', COUNT (*) FROM TABLE WHERE c = 'c2' GROUP BY 1,2,3;
  • 40. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Experiments • We experimented H-Frag Computation and Query algorithms against Frag-Cubing algorithm used in [Li et al. 2004]; • The H-Frag algorithms were coded in Java 64 bits; • Frag-Cubing is a free and open source C++ application (http://illimine.cs.uiuc.edu/); • The synthetic base relations were created by using data generator provided by the IlliMine project; • The IlliMine project is an open-source project that provides various approaches for data mining and machine learning. • Frag-Cubing approach is part of IlliMine project. • We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each core, 12MB cache and 128GB of RAM DDR3 1333MHz. • The system runs Windows Server 2008 64 bits, High Performance version.
  • 41. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results - Performance Evaluation of Point Queries Response time per query over 100 trials: T=107; C=104; D=30, S=0 In average, point queries were answered 3 times slower when accomplished by the H-Frag approach if compared to the Frag-Cubing approach.
  • 42. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results - Performance Evaluation of Inquire Operators Response time queries with inquire operators: T = 107; C = 104; D = 30, S = 0. Queries _ with two inquire operators _ were answered by the H-Frag approach about 2.5 times slower than when answered by the Frag-Cubing approach. • The Frag-Cubing approach _ did not perform queries with three inquire operators due to memory overflow.
  • 43. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results: Where the relation with different numbers of dimensions were computed. T = 107; C = 104; D = 30, S = 0. The runtime was linear in both approaches. In average, the hybrid memory usage_ caused the H-Frag approach_ to consume 3 times less main memory.
  • 44. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results - Massive Data Cube One relation with T = 109 tuples was computed by the H-Frag approach. This experiment_ took 64 hours_ and consumed 126 GB of RAM. Queries_ with five range operators, ten point operators, and one inquire operator were answered in less than 35 seconds. Data cubes, with a high number of tuples_ could not be computed by the Frag-Cubing approach using the main memory only. This_ was demonstrated_ by trying to compute a base relation with 200 million tuples and 60 dimensions.
  • 45. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Conclusions • H-Frag has linear runtime and memory consumption, similar to Frag-Cubing; • When compared to Frag-Cubing, H-Frag is faster to answer sub-cube queries. • It introduces a different cube representation with less empty cells_ than Frag-Cubing; • Frag-Cubing cannot answer two sub-cube operators in a data cube with 200 million tuples , C=104, D=30 and S=0. • We had scenarios where the Frag-Cubing approach failed to compute the data cube due to the main memory lack.
  • 46. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Conclusions Interesting research directions_ to further extend H-Frag:  First, we must experiment H-Frag_ with holistic measures.  Top-k query is part of our interest, since inverted index is also useful for this type of problem.  Multicore and multicomputer versions of H-Frag must be implemented.
  • 47. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Acknowledgements Thank you very much e-mail rrochas@gmail.com