An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management
1. LOGO
www.sp2.fr http://www.polytech.univ-nantes.fr/COD/
An Ontology-Based
Autonomic System for
Improving Data Warehouses
by Cache
Allocation Management
Vlad Nicolicin-Georgescu, Henri Briand
Remi Lehn and Vincent Benatier
Knowledge and Experience Management Workshop
FG-WM 2009
22/09/2009
2. LOGO
Contents
1 Introduction
2 Problematic
3 Knowledge Management
4 Autonomic Computing
5 Combining the Elements
6 Results
7 Conclusions and Future Directions
22/09/2009 Vlad Nicolicin Georgescu
3. LOGO
Introduction
Decision Support Systems
Computerized systems with the main goal to analyze a series of
facts and give propositions for acting regarding the facts
involved – Business Intelligence
Their core is the analytical (derived) data which is translated into
data warehouse (architecture) with the help of data marts (the
bricks) (Inmon, 2005)
The challenge: managing the data warehouses efficiently
(cost, performance and resource scaling)
22/09/2009 3 Vlad Nicolicin Georgescu
4. LOGO
Contents
1 Introduction
2 Problematic
3 Knowledge Management
4 Autonomic Computing
5 Combining the Elements
6 Results
7 Conclusions and Future Directions
22/09/2009 Vlad Nicolicin Georgescu
5. LOGO
Problematic - Industrial
Enterprises’ decision support systems – at the end of the
first year up to 90% of data warehouse efforts is
considered as failure (Frolick and Lindsey, 2003)
The main causes
Bad management - manual configurations, manual maintenance
operations, bad scaling of systems resources
Bad performance due to inefficient common resource sharing
between groups and conglomerates
Increase of the data warehouse size with time
Any of the data may be accessed at any time: ‘Give me what I
want so I can tell you what I really want’
22/09/2009 5 Vlad Nicolicin Georgescu
6. LOGO
Problematic – Industrial
High costs of data warehouse maintenance (due to
previous causes) translated into:
Need for increase in a systems hardware resources
(normal cost)
Need for decisional experts to configure and maintain
data warehouses (more costly)
22/09/2009 6 Vlad Nicolicin Georgescu
7. LOGO
Problematic – Industrial
Example
10 Data warehouses and shared RAM memory
1 data warehouse requires 20GB of RAM -> 200GB of RAM
• Costly high (sometimes not a problem)
• Architecturally impossible (stuck!)
How to reallocate and manage?
To manage them the enterprise makes use of an expert to
configure and maintain how the memory is allocated based on
each data warehouse’s needs: priority, usage period, changes
in the architecture etc
The problem repeats recursively
Too hard to sustain due to cost and human limits
22/09/2009 7 Vlad Nicolicin Georgescu
8. LOGO
Problematic – Scientific
How to manage efficiently decision support systems:
How to formalize non structured data from different
sources (editors readme, forums, html ..)
How to render various processes (RAM memory
allocation between groups of data warehouse)
autonomic based on the formalized knowledge
Finding suitable algorithms for resource allocation
and parameter configuration (cache memory) in
groups of data warehouse
22/09/2009 8 Vlad Nicolicin Georgescu
9. LOGO
Problematic – Scientific
Building knowledge bases based on decision support
systems - Ontologies and Ontology Based Rules
Autonomic Computing based on the knowledge bases
& algorithms for improving data warehouse performance
Combining the notions of knowledge formalization with
the notions of autonomic computing for data warehouse
management
22/09/2009 9 Vlad Nicolicin Georgescu
10. LOGO
Contents
1 Introduction
2 Problematic
3 Knowledge Management
4 Autonomic Computing
5 Combining the Elements
6 Results
7 Conclusions and Future Directions
22/09/2009 Vlad Nicolicin Georgescu
11. LOGO
Knowledge Management
Manage data warehouse for improving its
performances
Knowledge division in the knowledge base to
express a decision support system
22/09/2009 11 Vlad Nicolicin Georgescu
12. LOGO
Data Warehouse
Knowledge Management
Performance
The measure of performance: query response time for
data retrieval operations
Analytical data is presented as opposed to operational
data by being retrieval time relaxed (Inmon, 2005)
True: if the operations we speak of concern aggregation and
calculation operations (i.e. during night)
Not so true: when performing data retrieval tasks for rapport
generation (day usage of the data warehouse)
22/09/2009 12 Vlad Nicolicin Georgescu
13. LOGO
Data Warehouse
Knowledge Management
Performance
Several propositions for query response time
improvement:
(Malik et al, 2008): how to design physically data bases
throughout caches – data base and architecture oriented
(Saharia and Babad, 2000): determining which data is most
likely to be accessed so it can be stored into caches - works
well for single data warehouse improvement and concerns the
data requested rather than on how to modify the data
warehouse parameters.
22/09/2009 13 Vlad Nicolicin Georgescu
14. LOGO
Knowledge Division
Knowledge Management
Our proposition for dividing knowledge to represent a
decision support system
Three main types
Architectural
Configuration and performance
Experience and advice/best practices
22/09/2009 14 Vlad Nicolicin Georgescu
15. LOGO
Knowledge Division
Knowledge Management
Architectural information
What components are part of a decision support systems
How are these entities linked and how do they exchange
What are the common resources characteristic for each entity
and shared between the
22/09/2009 15 Vlad Nicolicin Georgescu
16. LOGO
Knowledge Division
Knowledge Management
Configuration and performance indicators (for
Essbase multidimensional cubes)
For each of the data warehouse: index file and data file size
(how much space does it occupy on the disk )
Three types of caches: index, data file and data cache
Query response time on data retrieval operations
22/09/2009 16 Vlad Nicolicin Georgescu
17. LOGO
Knowledge Division
Knowledge Management
Experience and best practices
More delicate due to its subjectivity and non structured form
in which the information finds itself
Represents all knowledge concerning decision support
system and data warehouse management (in any form)
Comes from several sources
Formalized under the form of rules knowledge base, such
as Event Condition Rules (Huebscher et al, 2008)
22/09/2009 17 Vlad Nicolicin Georgescu
18. LOGO
Contents
1 Introduction
2 Problematic
3 Knowledge Management
4 Autonomic Computing
5 Combining the Elements
6 Results
7 Conclusions and Future Directions
22/09/2009 Vlad Nicolicin Georgescu
19. LOGO
Autonomic Computing
Previous propositions of representing self managing
systems:
Inspired by the functioning of the human body (Wang, 2007)
Self-healing systems to be further on elaborated to self-X
systems (Gosh et al., 2007)
Proposition made by IBM in 2001, and refined towards the
current known form (IBM, 2001)
22/09/2009 19 Vlad Nicolicin Georgescu
20. LOGO
Autonomic Computing
Autonomic computing - the ability for an IT
infrastructure to adapt and change in accordance with
business policies and objectives, guiding systems to be
(IBM, 2001):
Self-configuring
Self-healing
Self-optimizing
Self-protecting
22/09/2009 20 Vlad Nicolicin Georgescu
21. LOGO
Autonomic Computing
Autonomic Computing
Manager
Autonomic Computing Manager: automates the self-X
functions and externalizes these functions according to
the behavior defined by the management interfaces
(IBM, 2001). The MAPE-K loop:
22/09/2009 21 Vlad Nicolicin Georgescu
22. LOGO
Autonomic Computing
Autonomic Computing
Manager
We propose the implementation of the loop on each
of the levels from the architecture of the decision support
system
Each entity has its own individual loop and is related to
the superior entities only
Each entity’s manager has two ‘responsibilities’:
Its individual self-management
Its direct children management
22/09/2009 22 Vlad Nicolicin Georgescu
23. LOGO
Autonomic Computing
Autonomic Computing
Manager
Retaking the Decision Support System’s schema
22/09/2009 23 Vlad Nicolicin Georgescu
24. LOGO
Algorithms
Autonomic Computing
Self-Improvement
Self-improvement algorithm:
Specific for the individual loop of each of the data warehouse
Executed at the end of each day when statics over the usage of
the data warehouse are gathered and its parameters can be
changed
Tries to improve the cache allocation for a data warehouse by
repetitively decreasing the cache values up to a certain limit:
• Step: the amount of cache decrease at each time period (CV –
cache value)
CV1 = CV0 - (CVmax –CV0)*step
• Delta: the threshold at which the algorithm stops. The impact that a
cache modification has. If (RT1-RT0)/RT0 < delta then we accept the
new cache proposition. (RT – average query response time)
22/09/2009 24 Vlad Nicolicin Georgescu
26. LOGO
Group-Improvement
Autonomic Computing
Algorithm
Group improvement algorithm
Specific for each application (seen as a group of data
warehouse)
Has the role of reallocating caches periodically between the data
warehouses in the group depending on their average
performance
‘The catch’: by a small sacrifice (delta) of some data
warehouses there is important performance gain to others
How to distinguish between performance and nonperformance
data warehouses?
22/09/2009 26 Vlad Nicolicin Georgescu
27. LOGO
Group-Improvement
Autonomic Computing
Algorithm
Performance data warehouse: its average query
response time is under the average response time of the
group
Non-performance data warehouse: the ones that are
above (the equal can go in one of the two categories)
22/09/2009 27 Vlad Nicolicin Georgescu
28. LOGO
Contents
1 Introduction
2 Problematic
3 Knowledge Management
4 Autonomic Computing
5 Combining the Elements
6 Results
7 Conclusions and Future Directions
22/09/2009 Vlad Nicolicin Georgescu
29. LOGO
Combining the elements
Bringing the Knowledge Management, Autonomic
Computing and Algorithms all together
Knowledge bases are formalized with the help of OWL
ontologies and ontology based rules
Autonomic Computing Managers are implemented
with the help of ontology based rules and Java
programs
Algorithms are formalized by ontologies, rules and
java programs
22/09/2009 29 Vlad Nicolicin Georgescu
30. LOGO
Knowledge base
Combining the elements
Ontology: explicit formal specifications of the terms in
the domain and relations among them (Grubber, 1992)
It expresses:
The hierarchical inclusion relations between entities (taxonomy)
The inter-entity concept relations that makes it much more
powerful than a taxonomy
Used with several knowledge formalization approaches
22/09/2009 30 Vlad Nicolicin Georgescu
31. LOGO
Knowledge base
Combining the elements
OWL:
W3C recommendation in xml based format for ontology representation
Evolved from the RDF
It provides the main concepts of:
Individual: an instance of ‘something’, the actual concept itself (i.e.
John, Mary, Bob)
Class: a group of individuals belonging to a same set having common
properties (i.e. John, Mary, Bob are Human, John, Bob are Men)
Property: a characteristic of an individual that makes it different form
others and allows him to belong to a class
• Data type property: links an individual to a literal value (John is 30
years old)
• Object property: links an individual to other individuals (John is the
friend of Mary, Mary hates Bob)
Sentence representation: (subject, predicate, object) – (John,
22/09/2009
hasAge, 30) 31 Vlad Nicolicin Georgescu
32. LOGO
Knowledge base
Combining the elements
Used to formalize the first two types of information:
architectural and configuration/performance
The ‘static’ aspect of the approach
An OWL representation of a data warehouse
22/09/2009 32 Vlad Nicolicin Georgescu
33. LOGO
Autonomic Computing
Combining the elements
The dynamic part of the knowledge management aspect
The rules that formalize:
The passage between the four states of the Autonomic
Computing Manager
How does the knowledge base in the middle of the loop
connects with each state
How the two algorithms are implemented over the loop
We base our approach on previous works to using
autonomic computing with ontologies (Stojanovic, 2004)
22/09/2009 33 Vlad Nicolicin Georgescu
34. LOGO
Autonomic Computing
Combining the elements
Autonomic Computing Manager loop phases applied on
the levels of the decision support systems
22/09/2009 34 Vlad Nicolicin Georgescu
35. LOGO
Algorithms
Combining the elements
Described using Jena Ontology based rules
Example of the data warehouse individual self-improving
algorithm
22/09/2009 35 Vlad Nicolicin Georgescu
36. LOGO
Contents
1 Introduction
2 Problematic
3 Knowledge Management
4 Autonomic Computing
5 Combining the Elements
6 Results
7 Conclusions and Future Directions
22/09/2009 Vlad Nicolicin Georgescu
37. LOGO
Results
Scenario:
With Oracle Hyperion Essbase BI solution
An Essbase application with two data warehouses (DW1 and
DW2)
A period of 14 days to see how each data warehouse improves
and how the application relocates the memory
A random series of queries (from a given pool) is done on each
data warehouse each day
Individual self-improvement algorithm runs each day
Group reallocation algorithm runs each 4 days
22/09/2009 37 Vlad Nicolicin Georgescu
38. LOGO
Results
22/09/2009 38 Vlad Nicolicin Georgescu
39. LOGO
Results
At the end of day 5 we have a good ratio response
time/cache allocation
The data warehouses improve themselves (individual
algorithm) fast and then oscillate around this point
(DW2)
At the end of the 6th day:
DW2 looses 2% in response time
DW1 gains around 80%
The application has reduced its memory consumption with 60%.
22/09/2009 39 Vlad Nicolicin Georgescu
40. LOGO
Contents
1 Introduction
2 Problematic
3 Knowledge Management
4 Autonomic Computing
5 Combining the Elements
6 Results
7 Conclusions and Future Directions
22/09/2009 Vlad Nicolicin Georgescu
41. LOGO
Conclusions
Conclusions & Future Directions
We have presented a common problematic in
enterprises today: knowledge management in decision
support systems
We have presented how can we formalize data
warehouses with the help of ontologies and ontology
based rules data
We have seen how we can enable autonomy by using
Autonomic Computing
We presented results over a test on a real application
22/09/2009 41 Vlad Nicolicin Georgescu
42. LOGO
Future directions
Conclusions & Future Directions
Extension of the parameters used for data warehouse
performance: calculation time, aggregation time etc.
Introduction of Service License Agreement (SLA)
notions for defining data warehouse usage
specifications
Extension of the knowledge base so it can be enriched
in an autonomic way
Introduction of attenuation in algorithms to avoid
oscillation
22/09/2009 42 Vlad Nicolicin Georgescu
44. LOGO
References
Mark N. Frolick and Keith Lindsey. Critical factors for data warehouse failure. Business Intelligence Journal, Vol.
8, No. 3, 2003.
Debanjan Ghosh, Raj Sharman, H. Raghav Rao, and Shambhu Upadhyaya. Self-healing systems — survey and
synthesis. Decision Support Systems 42, Vol 42:p. 2164–2185, 2007
T. Gruber. What is an ontology? Academic Press Pub., 1992
M.C. Huebscher and J.A. McCann. A survey on autonomic computing – degrees, models and applications. ACM
Computing Surveys, Vol. 40, No. 3, 2008
Corporation IBM. An architectural blueprint for autonomic computing. IBMCorporation, 2001
Corporation IBM. Autonomic computing. powering your business for success. International Journal of Computer
Science and Network Security, Vol.7 No.10:p. 2–4, 2005
W.H. Inmon. Building the data warehouse, fourth edition. Wiley Publishing, 2005
S.S. Lightstone, G. Lohman, and D. Zilio. Toward autonomic computing with db2 universal database. ACM
SIGMOD Record, Vol. 31, Issue 3, 2002
A. Mateen, B. Raza, and T. Hussain. Autonomic computing in sql server. In 7th IEEE/ACIS International
Conference on Computer and Information Science, 2008
L. Stojanovic, J. Schneider, A. Maedche, S. Libischer, R. Studer, Th. Lumpp, A. Abecker, G. Breiter, and
J. Dinger. The role of ontologies in autonomic computing systems. IBM Systems Journal, Vol. 43, No. 3:p. 598–
616, 2004
V. Markl, G. M. Lohman, and V. Raman. Leo : An autonomic optimizer for db2. IBM Systems Journal, Vol. 42, No.
1, 2003
A. N. Saharia and Y.M. Babad. Enhancing data warehouse performance through query caching. The DATA BASE
Advances in Informatics Systems, Vol 31, No.3, 2000
Yingxu Wang, Toward Theoretical Foundations of Autonomic Computing, Int’l Journal of Cognitive Informatics and
Natural Intelligence, 1(3), 1-16, July-September 2007
22/09/2009 Vlad Nicolicin Georgescu