G N Wikramanayake (1997) "Re-engineering Databases using Meta-Programming Technology" In:16th National Information Technology Conference on Information Technology for Better Quality of Life Edited by:R. Ganepola et al. pp. 1-14. Computer Society of Sri Lanka, Colombo: CSSL Jul 11-13, ISBN 955-9155-05-9
Re-Engineering Databases using Meta-Programming Technology
1. RE-ENGINEERING DATABASES USING META-PROGRAMMING
TECHNOLOGY
G.N. Wikramanayake
Department of Statistics and Computer Science,
University of Colombo.
gihanw@hadawatha.cmb.ac.lk
Key words: database re-engineering and migration, meta-programming and legacy systems.
ABSTRACT
A wealth of information is held in databases supporting the IT capabilities of organisations. Many of
these databases are called legacy databases in that they and their associated applications were developed with
software systems that are now technologically obsolete, particularly when compared with the more recent
systems being used for new developments in the organisation. These legacy databases need to be evolved and
migrated to modern computing environments, so that their existence remains beneficial to their community of
users. The evolution path of these databases is based on a re-engineering process. The Conceptualised
Constraint Visualisation and Enhancement System (CCVES) for relational legacy databases, developed at
Cardiff, is a database software tool that assists with the migration process of legacy databases, and its re-
engineering databases using meta-programming technology is described here.
This tool is initially used to create a graphical model of a relational legacy database which shows its
current integrity constraints applicable to the elements of the model. CCVES was developed using meta-
translation techniques and can accept input from a variety of relational systems (INGRES, Oracle and
POSTGRES have been tested) to produce graphical models of a database’s schema as either as ER model or an
OMT model accompanied by a display of the integrity constraints in force in the database. CCVES can also be
used to enhance the legacy database by accepting input of explicit new constraints which the underlying
database does not support or which are incomplete in the database in that they should be enforced. This enables
further constraints to appear in the graphical model. These constraints are used to enhance the original
database’s meta-data model; and to assist legacy databases to be evolved and viewed in new ways. These
constraints can also be used to detect inconsistent legacy data prior to its migration from its current database
and help in the transparent migration of legacy databases which permits users to continue using them while
they migrate.
Meta-programming technology have been successfully used in several recent research projects to
address heterogeneity issues. A key to this approach is the transformation of the source meta-data or query into
a common internal representation which is then separately transformed into a chosen target representation.
Thus components of a schema, referred to as meta-data, are classified as entity and attribute on input, and are
stored in a database language independent fashion in the internal representation. This meta-data is then
processed to derive the appropriate schema information of a particular DBMS. In this way it is possible to use a
single representation and yet deal with issues related to most types of DBMSs. A similar approach is used for
query transformation between source and target representations.
16th National IT Conference, Sri Lanka, 11-13 July 1997 1
2. 1. INTRODUCTION technology itself limits them from being
adapted to meet the changing business needs
Over the years rapid technological changes catalysed by new technology. The older
have taken place in all fields of computing. systems which have been developed using
Most of these changes have been due to the 3GLs and in operation for many years, often
advances in data communications, computer suffer from failures, inappropriate
hardware and software which together have functionality, lack of documentation, poor
provided a reliable and powerful networking performance and are referred to as legacy
environment (i.e. standard local and wide area information systems.
networks) that allow the management of data
stored in computing facilities at many nodes The current technology is much more flexible
of the network. These changes have turned as it supports methods to evolve (e.g. 4GLs,
round the hardware technology from CASE tools, GUI toolkits and reusable
centralised mainframes to networked file- software libraries), and can share resources
server and client-server architectures which through software that allows interoperability
support various ways to use and share data. (e.g. ODBC). This evolution reflects the
changing business needs. However, modern
Simultaneous developments in the software systems need to be properly designed and
industry have produced techniques (e.g. for implemented to benefit from this technology,
system design and development) and products which may still be unable to prevent such
capable of utilising the new hardware systems themselves being considered to be
resources (e.g. multi-user environments with legacy information systems in the near future
GUIs). These new developments are being due to the advent of the next generation of
used for a wide variety of applications, technology with its own special features. The
including modern distributed information only salvation would appear to be building in
processing applications, such as office evolution paths in the current systems. This
automation where users can create and use will ensure that any attempts to incorporate
databases with forms and reports with minimal the modern technology will not adversely
effort, compared to the development efforts affect the ongoing functionality of existing
using 3GLs. Such applications are being systems.
developed with the aid of database technology
as this field too has advanced by allowing Re-engineering of legacy databases using
users to represent and manipulate advanced meta-programming technology in such a way
forms of data and their functionalities. Due to that the process is transparent to the current
the program data independence feature of users has proved to be a successful. This paper
DBMSs the maintenance of database describes the benefits of this technology to the
application programs had become easier as very important application areas of enhancing
functionalities that were traditionally and evolving heterogeneous distributed legacy
performed by procedural application routines databases to assist the legacy database
are now supported declaratively using migration process. The role of meta-
database concepts such as constraints and programming in this context is described here
rules. by considering the complementary roles of
schema and query meta-translation systems,
In the field of databases, the recent advances the schema meta-visualisation system and
resulting from technological transformation schema meta-integration system.
include many areas such as the use of
distributed database technology, object- The rest of the paper is organised as follows.
oriented-oriented technology, constraints, Section 2 identifies the re-engineering of
knowledge-based systems, 4GLs and CASE databases with special emphasis on the
tools. Meanwhile, the older technology was relational model. This is followed by an
dealing with files and primitive database overview of the meta-programming
systems which now appear inflexible, as the technology. Three main stages in the
2 16th National IT Conference, Sri Lanka, 11-13 July 1997
3. application of our system are described next. • Determine keys, e.g. primary keys,
The role of our system in context of meta- candidate keys and foreign keys.
programming technology is then described. • Determine entity and relationship types.
Finally, we complete the paper by identifying • Construct suitable data abstractions, such
some of our experiences and drawing as generalisation and aggregation
conclusions. structures.
2.1 Contents of a relational database
2. RE-ENGINEERING DATABASES
Diverse sources provide information that leads
Software such as programming code and to the identification of a database’s contents.
databases is re-engineered for a number of These include the database’s schema,
reasons: for example, to allow reuse of past observed patterns of data, semantic
development efforts, reduce maintenance understanding of application and user
expense and improve software flexibility 11. manuals. Among these the most informative
This re-engineering process consists of two source is the database’s schema, which can be
stages, namely: a reverse-engineering and a extracted from the data dictionary of a DBMS.
forward-engineering process. In database The observed patterns of data usually provide
migration the reverse-engineering process may information such as possible key fields,
be applied to help migrate databases between domain ranges and the related data elements.
different vendor implementations of a This source of information is usually not
particular database paradigm (e.g. from reliable as invalid, inconsistent, and
Informix to Oracle), between different incomplete data exists in most legacy
versions of a particular DBMS (e.g. Oracle applications. The reliability can be increased
version 3 to Oracle version 7) and between by using the semantics of an application. The
database types (e.g. hierarchical to modern availability of user manuals for a legacy IS is
relational database systems). The forward- rare and they are usually out of date, which
engineering process, which is the second stage means they provide little or no useful
of re-engineering, is performed on the information to this search.
conceptual model derived from the original
reverse-engineering process. At this stage, the Data dictionaries of relational databases store
objective is to redesign and / or enhance an information about relations, attributes of
existing database system with missing and / or relations, and rapid data access paths of an
new information. application. Modern relational databases
record additional information, such as primary
The application of reverse-engineering to and foreign keys (e.g. Oracle), rules /
relational databases has been widely described constraints on relations (e.g. INGRES,
and applied 2-4, 8-11, 18. The latest approaches POSTGRES, Oracle) and generalisation
have been extended to construct a higher level hierarchies (e.g. POSTGRES). Hence, analysis
of abstraction than the original E-R model. of the data dictionaries of relational databases
This includes the representation of object- provides the basic elements of a database
oriented concepts such as generalisation / schema, i.e. entities, their attributes, and
specialisation hierarchies in a reversed- sometimes the keys and constraints, which are
engineered conceptual model. then used to discover the entity and
relationship types that represent the basic
The techniques used in the reverse- components of a conceptual model for the
engineering process consist of identifying application. The trend is for each new product
common characteristics as identified below: release to support more sophisticated facilities
for representing knowledge about the data.
• Identify the database’s contents such as
relations and attributes of relations.
16th National IT Conference, Sri Lanka, 11-13 July 1997 3
4. 2.2 Keys of relational data model between two entities will have the same
attribute names in the entities involved. This
Theoretically, three types of key are specified naming convention was used in 2 to determine
in a relational data model. They are primary, relationship types, as foreign key
candidate and foreign keys. Early relational specifications are not supported by all
DBMSs were not capable of implicitly databases. An important contribution of our
representing these. However, sometimes work is to support the identification of foreign
indexes which are used for rapid data access key specifications for any database and hence
can be used as a clue to determine some keys the detection of relationships, without
of an application database. For instance, the performing any name conversions. We note
analysis of the unique index keys of a that some reverse-engineering methods rely on
relational database provides sufficient candidate keys (e.g. 8, 10), while others rely on
information to determine possible primary or primary keys (e.g. 2). These approaches insist
candidate keys of an application. The on their users meeting their pre-requisites (e.g.
observed attribute names and data patterns specification of missing keys) to enable the
may also be used to assist this process. This user to successfully apply their reverse-
includes attribute names ending with ‘#’ or engineering process. This means it is not
‘no’ as possible candidate keys, and attributes possible to produce a suitable conceptual
in different relations having the same name for model until the pre-requisites are supplied. For
possible foreign key attributes. In the latter a large legacy database application the number
case, we need to consider homonyms to of these could exceed a hundred and hence, it
eliminate incorrect detections and synonyms is not appropriate to rely on such pre-
to prevent any omissions due to the use of requisites being met to derive an initial
different names for the same purpose. Such conceptual model. Therefore, we concentrate
attributes may need to be further verified on providing an initial conceptual model using
using the data elements of the database. This only the available information. This will
includes explicit checks on data for validity of ensure that the reverse-engineering process
uniqueness and referential integrity properties. will not fail due to the absence of any vital
However the reverse of this process, i.e. information (e.g. the key specification for an
determining a uniqueness property from the entity).
data values in the extensional database is not a
reliable source of information, as the data 2.3 Entity and Relationship Types of a
itself is usually not complete (i.e. it may not data model
contain all possible values) and may not be
fully accurate. Hence we do not use this In the context of an E-R model an entity is
process although it has been used in 2, 11. classified as strong (regular) or weak
The lack of information on keys in some depending on an existence-dependent property
existing database specifications has led to the of the entity. A weak entity cannot exist
use of data instances to derive possible keys. without the entity it is dependent on. The
However it is not practicable to automate this enhanced E-R model (EER) 5 identifies more
process as some entities have keys consisting entity types, namely: composite, generalised
of multiple attributes. This means many and specialised entities. Different
permutations would have to be considered to classifications of entities are due to their
test for all possibilities. This is an expensive associative properties with other entities. The
operation when the volume of data and / or the identification of an appropriate entity type for
number of attributes is large. each entity will assist in constructing a
graphically informative conceptual model for
In 2, a consistent naming convention is applied its users. The extraction of information from
to key attributes. Here attributes used to legacy systems to classify the appropriate
represent the same information must have the entity type is a difficult task as such
same name, and as a result referencing and information is usually lost during an
referenced attributes of a binary relationship implementation. This is because
4 16th National IT Conference, Sri Lanka, 11-13 July 1997
5. implementations take different forms even abstractions can only be introduced either by
within a particular data model 5. Hence, an introducing them without affecting the
information extraction process may need to existing data structures or by transforming
interact with a user to determine some of the existing entities and relationships to support
entity and relationship types. The type of their representation. For example, entities
interaction required depends on the Staff and Student may be transformed to
information available for processing and will represent a generalisation structure by
take different forms. For this reason we focus introducing a Person entity.
only on our approach, i.e. determining entity
and relationship types using enhanced Other forms of transformation can also be
knowledge such as primary and foreign key performed. These include decomposing all n-
information. ary relationships for n > 3 into their
constituent relationships of order 2 to remove
2.4 Suitable Data Abstractions for a data such relationships and hence simplify the
model association among their entities. At this stage
double buried relationships are identified and
Entities and relationships form the basic merged and relationships formed with
components of a conceptual data model. These subclasses are eliminated. Transitive closure
components describe specific structures of a relationships are also identified and changed
data model. A collection of entities may be to form simplified hierarchies. We use
used to represent more than one data structure. constraints to determine relationships and
For example, entities Person and Student may hierarchies. By controlling these constraints
be represented as a 1:1 relationship or as a is-a (i.e. modifying or deleting them) it is possible
relationship. Each representation has its own to transform or eliminate necessary
view and hence the user understanding of the relationships and hierarchies.
data model will differ with the choice of data
structure. Hence it is important to be able to
introduce any data structure for a conceptual 3. META-PROGRAMMING
model and view using the most suitable data TECHNOLOGY
abstraction.
Meta-programming technology allows the
Data structures such as generalisation and meta-data (schema information) of a database
aggregation have inherent behavioural to be held and processed independently of its
properties which give additional information source specification language. This allows us
about their participating entities (e.g. an to work on a database language independent
instance of a specialised entity of a environment and hence overcome many
generalisation hierarchy is made up from an logical heterogeneity issues. Prolog based
instance of its generalised entity). These meta-programming technology has been used
structures are specialised relationships and in previous research at Cardiff in the area of
representation of them in a conceptual model logical heterogeneity 6, 14. Using this
provides a higher level of data abstraction and technology the meta-translation of database
a better user understanding than the basic E-R query languages 7 and database schemas 15 has
data model gives. These data abstractions been performed. This work has shown how the
originated in the object-oriented data model heterogeneity issues of different DBMSs can
and they are not implicitly represented in be addressed without having to reprogram the
existing relational DBMSs. Extended- same functionality for each and every DBMS.
relational DBMSs support the O-O paradigm We use meta-programming technology for our
(e.g. POSTGRES) with generalisation legacy database migration approach as we
structures being created using inheritance need to be able to start with a legacy source
definitions on entities. However in the context database and end with a modern target
of legacy DBMSs such information is not database where the respective database
normally available, and as a result such data schema and query languages may be different
16th National IT Conference, Sri Lanka, 11-13 July 1997 5
6. from each other. In this approach the source applied experimentally to the legacy database
database schema or query language is mapped to determine the extent to which it conforms to
on input into an internal canonical form. All them. This process is done at stage 3 (cf. paths
the required processing is then done using the C-1 and C-2 of figure i). The user can then
information held in this internal form. This decide whether these constraints should be
information is finally mapped to the target enforced to improve the quality of the legacy
schema or query language to produce the database prior to its migration. At this point
desired output. The advantage of this approach the three preparatory stages in the application
is that processing is not affected by of our approach are complete. The actual
heterogeneity as it is always performed on migration process is then performed. All
data held in the canonical form. This stages are further described below to enable us
canonical form is an enriched collection of to identify the main processing components of
semantic data modelling features. our proposed system as well as to explain how
we deal with different levels of heterogeneity.
4. APPLICATION 4.1 Stage 1: Reverse Engineering
We view our re-engineering approach as In stage 1, the data definition of the selected
consisting of 3 stages. At stage 1, the data database is reverse-engineered to produce a
definition of the selected database is reverse- graphical display of the database. To perform
engineered to produce a graphical display (cf. this task, the database’s meta-data must be
paths A-1 and A-2 of figure i). However, in extracted (cf. path A-1 of figure i). This is
legacy systems much of the information achieved by connecting directly to the
needed to present the database schema in this heterogeneous database. The accessed meta-
way is not available as part of the database data needs to be represented using our internal
meta-data and hence these links which are form. This is achieved through a schema
present in the database cannot be shown in mapping process as used in the SMTS
this conceptual model. In modern systems (Schema Meta-Translation System) of Ramfos
15
such links can be identified using constraint . The meta-data in our internal formalism
specifications. Thus, if the database does not then needs to be processed to derive the
have any explicit constraints, or it does but graphical constructs present for the database
these are incomplete, new knowledge about concerned (cf. path A-2 of figure i). These
the database needs to be entered at stage 2 (cf. constructs are in the form of entity types and
path B-1 of figure i), which will then be the relationships and their derivation process
reflected in the enhanced schema appearing in is the main processing component in stage 1.
the graphical display (cf. path B-2 of figure i). The identified graphical constructs are
This enhancement will identify new links that mapped to a display description language to
should be present for the database concerned. produce a graphical display of the database.
These new database constraints can next be
6 16th National IT Conference, Sri Lanka, 11-13 July 1997
7. Schema
Enhanced Visualisation Enforced
Constraints (EER or OMT) Constraints
with Constraints
B-1 C-1
B-2 A-2
Internal Processing
B-3 C-2
A-1
Heterogeneous Databases
Stage 1 (Reverse Engineering) Stage 2 (Knowledge Augmentation)
Stage 3 (Constraint Enforcement)
Figure i: Information flow in the 3 stages of our approach prior to migration
a) Database connectivity for point for the meta-translation process as in
heterogeneous database access previous Cardiff systems 12, 15,. We found that
it is not essential to produce such a textual
Unlike the previous Cardiff meta-translation file, as the required intermediate
systems 7, 12, 15, which addressed heterogeneity representation can be directly produced by the
at the logical and data management levels, our database access process. This means that we
system looks at the physical level as well. could also by-pass the meta-translation
While these previous systems processed process that performs the analysis of the DDL
schemas in textual form and did not access text to translate it into the intermediate
actual databases to extract their DDL representation. However the DDL formalism
specification, our system addresses physical of the schema can be used for optional textual
heterogeneity by accessing databases running viewing and could also serve as the starting
on different hardware / software platforms point for other tools (e.g. The Schema Meta-
(e.g. computer systems, operating systems, Integration System (SMIS) of Qutaishat 12.)
DBMSs and network protocols). Our aim is to developed at Cardiff for meta-programming
directly access the meta-data of a given database applications.
database application by specifying its name,
the name and version of the host DBMS, and The initial functionality of the Stage 1
the address of the host machine (we assume database connectivity process is to access a
that access privileges for this host machine heterogeneous database and supply the
and DBMS have been granted). If this accessed meta-data as input to our schema
database access process can produce a meta-translator (SMTS). This module needs to
description of the database in DDL formalism, deal with heterogeneity at the physical and
then this textual file is used as the starting data management levels. We achieve this by
16th National IT Conference, Sri Lanka, 11-13 July 1997 7
8. using DML commands of the specific DBMS Graphical data models of schemas employ a
to extract the required meta-data held in set of data modelling concepts and a language-
database data dictionaries treated like user independent graphical notation (e.g. the Entity
defined tables. Relationship (E-R) model,
Extended/Enhanced Entity Relationship
Relatively recently, the functionalities of a (EER) model 5 or the Object Modelling
heterogeneous database access process have Technique (OMT) 17). In a heterogeneous
been provided by means of drivers such as environment different users may prefer
ODBC 16. Use of such drivers will allow different graphical models, and an
access to any database supported by them and understanding of the database structure and
hence obviate the need to develop specialised architecture beyond that given by the
tools for each database type as happened in traditional entities and their properties.
our case. These driver products were not Therefore, there is a need to produce graphical
available when we undertook this stage of our models of a database’s schema using different
work. graphical notations such as either E-R/EER or
OMT, and to accompany them with additional
b) Schema meta-translation information such as a display of the integrity
constraints in force in the database 18. The
The schema meta-translation process 15 display of integrity constraints allows users to
accepts input of any database schema look at intra- and inter-object constraints and
irrespective of its DDL and features. The gain a better understanding of domain
information captured during this process is restrictions applicable to particular entities.
represented internally to enable it to be Current reverse engineering tools do not
mapped from one database schema to another support this type of display.
or to further process and supply information to
other modules such as the schema meta- The generated graphical constructs are held
visualisation system (SMVS) 13 and the query internally in a similar form to the meta-data of
meta-translation system (QMTS) 7. Thus, the the database schema. Hence using a schema
use of an internal canonical form for meta meta visualisation process (SMVS) it is
representation has successfully accommodated possible to map the internally held graphical
heterogeneity at the data management and constructs into appropriate graphical symbols
logical levels. and coordinates for the graphical display of
the schema. This approach has a similarity to
c) Schema meta-visualisation the SMTS, the main difference being that the
output is graphical rather than textual.
Schema visualisation using graphical notation
and diagrams has proved to be an important 4.2 Stage 2: Knowledge Augmentation
step in a number of applications, e.g. during
the initial stages of the database design In a heterogeneous distributed database
process; for database maintenance; for environment, evolution is expected, especially
database re-design; for database enhancement; in legacy databases. This evolution can affect
for database integration; or for database the schema description and in particular
migration; as it gives users a sound schema constraints that are not reflected in the
understanding of an existing database’s stage 1 (path A-2) graphical display as they
structure in an easily assimilated format 1, 5. may be implicit in applications. Thus our
Database users need to see a visual picture of system is designed to accept new constraint
their database structure instead of textual specifications (cf. path B-1 of figure i) and
descriptions of the defining schema as it is add them to the graphical display (cf. path B-2
easier for them to comprehend a picture. This of figure i) so that these hidden constraints
has led to the production of graphical become explicit.
representations of schema information,
effected by a reverse engineering process.
8 16th National IT Conference, Sri Lanka, 11-13 July 1997
9. The new knowledge accepted at this point is for the same additional knowledge. The
used to enhance the schema and is retained in augmented tables are created and maintained
the database using a database augmentation in a similar way to user-defined tables, but
process (cf. path B-3 of figure i). The new have a special identification to distinguish
information is stored in a form that conforms them. Their structure is in line with the
with the enhanced target DBMS’s methods of international standards and the newer versions
storing such information. This assists the of commercial DBMSs, so that the enhanced
subsequent migration stage. database can be easily migrated to either a
newer version of the host DBMS or to a
a) Schema enhancement different DBMS supporting the latest SQL
standards. Migration should then mean that
Our system needs to permit a database schema the newer system can enforce the constraints.
to be enhanced by specifying new constraints Our approach should also mean that it is easy
applicable to the database. This process is to map our tables for holding this information
performed via the graphical display. These into the representation used by the target
constraints, which are in the form of integrity DBMS even if it is different, as we are
constraints (e.g. primary key, foreign key, mapping from a well defined structure.
check constraints) and structural components
(e.g. inheritance hierarchies, entity Legacy databases that do not support explicit
modifications) are specified using a GUI. constraints can be enhanced by using the
When they are entered they will appear in the above knowledge augmentation method. This
graphical display. requirement is less likely to occur for
databases managed by more recent DBMSs as
b) Database augmentation they already hold some constraint
specification information in their system
The input data to enhance a schema provides tables. The direction taken by Oracle version 6
new knowledge about a database. It is was a step towards our augmentation
essential to retain this knowledge within the approach, as it allowed the database
database itself, if it is to be readily available administrator to specify integrity constraints
for any further processing. Typically, this such as primary and foreign keys, but did not
information is retained in the knowledge base yet enforce them. The next release of Oracle,
of the tool used to capture the input data, so i.e. version 7, implemented this constraint
that it can be reused by the same tool. This enforcement process.
approach restricts the use of this knowledge
by other tools and hence it must be re-entered 4.3 Stage 3: Constraint Enforcement
every time the re-engineering process is
applied to that database. This makes it harder The enhanced schema can be held in the
for the user to gain a consistent understanding database, but the DBMS can only enforce
of an application, as different constraints may these constraints if it has the capability to do
be specified during two separate re- so. This will not normally be the case in
engineering processes. To overcome this legacy systems. In this situation, the new
problem, we augment the database itself using constraints may be enforced via a newer
the techniques proposed in SQL-3, wherever version of the DBMS or by migrating the
possible. When it is not possible to use SQL-3 database to another DBMS supporting
structures we store the information in our own constraint enforcement. However, the data
augmented table format which is a natural being held in the database may not conform to
extension of the SQL-3 approach. the new constraints, and hence existing data
may be rejected by the target DBMS in the
When a database is augmented using this migration, thus losing data and / or delaying
method, the new knowledge is available in the the migration process. To address this problem
database itself. Hence, any further re- and to assist the migration process, we provide
engineering processes need not make requests an optional constraint enforcement process
16th National IT Conference, Sri Lanka, 11-13 July 1997 9
10. module which can be applied to a database However, we demonstrate how to create and
before it is migrated. The objective of this populate a legacy database schema in the
process is to give users the facility to ensure desired target environment while showing the
that the database conforms to all the enhanced role of SMTS and QMTS in such a process.
constraints before migration occurs. This
process is optional so that the user can decide
whether these constraints should be enforced 5. THE ROLE IN CONTEXT OF
to improve the quality of the legacy data prior META-PROGRAMMING
to its migration, whether it is best left as it TECHNOLOGY
stands, or whether the new constraints are too
severe. Our approach described in section 4 is based
on preparing a legacy database schema for
The constraint definitions in the augmented graceful migration. This involves visualisation
schema are employed to perform this task. As of database schemas with constraints and
all constraints held have already been enhancing them with constraints to capture
internally represented in the form of logical more knowledge. Hence we call our system
expressions, these can be used to produce data the Conceptualised Constraint Visualisation
manipulation statements suitable for the host and Enhancement System (CCVES).
DBMS. Once these statements are produced,
they are executed against the current database CCVES has been developed to fit in with the
to identify the existence of data violating a previously developed schema (SMTS) 15 and
constraint. query (QMTS) 7 meta-translation systems, and
the schema meta-visualisation system (SMVS)
13
4.4 Stage 4: Migration Process . This allows us to consider the
complementary roles of CCVES, SMTS,
The migration process itself is incrementally QMTS and SMVS during Heterogeneous
performed by initially creating the target Distributed Database access in a uniform way
6, 14
database and then copying the legacy data . The combined set of tools achieves
over to it. The schema meta-translation semantic coordination and promotes
(SMTS) technique of Ramfos 15 is used to interoperability in a heterogeneous
produce the target database schema. The environment at logical, physical and data
legacy data can be copied using the import / management levels.
export tools of source and target DBMS or
DML statements of the respective DBMSs. Figure ii illustrates the architecture of CCVES
During this process, the legacy applications in the context of meta-data processing
must continue to function until they too are modules. It outlines in general terms the
migrated. To achieve this an interface can be process of accessing a remote (legacy)
used to capture and process all database database to perform various database tasks,
queries of the legacy applications during such as querying, visualisation, enhancement,
migration. This interface can decide how to migration and integration. All these processes
process database queries against the current uses the meta-data for their internal process.
state of the migration and re-direct those
newly related to the target database. The query There are seven sub-processes: the schema
meta-translation (QMTS) technique of mapping process 15, query mapping process 7,
Howells 7 can be used to convert these queries schema integration process 12, schema
to the target DML. This approach will visualisation process 13, database connectivity
facilitate transparent migration for legacy process, database enhancement process and
databases. Our work does not involve the database migration process. The first two
development of an interface to capture and processes together have been called the
process all database queries, as interaction Integrated Translation Support Environment 6,
with the query interface of the legacy IS is and the first four processes together have been
embedded in the legacy application code. called the Meta-Integration/Translation
10 16th National IT Conference, Sri Lanka, 11-13 July 1997
11. Support Environment 12. The last three query mapping process, referred to as QMTS,
processes were introduced as CCVES to to generate the required queries to update the
perform database enhancement and migration database via the DBC process. At this stage
in such an environment. any existing or enhanced constraints may be
applied to the database to determine the extent
The schema mapping process, referred to as to which it conforms to the new
SMTS, translates the definition of a source enhancements. Carrying out this process will
schema to a target schema definition (e.g. an also ensure that legacy data will not be
INGRES schema to a POSTGRES schema). rejected by the target DBMS due to possible
The query mapping process, referred to as violations. Finally, the database migration
QMTS, translates a source query to a target process, referred to as DBMI, assists
query (e.g. an SQL query to a QUEL query). migration by incrementally migrating the
The meta-integration process, referred to as database to the target environment (route C-1
SMIS, tackles heterogeneity at the logical to C-6 in figure ii). Target schema constructs
level in a distributed environment containing for each migratable component are produced
multiple database schemas (e.g. Ontos and via SMTS, and DDL statements are issued to
Exodus local schemas with a POSTGRES the target DBMS to create the new database
global schema) - it integrates the local schema. The data for these migrated tables are
schemas to create the global schema. The extracted by instructing the source DBMS to
meta-visualisation process, referred to as export the source data to the target database
SMVS, generates a graphical representation of via QMTS. Here too, the queries which
a schema. The remaining three processes, implement this export are issued to the DBMS
namely: database connectivity, enhancement via the DBC process.
and migration with their associated processes,
namely: SMVS, SMTS and QMTS, are the 6. EXPERIENCES AND
subject of the present thesis, as they together CONCLUSIONS
form CCVES (centre section of figure ii).
CCVES, although it has been tested for only
The database connectivity process (DBC), three types of DBMS, namely: INGRES,
queries meta-data from a remote database POSTGRES and Oracle, could be easily
(route A-1 in figure ii) to supply meta- adapted for other relational DBMSs as they
knowledge (route A-2 in figure ii) to the represent their meta-data similarly - i.e. in the
schema mapping process referred to as SMTS. form of system tables, with minor differences
SMTS translates this meta-knowledge to an such as table and attribute names and some
internal representation which is based on SQL table structures. Non relational database
schema constructs. These SQL constructs are models accessible via ODBC or other tools
supplied to SMVS for further processing (e.g. Data Extract for DB2, which permits
(route A-3 in figure ii) which results in the movement of data from IMS/VS, DL/1,
production of a graphical view of the schema VSAM, SAM to SQL/DS or DB2), could also
(route A-4 in figure ii). Our reverse- be easily adapted as the meta-data required by
engineering techniques 18 are applied to CCVES could be extracted from them.
identify entity and relationship types to be Previous work related to meta-translation 7 has
used in the graphical model. Meta-knowledge investigated the translation of dBase code to
enhancements are solicited at this point by the INGRES/QUEL, demonstrating the
database enhancement process (DBE) (route applicability of this technique in general, not
B-1 in figure ii), which allows the definition only to the relational data model but also to
of new constraints and changes to the existing others such as CODASYL and hierarchical
schema. These enhancements are reflected in data models. This means CCVES is capable in
the graphical view (route B-2 and B-3 in principle of being extended to cope with other
figure ii) and may be used to augment the data models.
database (route B-4 to B-8 in figure ii). This
approach to augmentation makes use of the
16th National IT Conference, Sri Lanka, 11-13 July 1997 11
12. 12 16th National IT Conference, Sri Lanka, 11-13 July 1997
13. The meta-programming approach enabled us 4. Dumpala S.R. and Arora S.K., ‘Schema
to implement many other features, such as the translation using the entity-relationship
ability to easily customise our system for approach’, Proceedings of 2nd
different data models, e.g. relational and International Conference on Entity-
object-oriented, the ability to easily enhance Relationship Approach, Chen P.P.-C.
or customise for different display models, e.g. (Ed.), Washington, 1981, pp. 339-360.
E-R, EER and OMT, and the ability to deal
with heterogeneity due to differences in local 5. Elmasri R. and Navathe S.B.,
databases (e.g. at the global level the user ‘Fundamentals of database systems’, 2nd
views all local databases as if they come from edition, Benjamin/Cummings, 1994.
the same DBMS, and is also able to view
databases using a preferred DDL syntax). 6. Fiddian N.J., Gray W.A., Ramfos A. and
Cooke A., ‘Database meta-translation
We were able to successfully reverse-engineer technology: integration, status and
a leading telecommunication database extract application’, Database Technology, Vol.
consisting of over 50 entities. This enabled us 4, 1992, pp. 259-263.
to test our tool on a scale greater than that of
our test databases. Beside all or parts of our 7. Howells D.I., Fiddian N.J. and Gray
system have been successfully used for other W.A., ‘A source-to-source meta-
research work 18. translation system for relational query
languages’, Proceedings of 13th
International Conference on Very Large
7. ACKNOWLEDGEMENTS Data Bases, Stocker P., Kent W. and
Hammersley P. (Eds.), Brighton, 1987, pp.
This work was supervised by Prof. W.A. Gray 227-234.
and Dr. N.J. Fiddian, and was carriedout at
University of Wales, College of Cardiff. This 8. Johannesson P. and Kalman K., ‘A
work was partially funded by the Association methodology for translating relational
of Commonwealth. schemas into conceptual schemas’,
Proceedings of 8th International
Conference on Entity-Relationship
8. REFERENCES Approach, 1989, pp. 279-294.
1. Batini C., Ceri S. and Navathe S.B., 9. Markowitz V.M. and Makowsky J.A.,
‘Conceptual database design: an entity- ‘Identifying extended entity-relationship
relationship approach’, object structures in relational schemas’,
Benjamin/Cummings, 1992. IEEE Transactions on Software
2. Chiang R.H.L., Barron T.M. and Storey Engineering, Vol. 16, No. 8, 1990, pp.
V.C., ‘Reverse engineering of relational 777-790.
database: extraction of an EER model
from a relational database’, Data and 10. Navathe S.B. and Awong A.M.,
Knowledge Engineering, Vol. 12, No. 2, ‘Abstracting relational and hierarchical
1994, pp. 107-142. data with a semantic data model’,
Proceedings of the 6th International
3. Davis A., ‘Converting a relational Conference on Entity-Relationship
database model into an entity Approach, March S.T. (Ed.), New York,
relationship model’, Proceedings of 6th 1987, pp. 305-333.
International Conference on Entity-
Relationship Approach, March S.T. (Ed.), 11. Premerlani W.J. and Blaha M.R., ‘An
New York, 1987, pp. 271-285. approach for reverse engineering of
16th National IT Conference, Sri Lanka, 11-13 July 1997 13
14. relational databases’, Communications of on Database and Expert Systems,
the ACM, Vol. 37, No. 5, 1994, pp. 42- International Institute for Advanced
49,134. Studies, 1994, pp. 148-154.
12. Qutaishat M.A., ’A schema meta- 15. Ramfos A., Fiddian N.J. and Gray W.A.,
integration system for a heterogeneous ‘A meta-translation system for object-
object-oriented database environment’, oriented to relational schema
Proceedings of NordDATA'92 translations’, Proceedings of 9th British
Conference, Tampere, 1992, pp. 74-92. National Conference on Databases,
Wolverhampton, 1991, pp. 245-268.
13. Qutaishat M.A., Gray W.A. and Fiddian
N.J., ‘A highly-customisable schema meta- 16. Richter J., ‘ODBC 2.0 further establishes
visualisation system for object-oriented cross-product data sharing standard’,
database schemas: overview’, BYTE, November 1994, pp. 52.
Proceedings of 4th International
Conference on Database and Expert 17. Rumbaugh J., Blaha M., Premerlani W.,
Systems Applications, Springer-Verlag, Eddy F. and Lorensen W., ‘Object-
1993, pp. 756-759. oriented modeling and design’, Prentice-
Hall, 1991.
14. Qutaishat M.A., Fiddian N.J. and Gray
W.A., ‘Review and potential of meta- 18. Wikramanayake G.N., ‘Assisting
programmed expert systems in a Migration and Evolution of Relational
heterogeneous distributed database Legacy Databases’, PhD Thesis,
environment’, Proceedings of Symposium University of Wales Cardiff, 1996.
14 16th National IT Conference, Sri Lanka, 11-13 July 1997