SlideShare a Scribd company logo
1 of 133
Download to read offline
Assisting

 Migration and Evolution

               of

Relational Legacy Databases




               by

      G.N. Wikramanayake




 Department of Computer Science,

   University of Wales Cardiff,

             Cardiff



         September 1996
Abstract

The research work reported here is concerned with enhancing and preparing databases with limited
DBMS capability for migration to keep up with current database technology. In particular, we have
addressed the problem of re-engineering heterogeneous relational legacy databases to assist them in
a migration process. Special attention has been paid to the case where the legacy database service
lacks the specification, representation and enforcement of integrity constraints. We have shown how
knowledge constraints of modern DBMS capabilities can be incorporated into these systems to
ensure that when migrated they can benefit from the current database technology.

To this end, we have developed a prototype conceptual constraint visualisation and enhancement
system (CCVES) to automate as efficiently as possible the process of re-engineering for a
heterogeneous distributed database environment, thereby assisting the global system user in
preparing their heterogeneous database systems for a graceful migration. Our prototype system has
been developed using a knowledge based approach to support the representation and manipulation of
structural and semantic information about schemas that the re-engineering and migration process
requires. It has a graphical user interface, including graphical visualisation of schemas with
constraints using user preferred modelling techniques for the convenience of the user. The system
has been implemented using meta-programming technology because of the proven power and
flexibility that this technology offers to this type of research applications.

The important contributions resulting from our research includes extending the benefits of meta-
programming technology to the very important application area of evolution and migration of
heterogeneous legacy databases. In addition, we have provided an extension to various relational
database systems to enable them to overcome their limitations in the representation of meta-data.
These extensions contribute towards the automation of the reverse-engineering process of legacy
databases, while allowing the user to analyse them using extended database modelling concepts.




                                                                                            Page v
CHAPTER 1

                                          Introduction

This chapter introduces the thesis. Section 1.1 is devoted to the background and motivations of the
research undertaken. Section 1.2 presents the broad goals of the research. The original achievements
which have resulted from the research are summarised in Section 1.3. Finally, the overall
organisation of the thesis is described in Section 1.4.

1.1 Background and Motivations of the Research

         Over the years rapid technological changes have taken place in all fields of computing. Most
of these changes have been due to the advances in data communications, computer hardware and
software [CAM89] which together have provided a reliable and powerful networking environment
(i.e. standard local and wide area networks) that allow the management of data stored in computing
facilities at many nodes of the network [BLI92]. These changes have turned round the hardware
technology from centralised mainframes to networked file-server and client-server architectures
[KHO92] which support various ways to use and share data. Modern computers are much more
powerful than the previous generations and perform business tasks at a much faster rate by using
their increased processing power [CAM88, CAM89]. Simultaneous developments in the software
industry have produced techniques (e.g. for system design and development) and products capable of
utilising the new hardware resources (e.g. multi-user environments with GUIs). These new
developments are being used for a wide variety of applications, including modern distributed
information processing applications, such as office automation where users can create and use
databases with forms and reports with minimal effort, compared to the development efforts using
3GLs [HIR85, WOJ94]. Such applications are being developed with the aid of database technology
[ELM94, DAT95] as this field too has advanced by allowing users to represent and manipulate
advanced forms of data and their functionalities. Due to the program data independence feature of
DBMSs the maintenance of database application programs has become easier as functionalities that
were traditionally performed by procedural application routines are now supported declaratively
using database concepts such as constraints and rules.

       In the field of databases, the recent advances resulting from technological transformation
include many areas such as the use of distributed database technology [OZS91, BEL92], object-
oriented technology [ATK89, ZDO90], constraints [DAT83, GRE93], knowledge-based systems
[MYL89, GUI94], 4GLs and CASE tools [COMP90, SCH95, SHA95]. Meanwhile, the older
technology was dealing with files and primitive database systems which now appear inflexible, as
the technology itself limits them from being adapted to meet the current changing business needs
catalysed by newer technologies. The older systems which have been developed using 3GLs and in
operation for many years, often suffer from failures, inappropriate functionality, lack of
documentation, poor performance and are referred to as legacy information systems [BRO93,
COMS94, IEE94, BRO95, IEEE95].

       The current technology is much more flexible as it supports methods to evolve (e.g. 4GLs,
CASE tools, GUI toolkits and reusable software libraries [HAR90, MEY94]), and can share
resources through software that allows interoperability (e.g. ODBC [RIC94, GEI95]). This evolution
reflects the changing business needs. However, modern systems need to be properly designed and
implemented to benefit from this technology, which may still be unable to prevent such systems
themselves being considered to be legacy information systems in the near future due to the advent of
the next generation of technology with its own special features. The only salvation would appear to
be building in evolution paths in the current systems.

        The increasing power of computers and their software has meant they have already taken
over many day to day functions and are taking over more of these tasks as time passes. Thus
computers are managing a larger volume of information in a more efficient manner. Over the years
most enterprises have adopted the computerisation option to enable them to efficiently perform their
business tasks and to be able to compete with their counterparts. As the performance ability of
computers has increased, the enterprises still using early computer technology face serious problems
due to the difficulties that are inherent in their legacy systems.

        This means that new enterprises using systems purely based on the latest technology have an
advantage over those which need to continue to use legacy information systems (ISs), as modern ISs
have been developed using current technology which provides not only better performance, but also
utilises the benefits of improved functionality. Hence, managers of legacy IS enterprises want to
retire their legacy code and use modern database management systems (DBMSs) in the latest
environment to gain the full benefits from this newer technology. However they want to use this
technology on the information and data they already hold as well as on data yet to be captured. They
also want to ensure that any attempts to incorporate the modern technology will not adversely affect
the ongoing functionality of their existing systems. This means legacy ISs need to be evolved and
migrated to a modern environment in such a way that the migration is transparent to the current
users. The theme of this thesis is how we can support this form of system evolution.

         1.1.1 The Barriers to Legacy Information System Migration

       Legacy ISs are usually those systems that have stood the test of time and have become a core
service component for a business’s information needs. These systems are a mix of hardware and
software, sometimes proprietary, often out of date, and built to earlier styles of design,
implementation and operation. Although they were productive and fulfilled their original
performance criteria and their requirements, these systems lack the ability to change and evolve. The
following can be seen as barriers to evolution in legacy IS [IEE94].

    • The technology used to build and maintain the legacy IS is obsolete,
    • The system is unable to reflect changes in the business world and to support new needs,
    • The system cannot integrate with other sub-systems,
    • The cost, time and risk involved in producing new alternative systems to the legacy IS.

        The risk factor is that a new system may not provide the full functionality of the current
system for a period because of teething problems. Due to these barriers, large organisations [PHI94]
prefer to write independent sub-systems to perform new tasks using modern technology which will
run alongside the existing systems, rather than attempt to achieve this by adapting existing code or
by writing a new system that replaces the old and has new facilities as well. We see the following
immediate advantages of this low risk approach.



Page 4
• The performance, reliability and functionality of the existing system is not affected,
    • New applications can take advantage of the latest technology,
    • There is no need to retrain those staff who only need the facilities of the old system.

       However with this approach, as business requirements evolve with time, more and more new
needs arise, resulting in the development and regular use of many diverse systems within the same
organisation. Hence, in the long term the above advantages are overshadowed by the more serious
disadvantages of this approach, such as:

    • The existing systems continue to exist and are legacy IS running on older and older
      technology,
    • The need to maintain many different systems to perform similar tasks increases the
      maintenance and support costs of the organisation,
    • Data becomes duplicated in different systems which implies the maintenance of redundant data
      with its associated increased risk of inconsistency between the data copies if updating occurs,
    • The overall maintenance cost for hardware, software and support personnel increases as many
      platforms are being supported,
    • The performance of the integrated information functions of the organisation decreases due to
      the need to interface many disparate systems.

        To address the above issues, legacy ISs need to be evolved and migrated to new computing
environments, when their owning organisation upgrades. This migration should occur within a
reasonable time after the upgrade occurs. This means that it is necessary to migrate legacy ISs to
new target environments in order to allow the organisation to dispose of the technology which is
becoming obsolete. Managers of some enterprises have chosen an easy way to overcome this
problem, by emulating [CAM89, PHI94] the current environment on the new platforms (e.g. AS/400
emulators for IBM S/360 and ICL’s DME emulators for 1900 and System 4 users). An alternative
strategy is achieved by translating [SHA93, PHI94, SHE94, BRO95] the software to run in new
environments (i.e. code-to-code level translation). The emulator approach perpetuates all the
software deficiencies of the legacy ISs although successfully removing the old-fashioned hardware
technology and so it does enjoy the increased processing power of the new hardware. The translation
approach takes advantage of some of the modern technological benefits in the target environment as
the conversions - such as IBM’s JCL and ICL’s SCL code to Unix shell scripts, Assembler to
COBOL, COBOL to COBOL embedded with SQL, and COBOL data structures to relational DBMS
tables - are also done as part of the translation process. This approach, although a step forward, still
carries over most of the legacy code as legacy systems are not evolved by this process. For example,
the basic design is not changed. Hence the barrier to change and/or integration to a common sub-
system still remains, and the translated systems were not designed for the environment they are now
running in, so they may not be compatible with it.

       There are other approaches to overcoming this problem which have been used by enterprises
[SHA93, BRO95]. These include re-implementing systems under the new environment and/or
upgrading existing systems to achieve performance improvements. As computer technology
continues to evolve at an ever quicker pace the need to migrate arises more rapidly. This means,
most small organisations and individuals are left behind and are forced to work in a technologically



Page 5
obsolete environment, mainly due to the high cost of frequently migrating to newer systems and/or
upgrading existing software, as this process involves time and manpower which cost money. The
gap between the older and newer system users will very soon create a barrier to information sharing
unless some tools are developed to assist the older technology users’ migration to new technology
environments. This assistance for the older technology users may take many forms, including tools
for: analysing and understanding existing systems; enhancing and modifying existing systems;
migrating legacy ISs to newer platforms. The complete migration process for a legacy IS needs to
consider these requirements and many other aspects, as recently identified by Brodie and
Stonebraker in [BRO95]. Our work was primarily motivated by these business oriented legacy
database issues and by work in the area of extending relational database technology to enable it to
represent more knowledge about its stored data [COD79, STO86a, STO86b, WIK90]. This second
consideration is an important aspect of legacy system migration, since if a graceful migration is to be
achieved we must be able to enhance a legacy relational database with such knowledge to take full
advantage of the new system environment.

         1.1.2 Heterogeneous Distributed Environments

        As well as the problem of having to use legacy ISs, most large enterprises are faced with the
problem of heterogeneity and the need for interoperability between existing ISs [IMS91]. This arises
due to the increased use of different computer systems and software tools for information processing
within an organisation as time passes. The development of networking capabilities to manage and
share information stored over a network has made interoperability a requirement and local area
networks finding broad acceptance in business enterprises has enhanced the need to perform this task
within organisations. Network file servers, client-server technology and the use of distributed
databases [OZS91, BEL92, KHO92] are results of these challenging innovations. This technology is
currently being used to create and process information held in heterogeneous databases, which
involves linking different databases in an interoperable environment. An aspect of this work is
legacy database interoperation, since as time passes these databases will have been built using
different generations of software.

        In recent years, the demand for distributed database capabilities has been fuelled mostly by
the decentralisation of business functions in large organisations to address customer needs, and by
mergers and acquisitions that have taken place in the corporate world. As a consequence, there is a
strong requirement among enterprises for the ability to cross-correlate data stored in different
existing heterogeneous databases. This has led to the development of products referred to as
gateways, to enable users to link different databases together, e.g. Microsoft’s Open Database
Connectivity (ODBC) drivers can link Access, FoxPro, Btrieve, dBASE and Paradox databases
together [COL94, RIC94]. There are similar products for other database vendors, such as Oracle1
[HOL93] and others2 [PUR93, SME93, RIC94, BRO95]. Database vendors have targetted cross-
platform compatibility via SQL access protocols to support interoperability in a heterogeneous
environment. As heterogeneity in distributed systems may occur in various forms ranging from

   1
     For IBM’s DB2, UNISYS’s DMS, DEC RMS.
   2
     For INGRES, SYBASE, Informix and other popular SQL DBMSs.
3
  During the life-time of this project the SQL-3 standards moved from a preliminary draft, through
several modifications before being finalised in 1995.


Page 6
different hardware platforms, operating systems, networking protocols and local database systems,
cross-platform compatibility via SQL provides only a simple form of heterogeneous distributed
database access. The biggest challenge comes in addressing heterogeneity due to differences in local
databases [OZS91, BEL92]. This challenge is also addressed in the design and development of our
system.

        Distributed DBMSs have become increasingly popular in organisations as they offer the
ability to interconnect existing databases, as well as having many other advantages [OZS91,
BEL92]. The interconnection of existing databases leads to two types of distributed DBMS, namely:
homogeneous and heterogeneous distributed DBMSs. In homogeneous systems all of the constituent
nodes run the same DBMS and the databases can be designed in harmony with each other. This
simplifies both the processing of queries at different nodes and the passing of data between nodes. In
heterogeneous systems the situation is more complex, as each node can be running a different
DBMS and the constituent databases can be designed independently. This is the normal situation
when we are linking legacy databases, as the DBMS and the databases used are more likely to be
heterogeneous since they are usually implemented for different platforms during different
technological eras. In such a distributed database environment, heterogeneity may occur in various
forms, at different levels [OZS91, BEL92], namely :

    • The logical level (i.e. involving different database designs),
    • The data management level (i.e. involving different data models),
    • The physical level, (i.e. involving different hardware, operating systems and network
      protocols), and
    • At all three or any pair of these levels.

         1.1.3 The Problems and Search for a Solution

        The concept of heterogeneity itself is valuable as it allows designers a freedom of choice
between different systems and design approaches, thus enabling them to identify those most suitable
for different applications. The exploitation of this freedom over the years in many organisations has
resulted in the creation of multiple local and remote information systems which now need to be
made interoperable to provide an efficient and effective information service to the enterprise
managers. Open database connectivity (ODBC) [RIC94, GEI95] and its standards has been proposed
to support interoperability among databases managed by different DBMSs. Database vendors such
as Oracle, INGRES, INFORMIX and Microsoft have already produced tools, engines and
connectivity products to fulfil this task [HOL93, PUR93, SME93, COL94, RIC94, BRO95]. These
products allow limited data transfer and query facilities among databases to support interoperability
among heterogeneous DBMSs. These features, although they permit easy, transparent heterogeneous
database access, still do not provide a solution to legacy IS where a primary concern is to evolve and
migrate the system to a target environment so that obsolete support systems can be retired.
Furthermore, the ODBC facilities are developed for current DBMSs and hence may not be capable
of accessing older generation DBMSs, and, if they are, are unlikely to be able to enhance them to
take advantage of the newer technologies. Hence there is a need to create tools that will allow ODBC
equivalent functionality for older generation DBMSs. Our work provides such functionality for all
the DBMSs we have chosen for this research. It also provides the ability to enhance and evolve
legacy databases.



Page 7
In order to evolve an information system, one needs to understand the existing system’s
structure and code. Most legacy information systems are not properly documented and hence
understanding such systems is a complex process. This means that changing any legacy code
involves a high risk as it could result in unexpected system behaviour. Therefore one needs to
analyse and understand existing system code before performing any changes to the system.

        Database system design and implementation tools have appeared recently which have the
aim of helping new information system development. Reverse and re-engineering tools are also
appearing in an attempt to address issues concerned with existing databases [SHA93, SCH95]. Some
of these tools allow the examination of databases built using certain types of DBMSs, however, the
enhancements they allow are done within the limitation of that system. Due to continuous ongoing
technology changes, most current commercial DBMSs do not support the most recent software
modelling techniques and features (e.g. Oracle version 7 does not support Object-Oriented features).
Hence a system built using current software tools is guaranteed to become a legacy system in the
near future (i.e. when new products with newer techniques and features begin to appear in the
commercial market place).

        Reverse engineering tools [SHA93] are capable of recreating the conceptual model of an
existing database and hence they are an ideal starting point when trying to gain a comprehensive
understanding of the information held in the database and its current state, as they create a visual
picture of that state. However, in legacy systems the schemas are basic, since most of the
information used to compose a conceptual model is not available in these databases. Information
such as constraints that show links between entities is usually embedded in the legacy application
code and users find it difficult to reverse engineer these legacy ISs. Our work addresses these issues
while assisting in overcoming this barrier within the knowledge representation limitations of existing
DBMSs.

         1.1.4 Primary and Secondary Motivations

        The research reported in this thesis therefore was primarily promoted by the need to provide,
for a logically heterogeneous distributed database environment, a design tool that allows users not
only to understand their existing systems but also to enhance and visualise an existing database’s
structure using new techniques that are either not yet present in existing systems or not supported by
the existing software environment. It was also motivated by:

a) Its direct applicability in the business world, as the new technique can be applied to incrementally
    enhance existing systems and prepare them to be easily migrated to new target environments,
    hence avoiding continued use of legacy information systems in the organisation.

        Although previous work and some design tools address the issue of legacy information
system analysis, evolution and migration, these are mainly concerned with 3GL languages such as
COBOL and C [COMS94, BRO95, IEEE95]. Little work has been reported which addresses the new
issues that arise due to the Object-Oriented (O-O) data model or the extended relational data model
[CAT94]. There are no reports yet of enhancing legacy systems so that they can migrate to O-O or
extended relational environments in a graceful migration from a relational system. There has been



Page 8
some work in the related areas of identifying extended entity relationship structures in relational
schemas, and some attempts at reverse-engineering relational databases [MAR90, CHI94, PRE94].

b) The lack of previous research in visualising pre-existing heterogeneous database schemas and
   evolving them by enhancing them with modern concepts supported in more recent releases of
   software.

         Most design tools [COMP90, SHA93] which have been developed to assist in Entity-
Relationship (E-R) modelling [ELM94] and Object Modelling Technique (OMT) modelling
[RUM91] are used in a top-down database design approach (i.e. forward engineering) to assist in
developing new systems. However, relatively few tools attempt to support a bottom-up approach
(i.e. reverse engineering) to allow visualisation of pre-existing database schemas as E-R or OMT
diagrams. Among these tools only a very few allow enhancement of the pre-existing database
schemas, i.e. they apply forward engineering to enhance a reverse-engineered schema. Even those
which do permit this action to some extent, always operate on a single database management system
and work mostly with schemas originally designed using such systems (e.g. CASE tools). The tools
that permit only the bottom-up approach are referred to as reverse-engineering tools and those which
support both (i.e. bottom-up and top-down) are called re-engineering tools [SHA93]. This thesis is
primarily concerned with creating re-engineering tools that assist legacy database migration.

        The commercially available re-engineering tools are customised for particular DBMSs and
are not easily usable in a heterogeneous environment. This barrier against widespread usability of re-
engineering tools means that a substantial adaptation and reprogramming effort (costing time and
money) is involved every time a new DBMS appears in a heterogeneous environment. An obvious
example that reflects this limitation arises in a heterogeneous distributed database environment
where there may be a need to visualise each participant database’s schema. In such an environment
if the heterogeneity occurs at the database management level (where each node uses a different
DBMS, for example, one node uses INGRES [DAT87] and another uses Oracle [ROL92]), then we
have to use two different re-engineering tools to display these schemas. This situation is exacerbated
for each additional DBMS that is incorporated into the given heterogeneous context. Also, legacy
databases are migrated to different DBMS environments as newer versions and better database
products have appeared since the original release of their DBMS. This means that a re-engineering
tool that assists legacy database migration must work in an heterogeneous environment so that its
use will not be restricted to particular types of ISs.

        Existing re-engineering tools provide a single target graphical data model (usually the E-R
model or a variant of it), which may differ in presentation style between tools and therefore inhibits
the uniformity of visualisation that is highly desirable in an interoperable heterogeneous distributed
database environment. This limitation means that users may need to use different tools to provide the
required uniformity of display in such an environment. The ability to visualise the conceptual model
of an information system using a user-preferred graphical data model is important as it ensures that
no inaccurate enhancements are made to the system due to any misinterpretation of graphical
notations used.

c) The need to apply rules and constraints to pre-existing databases to identify and clean inconsistent
    legacy data, as preparation for migration or as an enhancement of the database’s quality.



Page 9
The inability to define and apply rules and constraints on early database systems due to
system limitations resulted in them not using constraints to increase the accuracy and consistency of
the data held by these systems. This limitation is now a barrier to information system migration as a
new target DBMS is unable to enforce constraints on a migrated database until all violations are
investigated and resolved either by omitting the violating data or by cleaning it. This investigation
may also show that a constraint has to be adjusted as the violating data is needed by the organisation.
The enhancement of such a system by rules and constraints provides knowledge that is usable to
determine possible data violations. The process of detecting constraint violations may be done by
applying queries that are generated from these enhanced constraints. Similar methods have been
used to implement integrity constraints [STO75], optimise queries [OZS91] and obtain intensional
answers [FON92, MOT89]. This is essential as constraints may have been implemented at the
application coding level and that can lead to their inconsistent application.

d) An awareness of the potential contribution that knowledge-based systems and meta-programming
   technologies, in association with extended relational database technology, have to offer in coping
   with semantic heterogeneity.

        The successful production of a conceptual model is highly dependent on the semantic
information available, and on the ability to reason about these semantics. A knowledge-based system
can be used to assist in this task, as the process to generalise effective exploitation of semantic
information for pre-existing heterogeneous databases needs to undergo three sub-processes, namely:
knowledge acquisition, representation and manipulation. The knowledge acquisition process extracts
the existing knowledge from a database’s data dictionaries. This knowledge may include subsequent
enhancements made by the user, as the use of a database to store such knowledge will provide easy
access to this information along with its original knowledge. The knowledge representation process
represents existing and enhanced knowledge. The knowledge manipulation process is concerned
with deriving new knowledge and ensuring consistency of existing knowledge. These stages are
addressable using specific processes. For instance, the reverse-engineering process used to produce a
conceptual model can be used to perform the knowledge acquisition task. Then the derived and
enhanced knowledge can be stored in the same database by adopting a process that will allow us to
distinguish this knowledge from its original meta-data. Finally, knowledge manipulation can be done
with the assistance of a Prolog based system [GRA88], while data and knowledge consistency can be
verified using the query language of the database.

1.2 Goals of the Research

         The broad goals of the research reported in this thesis are highlighted here, with detailed aims
and objectives presented in section 2.4. These goals are to investigate interoperability problems,
schema enhancement and migration in a heterogeneous distributed database environment, with
particular emphasis on extended relational systems. This should provide a basis for the design and
implementation of a prototype software system that brings together new techniques from the areas of
knowledge-based systems, meta-programming and O-O conceptual data modelling with the aim of
facilitating schema enhancement, by means of generalising the efficient representation of constraints
using the current standards. Such a system is a tool that would be a valuable asset in a logically
heterogeneous distributed extended relational database environment as it would make it possible for



Page 10
global users to incrementally enhance legacy information systems. This offers the potential for users
in this type of environment to work in terms of such a global schema, through which they can
prepare their legacy systems to easily migrate to target environments and so gain the benefits of
modern computer technology.

1.3 Original Achievements of the Research

        The importance of this research lies in establishing the feasibility of enhancing, cleaning and
migrating heterogeneous legacy databases using meta-programming technology, knowledge-based
system technology, database system technology and O-O conceptual data modelling concepts, to
create a comprehensive set of techniques and methods that form an efficient and useful generalised
database re-engineering tool for heterogeneous sets of databases. The benefits such a tool can bring
are also demonstrated and assessed.

       A prototype Conceptual Constraint Visualisation and Enhancement System (CCVES)
[WIK95a] has been developed as a result of the research. To be more specific, our work has made
four important contributions to progress in the database topic area of Computer Science:

1) CCVES is the first system to bring the benefits of meta-programming technology to the very
   important application area of enhancing and evolving heterogeneous distributed legacy databases
   to assist the legacy database migration process [GRA94, WIK95c].

2) CCVES is also the first system to enhance existing databases with constraints to improve their
   visual presentation and hence provide a better understanding of existing applications [WIK95b].
   This process is applicable to any relational database application, including those which are unable
   to naturally support the specification and enforcement of constraints. More importantly, this
   process does not affect the performance of an existing application.

3) As will be seen later, we have chosen the current SQL-3 standards [ISO94] as the basis for
   knowledge representation in our research. This project provides an extension to the
   representation of the relational data model to cope with automated reuse of knowledge in the re-
   engineering process. In order to cope with technological changes that result from the emergence
   of new systems or new versions of existing DBMSs, we also propose a series of extended
   relational system tables conforming to SQL-3 standards to enhance existing relational DBMSs
   [WIK95b].

4) The generation of queries using the constraint specifications of the enhanced legacy systems is an
   easy and convenient method of detecting any constraint violating data in existing systems. The
   application of this technique in the context of a heterogeneous environment for legacy
   information systems is a significant step towards detecting and cleaning inconsistent data in
   legacy systems prior to their migration. This is essential if a graceful migration is to be effected
   [WIK95c].

1.4 Organisation of the Thesis




Page 11
The thesis is organised into 8 chapters. This first chapter has given an introduction to the
research done, covering background and motivations, and outlining original achievements. The rest
of the thesis is organised as follows:

Chapter 2 is devoted to presenting an overview of the research together with detailed aims and
objectives for the work undertaken. It begins by identifying the scope of the work in terms of
research constraints and development technologies. This is followed by an overview of the research
undertaken, where a step by step discussion of the approach adopted and its role in a heterogeneous
distributed database environment is given. Finally, detailed aims and objectives are drawn together
to conclude the chapter.

Chapter 3 identifies the relational data model as the current dominant database model and presents
its development along with its terminology, features and query languages. This is followed by a
discussion of conceptual data models with special emphasis on the data models and symbols used in
our project. Finally, we pay attention to key concepts related to our project, mainly the notion of
semantic integrity constraints and extensions to the relational model. Here, we present important
integrity constraint extensions to the relational model and its support using different SQL standards.

Chapter 4 addresses the issue of legacy information system migration. The discussion commences
with an introduction to legacy and our target information systems. This is followed by migration
strategies and methods for such ISs. Finally, we conclude by referring to current techniques and
identify the trends and existing tools applicable to database migration.

Chapter 5 addresses the re-engineering process for relational databases. Techniques currently used
for this purpose are identified first. Our approach, which uses constraints to re-engineer a relational
legacy database is described next. This is followed by a process for detecting possible keys and
structures of legacy databases. Our schema enhancement and knowledge representation techniques
are then introduced. Finally, we present a process to detect and resolve conflicts that may occur due
to schema enhancement.

Chapter 6 introduces some example test databases which were chosen to represent a legacy
heterogeneous distributed database environment and its access processes. Initially, we present the
design of our test databases, the selection of our test DBMSs and the prototype system environment.
This is followed by the application of our re-engineering approach to our test databases. Finally, the
organisation of relational meta-data and its access is described using our test DBMSs.

Chapter 7 presents the internal and external architecture and operation of our conceptual constraint
visualisation and enforcement system (CCVES) in terms of the design, structure and operation of its
interfaces, and its intermediate modelling system. The internal schema mappings, e.g. mapping from
INGRES QUEL to SQL and vice-versa, and internal database migration processes are presented in
detail here.

Chapter 8 provides an evaluation of CCVES, identifying its limitations and improvements that could
be made to the system. A discussion of potential applications is presented. Finally we conclude the




Page 12
chapter by drawing conclusions about the research project as a whole.




Page 13
CHAPTER 2

                   Research Scope, Approach, Aims and Objectives

This chapter describes, in some detail, the aims and objectives of the research that has been
undertaken. Firstly, the boundaries of the research are defined in section 2.1, which considers the
scope of the project. Secondly, an overview of the research approach we have adopted in dealing
with heterogeneous distributed legacy database evolution and migration is given in section 2.2.
Next, in section 2.3, the discussion is extended to the wider aspects of applying our approach in a
heterogeneous distributed database environment using the existing meta-programming technology
developed at Cardiff in other projects. Finally, the research aims and objectives are detailed in
section 2.4, illustrating what we intend to achieve, and the benefits expected from achieving the
stated aims.

2.1 Scope of the Project

       We identify the scope of the work in terms of research constraints and the limitations of
current development technologies. An overview of the problem is presented along with the
drawbacks and limitations of database software development technology in addressing the
problem. This will assist in identifying our interests and focussing the issues to be addressed.

       2.1.1 Overview of the Problem

        In most database designs, a conceptual design and modelling technique is used in
developing the specifications at the user requirements and analysis stage of the design. This stage
usually describes the real world in terms of object/entity types that are related to one another in
various ways [BAT92, ELM94]. Such a technique is also used in reverse-engineering to portray
the current information content of existing databases, as the original designs are usually either
lost, or inappropriate because the database has evolved from its original design. The resulting
pictorial representation of a database can be used for database maintenance, for database re-
design, for database enhancement, for database integration or for database migration, as it gives its
users a sound understanding of an existing database’s architecture and contents.

        Only a few current database tools [COMP90, BAT92, SHA93, SCH95] allow the capture
and presentation of database definitions from an existing database, and the analysis and display of
this information at a higher level of abstraction. Furthermore, these tools are either restricted to
accessing a specific database management system’s databases or permit modelling with only a
single given display formalism, usually a variant of the EER [COMP90]. Consequently there is a
need to cater for multiple database platforms with different user needs to allow access to a set of
databases comprising a heterogeneous database, by providing a facility to visualise databases
using a preferred conceptual modelling technique which is familiar to the different user
communities of the heterogeneous system.

        The fundamental modelling constructs of current reverse and re-engineering tools are
entities, relationships and associated attributes. These constructs are useful for database design at
a high level of abstraction. However, the semantic information now available in the form of rules
and constraints in modern DBMSs provides their users with a better understanding of the
underlying database as its data conforms to these constraints. This may not necessarily be true for
legacy systems, which may have constraints defined that were not enforced. The ability to
visualise rules and constraints as part of the conceptual model increases user understanding of a
database. Users could also exploit this information to formulate queries that more effectively
utilise the information held in a database. Having these features in mind, we concentrated on
providing a tool that permits specification and visualisation of constraints as part of the graphical
display of the conceptual model of a database. With modern technology increasing the number of
legacy systems and with increasing awareness of the need to use legacy data [BRO95, IEEE95],
the availability of such a visualisation tool will be more important in future as it will let users see
the full definition of the contents of their databases in a familiar format.

         Three types of abstraction mechanism, namely: classification, aggregation and
generalisation, are used in conceptual design [ELM94]. However, most existing DBMSs do not
maintain sufficient meta-data information to assist in identifying all these abstraction mechanisms
within their data models. This means that reverse and re-engineering tools are semi-automated, in
that they extract information, but users have to guide them and decide what information to look
for [WAT94]. This requires interactions with the database designer in order to obtain missing
information and to resolve possible conflicts. Such additional information is supplied by the tool
users when performing the reverse-engineering process. As this additional information is not
retained in the database, it must be re-entered every time a reverse engineering process is
undertaken if the full representation is to be achieved. To overcome this problem, knowledge
bases are being used to retain this information when it is supplied. However, this approach
restricts the use of this knowledge by other tools which may exist in the database’s environment.
The ability to hold this knowledge in the database itself would enhance an existing database with
information that can be widely used. This would be particularly useful in the context of legacy
databases as it would enrich their semantics. One of the issues considered in this thesis is how this
can be achieved.

        Most existing relational database applications record only entities and their properties (i.e.
attribute names and data types) as system meta-data. This is because these systems conformed to
early database standards (e.g. the SQL/86 standard [ANSI86], supported by INGRES version 5
and Oracle version 5). However, more recent relational systems record additional information
such as constraint and rule definitions, as they conform to the SQL/92 standards [ANSI92] (e.g.
Oracle version 7). This additional information includes, for example, primary and foreign key
specifications, and can be used to identify classification and aggregation abstractions used in a
conceptual model [CHI94, PRE94, WIK95b]. However, the SQL/92 standard does not capture the
full range of modelling abstractions, e.g. inheritance representing generalisation hierarchies. This
means that early relational database applications are now legacy systems as they fail to naturally
represent additional information such as constraint and rule definitions. Such legacy database
systems are being migrated to modern database systems not only to gain the benefits of the current
technology but also to be compatible with new applications built with the modern technology. The
SQL standards are currently subject to review to permit the representation of extra knowledge
(e.g. object-oriented features), and we have anticipated some of these proposals in our work - i.e.
SQL-33 [ISO94] will be adopted by commercial systems and thus the current modern DBMSs


Page 15
will become legacy databases in the near future or already may be considered to be legacy
databases in that their data model type will have to be mapped onto the newer version. Having
experienced the development process of recent DBMSs it is inevitable that most current databases
will have to be migrated, either to a newer version of the existing DBMS or to a completely
different newer technology DBMS for a variety of reasons. Thus the migration of legacy
databases is perceived to be a continuing requirement, in any organisation, as technology
advances continue to be made.

        Most migrations currently being undertaken are based on code-to-code level translations of
the applications and associated databases to enable the older system to be functional in the target
environment. Minimal structural changes are made to the original system and database, thus the
design structures of these systems are still old-fashioned, although they are running in a modern
computing environment. This means that such systems are inflexible and cannot be easily
enhanced with new functions or integrated with other applications in their new environment. We
have also observed that more recent database systems have often failed to benefit from modern
database technology due to inherent design faults that have resulted in the use of unnormalised
structures, which cause omission of the features enforcing integrity constraints even when this is
possible. The ability to create and use databases without the benefit of a database design course is
one reason for such design faults. Hence there is a need to assist existing systems to be evolved,
not only to perform new tasks but also to improve their structure so that these systems can
maximise the gains they receive from their current technology environment and any environment
they migrate to in the future.


       2.1.2 Narrowing Down the Problem

        Technological advances in both hardware and software have improved the performance
and maintenance functionality of all information systems (ISs), and as a result, older ISs suffer
from comparatively poor performance and inappropriate functionality when compared with more
modern systems. Most of these legacy systems are written in a 3GL such as COBOL, have been
around for many years, and run on old-fashioned mainframes. Problems associated with legacy
systems are being identified and various solutions are being developed [BRO93, SHE94, BRO95].
These systems basically have three functional components, namely: interface, application and a
database service, which are sometimes inter-related to each other, depending on how they were
used during the design and implementation stages of the IS development. This means that the
complexity of a legacy IS depends on what occurred during the design and implementation of the
system. These systems may range from a simple single user database application using separate
interfaces and applications, to a complex multi-purpose unstructured application. Due to the
complex nature of the problem area we do not address this issue as a whole, but focus only on
problems associated with one sub-component of such legacy information systems, namely the
database service. This in itself is a wide field, and we have further restricted ourselves to legacy
ISs using a specific DBMS for their database service. We considered data models ranging from
original flat file and relational systems, to modern relational DBMSs and object-oriented DBMSs.
From these data models we have chosen the traditional relational model for the following reasons.

      • The relational model is currently the most widely used database model.


Page 16
• During the last two decades the relational model has been the most popular model;
     therefore it has been used to develop many database applications and most of these are now
     legacy systems.
     • There have been many extensions and variations of the relational model, which has
     resulted in many heterogeneous relational database systems being used in organisations.
     • The relational model can be enhanced to represent additional semantics currently
     supported only by modern DBMSs (e.g. extended relational systems [ZDO90, CAT94]).

        As most business requirements change with time, the need to enhance and migrate legacy
information systems exists for almost every organisation. We address problems faced by these
users while seeking for a solution that prevents new systems becoming legacy systems in the near
future. The selection of the relational model as our database service to demonstrate how one could
achieve these needs means that we shall be addressing only relational legacy database systems and
not looking at any other type of legacy information systems.

         This decision means we are not considering the many common legacy IS migration
problems identified by Brodie [BRO95] (e.g. migration of legacy database services such as flat-
file structures or hierarchical databases into modern extended relational databases; migration of
legacy applications with millions of lines of code written in some COBOL-like language into a
modern 4GL/GUI environment). However, as shown later, addressing the problems associated
with relational legacy databases has enabled us to identify and solve problems associated with
more recent DBMSs, and it also assists in identifying precautions which if implemented by
designers of new systems will minimise the chance of similar problems being faced by these
systems as IS developments occur in the future.

2.2 Overview of the Research Approach

       Having presented an overview and narrowing down of our problem, we identify the
following as the main functionalities that should be provided to fulfil our research goal:

     • Reverse-engineering of a relational legacy database to fully portray its current information
     content.
     • Enhancing a legacy database with new knowledge to identify modelling concepts that
     should be available to the database concerned or to applications using that database.
     • Determining the extent to which the legacy database conforms to its existing and enhanced
     descriptions.
     • Ensuring that the migrated IS will not become a legacy IS in the future.

        We need to consider the heterogeneity issue in order to be able to reverse-engineer any
given relational legacy database. Three levels of heterogeneity are present for a particular data
model, namely: at a physical, logical and data management level. The physical level of
heterogeneity usually arises due to different data model implementation techniques, use of
different computer platforms and use of different DBMSs. The physical / logical data
independence of DBMSs hides implementation differences from users, hence we need only
address how to access databases that are built using different DBMSs, running on different
computer platforms.


Page 17
Differences in DBMS characteristics lead to heterogeneity at the logical level. Here, the
different DBMSs conform to a particular standard (e.g. SQL/86 or SQL/92), which supports a
particular database query language (e.g. SQL or QUEL) and different relational data model
features (e.g. handling of integrity constraints and availability of object-oriented features). To
tackle heterogeneity at the logical level, we need to be aware of different standards, and to model
ISs supporting different features and query languages.

        Heterogeneity at the data management level arises due to the physical limitations of a
DBMS, differences in the logical design and inconsistencies that occurred when populating the
database. Logical differences in different database schemas have to be resolved only if we are
going to integrate them. The schema integration process is concerned with merging different
related database applications. Such a facility can assist the migration of heterogeneous database
systems. However any attempt to integrate legacy database schemas prior to the migration process
complicates the entire process as it is similar to attempting to provide new functionalities within
the system which is being migrated. Such attempts increase the chance of failure of the overall
migration process. Hence we consider any integration or enhancements in the form of new
functionalities only after successfully migrating the original legacy IS. However, the physical
limitations of a DBMS and data inconsistencies in the database need to be addressed beforehand
to ensure a successful migration.

       Our work addresses the heterogeneity issues associated with database migration by
adopting an approach that allows its users to incrementally increase the number of DBMSs it
could handle without having to reprogram its main application modules. Here, the user needs to
supply specific knowledge about DBMS schema and query language constructs. This is held
together with the knowledge of the DBMSs already supported and has no effect on the
application’s main processing modules.

       2.2.1 Meta-Programming

        Meta-programming technology allows the meta-data (schema information) of a database to
be held and processed independently of its source specification language. This allows us to work
on a database language independent environment and hence overcome many logical heterogeneity
issues. Prolog based meta-programming technology has been used in previous research at Cardiff
in the area of logical heterogeneity [FID92, QUT94]. Using this technology the meta-translation
of database query languages [HOW87] and database schemas [RAM91] has been performed. This
work has shown how the heterogeneity issues of different DBMSs can be addressed without
having to reprogram the same functionality for each and every DBMS. We use meta-programming
technology for our legacy database migration approach as we need to be able to start with a legacy
source database and end with a modern target database where the respective database schema and
query languages may be different from each other. In this approach the source database schema or
query language is mapped on input into an internal canonical form. All the required processing is
then done using the information held in this internal form. This information is finally mapped to
the target schema or query language to produce the desired output. The advantage of this approach
is that processing is not affected by heterogeneity as it is always performed on data held in the
canonical form. This canonical form is an enriched collection of semantic data modelling features.


Page 18
2.2.2 Application

        We view our migration approach as consisting of a series of stages, with the final stage
being the actual migration and earlier stages being preparatory. At stage 1, the data definition of
the selected database is reverse-engineered to produce a graphical display (cf. paths A-1 and A-2
of figure 2.1). However, in legacy systems much of the information needed to present the database
schema in this way is not available as part of the database meta-data and hence these links which
are present in the database cannot be shown in this conceptual model. In modern systems such
links can be identified using constraint specifications. Thus, if the database does not have any
explicit constraints, or it does but these are incomplete, new knowledge about the database needs
to be entered at stage 2 (cf. path B-1 of figure 2.1), which will then be reflected in the enhanced
schema appearing in the graphical display (cf. path B-2 of figure 2.1). This enhancement will
identify new links that should be present for the database concerned. These new database
constraints can next be applied experimentally to the legacy database to determine the extent to
which it conforms to them. This process is done at stage 3 (cf. paths C-1 and C-2 of figure 2.1).
The user can then decide whether these constraints should be enforced to improve the quality of
the legacy database prior to its migration. At this point the three preparatory stages in the
application of our approach are complete. The actual migration process is then performed. All
stages are further described below to enable us to identify the main processing components of our
proposed system as well as to explain how we deal with different levels of heterogeneity.

       Stage 1: Reverse Engineering

        In stage 1, the data definition of the selected database is reverse-engineered to produce a
graphical display of the database. To perform this task, the database’s meta-data must be extracted
(cf. path A-1 of figure 2.1). This is achieved by connecting directly to the heterogeneous database.
The accessed meta-data needs to be represented using our internal form. This is achieved through
a schema mapping process as used in the SMTS (Schema Meta-Translation System) of Ramfos
[RAM91]. The meta-data in our internal formalism then needs to be processed to derive the
graphical constructs present for the database concerned (cf. path A-2 of figure 2.1). These
constructs are in the form of entity types and the relationships and their derivation process is the
main processing component in stage 1. The identified graphical constructs are mapped to a display
description language to produce a graphical display of the database.




Page 19
Schema
                        Enhanced                             Visualisation                          Enforced
                        Constraints                         (EER or OMT)                           Constraints
                                                            with Constraints

                                        B-1                                                  C-1
                                                        B-2            A-2


                                                           Internal Processing


                                                    B-3                      C-2

                                                                 A-1


                                                      Heterogeneous Databases




                                 Stage 1 (Reverse Engineering)                         Stage 2 (Knowledge Augmentation)
                                                          Stage 3 (Constraint Enforcement)


                   Figure 2.1: Information flow in the 3 stages of our approach prior to migration


       a) Database connectivity for heterogeneous database access

        Unlike the previous Cardiff meta-translation systems [HOW87, RAM91, QUT92], which
addressed heterogeneity at the logical and data management levels, our system looks at the
physical level as well. While these previous systems processed schemas in textual form and did
not access actual databases to extract their DDL specification, our system addresses physical
heterogeneity by accessing databases running on different hardware / software platforms (e.g.
computer systems, operating systems, DBMSs and network protocols). Our aim is to directly
access the meta-data of a given database application by specifying its name, the name and version
of the host DBMS, and the address of the host machine4. If this database access process can
produce a description of the database in DDL formalism, then this textual file is used as the
starting point for the meta-translation process as in previous Cardiff systems [RAM91, QUT92].
We found that it is not essential to produce such a textual file, as the required intermediate
representation can be directly produced by the database access process. This means that we could
also by-pass the meta-translation process that performs the analysis of the DDL text to translate it
into the intermediate representation5. However the DDL formalism of the schema can be used for
optional textual viewing and could also serve as the starting point for other tools6 developed at
Cardiff for meta-programming database applications.

       The initial functionality of the Stage 1 database connectivity process is to access a
heterogeneous database and supply the accessed meta-data as input to our schema meta-translator

   4
     We assume that access privileges for this host machine and DBMS have been granted.
   5
     A list of tokens ready for syntactic analysis in the parsing phase is produced and processed
based on the BNF syntax specification of the DDL [QUT92].
   6
     e.g. The Schema Meta-Integration System (SMIS) of Qutaishat [QUT92].


Page 20
(SMTS). This module needs to deal with heterogeneity at the physical and data management
levels. We achieve this by using DML commands of the specific DBMS to extract the required
meta-data held in database data dictionaries treated like user defined tables.

       Relatively recently, the functionalities of a heterogeneous database access process have
been provided by means of drivers such as ODBC [RIC94]. Use of such drivers will allow access
to any database supported by them and hence obviate the need to develop specialised tools for
each database type as happened in our case. These driver products were not available when we
undertook this stage of our work.

       b) Schema meta-translation

        The schema meta-translation process [RAM91] accepts input of any database schema
irrespective of its DDL and features. The information captured during this process is represented
internally to enable it to be mapped from one database schema to another or to further process and
supply information to other modules such as the schema meta-visualisation system (SMVS)
[QUT93] and the query meta-translation system (QMTS) [HOW87]. Thus, the use of an internal
canonical form for meta representation has successfully accommodated heterogeneity at the data
management and logical levels.

       c) Schema meta-visualisation

        Schema visualisation using graphical notation and diagrams has proved to be an important
step in a number of applications, e.g. during the initial stages of the database design process; for
database maintenance; for database re-design; for database enhancement; for database integration;
or for database migration; as it gives users a sound understanding of an existing database’s
structure in an easily assimilated format [BAT92, ELM94]. Database users need to see a visual
picture of their database structure instead of textual descriptions of the defining schema as it is
easier for them to comprehend a picture. This has led to the production of graphical
representations of schema information, effected by a reverse engineering process. Graphical data
models of schemas employ a set of data modelling concepts and a language-independent graphical
notation (e.g. the Entity Relationship (E-R) model [CHE76], Extended/Enhanced Entity
Relationship (EER) model [ELM94] or the Object Modelling Technique (OMT) [RUM91]). In a
heterogeneous environment different users may prefer different graphical models, and an
understanding of the database structure and architecture beyond that given by the traditional
entities and their properties. Therefore, there is a need to produce graphical models of a database’s
schema using different graphical notations such as either E-R/EER or OMT, and to accompany
them with additional information such as a display of the integrity constraints in force in the
database [WIK95b]. The display of integrity constraints allows users to look at intra- and inter-
object constraints and gain a better understanding of domain restrictions applicable to particular
entities. Current reverse engineering tools do not support this type of display.

        The generated graphical constructs are held internally in a similar form to the meta-data of
the database schema. Hence using a schema meta visualisation process (SMVS) it is possible to
map the internally held graphical constructs into appropriate graphical symbols and coordinates
for the graphical display of the schema. This approach has a similarity to the SMTS, the main


Page 21
difference being that the output is graphical rather than textual.

       Stage 2: Knowledge Augmentation

        In a heterogeneous distributed database environment, evolution is expected, especially in
legacy databases. This evolution can affect the schema description and in particular schema
constraints that are not reflected in the stage 1 (path A-2) graphical display as they may be
implicit in applications. Thus our system is designed to accept new constraint specifications (cf.
path B-1 of figure 2.1) and add them to the graphical display (cf. path B-2 of figure 2.1) so that
these hidden constraints become explicit.

        The new knowledge accepted at this point is used to enhance the schema and is retained in
the database using a database augmentation process (cf. path B-3 of figure 2.1). The new
information is stored in a form that conforms with the enhanced target DBMS’s methods of
storing such information. This assists the subsequent migration stage.

       a) Schema enhancement

        Our system needs to permit a database schema to be enhanced by specifying new
constraints applicable to the database. This process is performed via the graphical display. These
constraints, which are in the form of integrity constraints (e.g. primary key, foreign key, check
constraints) and structural components (e.g. inheritance hierarchies, entity modifications) are
specified using a GUI. When they are entered they will appear in the graphical display.

       b) Database augmentation

        The input data to enhance a schema provides new knowledge about a database. It is
essential to retain this knowledge within the database itself, if it is to be readily available for any
further processing. Typically, this information is retained in the knowledge base of the tool used
to capture the input data, so that it can be reused by the same tool. This approach restricts the use
of this knowledge by other tools and hence it must be re-entered every time the re-engineering
process is applied to that database. This makes it harder for the user to gain a consistent
understanding of an application, as different constraints may be specified during two separate re-
engineering processes. To overcome this problem, we augment the database itself using the
techniques proposed in SQL-3 [ISO94], wherever possible. When it is not possible to use SQL-3
structures we store the information in our own augmented table format which is a natural
extension of the SQL-3 approach.

        When a database is augmented using this method, the new knowledge is available in the
database itself. Hence, any further re-engineering processes need not make requests for the same
additional knowledge. The augmented tables are created and maintained in a similar way to user-
defined tables, but have a special identification to distinguish them. Their structure is in line with
the international standards and the newer versions of commercial DBMSs, so that the enhanced
database can be easily migrated to either a newer version of the host DBMS or to a different
DBMS supporting the latest SQL standards. Migration should then mean that the newer system
can enforce the constraints. Our approach should also mean that it is easy to map our tables for


Page 22
holding this information into the representation used by the target DBMS even if it is different, as
we are mapping from a well defined structure.

       Legacy databases that do not support explicit constraints can be enhanced by using the
above knowledge augmentation method. This requirement is less likely to occur for databases
managed by more recent DBMSs as they already hold some constraint specification information
in their system tables. The direction taken by Oracle version 6 was a step towards our
augmentation approach, as it allowed the database administrator to specify integrity constraints
such as primary and foreign keys, but did not yet enforce them [ROL92]. The next release of
Oracle, i.e. version 7, implemented this constraint enforcement process.


       Stage 3: Constraint Enforcement

        The enhanced schema can be held in the database, but the DBMS can only enforce these
constraints if it has the capability to do so. This will not normally be the case in legacy systems. In
this situation, the new constraints may be enforced via a newer version of the DBMS or by
migrating the database to another DBMS supporting constraint enforcement. However, the data
being held in the database may not conform to the new constraints, and hence existing data may
be rejected by the target DBMS in the migration, thus losing data and / or delaying the migration
process. To address this problem and to assist the migration process, we provide an optional
constraint enforcement process module which can be applied to a database before it is migrated.
The objective of this process is to give users the facility to ensure that the database conforms to all
the enhanced constraints before migration occurs. This process is optional so that the user can
decide whether these constraints should be enforced to improve the quality of the legacy data prior
to its migration, whether it is best left as it stands, or whether the new constraints are too severe.

       The constraint definitions in the augmented schema are employed to perform this task. As
all constraints held have already been internally represented in the form of logical expressions,
these can be used to produce data manipulation statements suitable for the host DBMS. Once
these statements are produced, they are executed against the current database to identify the
existence of data violating a constraint.

       Stage 4: Migration Process

        The migration process itself is incrementally performed by initially creating the target
database and then copying the legacy data over to it. The schema meta-translation (SMTS)
technique of Ramfos [RAM91] is used to produce the target database schema. The legacy data can
be copied using the import / export tools of source and target DBMS or DML statements of the
respective DBMSs. During this process, the legacy applications must continue to function until
they too are migrated. To achieve this an interface can be used to capture and process all database
queries of the legacy applications during migration. This interface can decide how to process
database queries against the current state of the migration and re-direct those newly related to the
target database. The query meta-translation (QMTS) technique of Howells [HOW87] can be used
to convert these queries to the target DML. This approach will facilitate transparent migration for
legacy databases. Our work does not involve the development of an interface to capture and


Page 23
process all database queries, as interaction with the query interface of the legacy IS is embedded
in the legacy application code. However, we demonstrate how to create and populate a legacy
database schema in the desired target environment while showing the role of SMTS and QMTS in
such a process.

2.3 The Role of CCVES in Context of Heterogeneous Distributed Databases

       Our approach described in section 2.2 is based on preparing a legacy database schema for
graceful migration. This involves visualisation of database schemas with constraints and
enhancing them with constraints to capture more knowledge. Hence we call our system the
Conceptualised Constraint Visualisation and Enhancement System (CCVES).

        CCVES has been developed to fit in with the previously developed schema (SMTS)
[RAM91] and query (QMTS) [HOW87] meta-translation systems, and the schema meta-
visualisation system (SMVS) [QUT93]. This allows us to consider the complementary roles of
CCVES, SMTS, QMTS and SMVS during Heterogeneous Distributed Database access in a
uniform way [FID92, QUT94]. The combined set of tools achieves semantic coordination and
promotes interoperability in a heterogeneous environment at logical, physical and data
management levels.

        Figure 2.2 illustrates the architecture of CCVES in the context of heterogeneous
distributed databases. It outlines in general terms the process of accessing a remote (legacy)
database to perform various database tasks, such as querying, visualisation, enhancement,
migration and integration.

        There are seven sub-processes: the schema mapping process [RAM91], query mapping
process [HOW87], schema integration process [QUT92], schema visualisation process [QUT93],
database connectivity process, database enhancement process and database migration process. The
first two processes together have been called the Integrated Translation Support Environment
[FID92], and the first four processes together have been called the Meta-Integration/Translation
Support Environment [QUT92]. The last three processes were introduced as CCVES to perform
database enhancement and migration in such an environment.

        The schema mapping process, referred to as SMTS, translates the definition of a source
schema to a target schema definition (e.g. an INGRES schema to a POSTGRES schema). The
query mapping process, referred to as QMTS, translates a source query to a target query (e.g. an
SQL query to a QUEL query). The meta-integration process, referred to as SMIS, tackles
heterogeneity at the logical level in a distributed environment containing multiple database
schemas (e.g. Ontos and Exodus local schemas with a POSTGRES global schema) - it integrates
the local schemas to create the global schema. The meta-visualisation process, referred to as
SMVS, generates a graphical representation of a schema. The remaining three processes, namely:
database connectivity, enhancement and migration with their associated processes, namely:
SMVS, SMTS and QMTS, are the subject of the present thesis, as they together form CCVES
(centre section of figure 2.2).

       The database connectivity process (DBC), queries meta-data from a remote database (route


Page 24
A-1 in figure 2.2) to supply meta-knowledge (route A-2 in figure 2.2) to the schema mapping
process referred to as SMTS. SMTS translates this meta-knowledge to an internal representation
which is based on SQL schema constructs. These SQL constructs are supplied to SMVS for
further processing (route A-3 in figure 2.2) which results in the production of a graphical view of
the schema (route A-4 in figure 2.2). Our reverse-engineering techniques [WIK95b] are applied to
identify entity and relationship types to be used in the graphical model. Meta-knowledge
enhancements are solicited at this point by the database enhancement process (DBE) (route B-1 in
figure 2.2), which allows the definition of new constraints and changes to the existing schema.
These enhancements are reflected in the graphical view (route B-2 and B-3 in figure 2.2) and may
be used to augment the database (route B-4 to B-8 in figure 2.2). This approach to augmentation
makes use of the query mapping process, referred to as QMTS, to generate the required queries to
update the database via the DBC process. At this stage any existing or enhanced constraints may
be applied to the database to determine the extent to which it conforms to the new enhancements.
Carrying out this process will also ensure that legacy data will not be rejected by the target DBMS
due to possible violations. Finally, the database migration process, referred to as DBMI, assists
migration by incrementally migrating the database to the target environment (route C-1 to C-6 in
figure 2.2). Target schema constructs for each migratable component are produced via SMTS, and
DDL statements are issued to the target DBMS to create the new database schema. The data for
these migrated tables are extracted by instructing the source DBMS to export the source data to
the target database via QMTS. Here too, the queries which implement this export are issued to the
DBMS via the DBC process.

2.4 Research Aims and Objectives

       Our relational database enhancement and augmentation approach is important in three
respects, namely:

    1) by holding the additional defining information in the database itself, this information is
      usable by any design tool in addition to assisting the full automation of any future re-
      engineering of the same database;
    2) it allows better user understanding of database applications, as the associated constraints
      are shown in addition to the traditional entities and attributes at the conceptual level;




Page 25
3) the process which assists a database administrator to clean inconsistent legacy data ensures a
safe migration. To perform this latter task in a real world situation without an automated support
tool is a very difficult, tedious, time consuming and error prone task.

        Therefore the main aim of this project has been the design and development of a tool to
assist database enhancement and migration in a heterogeneous distributed relational database
environment. Such a system is concerned with enhancing the constituent databases in this type of
environment to exploit potential knowledge both to automate the re-engineering process and to
assist in evolving and cleaning the legacy data to prevent data rejection, possible losses of data
and/or delays in the migration process. To this end, the following detailed aims and objectives
have been pursued in our research:

1. Investigation of the problems inherent in schema enhancement and migration for a
heterogeneous distributed relational legacy database environment, in order to fully understand
these processes.

2. Identification of the conceptual foundation on which to successfully base the design and
development of a tool for this purpose. This foundation includes:

    • A framework to establish meta-data representation and manipulation.
    • A real world data modelling framework that facilitates the enhancement of existing working
      systems and which supports applications during migration.
    • A framework to retain the enhanced knowledge for future use which is in line with current
      international standards and techniques used in newer versions of relational DBMSs.
    • Exploiting existing databases in new ways, particularly linking them with data held in other
      legacy systems or more modern systems.
    • Displaying the structure of databases in a graphical form to make it easy for users to
      comprehend their contents.
    • The provision of an interactive graphical response when enhancements are made to a
      database.
    • A higher level of data abstraction for tasks associated with visualising the contents,
      relationships and behavioural properties of entities and constraints.
    • Determining the constraints on the information held and the extent to which the data
      conforms to these constraints.
    • Integrating with other tools to maximise the benefits of the new tool to the user community.

3. Development of a prototype tool to automate the re-engineering process and the migration
assisting tasks as far as possible. The following development aims have been chosen for this
system:

    • It should provide a realistic solution to the schema enhancement and migration assistance
      process.
    • It should be able to access and perform this task for legacy database systems.
    • It should be suitable for the data model at which it is targeted.
    • It should be as generic as possible so that it can be easily customised for other data models.
    • It should be able to retain the enhanced knowledge for future analysis by itself and other


Page 26
tools.
    • It should logically support a model using modern data modelling techniques irrespective of
      whether it is supported by the DBMS in use.
    • It should make extensive use of modern graphical user interface facilities for all graphical
      displays of the database schema.
• Graphical displays should also be as generic as possible so that they can be easily enhanced or
customised for other display methods.




Page 27
CHAPTER 3
                        Database Technology, Relational Model,
                     Conceptual Modelling and Integrity Constraints

The origins and historical development of database technology are initially presented here to focus
the evolution of ISs and the emergence of database models. The relational data model is identified as
currently the most commonly used database model and some terminology for this data model, along
with its features including query languages is then presented. A discussion of conceptual data
models with special emphasis on EER and OMT is provided to introduce these data models and the
symbols used in our project. Finally, we pay attention to crucial concepts relating to our work,
namely the notion of semantic integrity constraints, with special emphasis on those used in semantic
extensions to the relational model. The relational database language SQL is also discussed,
identifying how and when it supports the implementation of these semantic integrity constraints.

3.1 Origins and Historical Developments

        The origin of data management goes back to the 1950’s and hence, this section is sub divided
into two parts: the first part describes database technology prior to the relational data model, and the
second part describes developments since. This division was chosen as the relational model is
currently the most dominant database model for information management [DAT90].

       3.1.1 Database Technology Prior to the Relational Data Model

        Database technology emerged from the need to manipulate large collections of data for
frequently used data queries and reports. The first major step in mechanisation of information
systems came with the advent of punched card machines which worked sequentially on fixed-length
fields [SEN73, SEN77]. With the appearance of stored program computers, tape-oriented systems
were used to perform these tasks with an increase in user efficiency. These systems used sequential
processing of files in batch mode, which was adequate until peripheral storage with random access
capabilities (e.g. DASD) and time sharing operating systems with interactive processing appeared to
support real-time processing in computer systems.

        Access methods such as direct and indexed sequential access methods (e.g. ISAM, VSAM)
[BRA82, MCF91] were used to assist with the storage and location of physical records in stored
files. Enhancements were made to procedural languages (e.g. COBOL) to define and manage
application files, making the application program dependent on the organisation of the file. This
technique caused data redundancy as several files were used in systems to hold the same data (e.g.
emp_name and address in a payroll file; insured_name and address in an insurance file; and
depositors_name and address in a bank file). These stored data files used in the applications of the
1960's are now referred to as conventional file systems, and they were maintained using third
generation programming languages such as COBOL and PL/1. This evolution of mechanised
information systems was influenced by the hardware and software developments which occurred in
the 1950’s and early 1960’s. Most long existing legacy ISs are based on this technology. Our work
does not address this type of IS as they do not use a DBMS for their data management.

       The evolution of databases and database management systems [CHA76, FRY76, SIB76,
Chapter 3               Database Technology, Relational Model, Conceptual Modelling and Integrity
Constraints
SEN77, KIM79, MCG81, SEL87, DAT90, ELM94] was to a large extent the result of addressing the
main deficiencies in the use of files, i.e. by reducing data redundancy and making application
programs less dependent on file organisation. An important factor in this evolution was the
development of data definition languages which allowed the description of a database to be
separated from its application programs. This facility allowed the data definition (often called a
schema) to be shared and integrated to provide a wide variety of information to the users. The
repository of all data definitions (meta data) is called data dictionaries and their use allows data
definitions to be shared and widely available to the user community.

        In the late 1960's applications began to share their data files using an integrated layer of
stored data descriptions, making the first true database, e.g. the IMS hierarchical database [MCG77,
DAT90]. This type of database was navigational in nature and applications explicitly followed the
physical organisation of records in files to locate data using commands such as GNP - get next under
parent. These databases provided centralised storage management, transaction management,
recovery facilities in the event of failure and system maintained access paths. These were the typical
characteristics of early DBMSs.

        Work on extending COBOL to handle databases was carried out in the late 60s and 70s. This
resulted in the establishment of the DBTG (i.e. DataBase Task Group) of CODASYL and the formal
introduction of the network model along with its data manipulation commands [DBTG71]. The
relational model was proposed during the same period [COD70], followed by the 3 level
ANSI/SPARC architecture [ANSI75] which made databases more independent of applications, and
became a standard for the organisation of DBMSs. Three popular types of commercial database
systems7 classified by their underlying data model emerged during the 70s [DAT90, ELM94],
namely:

         • hierarchical
         • network
         • relational

and these have been the dominant types of DBMS from the late 60s on into the 80s and 90s.

         3.1.2 Database Technology Since the Relational Data Model

        At the same time as the relational data model appeared, database systems introduced another
layer of data description on top of the navigational functionality of the early hierarchical and
network models to bring extra logical data independence8. The relational model also introduced the
use of non-procedural (i.e. declarative) languages such as SQL [CHA74]. By the early 1980's many
relational database products, e.g. System R [AST76], DB2 [HAD84], INGRES [STO76] and Oracle
were in use and due to their growing maturity in the mid 80s and the complexity of programming,
navigating, and changing data structures in the older DBMS data models, the relational data model
was able to take over the commercial database market with the result that it is now dominant.



   7
       Other types such as flat file, inverted file systems were also used.
   8
       This allows changes to the logical structure of data without changing the application programs.


Page 29
Chapter 3               Database Technology, Relational Model, Conceptual Modelling and Integrity
Constraints
       The advent of inexpensive and reliable communication between computer systems, through
the development of national and international networks, has brought further changes in the design of
these systems. These developments led to the introduction of distributed databases, where a
processor uses data at several locations and links it as though it were at a single site. This technology
has led to distributed DBMSs and the need for interoperability among different database systems
[OZS91, BEL92].

       Several shortcomings of the relational model have been identified, including its inability to
perform efficiently compute-intensive applications such as simulation, to cope with computer-aided
design (CAD) and programming language environments, and to represent and manipulate effectively
concepts such as [KIM90]:

       • Complex nested entities (e.g. design and engineering objects),
       • Unstructured data (e.g. images, textual documents),
       • Generalisation and aggregation within a data structure,
       • The notion of time and versioning of objects and schemas,
       • Long duration transactions.

        The notion of a conceptual schema for application-independent modelling introduced by the
ANSI/SPARC architecture led to another data model, namely: the semantic model. One of the most
successful semantic models is the entity-relationship (E-R) model [CHE76]. Its concepts include
entities, relationships, value sets and attributes. These concepts are used in traditional database
design as they are application-independent. Many modelling concepts based on variants/extensions
to the E-R model have appeared since Chen’s paper. The enhanced/extended entity-relationship
model (EER) [TEO86, ELM94], the entity-category-relationship model (ECR) [ELM85], and the
Object Modelling Technique (OMT) [RUM91] are the most popular of these.

        The DAPLEX functional model [SHI81] and the Semantic Data Model [HAM81] are also
semantic models. They capture a richer set of semantic relationships among real-world entities in a
database than the E-R based models. Semantic relationships such as generalisation / specialisation
between a superclass and its subclass, the aggregation relationship between a class and its attributes,
the instance-of relationship between an instance and its class, the part-of relationship between
objects forming a composite object, and the version-of relationship between abstracted versioned
objects are semantic extensions supported in these models. The object-oriented data model with its
notions of class hierarchy, class-composition hierarchy (for nested objects) and methods could be
regarded as a subset of this type of semantic data model in terms of its modelling power, except for
the fact that the semantic data model lacks the notion of methods [KIM90] which is an important
aspect of the object-oriented model.

       The relational model of data and the relational query language have been extended [ROW87]
to allow modelling and manipulation of additional semantic relationships and database facilities.
These extensions include data abstraction, encapsulation, object identity, composite objects, class
hierarchies, rules and procedures. However, these extended relational systems are still being
evolved to fully incorporate features such as implementation of domain and extended data types,
enforcement of primary and foreign key and referential integrity checking, prohibition of duplicate
rows in tables and views, handling missing information by supporting four-valued predicate logic



Page 30
Chapter 3               Database Technology, Relational Model, Conceptual Modelling and Integrity
Constraints
(i.e. true, false, unknown, not applicable) and view updatability [KIV92], and they are not yet
available as commercial products.

        The early 1990's saw the emergence of new database systems by a natural evolution of
database technology, with many relational database systems being extended and other data models
(e.g. the object-oriented model) appearing to satisfy more diverse application needs. This opened
opportunities to use databases for a greater diversity of applications which had not been previously
exploited as they were not perceived as tractable by a database approach (e.g. Image, medical,
document management, engineering design and multi-media information, used in complex
information processing applications such as office automation (OA), computer-aided design (CAD),
computer-aided manufacturing (CAM) and hyper media [KIM90, ZDO90, CAT94]). The object-
oriented (O-O) paradigm represents a sound basis for making progress in these areas and as a result
two types of DBMS are beginning to dominate in the mid 90s [ZDO90], namely: the object-oriented
DBMS, and the extended relational DBMS.

        There are two styles of O-O DBMS, depending on whether they have evolved from
extensions to an O-O programming language or by evolving a database model. Extensions have been
created for two database models, namely: the relational and the functional models. The extensions to
existing relational DBMSs have resulted in the so-called Extended Relational DBMSs which have
O-O features (e.g. POSTGRES and Starburst), while extensions to the functional model have
produced PROBE and OODAPLEX. The approach of extending O-O programming language
systems with database management features has resulted in many systems (e.g. Smalltalk into
GemStone and ALLTALK, and C++ into many DBMSs including VBase / ONTOS, IRIS and O2).
References to these systems with additional information and references can be found in [CAT94].

       Research is currently taking place into other kinds of database such as active, deductive and
expert database systems [DAT90]. This thesis focuses on the relational model and possible
extensions to it which can represent semantics in existing relational database information systems in
such a way that these systems can be viewed in new ways and easily prepared for migration to more
modern database environments.

3.2 Relational Data Model

        In this section we introduce some of the commonly used terminology of the relational model.
This is followed by a selective description of the features and query languages of this model. Further
details of this data model can be found in most introductory database text books, e.g. [MCF91,
ROB93, ELM94, DAT95].

        A relation is represented as a table (entity) in which each row represents a tuple (record), the
number of columns being the degree of the relation and the number of rows being its cardinality. An
example of this representation is shown in figure 3.1, which shows a relation holding Student details,
with degree 3 and cardinality 5. This table and each of its columns are named, so that a unique
identity for a table column of a given schema is achieved via its table name and column name. The
columns of a table are called attributes (fields) each having its own domain (data type) representing
its pool of legal data. Basic types of domains are used (e.g. integer, real, character, text, date) to
define the domains of attributes. Constraints may be enforced to further restrict the pool of legal



Page 31
Chapter 3              Database Technology, Relational Model, Conceptual Modelling and Integrity
Constraints
values for an attribute. Tables which actually hold data are called base tables to distinguish them
from view tables which can be used for viewing data associated with one or more base tables. A
view table can also be an abstraction from a single base table which is used to control access to parts
of the data.

        A column or set of columns whose values uniquely identify a row of a relation is called a
candidate key (key) of the relation. It is customary to designate one candidate key of a relation as a
primary key (e.g. SNO in figure 3.1). The specification of keys restricts the possible values the key
attribute(s) may hold (e.g. no duplicate values), and is a type of constraint enforceable on a relation.
Additional constraints may be imposed on an attribute to further restrict its legal values. In such
cases, there should be a common set of legal values satisfying all the constraints of that attribute,
ensuring its ability to accept some data. For example, a pattern constraint which ensures that the first
character of SNO is ‘S’ further restricts the possible values of SNO - see figure 3.1. Many other
concepts and constraints are associated with the relational model although most of them are not
supported by early relational systems as, indeed, some of the more recent relational systems (e.g. a
value set constraint for the Address field as shown in figure 3.1).

                                                                  Domain
                                                              (type character)
                                                                                  Value Set Constraint
                       Pattern Constraint
                  (all values begin with 'S')

                         Primary Key
                        (unique values)




                                          Student    SNO          Name           Address




                                                                                                                  Cardinality
                                                       S1         Jones           Cardiff
                                                       S2         Smith           Bristol          :
                 Relation                                                                                Tuples
                                                       S3         Gray            Swansea
                                                       S4         Brown           Cardiff          :
                                                       S5         Jones           Newport




                                                                    Attributes
                                                                     Degree

                                                    Figure 3.1: The Student relation


       3.2.1 Requisite Features of the Relational Model

        During the early stages of the development of relational database systems there were many
requisite features identified which a comprehensive relational system should have [KIM79, DAT90].
We shall now examine these features to illustrate the kind of features expected from early relational
database management systems. They included support for:

      • Recovery from both soft and hard crashes,
      • A report generator for formatted display of the results of queries,
      • An efficient optimiser to meet the response-time requirements of users,
      • User views of the stored database,


Page 32
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases
Assisting Migration and Evolution of Relational Legacy Databases

More Related Content

What's hot

Topic1 Understanding Distributed Information Systems
Topic1 Understanding Distributed Information SystemsTopic1 Understanding Distributed Information Systems
Topic1 Understanding Distributed Information Systemssanjoysanyal
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSNDhaya kanthavel
 
Case Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesCase Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesG. Habib Uddin Khan
 
DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...
DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...
DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...Tal Lavian Ph.D.
 
Case4 lego embracing change by combining bi with flexible information system 2
Case4  lego embracing change by combining bi with flexible  information system 2Case4  lego embracing change by combining bi with flexible  information system 2
Case4 lego embracing change by combining bi with flexible information system 2dyadelm
 
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEMLEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEMmyteratak
 
Pmit 6102-14-lec1-intro
Pmit 6102-14-lec1-introPmit 6102-14-lec1-intro
Pmit 6102-14-lec1-introJesmin Rahaman
 
0321210255 ch01
0321210255 ch010321210255 ch01
0321210255 ch01MsKamala
 
Toward Cloud Computing: Security and Performance
Toward Cloud Computing: Security and PerformanceToward Cloud Computing: Security and Performance
Toward Cloud Computing: Security and Performanceijccsa
 
A database management system
A database management systemA database management system
A database management systemghulam120
 
Lesson - 02 Network Design and Management
Lesson - 02 Network Design and ManagementLesson - 02 Network Design and Management
Lesson - 02 Network Design and ManagementAngel G Diaz
 
Distributed Systems - Information Technology
Distributed Systems - Information TechnologyDistributed Systems - Information Technology
Distributed Systems - Information TechnologySagar Mehta
 
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTUREA HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTUREijccsa
 

What's hot (15)

Topic1 Understanding Distributed Information Systems
Topic1 Understanding Distributed Information SystemsTopic1 Understanding Distributed Information Systems
Topic1 Understanding Distributed Information Systems
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN
 
Case Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesCase Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile Databases
 
DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...
DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...
DWDM-RAM: a data intensive Grid service architecture enabled by dynamic optic...
 
Case4 lego embracing change by combining bi with flexible information system 2
Case4  lego embracing change by combining bi with flexible  information system 2Case4  lego embracing change by combining bi with flexible  information system 2
Case4 lego embracing change by combining bi with flexible information system 2
 
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEMLEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
 
Pmit 6102-14-lec1-intro
Pmit 6102-14-lec1-introPmit 6102-14-lec1-intro
Pmit 6102-14-lec1-intro
 
0321210255 ch01
0321210255 ch010321210255 ch01
0321210255 ch01
 
Toward Cloud Computing: Security and Performance
Toward Cloud Computing: Security and PerformanceToward Cloud Computing: Security and Performance
Toward Cloud Computing: Security and Performance
 
A database management system
A database management systemA database management system
A database management system
 
Fs2510501055
Fs2510501055Fs2510501055
Fs2510501055
 
Lesson - 02 Network Design and Management
Lesson - 02 Network Design and ManagementLesson - 02 Network Design and Management
Lesson - 02 Network Design and Management
 
Distributed Systems - Information Technology
Distributed Systems - Information TechnologyDistributed Systems - Information Technology
Distributed Systems - Information Technology
 
Case study 9
Case study 9Case study 9
Case study 9
 
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTUREA HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
 

Similar to Assisting Migration and Evolution of Relational Legacy Databases

HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfAgaram Technologies
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
 
Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or futureDavid Walker
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 
Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud lyingcom
 
Microservices for Application Modernisation
Microservices for Application ModernisationMicroservices for Application Modernisation
Microservices for Application ModernisationAjay Kumar Uppal
 
data-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdfdata-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdfssuser18927d
 
Query Evaluation Techniques for Large Databases.pdf
Query Evaluation Techniques for Large Databases.pdfQuery Evaluation Techniques for Large Databases.pdf
Query Evaluation Techniques for Large Databases.pdfRayWill4
 
Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World IRJET Journal
 
Development of a Suitable Load Balancing Strategy In Case Of a Cloud Computi...
Development of a Suitable Load Balancing Strategy In Case Of a  Cloud Computi...Development of a Suitable Load Balancing Strategy In Case Of a  Cloud Computi...
Development of a Suitable Load Balancing Strategy In Case Of a Cloud Computi...IJMER
 
Embracing Containers and Microservices for Future Proof Application Moderniza...
Embracing Containers and Microservices for Future Proof Application Moderniza...Embracing Containers and Microservices for Future Proof Application Moderniza...
Embracing Containers and Microservices for Future Proof Application Moderniza...Marlabs
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfHasanAfwaaz1
 
Dynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed DatabaseDynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed DatabaseEditor IJCATR
 
A Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse EngineeringA Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse Engineeringijsrd.com
 
Flaw less coding and authentication of user data using multiple clouds
Flaw less coding and authentication of user data using multiple cloudsFlaw less coding and authentication of user data using multiple clouds
Flaw less coding and authentication of user data using multiple cloudsIRJET Journal
 
Flexible and modular software framework as a solution for operational excelle...
Flexible and modular software framework as a solution for operational excelle...Flexible and modular software framework as a solution for operational excelle...
Flexible and modular software framework as a solution for operational excelle...Thomas Schulz
 

Similar to Assisting Migration and Evolution of Relational Legacy Databases (20)

HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdfHOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
HOW-CLOUD-IMPLEMENTATION-CAN-ENSURE-MAXIMUM-ROI.pdf
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving Environment
 
AtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White PapaerAtomicDBCoreTech_White Papaer
AtomicDBCoreTech_White Papaer
 
01_Program
01_Program01_Program
01_Program
 
Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or future
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Adm Workshop Program
Adm Workshop ProgramAdm Workshop Program
Adm Workshop Program
 
Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud
 
Microservices for Application Modernisation
Microservices for Application ModernisationMicroservices for Application Modernisation
Microservices for Application Modernisation
 
data-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdfdata-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdf
 
Query Evaluation Techniques for Large Databases.pdf
Query Evaluation Techniques for Large Databases.pdfQuery Evaluation Techniques for Large Databases.pdf
Query Evaluation Techniques for Large Databases.pdf
 
Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World
 
Development of a Suitable Load Balancing Strategy In Case Of a Cloud Computi...
Development of a Suitable Load Balancing Strategy In Case Of a  Cloud Computi...Development of a Suitable Load Balancing Strategy In Case Of a  Cloud Computi...
Development of a Suitable Load Balancing Strategy In Case Of a Cloud Computi...
 
Embracing Containers and Microservices for Future Proof Application Moderniza...
Embracing Containers and Microservices for Future Proof Application Moderniza...Embracing Containers and Microservices for Future Proof Application Moderniza...
Embracing Containers and Microservices for Future Proof Application Moderniza...
 
CC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdfCC LECTURE NOTES (1).pdf
CC LECTURE NOTES (1).pdf
 
publishable paper
publishable paperpublishable paper
publishable paper
 
Dynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed DatabaseDynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed Database
 
A Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse EngineeringA Comparative Study of Forward and Reverse Engineering
A Comparative Study of Forward and Reverse Engineering
 
Flaw less coding and authentication of user data using multiple clouds
Flaw less coding and authentication of user data using multiple cloudsFlaw less coding and authentication of user data using multiple clouds
Flaw less coding and authentication of user data using multiple clouds
 
Flexible and modular software framework as a solution for operational excelle...
Flexible and modular software framework as a solution for operational excelle...Flexible and modular software framework as a solution for operational excelle...
Flexible and modular software framework as a solution for operational excelle...
 

More from Gihan Wikramanayake

Using ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical FacultyUsing ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical FacultyGihan Wikramanayake
 
Evaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universitiesEvaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universitiesGihan Wikramanayake
 
Broadcasting Technology: Overview
Broadcasting  Technology: OverviewBroadcasting  Technology: Overview
Broadcasting Technology: OverviewGihan Wikramanayake
 
Importance of Information Technology for Sports
Importance of Information Technology for SportsImportance of Information Technology for Sports
Importance of Information Technology for SportsGihan Wikramanayake
 
Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...Gihan Wikramanayake
 
Exploiting Tourism through Data Warehousing
Exploiting Tourism through Data WarehousingExploiting Tourism through Data Warehousing
Exploiting Tourism through Data WarehousingGihan Wikramanayake
 
Speaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesSpeaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesGihan Wikramanayake
 
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...Gihan Wikramanayake
 
Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...Gihan Wikramanayake
 
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුමICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුමGihan Wikramanayake
 
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුමවෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුමGihan Wikramanayake
 
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුමපරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුමGihan Wikramanayake
 
Balanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMMBalanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMMGihan Wikramanayake
 
Web Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: DrawbacksWeb Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: DrawbacksGihan Wikramanayake
 
Evolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy DatabasesEvolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy DatabasesGihan Wikramanayake
 
Development of a Web site with Dynamic Data
Development of a Web site with Dynamic DataDevelopment of a Web site with Dynamic Data
Development of a Web site with Dynamic DataGihan Wikramanayake
 
Web Based Agriculture Information System
Web Based Agriculture Information SystemWeb Based Agriculture Information System
Web Based Agriculture Information SystemGihan Wikramanayake
 

More from Gihan Wikramanayake (20)

Using ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical FacultyUsing ICT to Promote Learning in a Medical Faculty
Using ICT to Promote Learning in a Medical Faculty
 
Evaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universitiesEvaluation of English and IT skills of new entrants to Sri Lankan universities
Evaluation of English and IT skills of new entrants to Sri Lankan universities
 
Learning beyond the classroom
Learning beyond the classroomLearning beyond the classroom
Learning beyond the classroom
 
Broadcasting Technology: Overview
Broadcasting  Technology: OverviewBroadcasting  Technology: Overview
Broadcasting Technology: Overview
 
Importance of Information Technology for Sports
Importance of Information Technology for SportsImportance of Information Technology for Sports
Importance of Information Technology for Sports
 
Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...Improving student learning through assessment for learning using social media...
Improving student learning through assessment for learning using social media...
 
Exploiting Tourism through Data Warehousing
Exploiting Tourism through Data WarehousingExploiting Tourism through Data Warehousing
Exploiting Tourism through Data Warehousing
 
Speaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesSpeaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia Databases
 
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
Authropometry of Sri Lankan Sportsmen and Sportswomen, with Special Reference...
 
Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...Analysis of Multiple Choice Question Papers with Special Reference to those s...
Analysis of Multiple Choice Question Papers with Special Reference to those s...
 
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුමICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව   දිනමිණ, පරිගණක දැනුම
ICT ප්‍රාරම්භක ඩිප්ලෝමා පාඨමාලාව දිනමිණ, පරිගණක දැනුම
 
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුමවෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය   දිනමිණ, පරිගණක දැනුම
වෘත්තීය අවස්ථා වැඩි පරිගණක ක්ෂේත‍්‍රය දිනමිණ, පරිගණක දැනුම
 
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුමපරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා   දිනමිණ, පරිගණක දැනුම
පරිගණක ක්ෂේත‍්‍රයේ වෘත්තීය අවස්ථා දිනමිණ, පරිගණක දැනුම
 
Producing Employable Graduates
Producing Employable GraduatesProducing Employable Graduates
Producing Employable Graduates
 
Balanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMMBalanced Scorecard and its relationship to UMM
Balanced Scorecard and its relationship to UMM
 
An SMS-Email Reader
An SMS-Email ReaderAn SMS-Email Reader
An SMS-Email Reader
 
Web Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: DrawbacksWeb Usage Mining based on Heuristics: Drawbacks
Web Usage Mining based on Heuristics: Drawbacks
 
Evolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy DatabasesEvolving and Migrating Relational Legacy Databases
Evolving and Migrating Relational Legacy Databases
 
Development of a Web site with Dynamic Data
Development of a Web site with Dynamic DataDevelopment of a Web site with Dynamic Data
Development of a Web site with Dynamic Data
 
Web Based Agriculture Information System
Web Based Agriculture Information SystemWeb Based Agriculture Information System
Web Based Agriculture Information System
 

Recently uploaded

Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 

Recently uploaded (20)

Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 

Assisting Migration and Evolution of Relational Legacy Databases

  • 1. Assisting Migration and Evolution of Relational Legacy Databases by G.N. Wikramanayake Department of Computer Science, University of Wales Cardiff, Cardiff September 1996
  • 2.
  • 3. Abstract The research work reported here is concerned with enhancing and preparing databases with limited DBMS capability for migration to keep up with current database technology. In particular, we have addressed the problem of re-engineering heterogeneous relational legacy databases to assist them in a migration process. Special attention has been paid to the case where the legacy database service lacks the specification, representation and enforcement of integrity constraints. We have shown how knowledge constraints of modern DBMS capabilities can be incorporated into these systems to ensure that when migrated they can benefit from the current database technology. To this end, we have developed a prototype conceptual constraint visualisation and enhancement system (CCVES) to automate as efficiently as possible the process of re-engineering for a heterogeneous distributed database environment, thereby assisting the global system user in preparing their heterogeneous database systems for a graceful migration. Our prototype system has been developed using a knowledge based approach to support the representation and manipulation of structural and semantic information about schemas that the re-engineering and migration process requires. It has a graphical user interface, including graphical visualisation of schemas with constraints using user preferred modelling techniques for the convenience of the user. The system has been implemented using meta-programming technology because of the proven power and flexibility that this technology offers to this type of research applications. The important contributions resulting from our research includes extending the benefits of meta- programming technology to the very important application area of evolution and migration of heterogeneous legacy databases. In addition, we have provided an extension to various relational database systems to enable them to overcome their limitations in the representation of meta-data. These extensions contribute towards the automation of the reverse-engineering process of legacy databases, while allowing the user to analyse them using extended database modelling concepts. Page v
  • 4. CHAPTER 1 Introduction This chapter introduces the thesis. Section 1.1 is devoted to the background and motivations of the research undertaken. Section 1.2 presents the broad goals of the research. The original achievements which have resulted from the research are summarised in Section 1.3. Finally, the overall organisation of the thesis is described in Section 1.4. 1.1 Background and Motivations of the Research Over the years rapid technological changes have taken place in all fields of computing. Most of these changes have been due to the advances in data communications, computer hardware and software [CAM89] which together have provided a reliable and powerful networking environment (i.e. standard local and wide area networks) that allow the management of data stored in computing facilities at many nodes of the network [BLI92]. These changes have turned round the hardware technology from centralised mainframes to networked file-server and client-server architectures [KHO92] which support various ways to use and share data. Modern computers are much more powerful than the previous generations and perform business tasks at a much faster rate by using their increased processing power [CAM88, CAM89]. Simultaneous developments in the software industry have produced techniques (e.g. for system design and development) and products capable of utilising the new hardware resources (e.g. multi-user environments with GUIs). These new developments are being used for a wide variety of applications, including modern distributed information processing applications, such as office automation where users can create and use databases with forms and reports with minimal effort, compared to the development efforts using 3GLs [HIR85, WOJ94]. Such applications are being developed with the aid of database technology [ELM94, DAT95] as this field too has advanced by allowing users to represent and manipulate advanced forms of data and their functionalities. Due to the program data independence feature of DBMSs the maintenance of database application programs has become easier as functionalities that were traditionally performed by procedural application routines are now supported declaratively using database concepts such as constraints and rules. In the field of databases, the recent advances resulting from technological transformation include many areas such as the use of distributed database technology [OZS91, BEL92], object- oriented technology [ATK89, ZDO90], constraints [DAT83, GRE93], knowledge-based systems [MYL89, GUI94], 4GLs and CASE tools [COMP90, SCH95, SHA95]. Meanwhile, the older technology was dealing with files and primitive database systems which now appear inflexible, as the technology itself limits them from being adapted to meet the current changing business needs catalysed by newer technologies. The older systems which have been developed using 3GLs and in operation for many years, often suffer from failures, inappropriate functionality, lack of documentation, poor performance and are referred to as legacy information systems [BRO93, COMS94, IEE94, BRO95, IEEE95]. The current technology is much more flexible as it supports methods to evolve (e.g. 4GLs, CASE tools, GUI toolkits and reusable software libraries [HAR90, MEY94]), and can share resources through software that allows interoperability (e.g. ODBC [RIC94, GEI95]). This evolution
  • 5. reflects the changing business needs. However, modern systems need to be properly designed and implemented to benefit from this technology, which may still be unable to prevent such systems themselves being considered to be legacy information systems in the near future due to the advent of the next generation of technology with its own special features. The only salvation would appear to be building in evolution paths in the current systems. The increasing power of computers and their software has meant they have already taken over many day to day functions and are taking over more of these tasks as time passes. Thus computers are managing a larger volume of information in a more efficient manner. Over the years most enterprises have adopted the computerisation option to enable them to efficiently perform their business tasks and to be able to compete with their counterparts. As the performance ability of computers has increased, the enterprises still using early computer technology face serious problems due to the difficulties that are inherent in their legacy systems. This means that new enterprises using systems purely based on the latest technology have an advantage over those which need to continue to use legacy information systems (ISs), as modern ISs have been developed using current technology which provides not only better performance, but also utilises the benefits of improved functionality. Hence, managers of legacy IS enterprises want to retire their legacy code and use modern database management systems (DBMSs) in the latest environment to gain the full benefits from this newer technology. However they want to use this technology on the information and data they already hold as well as on data yet to be captured. They also want to ensure that any attempts to incorporate the modern technology will not adversely affect the ongoing functionality of their existing systems. This means legacy ISs need to be evolved and migrated to a modern environment in such a way that the migration is transparent to the current users. The theme of this thesis is how we can support this form of system evolution. 1.1.1 The Barriers to Legacy Information System Migration Legacy ISs are usually those systems that have stood the test of time and have become a core service component for a business’s information needs. These systems are a mix of hardware and software, sometimes proprietary, often out of date, and built to earlier styles of design, implementation and operation. Although they were productive and fulfilled their original performance criteria and their requirements, these systems lack the ability to change and evolve. The following can be seen as barriers to evolution in legacy IS [IEE94]. • The technology used to build and maintain the legacy IS is obsolete, • The system is unable to reflect changes in the business world and to support new needs, • The system cannot integrate with other sub-systems, • The cost, time and risk involved in producing new alternative systems to the legacy IS. The risk factor is that a new system may not provide the full functionality of the current system for a period because of teething problems. Due to these barriers, large organisations [PHI94] prefer to write independent sub-systems to perform new tasks using modern technology which will run alongside the existing systems, rather than attempt to achieve this by adapting existing code or by writing a new system that replaces the old and has new facilities as well. We see the following immediate advantages of this low risk approach. Page 4
  • 6. • The performance, reliability and functionality of the existing system is not affected, • New applications can take advantage of the latest technology, • There is no need to retrain those staff who only need the facilities of the old system. However with this approach, as business requirements evolve with time, more and more new needs arise, resulting in the development and regular use of many diverse systems within the same organisation. Hence, in the long term the above advantages are overshadowed by the more serious disadvantages of this approach, such as: • The existing systems continue to exist and are legacy IS running on older and older technology, • The need to maintain many different systems to perform similar tasks increases the maintenance and support costs of the organisation, • Data becomes duplicated in different systems which implies the maintenance of redundant data with its associated increased risk of inconsistency between the data copies if updating occurs, • The overall maintenance cost for hardware, software and support personnel increases as many platforms are being supported, • The performance of the integrated information functions of the organisation decreases due to the need to interface many disparate systems. To address the above issues, legacy ISs need to be evolved and migrated to new computing environments, when their owning organisation upgrades. This migration should occur within a reasonable time after the upgrade occurs. This means that it is necessary to migrate legacy ISs to new target environments in order to allow the organisation to dispose of the technology which is becoming obsolete. Managers of some enterprises have chosen an easy way to overcome this problem, by emulating [CAM89, PHI94] the current environment on the new platforms (e.g. AS/400 emulators for IBM S/360 and ICL’s DME emulators for 1900 and System 4 users). An alternative strategy is achieved by translating [SHA93, PHI94, SHE94, BRO95] the software to run in new environments (i.e. code-to-code level translation). The emulator approach perpetuates all the software deficiencies of the legacy ISs although successfully removing the old-fashioned hardware technology and so it does enjoy the increased processing power of the new hardware. The translation approach takes advantage of some of the modern technological benefits in the target environment as the conversions - such as IBM’s JCL and ICL’s SCL code to Unix shell scripts, Assembler to COBOL, COBOL to COBOL embedded with SQL, and COBOL data structures to relational DBMS tables - are also done as part of the translation process. This approach, although a step forward, still carries over most of the legacy code as legacy systems are not evolved by this process. For example, the basic design is not changed. Hence the barrier to change and/or integration to a common sub- system still remains, and the translated systems were not designed for the environment they are now running in, so they may not be compatible with it. There are other approaches to overcoming this problem which have been used by enterprises [SHA93, BRO95]. These include re-implementing systems under the new environment and/or upgrading existing systems to achieve performance improvements. As computer technology continues to evolve at an ever quicker pace the need to migrate arises more rapidly. This means, most small organisations and individuals are left behind and are forced to work in a technologically Page 5
  • 7. obsolete environment, mainly due to the high cost of frequently migrating to newer systems and/or upgrading existing software, as this process involves time and manpower which cost money. The gap between the older and newer system users will very soon create a barrier to information sharing unless some tools are developed to assist the older technology users’ migration to new technology environments. This assistance for the older technology users may take many forms, including tools for: analysing and understanding existing systems; enhancing and modifying existing systems; migrating legacy ISs to newer platforms. The complete migration process for a legacy IS needs to consider these requirements and many other aspects, as recently identified by Brodie and Stonebraker in [BRO95]. Our work was primarily motivated by these business oriented legacy database issues and by work in the area of extending relational database technology to enable it to represent more knowledge about its stored data [COD79, STO86a, STO86b, WIK90]. This second consideration is an important aspect of legacy system migration, since if a graceful migration is to be achieved we must be able to enhance a legacy relational database with such knowledge to take full advantage of the new system environment. 1.1.2 Heterogeneous Distributed Environments As well as the problem of having to use legacy ISs, most large enterprises are faced with the problem of heterogeneity and the need for interoperability between existing ISs [IMS91]. This arises due to the increased use of different computer systems and software tools for information processing within an organisation as time passes. The development of networking capabilities to manage and share information stored over a network has made interoperability a requirement and local area networks finding broad acceptance in business enterprises has enhanced the need to perform this task within organisations. Network file servers, client-server technology and the use of distributed databases [OZS91, BEL92, KHO92] are results of these challenging innovations. This technology is currently being used to create and process information held in heterogeneous databases, which involves linking different databases in an interoperable environment. An aspect of this work is legacy database interoperation, since as time passes these databases will have been built using different generations of software. In recent years, the demand for distributed database capabilities has been fuelled mostly by the decentralisation of business functions in large organisations to address customer needs, and by mergers and acquisitions that have taken place in the corporate world. As a consequence, there is a strong requirement among enterprises for the ability to cross-correlate data stored in different existing heterogeneous databases. This has led to the development of products referred to as gateways, to enable users to link different databases together, e.g. Microsoft’s Open Database Connectivity (ODBC) drivers can link Access, FoxPro, Btrieve, dBASE and Paradox databases together [COL94, RIC94]. There are similar products for other database vendors, such as Oracle1 [HOL93] and others2 [PUR93, SME93, RIC94, BRO95]. Database vendors have targetted cross- platform compatibility via SQL access protocols to support interoperability in a heterogeneous environment. As heterogeneity in distributed systems may occur in various forms ranging from 1 For IBM’s DB2, UNISYS’s DMS, DEC RMS. 2 For INGRES, SYBASE, Informix and other popular SQL DBMSs. 3 During the life-time of this project the SQL-3 standards moved from a preliminary draft, through several modifications before being finalised in 1995. Page 6
  • 8. different hardware platforms, operating systems, networking protocols and local database systems, cross-platform compatibility via SQL provides only a simple form of heterogeneous distributed database access. The biggest challenge comes in addressing heterogeneity due to differences in local databases [OZS91, BEL92]. This challenge is also addressed in the design and development of our system. Distributed DBMSs have become increasingly popular in organisations as they offer the ability to interconnect existing databases, as well as having many other advantages [OZS91, BEL92]. The interconnection of existing databases leads to two types of distributed DBMS, namely: homogeneous and heterogeneous distributed DBMSs. In homogeneous systems all of the constituent nodes run the same DBMS and the databases can be designed in harmony with each other. This simplifies both the processing of queries at different nodes and the passing of data between nodes. In heterogeneous systems the situation is more complex, as each node can be running a different DBMS and the constituent databases can be designed independently. This is the normal situation when we are linking legacy databases, as the DBMS and the databases used are more likely to be heterogeneous since they are usually implemented for different platforms during different technological eras. In such a distributed database environment, heterogeneity may occur in various forms, at different levels [OZS91, BEL92], namely : • The logical level (i.e. involving different database designs), • The data management level (i.e. involving different data models), • The physical level, (i.e. involving different hardware, operating systems and network protocols), and • At all three or any pair of these levels. 1.1.3 The Problems and Search for a Solution The concept of heterogeneity itself is valuable as it allows designers a freedom of choice between different systems and design approaches, thus enabling them to identify those most suitable for different applications. The exploitation of this freedom over the years in many organisations has resulted in the creation of multiple local and remote information systems which now need to be made interoperable to provide an efficient and effective information service to the enterprise managers. Open database connectivity (ODBC) [RIC94, GEI95] and its standards has been proposed to support interoperability among databases managed by different DBMSs. Database vendors such as Oracle, INGRES, INFORMIX and Microsoft have already produced tools, engines and connectivity products to fulfil this task [HOL93, PUR93, SME93, COL94, RIC94, BRO95]. These products allow limited data transfer and query facilities among databases to support interoperability among heterogeneous DBMSs. These features, although they permit easy, transparent heterogeneous database access, still do not provide a solution to legacy IS where a primary concern is to evolve and migrate the system to a target environment so that obsolete support systems can be retired. Furthermore, the ODBC facilities are developed for current DBMSs and hence may not be capable of accessing older generation DBMSs, and, if they are, are unlikely to be able to enhance them to take advantage of the newer technologies. Hence there is a need to create tools that will allow ODBC equivalent functionality for older generation DBMSs. Our work provides such functionality for all the DBMSs we have chosen for this research. It also provides the ability to enhance and evolve legacy databases. Page 7
  • 9. In order to evolve an information system, one needs to understand the existing system’s structure and code. Most legacy information systems are not properly documented and hence understanding such systems is a complex process. This means that changing any legacy code involves a high risk as it could result in unexpected system behaviour. Therefore one needs to analyse and understand existing system code before performing any changes to the system. Database system design and implementation tools have appeared recently which have the aim of helping new information system development. Reverse and re-engineering tools are also appearing in an attempt to address issues concerned with existing databases [SHA93, SCH95]. Some of these tools allow the examination of databases built using certain types of DBMSs, however, the enhancements they allow are done within the limitation of that system. Due to continuous ongoing technology changes, most current commercial DBMSs do not support the most recent software modelling techniques and features (e.g. Oracle version 7 does not support Object-Oriented features). Hence a system built using current software tools is guaranteed to become a legacy system in the near future (i.e. when new products with newer techniques and features begin to appear in the commercial market place). Reverse engineering tools [SHA93] are capable of recreating the conceptual model of an existing database and hence they are an ideal starting point when trying to gain a comprehensive understanding of the information held in the database and its current state, as they create a visual picture of that state. However, in legacy systems the schemas are basic, since most of the information used to compose a conceptual model is not available in these databases. Information such as constraints that show links between entities is usually embedded in the legacy application code and users find it difficult to reverse engineer these legacy ISs. Our work addresses these issues while assisting in overcoming this barrier within the knowledge representation limitations of existing DBMSs. 1.1.4 Primary and Secondary Motivations The research reported in this thesis therefore was primarily promoted by the need to provide, for a logically heterogeneous distributed database environment, a design tool that allows users not only to understand their existing systems but also to enhance and visualise an existing database’s structure using new techniques that are either not yet present in existing systems or not supported by the existing software environment. It was also motivated by: a) Its direct applicability in the business world, as the new technique can be applied to incrementally enhance existing systems and prepare them to be easily migrated to new target environments, hence avoiding continued use of legacy information systems in the organisation. Although previous work and some design tools address the issue of legacy information system analysis, evolution and migration, these are mainly concerned with 3GL languages such as COBOL and C [COMS94, BRO95, IEEE95]. Little work has been reported which addresses the new issues that arise due to the Object-Oriented (O-O) data model or the extended relational data model [CAT94]. There are no reports yet of enhancing legacy systems so that they can migrate to O-O or extended relational environments in a graceful migration from a relational system. There has been Page 8
  • 10. some work in the related areas of identifying extended entity relationship structures in relational schemas, and some attempts at reverse-engineering relational databases [MAR90, CHI94, PRE94]. b) The lack of previous research in visualising pre-existing heterogeneous database schemas and evolving them by enhancing them with modern concepts supported in more recent releases of software. Most design tools [COMP90, SHA93] which have been developed to assist in Entity- Relationship (E-R) modelling [ELM94] and Object Modelling Technique (OMT) modelling [RUM91] are used in a top-down database design approach (i.e. forward engineering) to assist in developing new systems. However, relatively few tools attempt to support a bottom-up approach (i.e. reverse engineering) to allow visualisation of pre-existing database schemas as E-R or OMT diagrams. Among these tools only a very few allow enhancement of the pre-existing database schemas, i.e. they apply forward engineering to enhance a reverse-engineered schema. Even those which do permit this action to some extent, always operate on a single database management system and work mostly with schemas originally designed using such systems (e.g. CASE tools). The tools that permit only the bottom-up approach are referred to as reverse-engineering tools and those which support both (i.e. bottom-up and top-down) are called re-engineering tools [SHA93]. This thesis is primarily concerned with creating re-engineering tools that assist legacy database migration. The commercially available re-engineering tools are customised for particular DBMSs and are not easily usable in a heterogeneous environment. This barrier against widespread usability of re- engineering tools means that a substantial adaptation and reprogramming effort (costing time and money) is involved every time a new DBMS appears in a heterogeneous environment. An obvious example that reflects this limitation arises in a heterogeneous distributed database environment where there may be a need to visualise each participant database’s schema. In such an environment if the heterogeneity occurs at the database management level (where each node uses a different DBMS, for example, one node uses INGRES [DAT87] and another uses Oracle [ROL92]), then we have to use two different re-engineering tools to display these schemas. This situation is exacerbated for each additional DBMS that is incorporated into the given heterogeneous context. Also, legacy databases are migrated to different DBMS environments as newer versions and better database products have appeared since the original release of their DBMS. This means that a re-engineering tool that assists legacy database migration must work in an heterogeneous environment so that its use will not be restricted to particular types of ISs. Existing re-engineering tools provide a single target graphical data model (usually the E-R model or a variant of it), which may differ in presentation style between tools and therefore inhibits the uniformity of visualisation that is highly desirable in an interoperable heterogeneous distributed database environment. This limitation means that users may need to use different tools to provide the required uniformity of display in such an environment. The ability to visualise the conceptual model of an information system using a user-preferred graphical data model is important as it ensures that no inaccurate enhancements are made to the system due to any misinterpretation of graphical notations used. c) The need to apply rules and constraints to pre-existing databases to identify and clean inconsistent legacy data, as preparation for migration or as an enhancement of the database’s quality. Page 9
  • 11. The inability to define and apply rules and constraints on early database systems due to system limitations resulted in them not using constraints to increase the accuracy and consistency of the data held by these systems. This limitation is now a barrier to information system migration as a new target DBMS is unable to enforce constraints on a migrated database until all violations are investigated and resolved either by omitting the violating data or by cleaning it. This investigation may also show that a constraint has to be adjusted as the violating data is needed by the organisation. The enhancement of such a system by rules and constraints provides knowledge that is usable to determine possible data violations. The process of detecting constraint violations may be done by applying queries that are generated from these enhanced constraints. Similar methods have been used to implement integrity constraints [STO75], optimise queries [OZS91] and obtain intensional answers [FON92, MOT89]. This is essential as constraints may have been implemented at the application coding level and that can lead to their inconsistent application. d) An awareness of the potential contribution that knowledge-based systems and meta-programming technologies, in association with extended relational database technology, have to offer in coping with semantic heterogeneity. The successful production of a conceptual model is highly dependent on the semantic information available, and on the ability to reason about these semantics. A knowledge-based system can be used to assist in this task, as the process to generalise effective exploitation of semantic information for pre-existing heterogeneous databases needs to undergo three sub-processes, namely: knowledge acquisition, representation and manipulation. The knowledge acquisition process extracts the existing knowledge from a database’s data dictionaries. This knowledge may include subsequent enhancements made by the user, as the use of a database to store such knowledge will provide easy access to this information along with its original knowledge. The knowledge representation process represents existing and enhanced knowledge. The knowledge manipulation process is concerned with deriving new knowledge and ensuring consistency of existing knowledge. These stages are addressable using specific processes. For instance, the reverse-engineering process used to produce a conceptual model can be used to perform the knowledge acquisition task. Then the derived and enhanced knowledge can be stored in the same database by adopting a process that will allow us to distinguish this knowledge from its original meta-data. Finally, knowledge manipulation can be done with the assistance of a Prolog based system [GRA88], while data and knowledge consistency can be verified using the query language of the database. 1.2 Goals of the Research The broad goals of the research reported in this thesis are highlighted here, with detailed aims and objectives presented in section 2.4. These goals are to investigate interoperability problems, schema enhancement and migration in a heterogeneous distributed database environment, with particular emphasis on extended relational systems. This should provide a basis for the design and implementation of a prototype software system that brings together new techniques from the areas of knowledge-based systems, meta-programming and O-O conceptual data modelling with the aim of facilitating schema enhancement, by means of generalising the efficient representation of constraints using the current standards. Such a system is a tool that would be a valuable asset in a logically heterogeneous distributed extended relational database environment as it would make it possible for Page 10
  • 12. global users to incrementally enhance legacy information systems. This offers the potential for users in this type of environment to work in terms of such a global schema, through which they can prepare their legacy systems to easily migrate to target environments and so gain the benefits of modern computer technology. 1.3 Original Achievements of the Research The importance of this research lies in establishing the feasibility of enhancing, cleaning and migrating heterogeneous legacy databases using meta-programming technology, knowledge-based system technology, database system technology and O-O conceptual data modelling concepts, to create a comprehensive set of techniques and methods that form an efficient and useful generalised database re-engineering tool for heterogeneous sets of databases. The benefits such a tool can bring are also demonstrated and assessed. A prototype Conceptual Constraint Visualisation and Enhancement System (CCVES) [WIK95a] has been developed as a result of the research. To be more specific, our work has made four important contributions to progress in the database topic area of Computer Science: 1) CCVES is the first system to bring the benefits of meta-programming technology to the very important application area of enhancing and evolving heterogeneous distributed legacy databases to assist the legacy database migration process [GRA94, WIK95c]. 2) CCVES is also the first system to enhance existing databases with constraints to improve their visual presentation and hence provide a better understanding of existing applications [WIK95b]. This process is applicable to any relational database application, including those which are unable to naturally support the specification and enforcement of constraints. More importantly, this process does not affect the performance of an existing application. 3) As will be seen later, we have chosen the current SQL-3 standards [ISO94] as the basis for knowledge representation in our research. This project provides an extension to the representation of the relational data model to cope with automated reuse of knowledge in the re- engineering process. In order to cope with technological changes that result from the emergence of new systems or new versions of existing DBMSs, we also propose a series of extended relational system tables conforming to SQL-3 standards to enhance existing relational DBMSs [WIK95b]. 4) The generation of queries using the constraint specifications of the enhanced legacy systems is an easy and convenient method of detecting any constraint violating data in existing systems. The application of this technique in the context of a heterogeneous environment for legacy information systems is a significant step towards detecting and cleaning inconsistent data in legacy systems prior to their migration. This is essential if a graceful migration is to be effected [WIK95c]. 1.4 Organisation of the Thesis Page 11
  • 13. The thesis is organised into 8 chapters. This first chapter has given an introduction to the research done, covering background and motivations, and outlining original achievements. The rest of the thesis is organised as follows: Chapter 2 is devoted to presenting an overview of the research together with detailed aims and objectives for the work undertaken. It begins by identifying the scope of the work in terms of research constraints and development technologies. This is followed by an overview of the research undertaken, where a step by step discussion of the approach adopted and its role in a heterogeneous distributed database environment is given. Finally, detailed aims and objectives are drawn together to conclude the chapter. Chapter 3 identifies the relational data model as the current dominant database model and presents its development along with its terminology, features and query languages. This is followed by a discussion of conceptual data models with special emphasis on the data models and symbols used in our project. Finally, we pay attention to key concepts related to our project, mainly the notion of semantic integrity constraints and extensions to the relational model. Here, we present important integrity constraint extensions to the relational model and its support using different SQL standards. Chapter 4 addresses the issue of legacy information system migration. The discussion commences with an introduction to legacy and our target information systems. This is followed by migration strategies and methods for such ISs. Finally, we conclude by referring to current techniques and identify the trends and existing tools applicable to database migration. Chapter 5 addresses the re-engineering process for relational databases. Techniques currently used for this purpose are identified first. Our approach, which uses constraints to re-engineer a relational legacy database is described next. This is followed by a process for detecting possible keys and structures of legacy databases. Our schema enhancement and knowledge representation techniques are then introduced. Finally, we present a process to detect and resolve conflicts that may occur due to schema enhancement. Chapter 6 introduces some example test databases which were chosen to represent a legacy heterogeneous distributed database environment and its access processes. Initially, we present the design of our test databases, the selection of our test DBMSs and the prototype system environment. This is followed by the application of our re-engineering approach to our test databases. Finally, the organisation of relational meta-data and its access is described using our test DBMSs. Chapter 7 presents the internal and external architecture and operation of our conceptual constraint visualisation and enforcement system (CCVES) in terms of the design, structure and operation of its interfaces, and its intermediate modelling system. The internal schema mappings, e.g. mapping from INGRES QUEL to SQL and vice-versa, and internal database migration processes are presented in detail here. Chapter 8 provides an evaluation of CCVES, identifying its limitations and improvements that could be made to the system. A discussion of potential applications is presented. Finally we conclude the Page 12
  • 14. chapter by drawing conclusions about the research project as a whole. Page 13
  • 15. CHAPTER 2 Research Scope, Approach, Aims and Objectives This chapter describes, in some detail, the aims and objectives of the research that has been undertaken. Firstly, the boundaries of the research are defined in section 2.1, which considers the scope of the project. Secondly, an overview of the research approach we have adopted in dealing with heterogeneous distributed legacy database evolution and migration is given in section 2.2. Next, in section 2.3, the discussion is extended to the wider aspects of applying our approach in a heterogeneous distributed database environment using the existing meta-programming technology developed at Cardiff in other projects. Finally, the research aims and objectives are detailed in section 2.4, illustrating what we intend to achieve, and the benefits expected from achieving the stated aims. 2.1 Scope of the Project We identify the scope of the work in terms of research constraints and the limitations of current development technologies. An overview of the problem is presented along with the drawbacks and limitations of database software development technology in addressing the problem. This will assist in identifying our interests and focussing the issues to be addressed. 2.1.1 Overview of the Problem In most database designs, a conceptual design and modelling technique is used in developing the specifications at the user requirements and analysis stage of the design. This stage usually describes the real world in terms of object/entity types that are related to one another in various ways [BAT92, ELM94]. Such a technique is also used in reverse-engineering to portray the current information content of existing databases, as the original designs are usually either lost, or inappropriate because the database has evolved from its original design. The resulting pictorial representation of a database can be used for database maintenance, for database re- design, for database enhancement, for database integration or for database migration, as it gives its users a sound understanding of an existing database’s architecture and contents. Only a few current database tools [COMP90, BAT92, SHA93, SCH95] allow the capture and presentation of database definitions from an existing database, and the analysis and display of this information at a higher level of abstraction. Furthermore, these tools are either restricted to accessing a specific database management system’s databases or permit modelling with only a single given display formalism, usually a variant of the EER [COMP90]. Consequently there is a need to cater for multiple database platforms with different user needs to allow access to a set of databases comprising a heterogeneous database, by providing a facility to visualise databases using a preferred conceptual modelling technique which is familiar to the different user communities of the heterogeneous system. The fundamental modelling constructs of current reverse and re-engineering tools are entities, relationships and associated attributes. These constructs are useful for database design at
  • 16. a high level of abstraction. However, the semantic information now available in the form of rules and constraints in modern DBMSs provides their users with a better understanding of the underlying database as its data conforms to these constraints. This may not necessarily be true for legacy systems, which may have constraints defined that were not enforced. The ability to visualise rules and constraints as part of the conceptual model increases user understanding of a database. Users could also exploit this information to formulate queries that more effectively utilise the information held in a database. Having these features in mind, we concentrated on providing a tool that permits specification and visualisation of constraints as part of the graphical display of the conceptual model of a database. With modern technology increasing the number of legacy systems and with increasing awareness of the need to use legacy data [BRO95, IEEE95], the availability of such a visualisation tool will be more important in future as it will let users see the full definition of the contents of their databases in a familiar format. Three types of abstraction mechanism, namely: classification, aggregation and generalisation, are used in conceptual design [ELM94]. However, most existing DBMSs do not maintain sufficient meta-data information to assist in identifying all these abstraction mechanisms within their data models. This means that reverse and re-engineering tools are semi-automated, in that they extract information, but users have to guide them and decide what information to look for [WAT94]. This requires interactions with the database designer in order to obtain missing information and to resolve possible conflicts. Such additional information is supplied by the tool users when performing the reverse-engineering process. As this additional information is not retained in the database, it must be re-entered every time a reverse engineering process is undertaken if the full representation is to be achieved. To overcome this problem, knowledge bases are being used to retain this information when it is supplied. However, this approach restricts the use of this knowledge by other tools which may exist in the database’s environment. The ability to hold this knowledge in the database itself would enhance an existing database with information that can be widely used. This would be particularly useful in the context of legacy databases as it would enrich their semantics. One of the issues considered in this thesis is how this can be achieved. Most existing relational database applications record only entities and their properties (i.e. attribute names and data types) as system meta-data. This is because these systems conformed to early database standards (e.g. the SQL/86 standard [ANSI86], supported by INGRES version 5 and Oracle version 5). However, more recent relational systems record additional information such as constraint and rule definitions, as they conform to the SQL/92 standards [ANSI92] (e.g. Oracle version 7). This additional information includes, for example, primary and foreign key specifications, and can be used to identify classification and aggregation abstractions used in a conceptual model [CHI94, PRE94, WIK95b]. However, the SQL/92 standard does not capture the full range of modelling abstractions, e.g. inheritance representing generalisation hierarchies. This means that early relational database applications are now legacy systems as they fail to naturally represent additional information such as constraint and rule definitions. Such legacy database systems are being migrated to modern database systems not only to gain the benefits of the current technology but also to be compatible with new applications built with the modern technology. The SQL standards are currently subject to review to permit the representation of extra knowledge (e.g. object-oriented features), and we have anticipated some of these proposals in our work - i.e. SQL-33 [ISO94] will be adopted by commercial systems and thus the current modern DBMSs Page 15
  • 17. will become legacy databases in the near future or already may be considered to be legacy databases in that their data model type will have to be mapped onto the newer version. Having experienced the development process of recent DBMSs it is inevitable that most current databases will have to be migrated, either to a newer version of the existing DBMS or to a completely different newer technology DBMS for a variety of reasons. Thus the migration of legacy databases is perceived to be a continuing requirement, in any organisation, as technology advances continue to be made. Most migrations currently being undertaken are based on code-to-code level translations of the applications and associated databases to enable the older system to be functional in the target environment. Minimal structural changes are made to the original system and database, thus the design structures of these systems are still old-fashioned, although they are running in a modern computing environment. This means that such systems are inflexible and cannot be easily enhanced with new functions or integrated with other applications in their new environment. We have also observed that more recent database systems have often failed to benefit from modern database technology due to inherent design faults that have resulted in the use of unnormalised structures, which cause omission of the features enforcing integrity constraints even when this is possible. The ability to create and use databases without the benefit of a database design course is one reason for such design faults. Hence there is a need to assist existing systems to be evolved, not only to perform new tasks but also to improve their structure so that these systems can maximise the gains they receive from their current technology environment and any environment they migrate to in the future. 2.1.2 Narrowing Down the Problem Technological advances in both hardware and software have improved the performance and maintenance functionality of all information systems (ISs), and as a result, older ISs suffer from comparatively poor performance and inappropriate functionality when compared with more modern systems. Most of these legacy systems are written in a 3GL such as COBOL, have been around for many years, and run on old-fashioned mainframes. Problems associated with legacy systems are being identified and various solutions are being developed [BRO93, SHE94, BRO95]. These systems basically have three functional components, namely: interface, application and a database service, which are sometimes inter-related to each other, depending on how they were used during the design and implementation stages of the IS development. This means that the complexity of a legacy IS depends on what occurred during the design and implementation of the system. These systems may range from a simple single user database application using separate interfaces and applications, to a complex multi-purpose unstructured application. Due to the complex nature of the problem area we do not address this issue as a whole, but focus only on problems associated with one sub-component of such legacy information systems, namely the database service. This in itself is a wide field, and we have further restricted ourselves to legacy ISs using a specific DBMS for their database service. We considered data models ranging from original flat file and relational systems, to modern relational DBMSs and object-oriented DBMSs. From these data models we have chosen the traditional relational model for the following reasons. • The relational model is currently the most widely used database model. Page 16
  • 18. • During the last two decades the relational model has been the most popular model; therefore it has been used to develop many database applications and most of these are now legacy systems. • There have been many extensions and variations of the relational model, which has resulted in many heterogeneous relational database systems being used in organisations. • The relational model can be enhanced to represent additional semantics currently supported only by modern DBMSs (e.g. extended relational systems [ZDO90, CAT94]). As most business requirements change with time, the need to enhance and migrate legacy information systems exists for almost every organisation. We address problems faced by these users while seeking for a solution that prevents new systems becoming legacy systems in the near future. The selection of the relational model as our database service to demonstrate how one could achieve these needs means that we shall be addressing only relational legacy database systems and not looking at any other type of legacy information systems. This decision means we are not considering the many common legacy IS migration problems identified by Brodie [BRO95] (e.g. migration of legacy database services such as flat- file structures or hierarchical databases into modern extended relational databases; migration of legacy applications with millions of lines of code written in some COBOL-like language into a modern 4GL/GUI environment). However, as shown later, addressing the problems associated with relational legacy databases has enabled us to identify and solve problems associated with more recent DBMSs, and it also assists in identifying precautions which if implemented by designers of new systems will minimise the chance of similar problems being faced by these systems as IS developments occur in the future. 2.2 Overview of the Research Approach Having presented an overview and narrowing down of our problem, we identify the following as the main functionalities that should be provided to fulfil our research goal: • Reverse-engineering of a relational legacy database to fully portray its current information content. • Enhancing a legacy database with new knowledge to identify modelling concepts that should be available to the database concerned or to applications using that database. • Determining the extent to which the legacy database conforms to its existing and enhanced descriptions. • Ensuring that the migrated IS will not become a legacy IS in the future. We need to consider the heterogeneity issue in order to be able to reverse-engineer any given relational legacy database. Three levels of heterogeneity are present for a particular data model, namely: at a physical, logical and data management level. The physical level of heterogeneity usually arises due to different data model implementation techniques, use of different computer platforms and use of different DBMSs. The physical / logical data independence of DBMSs hides implementation differences from users, hence we need only address how to access databases that are built using different DBMSs, running on different computer platforms. Page 17
  • 19. Differences in DBMS characteristics lead to heterogeneity at the logical level. Here, the different DBMSs conform to a particular standard (e.g. SQL/86 or SQL/92), which supports a particular database query language (e.g. SQL or QUEL) and different relational data model features (e.g. handling of integrity constraints and availability of object-oriented features). To tackle heterogeneity at the logical level, we need to be aware of different standards, and to model ISs supporting different features and query languages. Heterogeneity at the data management level arises due to the physical limitations of a DBMS, differences in the logical design and inconsistencies that occurred when populating the database. Logical differences in different database schemas have to be resolved only if we are going to integrate them. The schema integration process is concerned with merging different related database applications. Such a facility can assist the migration of heterogeneous database systems. However any attempt to integrate legacy database schemas prior to the migration process complicates the entire process as it is similar to attempting to provide new functionalities within the system which is being migrated. Such attempts increase the chance of failure of the overall migration process. Hence we consider any integration or enhancements in the form of new functionalities only after successfully migrating the original legacy IS. However, the physical limitations of a DBMS and data inconsistencies in the database need to be addressed beforehand to ensure a successful migration. Our work addresses the heterogeneity issues associated with database migration by adopting an approach that allows its users to incrementally increase the number of DBMSs it could handle without having to reprogram its main application modules. Here, the user needs to supply specific knowledge about DBMS schema and query language constructs. This is held together with the knowledge of the DBMSs already supported and has no effect on the application’s main processing modules. 2.2.1 Meta-Programming Meta-programming technology allows the meta-data (schema information) of a database to be held and processed independently of its source specification language. This allows us to work on a database language independent environment and hence overcome many logical heterogeneity issues. Prolog based meta-programming technology has been used in previous research at Cardiff in the area of logical heterogeneity [FID92, QUT94]. Using this technology the meta-translation of database query languages [HOW87] and database schemas [RAM91] has been performed. This work has shown how the heterogeneity issues of different DBMSs can be addressed without having to reprogram the same functionality for each and every DBMS. We use meta-programming technology for our legacy database migration approach as we need to be able to start with a legacy source database and end with a modern target database where the respective database schema and query languages may be different from each other. In this approach the source database schema or query language is mapped on input into an internal canonical form. All the required processing is then done using the information held in this internal form. This information is finally mapped to the target schema or query language to produce the desired output. The advantage of this approach is that processing is not affected by heterogeneity as it is always performed on data held in the canonical form. This canonical form is an enriched collection of semantic data modelling features. Page 18
  • 20. 2.2.2 Application We view our migration approach as consisting of a series of stages, with the final stage being the actual migration and earlier stages being preparatory. At stage 1, the data definition of the selected database is reverse-engineered to produce a graphical display (cf. paths A-1 and A-2 of figure 2.1). However, in legacy systems much of the information needed to present the database schema in this way is not available as part of the database meta-data and hence these links which are present in the database cannot be shown in this conceptual model. In modern systems such links can be identified using constraint specifications. Thus, if the database does not have any explicit constraints, or it does but these are incomplete, new knowledge about the database needs to be entered at stage 2 (cf. path B-1 of figure 2.1), which will then be reflected in the enhanced schema appearing in the graphical display (cf. path B-2 of figure 2.1). This enhancement will identify new links that should be present for the database concerned. These new database constraints can next be applied experimentally to the legacy database to determine the extent to which it conforms to them. This process is done at stage 3 (cf. paths C-1 and C-2 of figure 2.1). The user can then decide whether these constraints should be enforced to improve the quality of the legacy database prior to its migration. At this point the three preparatory stages in the application of our approach are complete. The actual migration process is then performed. All stages are further described below to enable us to identify the main processing components of our proposed system as well as to explain how we deal with different levels of heterogeneity. Stage 1: Reverse Engineering In stage 1, the data definition of the selected database is reverse-engineered to produce a graphical display of the database. To perform this task, the database’s meta-data must be extracted (cf. path A-1 of figure 2.1). This is achieved by connecting directly to the heterogeneous database. The accessed meta-data needs to be represented using our internal form. This is achieved through a schema mapping process as used in the SMTS (Schema Meta-Translation System) of Ramfos [RAM91]. The meta-data in our internal formalism then needs to be processed to derive the graphical constructs present for the database concerned (cf. path A-2 of figure 2.1). These constructs are in the form of entity types and the relationships and their derivation process is the main processing component in stage 1. The identified graphical constructs are mapped to a display description language to produce a graphical display of the database. Page 19
  • 21. Schema Enhanced Visualisation Enforced Constraints (EER or OMT) Constraints with Constraints B-1 C-1 B-2 A-2 Internal Processing B-3 C-2 A-1 Heterogeneous Databases Stage 1 (Reverse Engineering) Stage 2 (Knowledge Augmentation) Stage 3 (Constraint Enforcement) Figure 2.1: Information flow in the 3 stages of our approach prior to migration a) Database connectivity for heterogeneous database access Unlike the previous Cardiff meta-translation systems [HOW87, RAM91, QUT92], which addressed heterogeneity at the logical and data management levels, our system looks at the physical level as well. While these previous systems processed schemas in textual form and did not access actual databases to extract their DDL specification, our system addresses physical heterogeneity by accessing databases running on different hardware / software platforms (e.g. computer systems, operating systems, DBMSs and network protocols). Our aim is to directly access the meta-data of a given database application by specifying its name, the name and version of the host DBMS, and the address of the host machine4. If this database access process can produce a description of the database in DDL formalism, then this textual file is used as the starting point for the meta-translation process as in previous Cardiff systems [RAM91, QUT92]. We found that it is not essential to produce such a textual file, as the required intermediate representation can be directly produced by the database access process. This means that we could also by-pass the meta-translation process that performs the analysis of the DDL text to translate it into the intermediate representation5. However the DDL formalism of the schema can be used for optional textual viewing and could also serve as the starting point for other tools6 developed at Cardiff for meta-programming database applications. The initial functionality of the Stage 1 database connectivity process is to access a heterogeneous database and supply the accessed meta-data as input to our schema meta-translator 4 We assume that access privileges for this host machine and DBMS have been granted. 5 A list of tokens ready for syntactic analysis in the parsing phase is produced and processed based on the BNF syntax specification of the DDL [QUT92]. 6 e.g. The Schema Meta-Integration System (SMIS) of Qutaishat [QUT92]. Page 20
  • 22. (SMTS). This module needs to deal with heterogeneity at the physical and data management levels. We achieve this by using DML commands of the specific DBMS to extract the required meta-data held in database data dictionaries treated like user defined tables. Relatively recently, the functionalities of a heterogeneous database access process have been provided by means of drivers such as ODBC [RIC94]. Use of such drivers will allow access to any database supported by them and hence obviate the need to develop specialised tools for each database type as happened in our case. These driver products were not available when we undertook this stage of our work. b) Schema meta-translation The schema meta-translation process [RAM91] accepts input of any database schema irrespective of its DDL and features. The information captured during this process is represented internally to enable it to be mapped from one database schema to another or to further process and supply information to other modules such as the schema meta-visualisation system (SMVS) [QUT93] and the query meta-translation system (QMTS) [HOW87]. Thus, the use of an internal canonical form for meta representation has successfully accommodated heterogeneity at the data management and logical levels. c) Schema meta-visualisation Schema visualisation using graphical notation and diagrams has proved to be an important step in a number of applications, e.g. during the initial stages of the database design process; for database maintenance; for database re-design; for database enhancement; for database integration; or for database migration; as it gives users a sound understanding of an existing database’s structure in an easily assimilated format [BAT92, ELM94]. Database users need to see a visual picture of their database structure instead of textual descriptions of the defining schema as it is easier for them to comprehend a picture. This has led to the production of graphical representations of schema information, effected by a reverse engineering process. Graphical data models of schemas employ a set of data modelling concepts and a language-independent graphical notation (e.g. the Entity Relationship (E-R) model [CHE76], Extended/Enhanced Entity Relationship (EER) model [ELM94] or the Object Modelling Technique (OMT) [RUM91]). In a heterogeneous environment different users may prefer different graphical models, and an understanding of the database structure and architecture beyond that given by the traditional entities and their properties. Therefore, there is a need to produce graphical models of a database’s schema using different graphical notations such as either E-R/EER or OMT, and to accompany them with additional information such as a display of the integrity constraints in force in the database [WIK95b]. The display of integrity constraints allows users to look at intra- and inter- object constraints and gain a better understanding of domain restrictions applicable to particular entities. Current reverse engineering tools do not support this type of display. The generated graphical constructs are held internally in a similar form to the meta-data of the database schema. Hence using a schema meta visualisation process (SMVS) it is possible to map the internally held graphical constructs into appropriate graphical symbols and coordinates for the graphical display of the schema. This approach has a similarity to the SMTS, the main Page 21
  • 23. difference being that the output is graphical rather than textual. Stage 2: Knowledge Augmentation In a heterogeneous distributed database environment, evolution is expected, especially in legacy databases. This evolution can affect the schema description and in particular schema constraints that are not reflected in the stage 1 (path A-2) graphical display as they may be implicit in applications. Thus our system is designed to accept new constraint specifications (cf. path B-1 of figure 2.1) and add them to the graphical display (cf. path B-2 of figure 2.1) so that these hidden constraints become explicit. The new knowledge accepted at this point is used to enhance the schema and is retained in the database using a database augmentation process (cf. path B-3 of figure 2.1). The new information is stored in a form that conforms with the enhanced target DBMS’s methods of storing such information. This assists the subsequent migration stage. a) Schema enhancement Our system needs to permit a database schema to be enhanced by specifying new constraints applicable to the database. This process is performed via the graphical display. These constraints, which are in the form of integrity constraints (e.g. primary key, foreign key, check constraints) and structural components (e.g. inheritance hierarchies, entity modifications) are specified using a GUI. When they are entered they will appear in the graphical display. b) Database augmentation The input data to enhance a schema provides new knowledge about a database. It is essential to retain this knowledge within the database itself, if it is to be readily available for any further processing. Typically, this information is retained in the knowledge base of the tool used to capture the input data, so that it can be reused by the same tool. This approach restricts the use of this knowledge by other tools and hence it must be re-entered every time the re-engineering process is applied to that database. This makes it harder for the user to gain a consistent understanding of an application, as different constraints may be specified during two separate re- engineering processes. To overcome this problem, we augment the database itself using the techniques proposed in SQL-3 [ISO94], wherever possible. When it is not possible to use SQL-3 structures we store the information in our own augmented table format which is a natural extension of the SQL-3 approach. When a database is augmented using this method, the new knowledge is available in the database itself. Hence, any further re-engineering processes need not make requests for the same additional knowledge. The augmented tables are created and maintained in a similar way to user- defined tables, but have a special identification to distinguish them. Their structure is in line with the international standards and the newer versions of commercial DBMSs, so that the enhanced database can be easily migrated to either a newer version of the host DBMS or to a different DBMS supporting the latest SQL standards. Migration should then mean that the newer system can enforce the constraints. Our approach should also mean that it is easy to map our tables for Page 22
  • 24. holding this information into the representation used by the target DBMS even if it is different, as we are mapping from a well defined structure. Legacy databases that do not support explicit constraints can be enhanced by using the above knowledge augmentation method. This requirement is less likely to occur for databases managed by more recent DBMSs as they already hold some constraint specification information in their system tables. The direction taken by Oracle version 6 was a step towards our augmentation approach, as it allowed the database administrator to specify integrity constraints such as primary and foreign keys, but did not yet enforce them [ROL92]. The next release of Oracle, i.e. version 7, implemented this constraint enforcement process. Stage 3: Constraint Enforcement The enhanced schema can be held in the database, but the DBMS can only enforce these constraints if it has the capability to do so. This will not normally be the case in legacy systems. In this situation, the new constraints may be enforced via a newer version of the DBMS or by migrating the database to another DBMS supporting constraint enforcement. However, the data being held in the database may not conform to the new constraints, and hence existing data may be rejected by the target DBMS in the migration, thus losing data and / or delaying the migration process. To address this problem and to assist the migration process, we provide an optional constraint enforcement process module which can be applied to a database before it is migrated. The objective of this process is to give users the facility to ensure that the database conforms to all the enhanced constraints before migration occurs. This process is optional so that the user can decide whether these constraints should be enforced to improve the quality of the legacy data prior to its migration, whether it is best left as it stands, or whether the new constraints are too severe. The constraint definitions in the augmented schema are employed to perform this task. As all constraints held have already been internally represented in the form of logical expressions, these can be used to produce data manipulation statements suitable for the host DBMS. Once these statements are produced, they are executed against the current database to identify the existence of data violating a constraint. Stage 4: Migration Process The migration process itself is incrementally performed by initially creating the target database and then copying the legacy data over to it. The schema meta-translation (SMTS) technique of Ramfos [RAM91] is used to produce the target database schema. The legacy data can be copied using the import / export tools of source and target DBMS or DML statements of the respective DBMSs. During this process, the legacy applications must continue to function until they too are migrated. To achieve this an interface can be used to capture and process all database queries of the legacy applications during migration. This interface can decide how to process database queries against the current state of the migration and re-direct those newly related to the target database. The query meta-translation (QMTS) technique of Howells [HOW87] can be used to convert these queries to the target DML. This approach will facilitate transparent migration for legacy databases. Our work does not involve the development of an interface to capture and Page 23
  • 25. process all database queries, as interaction with the query interface of the legacy IS is embedded in the legacy application code. However, we demonstrate how to create and populate a legacy database schema in the desired target environment while showing the role of SMTS and QMTS in such a process. 2.3 The Role of CCVES in Context of Heterogeneous Distributed Databases Our approach described in section 2.2 is based on preparing a legacy database schema for graceful migration. This involves visualisation of database schemas with constraints and enhancing them with constraints to capture more knowledge. Hence we call our system the Conceptualised Constraint Visualisation and Enhancement System (CCVES). CCVES has been developed to fit in with the previously developed schema (SMTS) [RAM91] and query (QMTS) [HOW87] meta-translation systems, and the schema meta- visualisation system (SMVS) [QUT93]. This allows us to consider the complementary roles of CCVES, SMTS, QMTS and SMVS during Heterogeneous Distributed Database access in a uniform way [FID92, QUT94]. The combined set of tools achieves semantic coordination and promotes interoperability in a heterogeneous environment at logical, physical and data management levels. Figure 2.2 illustrates the architecture of CCVES in the context of heterogeneous distributed databases. It outlines in general terms the process of accessing a remote (legacy) database to perform various database tasks, such as querying, visualisation, enhancement, migration and integration. There are seven sub-processes: the schema mapping process [RAM91], query mapping process [HOW87], schema integration process [QUT92], schema visualisation process [QUT93], database connectivity process, database enhancement process and database migration process. The first two processes together have been called the Integrated Translation Support Environment [FID92], and the first four processes together have been called the Meta-Integration/Translation Support Environment [QUT92]. The last three processes were introduced as CCVES to perform database enhancement and migration in such an environment. The schema mapping process, referred to as SMTS, translates the definition of a source schema to a target schema definition (e.g. an INGRES schema to a POSTGRES schema). The query mapping process, referred to as QMTS, translates a source query to a target query (e.g. an SQL query to a QUEL query). The meta-integration process, referred to as SMIS, tackles heterogeneity at the logical level in a distributed environment containing multiple database schemas (e.g. Ontos and Exodus local schemas with a POSTGRES global schema) - it integrates the local schemas to create the global schema. The meta-visualisation process, referred to as SMVS, generates a graphical representation of a schema. The remaining three processes, namely: database connectivity, enhancement and migration with their associated processes, namely: SMVS, SMTS and QMTS, are the subject of the present thesis, as they together form CCVES (centre section of figure 2.2). The database connectivity process (DBC), queries meta-data from a remote database (route Page 24
  • 26. A-1 in figure 2.2) to supply meta-knowledge (route A-2 in figure 2.2) to the schema mapping process referred to as SMTS. SMTS translates this meta-knowledge to an internal representation which is based on SQL schema constructs. These SQL constructs are supplied to SMVS for further processing (route A-3 in figure 2.2) which results in the production of a graphical view of the schema (route A-4 in figure 2.2). Our reverse-engineering techniques [WIK95b] are applied to identify entity and relationship types to be used in the graphical model. Meta-knowledge enhancements are solicited at this point by the database enhancement process (DBE) (route B-1 in figure 2.2), which allows the definition of new constraints and changes to the existing schema. These enhancements are reflected in the graphical view (route B-2 and B-3 in figure 2.2) and may be used to augment the database (route B-4 to B-8 in figure 2.2). This approach to augmentation makes use of the query mapping process, referred to as QMTS, to generate the required queries to update the database via the DBC process. At this stage any existing or enhanced constraints may be applied to the database to determine the extent to which it conforms to the new enhancements. Carrying out this process will also ensure that legacy data will not be rejected by the target DBMS due to possible violations. Finally, the database migration process, referred to as DBMI, assists migration by incrementally migrating the database to the target environment (route C-1 to C-6 in figure 2.2). Target schema constructs for each migratable component are produced via SMTS, and DDL statements are issued to the target DBMS to create the new database schema. The data for these migrated tables are extracted by instructing the source DBMS to export the source data to the target database via QMTS. Here too, the queries which implement this export are issued to the DBMS via the DBC process. 2.4 Research Aims and Objectives Our relational database enhancement and augmentation approach is important in three respects, namely: 1) by holding the additional defining information in the database itself, this information is usable by any design tool in addition to assisting the full automation of any future re- engineering of the same database; 2) it allows better user understanding of database applications, as the associated constraints are shown in addition to the traditional entities and attributes at the conceptual level; Page 25
  • 27. 3) the process which assists a database administrator to clean inconsistent legacy data ensures a safe migration. To perform this latter task in a real world situation without an automated support tool is a very difficult, tedious, time consuming and error prone task. Therefore the main aim of this project has been the design and development of a tool to assist database enhancement and migration in a heterogeneous distributed relational database environment. Such a system is concerned with enhancing the constituent databases in this type of environment to exploit potential knowledge both to automate the re-engineering process and to assist in evolving and cleaning the legacy data to prevent data rejection, possible losses of data and/or delays in the migration process. To this end, the following detailed aims and objectives have been pursued in our research: 1. Investigation of the problems inherent in schema enhancement and migration for a heterogeneous distributed relational legacy database environment, in order to fully understand these processes. 2. Identification of the conceptual foundation on which to successfully base the design and development of a tool for this purpose. This foundation includes: • A framework to establish meta-data representation and manipulation. • A real world data modelling framework that facilitates the enhancement of existing working systems and which supports applications during migration. • A framework to retain the enhanced knowledge for future use which is in line with current international standards and techniques used in newer versions of relational DBMSs. • Exploiting existing databases in new ways, particularly linking them with data held in other legacy systems or more modern systems. • Displaying the structure of databases in a graphical form to make it easy for users to comprehend their contents. • The provision of an interactive graphical response when enhancements are made to a database. • A higher level of data abstraction for tasks associated with visualising the contents, relationships and behavioural properties of entities and constraints. • Determining the constraints on the information held and the extent to which the data conforms to these constraints. • Integrating with other tools to maximise the benefits of the new tool to the user community. 3. Development of a prototype tool to automate the re-engineering process and the migration assisting tasks as far as possible. The following development aims have been chosen for this system: • It should provide a realistic solution to the schema enhancement and migration assistance process. • It should be able to access and perform this task for legacy database systems. • It should be suitable for the data model at which it is targeted. • It should be as generic as possible so that it can be easily customised for other data models. • It should be able to retain the enhanced knowledge for future analysis by itself and other Page 26
  • 28. tools. • It should logically support a model using modern data modelling techniques irrespective of whether it is supported by the DBMS in use. • It should make extensive use of modern graphical user interface facilities for all graphical displays of the database schema. • Graphical displays should also be as generic as possible so that they can be easily enhanced or customised for other display methods. Page 27
  • 29. CHAPTER 3 Database Technology, Relational Model, Conceptual Modelling and Integrity Constraints The origins and historical development of database technology are initially presented here to focus the evolution of ISs and the emergence of database models. The relational data model is identified as currently the most commonly used database model and some terminology for this data model, along with its features including query languages is then presented. A discussion of conceptual data models with special emphasis on EER and OMT is provided to introduce these data models and the symbols used in our project. Finally, we pay attention to crucial concepts relating to our work, namely the notion of semantic integrity constraints, with special emphasis on those used in semantic extensions to the relational model. The relational database language SQL is also discussed, identifying how and when it supports the implementation of these semantic integrity constraints. 3.1 Origins and Historical Developments The origin of data management goes back to the 1950’s and hence, this section is sub divided into two parts: the first part describes database technology prior to the relational data model, and the second part describes developments since. This division was chosen as the relational model is currently the most dominant database model for information management [DAT90]. 3.1.1 Database Technology Prior to the Relational Data Model Database technology emerged from the need to manipulate large collections of data for frequently used data queries and reports. The first major step in mechanisation of information systems came with the advent of punched card machines which worked sequentially on fixed-length fields [SEN73, SEN77]. With the appearance of stored program computers, tape-oriented systems were used to perform these tasks with an increase in user efficiency. These systems used sequential processing of files in batch mode, which was adequate until peripheral storage with random access capabilities (e.g. DASD) and time sharing operating systems with interactive processing appeared to support real-time processing in computer systems. Access methods such as direct and indexed sequential access methods (e.g. ISAM, VSAM) [BRA82, MCF91] were used to assist with the storage and location of physical records in stored files. Enhancements were made to procedural languages (e.g. COBOL) to define and manage application files, making the application program dependent on the organisation of the file. This technique caused data redundancy as several files were used in systems to hold the same data (e.g. emp_name and address in a payroll file; insured_name and address in an insurance file; and depositors_name and address in a bank file). These stored data files used in the applications of the 1960's are now referred to as conventional file systems, and they were maintained using third generation programming languages such as COBOL and PL/1. This evolution of mechanised information systems was influenced by the hardware and software developments which occurred in the 1950’s and early 1960’s. Most long existing legacy ISs are based on this technology. Our work does not address this type of IS as they do not use a DBMS for their data management. The evolution of databases and database management systems [CHA76, FRY76, SIB76,
  • 30. Chapter 3 Database Technology, Relational Model, Conceptual Modelling and Integrity Constraints SEN77, KIM79, MCG81, SEL87, DAT90, ELM94] was to a large extent the result of addressing the main deficiencies in the use of files, i.e. by reducing data redundancy and making application programs less dependent on file organisation. An important factor in this evolution was the development of data definition languages which allowed the description of a database to be separated from its application programs. This facility allowed the data definition (often called a schema) to be shared and integrated to provide a wide variety of information to the users. The repository of all data definitions (meta data) is called data dictionaries and their use allows data definitions to be shared and widely available to the user community. In the late 1960's applications began to share their data files using an integrated layer of stored data descriptions, making the first true database, e.g. the IMS hierarchical database [MCG77, DAT90]. This type of database was navigational in nature and applications explicitly followed the physical organisation of records in files to locate data using commands such as GNP - get next under parent. These databases provided centralised storage management, transaction management, recovery facilities in the event of failure and system maintained access paths. These were the typical characteristics of early DBMSs. Work on extending COBOL to handle databases was carried out in the late 60s and 70s. This resulted in the establishment of the DBTG (i.e. DataBase Task Group) of CODASYL and the formal introduction of the network model along with its data manipulation commands [DBTG71]. The relational model was proposed during the same period [COD70], followed by the 3 level ANSI/SPARC architecture [ANSI75] which made databases more independent of applications, and became a standard for the organisation of DBMSs. Three popular types of commercial database systems7 classified by their underlying data model emerged during the 70s [DAT90, ELM94], namely: • hierarchical • network • relational and these have been the dominant types of DBMS from the late 60s on into the 80s and 90s. 3.1.2 Database Technology Since the Relational Data Model At the same time as the relational data model appeared, database systems introduced another layer of data description on top of the navigational functionality of the early hierarchical and network models to bring extra logical data independence8. The relational model also introduced the use of non-procedural (i.e. declarative) languages such as SQL [CHA74]. By the early 1980's many relational database products, e.g. System R [AST76], DB2 [HAD84], INGRES [STO76] and Oracle were in use and due to their growing maturity in the mid 80s and the complexity of programming, navigating, and changing data structures in the older DBMS data models, the relational data model was able to take over the commercial database market with the result that it is now dominant. 7 Other types such as flat file, inverted file systems were also used. 8 This allows changes to the logical structure of data without changing the application programs. Page 29
  • 31. Chapter 3 Database Technology, Relational Model, Conceptual Modelling and Integrity Constraints The advent of inexpensive and reliable communication between computer systems, through the development of national and international networks, has brought further changes in the design of these systems. These developments led to the introduction of distributed databases, where a processor uses data at several locations and links it as though it were at a single site. This technology has led to distributed DBMSs and the need for interoperability among different database systems [OZS91, BEL92]. Several shortcomings of the relational model have been identified, including its inability to perform efficiently compute-intensive applications such as simulation, to cope with computer-aided design (CAD) and programming language environments, and to represent and manipulate effectively concepts such as [KIM90]: • Complex nested entities (e.g. design and engineering objects), • Unstructured data (e.g. images, textual documents), • Generalisation and aggregation within a data structure, • The notion of time and versioning of objects and schemas, • Long duration transactions. The notion of a conceptual schema for application-independent modelling introduced by the ANSI/SPARC architecture led to another data model, namely: the semantic model. One of the most successful semantic models is the entity-relationship (E-R) model [CHE76]. Its concepts include entities, relationships, value sets and attributes. These concepts are used in traditional database design as they are application-independent. Many modelling concepts based on variants/extensions to the E-R model have appeared since Chen’s paper. The enhanced/extended entity-relationship model (EER) [TEO86, ELM94], the entity-category-relationship model (ECR) [ELM85], and the Object Modelling Technique (OMT) [RUM91] are the most popular of these. The DAPLEX functional model [SHI81] and the Semantic Data Model [HAM81] are also semantic models. They capture a richer set of semantic relationships among real-world entities in a database than the E-R based models. Semantic relationships such as generalisation / specialisation between a superclass and its subclass, the aggregation relationship between a class and its attributes, the instance-of relationship between an instance and its class, the part-of relationship between objects forming a composite object, and the version-of relationship between abstracted versioned objects are semantic extensions supported in these models. The object-oriented data model with its notions of class hierarchy, class-composition hierarchy (for nested objects) and methods could be regarded as a subset of this type of semantic data model in terms of its modelling power, except for the fact that the semantic data model lacks the notion of methods [KIM90] which is an important aspect of the object-oriented model. The relational model of data and the relational query language have been extended [ROW87] to allow modelling and manipulation of additional semantic relationships and database facilities. These extensions include data abstraction, encapsulation, object identity, composite objects, class hierarchies, rules and procedures. However, these extended relational systems are still being evolved to fully incorporate features such as implementation of domain and extended data types, enforcement of primary and foreign key and referential integrity checking, prohibition of duplicate rows in tables and views, handling missing information by supporting four-valued predicate logic Page 30
  • 32. Chapter 3 Database Technology, Relational Model, Conceptual Modelling and Integrity Constraints (i.e. true, false, unknown, not applicable) and view updatability [KIV92], and they are not yet available as commercial products. The early 1990's saw the emergence of new database systems by a natural evolution of database technology, with many relational database systems being extended and other data models (e.g. the object-oriented model) appearing to satisfy more diverse application needs. This opened opportunities to use databases for a greater diversity of applications which had not been previously exploited as they were not perceived as tractable by a database approach (e.g. Image, medical, document management, engineering design and multi-media information, used in complex information processing applications such as office automation (OA), computer-aided design (CAD), computer-aided manufacturing (CAM) and hyper media [KIM90, ZDO90, CAT94]). The object- oriented (O-O) paradigm represents a sound basis for making progress in these areas and as a result two types of DBMS are beginning to dominate in the mid 90s [ZDO90], namely: the object-oriented DBMS, and the extended relational DBMS. There are two styles of O-O DBMS, depending on whether they have evolved from extensions to an O-O programming language or by evolving a database model. Extensions have been created for two database models, namely: the relational and the functional models. The extensions to existing relational DBMSs have resulted in the so-called Extended Relational DBMSs which have O-O features (e.g. POSTGRES and Starburst), while extensions to the functional model have produced PROBE and OODAPLEX. The approach of extending O-O programming language systems with database management features has resulted in many systems (e.g. Smalltalk into GemStone and ALLTALK, and C++ into many DBMSs including VBase / ONTOS, IRIS and O2). References to these systems with additional information and references can be found in [CAT94]. Research is currently taking place into other kinds of database such as active, deductive and expert database systems [DAT90]. This thesis focuses on the relational model and possible extensions to it which can represent semantics in existing relational database information systems in such a way that these systems can be viewed in new ways and easily prepared for migration to more modern database environments. 3.2 Relational Data Model In this section we introduce some of the commonly used terminology of the relational model. This is followed by a selective description of the features and query languages of this model. Further details of this data model can be found in most introductory database text books, e.g. [MCF91, ROB93, ELM94, DAT95]. A relation is represented as a table (entity) in which each row represents a tuple (record), the number of columns being the degree of the relation and the number of rows being its cardinality. An example of this representation is shown in figure 3.1, which shows a relation holding Student details, with degree 3 and cardinality 5. This table and each of its columns are named, so that a unique identity for a table column of a given schema is achieved via its table name and column name. The columns of a table are called attributes (fields) each having its own domain (data type) representing its pool of legal data. Basic types of domains are used (e.g. integer, real, character, text, date) to define the domains of attributes. Constraints may be enforced to further restrict the pool of legal Page 31
  • 33. Chapter 3 Database Technology, Relational Model, Conceptual Modelling and Integrity Constraints values for an attribute. Tables which actually hold data are called base tables to distinguish them from view tables which can be used for viewing data associated with one or more base tables. A view table can also be an abstraction from a single base table which is used to control access to parts of the data. A column or set of columns whose values uniquely identify a row of a relation is called a candidate key (key) of the relation. It is customary to designate one candidate key of a relation as a primary key (e.g. SNO in figure 3.1). The specification of keys restricts the possible values the key attribute(s) may hold (e.g. no duplicate values), and is a type of constraint enforceable on a relation. Additional constraints may be imposed on an attribute to further restrict its legal values. In such cases, there should be a common set of legal values satisfying all the constraints of that attribute, ensuring its ability to accept some data. For example, a pattern constraint which ensures that the first character of SNO is ‘S’ further restricts the possible values of SNO - see figure 3.1. Many other concepts and constraints are associated with the relational model although most of them are not supported by early relational systems as, indeed, some of the more recent relational systems (e.g. a value set constraint for the Address field as shown in figure 3.1). Domain (type character) Value Set Constraint Pattern Constraint (all values begin with 'S') Primary Key (unique values) Student SNO Name Address Cardinality S1 Jones Cardiff S2 Smith Bristol : Relation Tuples S3 Gray Swansea S4 Brown Cardiff : S5 Jones Newport Attributes Degree Figure 3.1: The Student relation 3.2.1 Requisite Features of the Relational Model During the early stages of the development of relational database systems there were many requisite features identified which a comprehensive relational system should have [KIM79, DAT90]. We shall now examine these features to illustrate the kind of features expected from early relational database management systems. They included support for: • Recovery from both soft and hard crashes, • A report generator for formatted display of the results of queries, • An efficient optimiser to meet the response-time requirements of users, • User views of the stored database, Page 32