Introduction to ArtificiaI Intelligence in Higher Education
Database Normalization
1. CHAPTER 4: NORMALIZATION
Chapter Objectives
At the end of the chapter, you should be able to:
understand the purpose of normalization;
perform first, second and third normalization;
merging relations (view integration);
transforming E-R diagrams to relations.
Essential Reading
Modem Database Management (4th Edition), red R. Mcfadden & Jeffrey A. Hoffer (1994),
Benjamin/Cummings.[Chapter 6, page 199 - 237]
Useful Websites to learn Database and Programming:
http://erwinglobio.wix.com/ittraining
http://ittrainingsolutions.webs.com/
http://erwinglobio.sulit.com.ph/
http://erwinglobio.multiply.com/
Prof. Erwin M. Globio, MSIT 4-1
2. DB212 CHAPTER 4: NORMALISATION
4.1 Basic Concepts
Normalization is a process for converting complex data structures into simple, stable data
structures.
Why normalisation is necessary ?
The database design must be efficient (performance-wise).
The amount of data should be reduced if possible.
The design should be free of update, insertion and deletion anomalies.
The design must comply with rules regarding relational databases.
The design has to show pertinent relationship between entities.
The design should permit simple retrieval, simplify data maintenance and reduce the need
to restructure data.
Table with
repeating group
Remove
Repeating group
First normal
form
Remove partial
dependencies
Second
normal form
Remove transitive
dependencies
Third normal
form
Remove remaining
anormalies resulting from
functional dependencies
Fourth normal
form
Remove multivalued
dependencies
Boyce-codd
Normal form
Remove remaining
anormalies
Fifth normal
form
Figure 4-1: Steps in normalisation
4-2 Prof. Erwin M. Globio, MSIT
3. DB212 CHAPTER 4: NORMALISATION
4.1.1 Functional Dependency
Normalisation is based on the analysis of functional dependence. A functional dependency is a
particular relationship between two attributes. For any relation R, that attribute B is
functionally dependent on attribute A if, for every valid instance of A, that value of A
uniquely determines then value of B. This is usually represented by an arrow, as follows:
A --> B
An attribute may be functionally dependent on two (or more) attributes, rather than on a single
attribute. For example, in the following relation:
ORDER (ORDER-NO, PART-NO, NO-ORDERED, PART-DESC, QUOTED-PRICE)
ORDER-NO, PART-NO --> NO-ORDERED, PART-DESC, QUOTE-PRICE
In this case, the attribute on the left-hand side of the arrow is called a determinant.
For examples:
CUST-NO - - > CUST-NAME, ADDRESS, COMPANY
INVOICE-NO - - > INVOICE-DATE, CUST-NO, ORDER-NO
CUST-NO and INVOICE-NO examples of determinants.
4.1.2 Keys
An attitude (or field), K, is the primary key of a table if:
All columns (all the fields in the table) are functionally dependent on K.
Each value is unique.
If K is a composite/concatenate key then it must comply with the following conditions:
No portion of the key should be a primary key.
All attributes that make up the key are not null.
4.2 Steps in Normalisation
First normal form (1NF). Any repeating groups have been removed, so that there is a
single value at the intersection of each row and column of the table.
Second normal form (2NF). Any partial functional dependencies have been removed.
Third normal form (3NF). Any transitive dependencies have been removed.
Note: If a relation meets the criteria for 3NF, it also meets criteria for 2NF and 1NF. Most
design problems can be avoided if the relations are in 3NF.
Prof. Erwin M. Globio, MSIT 4-3
4. DB212 CHAPTER 4: NORMALISATION
4.2.1 First Normal Form
Example:
UNF INF
Order-no Order no Order-no
Date Date Part-no
Part-no Cust-no Qty-ordered
Qty-ordered Cust-name Part-description
Part-description Cust-address Quote-price
Quote-price
Cust-no
Cust-name
Cust-address
4.2.2 Second Normal Form
A relation is in 2NF if:
It is in INF, and
all non-key attributes are fully functionally dependent on the primary key and not on only
a portion of the primary key.
Steps to transform into 2NF
Identify all functional dependencies in INF.
Make each determinant the primary key of a new relation.
Place all attributes that depend on a given determinant in the relation with that
determinant that depend on a given determinant in the relation with that determinant as
non-key attributes.
All the functional dependencies in this case are:
ORDER-NO --> DATE, CUST-NO, CUSTNAME, CUST-ADDRESS
PART-NO --> PART-DESC
Note : In this case, we say that PART-NO is only partially functional dependent on the key.
(ORDER-NO, PART-NO) - - > QTY-ORDERED, QUOTE-PRICE
4-4 Prof. Erwin M. Globio, MSIT
5. DB212 CHAPTER 4: NORMALISATION
The partial functional dependency in
ITEM (ORDER-NO, PART-NO, QTY-ORDERES, QUOTE-PRICE)
creates redundancy in that relation, which results in anomalies when the table is updated.
Insertion anomaly. To insert a row for the ITEM table, we must provide the part
description information too.
Deletion anomaly. If we delete a row for the ITEM table, we may lose some PART
information.
Modification anomaly. If a PART's description changes, we must record the change in
multiple rows in the ITEM table.
Example:
1NF 2NF
Order-no Order-no Order-no Order-no
Date Part-no Date Part-no
Cust-no Qty-ordered Cust-no
Quoted-price
Cust-name
Part-description Cust-name
Quoted-price
Cust-address Quoted-price
Cust-address
Part-no
Part-description
Note: A relation that is in first normal firm will be in second normal form if any one
of the following conditions apply:
The primary key consists of only one attitude (such as the attribute ORDER-NO in
ORDER).
No nonkey attributes exist in the relation.
Every nonkey attribute is functionally dependent on the full set of primary key attributes.
4.2.3 Third Normal Form
A relation is in 3NF if:
It is in 2NF, and
no transitive dependencies.
Transitive dependencies are when A - - > B - - > C. Thus it can be split into A - - > B and B - -
> C.
Prof. Erwin M. Globio, MSIT 4-5
6. DB212 CHAPTER 4: NORMALISATION
Steps to transform into 3NF:
Create one relation for each determinant in the transitive dependency.
Make the determinants the primary keys in their respective relations.
Include as non-key attributes those attributes that depend on the determinant.
In the functional dependency:
ORDER ( ORDER-NO, DATE, CUST-NO, CUST-NAME, CUST ADDRESS)
there is a transitive dependency. That is, one of the non-key attribute can be used to determine
other attributes.
CUST-NO --> CUST-NAME, CUST-ADDRESS
Therefore, there are update anomalies in this table.
Insertion anomaly. A new customer is found and cannot be entered until it has made an
order.
Deletion anomaly. If an order-no is deleted from the ORDER table, we may lose some
CUSTOMER information.
Modification anomaly. If the address of a customer changes, we have to update all the
associated past order records.
To remove such anomalies, we can decompose the ORDER relation into two relations.
Example:
2NF 3NF
Order-no Order-no Order-no Order-no
Date Part-no Date Part-no
Cust-no Qty-priced Cust-no Qty-ordered
Cust-name Quoted-price Quoted-price
Cust-address Cust-no
Part-no Cust-name Part-no
Part-description Cust-address Part-description
Notice that CUST-NO is the primary key of a new relation and is a foreign key in the ORDER
relation.
A foreign key is an attribute that appears as a nonkey attribute in one relation and as a
primary key attribute in another relation.
Therefore the final result is
ORDER (ORDER-NO, DATE, CUST-NO)
ITEMS (ORDER-NO, PART-NO, NO-ORDERED, QUOTED-PRICE)
CUSTOMER (CUST-NO, CUST-NAME, CUST-ADDRESS)
PART (PART-NO, PART-DESC)
4-6 Prof. Erwin M. Globio, MSIT
7. DB212 CHAPTER 4: NORMALISATION
4.3 Transforming E-R Diagram to Relations
4.3.1 Represent Entities
Each entity type in an E-R diagram is transformed into a relation. The primary key of the
entity type becomes the primary key of the corresponding relation.
Taking the following E-R diagram as an example,
Cust-no
CUSTOMER
Cust-name
Address
PLACES
Qty-ordered
Quoted-price
Order-no Part-no
Date ORDER CONSISTS PART Part-description
Cust-no
the ORDER entity is transform into the following relation :
ORDER ( ORDER-NO, DATE, CUST-NO )
4.3.2 Represent Relationships
Binary 1:N Relationship
A binary one-to-many (1:N) relationship in an E-R diagram is represented by adding the
primary key attribute of the entity on the one-sided of the relationship, as a foreign key.
Thus the CUSTOMER and ORDER relations in the E-R diagram are then transformed
into
ORDER ( ORDER-NO, DATE, CUST-NO )
CUSTOMER ( CUST-NO, CUST-NAME, CUST-ADDRESS )
CUST-NO is a foreign key in the ORDER relation but a primary key in the CUSTOMER
relation.
Prof. Erwin M. Globio, MSIT 4-7
8. DB212 CHAPTER 4: NORMALISATION
Binary M:N Relationship
For a binary any-to-many relationship between two entity types A and B, create a separate
relation C. The primary key of this C relation is the composite key consisting of the
primary keys for entities A and B.
Thus, in the entities types PART and ORDER, a relation called ORDER-LINE is created
which consists of the two primary keys in the PART and ORDER as well as the attributes
QTY-ORDERED, QUOTED-PRICE.
That is,
ORDER-LINE ( ORDER-NO, PART-NO, QTY-ORDERED, QUOTED-PRICES)
Unary Relationships
In a unary relationship (recursive relationship), the primary key of that relation is the
same as for the entity type. A foreign key is added to the relation that references the
primary key values. This is known as the recursive foreign key.
Example:
EMPLOYEE (EMP-ID, NAME, BIRTHDATE, MANAGER-ID)
4.4 Merging Relations
As part of the logical design process, normalised relations may have been created from a
number of separate E-R diagrams and other user views. Some of these relations may be
redundant and can be integrated with other relations (view integration).
Example:
Suppose that modelling a user view results in the following 3NF relation:
STUDENT1 (STUDENTID, NAME, ADDRESS, PHONE, GUARDIAN).
Modelling a second user view might result in the following relation:
STUDENT2 (STUENTID, NAME, ADDRESS, DEPT)
Since these two relations have the same primary key (STUDENTID), they describe the same
entity and may be merged into one relation. Therefore the result of the merging is:
STUDENT (STUDENTID, NAME, ADDRESS, PHONE, GUARDIAN, DEPT)
This reduces duplication of NAME and ADDRESS.
4-8 Prof. Erwin M. Globio, MSIT
9. DB212 CHAPTER 4: NORMALISATION
4.5 Review Questions
1. For each of the following relations, indicate the normal form for that relation. If
the relation is not in 3NF, normalise it.
(Note: Functional dependencies are shown where appropriate.)
a. CLASS (COURSE NO, SECTION NO)
b. CLASS (COURSE NO, SECTION NO, ROOM)
c. CLASS (COURSE NO, SECTION NO, ROOM, APACITY)ROOM - - >
CAPACITY
d. CLASS (COURSE NO, SECTION NO, COURSE NAME, ROOM,
CAPACITY)ROOM - - > CAPACITYCOURSE NO - - > COURSE NAME
2. The table below contains sample data for parts and for vendors.
Part No. Description Vendor Name Address Unit Cost
123 Logic Chip Fast Chips Cupertino 10.00
Smart Chips Phoenix 8.00
5678 Memory chip Fast Chips Cupertino 3.00
Quality Chips Austin 2.00
Smart Chips Phoenix 5.00
a. Convert this table to a relation (named PART SUPPLIER) in first normal form.
b. List the functional dependencies in PART SUPPLIER and identify a candidate key.
c. Identify each of the following: an insert anomaly, a delete anomaly, and
modification anomaly in the above 1NF relation.
d. Convert the relation to 3NF.
3. When integrating relations, the database analyst must understand the meaning of data and
try to resolve problems arising form synonyms, homonyms relations. Illustrate with
examples (quoting from your project), how such problems can be resolved.
Prof. Erwin M. Globio, MSIT 4-9
10. DB212 CHAPTER 4: NORMALISATION
LOCATION Accom
ROOM
May be
Patient no Assigned to
Location
Patient
name
PATIENT Is billed ITEM
for
Extension
Patient
address
Description
(Other patient
attributes)
Attenda Charge
Item code
Procedure
PHYSICIAN
Physician
Physician ID
phone
4 - 10 Prof. Erwin M. Globio, MSIT