2. Normalization
Normalization: the process of converting
complex data structures into simple, stable
data structures.
The main idea is to avoid duplication of large
data.
Why normalization?
The relation derived from the user view or data
store will most likely be unnormalized.
The problem usually happens when an existing
system uses unstructured file, e.g. in MS Excel.
3. The Three Steps of
Normalization
The standard normalization has more
than three steps:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain/Key Normal Form (DKNF)
However, only three steps (1NF, 2NF,
3NF) are sufficient for normalization.
4. I. First Normal Form (1NF)
The official qualifications for 1NF are:
1. Each attribute must have a unique name.
2. Each attribute must have a single value.
3. Row cannot be duplicated.
4. There is no repeating groups.
Additional:
1. Choose a primary key. The primary key
can be an attribute or combined attributes.
5. Name DOB Course Payment
Sok 11/5/1990 IT 450 Dollars
Sao 4/4/1989 Mgt 400 Dollars
Chan 7/7/1991 IT Mgt IT: 450 Dollars
Mgt: 400 Dollars
Sok 11/5/1990 Mgt 400 Dollars
Sao 4/4/1989 Tour 1) 200 Dollars
2) 200 Dollars
1. Each attribute has unique name -> Good
2. The Payment has multi data type (currency & string) -> Bad
3. All rows are not duplicated -> Good
4. The Course and Payment have repeating groups -> Bad
6. Name DOB Course Payment ($)
Sok 11/5/1990 IT 450
Sao 4/4/1989 Mgt 400
Chan 7/7/1991 IT 450
Chan 7/7/1991 Mgt 400
Sok 11/5/1990 Mgt 400
Sao 4/4/1989 Tour 200
Sao 4/4/1989 Tour 200
All correct?
Name? No. Name has duplicated values.
Or DOB, or Course or Payment? No. Each one has duplicated values.
Name and DOB? No. They still have duplicated values.
Name and DOB and Course? No. Still duplicated.
Combine all attribute? Still no. The last two rows are duplicated.
So what else we can do?
Of course, there is a way. Add a new attribute to be a
primary key. So let’s call it ID.
Not yet. Choose a primary key.
7. ID Name DOB Course Payment
1 Sok 11/5/1990 IT 450
2 Sao 4/4/1989 Mgt 400
3 Chan 7/7/1991 IT 450
4 Chan 7/7/1991 Mgt 400
5 Sok 11/5/1990 Mgt 300
6 Sao 4/4/1989 Tour 200
7 Sao 4/4/1989 Tour 200
Now it is completely in 1NF.
Next, check it if it is not in 2NF.
8. II. Second Normal Form (2NF)
The official qualifications for 2NF are:
1. A table is already in 1NF.
2. All nonkey attributes are fully dependent
on the primary key.
All partial dependencies are removed and
placed in another table.
9. CourseID Semester Num Student Course Name
IT101 2013-1 25 Database
IT101 2013-2 25 Database
IT102 2013-1 30 Web Prog
IT102 2013-2 35 Web Prog
IT103 2014-1 20 Networking
Assume you have a table below contain a primary (CourseID + Semester):
Primary Key
The Course Name depends on only CourseID, a part of the primary key
not the whole primary (CourseID + Semester).It’s called partial dependency.
Solution:
Remove CourseID and Course Name together
to create a new table.
10. CourseID Course Name
IT101 Database
IT101 Database
IT102 Web Prog
IT102 Web Prog
IT103 Networking
Semester
Done?
Oh no, it is still not in 1NF yet.
You have to remove the repeating
groups too.
CourseID Course Name
IT101 Database
IT102 Web Prog
IT103 Networking
CourseID Semester Num Student
IT101 2013-1 25
IT101 2013-2 25
IT102 2013-1 30
IT102 2013-2 35
IT103 2014-1 20
1
M
11. III. Third Normal Form (3NF)
The official qualifications for 3NF are:
1. A table is already in 2NF.
2. Nonprimary key attributes do not depend
on other nonprimary key attributes (i.e. no
transitive dependencies)
All transitive dependencies are removed and
placed in another table.
12. StudyID Course Name Teacher Name Teacher Tel
1 Database Sok Piseth 012 123 456
2 Database Sao Kanha 0977 322 111
3 Web Prog Chan Veasna 012 412 333
4 Web Prog Chan Veasna 012 412 333
5 Networking Pou Sambath 077 545 221
Assume you have a table below contain a primary (StudyID):
Primary Key
The Teacher Tel is a nonkey attribute, and
the Teacher Name is also a nonkey atttribute.
But Teacher Tel depends on Teacher Name.
It is called transitive dependency.
Solution:
Remove Teacher Name and Teacher Tel together
to create a new table.
13. Teacher Name Teacher Tel
Sok Piseth 012 123 456
Sao Kanha 0977 322 111
Chan Veasna 012 412 333
Chan Veasna 012 412 333
Pou Sambath 077 545 221
Done?
Oh no, it is still not in 1NF yet.
So you have to remove
the repeating groups,
and add a primary key.
Teacher Name Teacher Tel
Sok Piseth 012 123 456
Sao Kanha 0977 322 111
Chan Veasna 012 412 333
Pou Sambath 077 545 221
Note about primary key:
- In theory, you can choose
Teacher Name to be a primary key.
- But in practice, you should add
Teacher ID as the primary key.
T.ID Teacher Name Teacher Tel
T1 Sok Piseth 012 123 456
T2 Sao Kanha 0977 322 111
T3 Chan Veasna 012 412 333
T4 Pou Sambath 077 545 221
StudyID Course Name T.ID
1 Database T1
2 Database T2
3 Web Prog T3
4 Web Prog T3
5 Networking T4
M
1
14. ID Name DOB Course Payment
1 Sok 11/5/1990 IT 450
2 Sao 4/4/1989 Mgt 400
3 Chan 7/7/1991 IT 450
4 Chan 7/7/1991 Mgt 400
5 Sok 11/5/1990 Mgt 300
6 Sao 4/4/1989 Tour 200
7 Sao 4/4/1989 Tour 200
What about this table?
In case of the above table, there is no 2NF because the primary key
is only one attribute, not the combined attributes.
Therefore, you can skip 2NF and move to 3NF.
In 3NF, you must remove transitive dependency.
Both Name and DOB does not depend on ID. So remove them.
Both Course and Payment does not depend on ID. So remove them.
15. ID Name DOB
S1 Sok 11/5/1990
S2 Chan 7/7/1991
S3 Sao 4/4/1989
CourseID Course
C1 IT
C2 Mgt
C3 Tour
Student Course
Payment
PID SID Course Payment
1 S1 C1 $450
2 S3 C2 $400
3 S2 C3 $450
4 S2 C2 $400
5 S1 C2 $300
6 S2 C3 $200
7 S2 C3 $200
1
M
1
M
17. Stop at 3NF
The most commonly used normal forms:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Highest normalization is not always desirable
More JOINS are required
Affect data retrieval performance/high response
time
For most business database design purposes, 3NF is
as high as we need to go in normalization process
18. Normalization in Real-World
When you newly create a table in a database
tool, e.g. MS Access, SQL Server, MySQL, or
Oracle, you won’t need all the steps.
The mentioned tools help you to overcome the
1NF already.
The 2NF happens when the primary key is
combine attributes, e.g. StudentName + DOB.
But to do so is unpractical.
Mostly, you only use 3NF. Because it can
remove all transitive dependency.
21. Functional Dependencies
21
Functional dependency can be divided
into two types:
Full functional dependency/Partial
dependency (PD)
• Will be used to transform 1NF 2NF
Transitive dependency (TD)
• Will be used to transform 2NF 3NF
22. Functional Dependencies
22
Multivalued Attributes (or repeating groups): non-
key attributes or groups of non-key attributes the
values of which are not uniquely identified by
(directly or indirectly) (not functionally dependent on)
the value of the Primary Key (or its part).
1st row
2nd row
Relational Schema
STUDENT(Stud_ID, Name, (Course_ID, Units))
23. Functional Dependencies
23
Partial Dependency – when an non-key attribute is
determined by a part, but not the whole, of a
COMPOSITE primary key (The Primary Key must be a
Composite Key).
Cust_ID → Name
25. Functional Dependencies
25
Consider a relation with attributes A and B, where attribute B is
functionally depends on attribute A. Let say an A is a PK of R.
To describe the relationship between attributes A and B is to say
that “A functionally determines B”.
A B
B is functionally
depends on A
R(A,B)
A B
26. Functional Dependencies
26
When a functional dependency exist, the attribute or
group of attributes on the left-handed side of the
arrow is called determinant.
Determinant:
Refers to the attributes, or a group of attributes, on the
left handed side of the arrow of a functional dependency.
A B
A functionally
determines B
27. staffNO sName position salary branchNo
S21 Johan Manager 3000 B005
S37 Ana Assistant 1200 B003
S14 Daud Supervisor 1800 B003
S9 Mary Assistant 900 B007
S5 Siti Manager 2400 B003
S41 Jani Assistant 900 B005
branchNO bAddress
B005 123, Kepong
B007 456, Nilai
B003 789, PTP
27
staff
branch
Functional Dependencies
Determinant
28. Functional Dependencies
28
Consider the attributes staffNO and position of the
staff relation.
For a specific staffNO (S21), we can determine the
position of that member of staff as Manager.
staffNO functionally determines position.
Staff number (S21) Position (manager)
staffNO position
position is functionally
depends on staffNO
29. Functional Dependencies
29
However the next figure illustrate that the opposite is
not true, as position does not functionally
determines staffNO.
A member of staff holds one position; however, they
maybe several members of staff with the same
position.
Position(manager)
staff number (S21)
staff number (S5)
position staffNO
staffNO does not functionally
depends on position
30. Partial Dependencies:
Full functional dependency indicates that if A and B are
attributes of a relation, B is fully functionally dependent on
A, if B is functionally dependent on A, but not on any proper
subset of A.
30
staff(staffNO,sName,position,salary,branchNO)
staffNO, staffName branchNO
True!!! each value of (staffNO, sName) is associated with
a single value of branchNO.
however, branchNO is also functionally dependent
on staffNO.
Functional Dependencies
32. Normalization Process
32
Formal technique for analyzing relations based on their Primary Key
(or candidate keys) and functional dependencies.
The technique executed as a series of steps (stage). Each step
corresponds to a specific normal form, that have specific
characteristic.
As normalization proceeds, the relations become progressively more
restricted (stronger) in format and also less vulnerable to anomalies.
Data Redundancies
0NF/UNF
1NF
2NF
3NF
33. 33
Normalization Process
2NF
3NF
UNF
1)Repeat Group
2)PK is not defined
1NF
1)Remove Repeat Group
2)Defined PK composite PK consist of attributes
Test for partial dependency
If (exist)
(1 Table)
Test for transitive dependency
If (exist)
(1 or 2 Tables)
(2 or 3 Tables)
(more then 1 table)
(3 or 4 Tables)
(a b …. TD) 1
(a ……. TD) 2
(b ….… TD) 3
(a, b x, y)
(a c, d)
(b z)
(c d)
Normalization Process Relation/Table Format
-Have repeating group
-PK not defined
-No repeating group
-PK defined
-Test partial dependency
-No repeating group
-PK defined
-No partial dependency
-Test transitive dependency
-No repeating group
-PK defined
-No partial dependency
-No transitive dependency
34. End of Chapter
Re-edited by:
Oum Saokosal
Master of Engineering in Information
Systems,
Jeonju University, South Korea
012-252-752
oum_saokosal@yahoo.com