SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Normalization 1
Introduction
In this exercise we are looking at the
optimisation of data structure. The example
system we are going to use as a model is a
database to keep track of employees of an
organisation working on different projects.
Objectives
By the end of the exercise you should be able to:
    Show understanding of why we normalize
    data
    Give formal definitions of 1NF, 2NF & 3NF
    Apply the process of normalization to your
    own work
Normalization 2
The data we would want to store could be
expressed as:
Project   Project       Employee   Employee   Rate       Rate
No        Name          No         Name       category

1203      Madagascar    11         Jessica    A          £90
          travel site              Brookes
                        12         Andy       B          £80
                                   Evans
                        16         Max Fat    C          £70

1506      Online        11         Jessica    A          £90
          estate                   Brookes
          agency
                        17         Alex       B          £80
                                   Branton
Normalization 3
Three problems become apparent with our
current model:
Tables in a RDBMS use a simple grid structure
Each project has a set of employees so we can’t
even use this format to enter data into a table.
How would you construct a query to find the
employees working on each project?
All tables in an RDBMS need a key
Each record in a RDBMS must have a unique
identity. Which field should be the primary key?
Data entry should be kept to a minimum
Our main problem is that each project contains
repeating groups, which lead to redundancy and
inconsistency.
Normalization 4
We could place the data into a table called:
tblProjects_Employees
Project   Project       Employee   Employee   Rate       Rate
No.       Name          No.        Name       category

1203      Madagascar    11         Jessica    A          £90
          travel site              Brookes
1203      Madagascar    12         Andy       B          £80
          travel site              Evans
1203      Madagascat    16         Max Fat    C          £70
          travel site
1506      Online        11         Jessica    A          £90
          estate                   Brookes
          agency
1506      Online        17         Alex       B          £70
          estate                   Branton
          agency
Normalization 5
Addressing our three problems:
Tables in a RDBMS use a simple grid structure
We can find members of each project using a
simple SQL or QBE search on either Project
Number or Project Name
All tables in an RDBMS need a key
We CAN uniquely identify each record. Although
no primary key exists we can use two or more
fields to create a composite key.
Data entry should be kept to a minimum
Our main problem that each project contains
repeating groups still remains. To create a
RDBMS we have to eliminate these groups or
sets.
Normalization 6
Did you notice that Madagascar was misspelled
in the 3rd record! Imagine trying to spot this error
in thousands of records. By using this structure
(flat filing) we create:
Redundant data
Duplicate copies of data – we would have to key
in Madagascar travel site 3 times. Not only do we
waste storage space we risk creating;
Inconsistent data
The more often we have to key in data the more
likely we are to make mistakes. (see IT01 notes
on the importance of accurate data).
Normalization 7
The solution is simply to take out the duplication.
We do this by:
Identifying a key
In this case we can use the project no and
employee no to uniquely identify each row

  Project No   Employee             Unique Identifier
               No
  1203         11
                                           120311
  1203         12
                                           120312
  1203         16
                                           120316

Note: Project 1056 is not shown for reasons of space
Normalization 8
We look for partial dependencies
We look for fields that depend on only part of the
key and not the entire key.
   Field           Project No   Employee No
   Project Name    

   Employee                     

   Rate Category                

   Rate                         


We remove partial dependencies
The fields listed are only dependent on part of
the key so we remove them from the table.
Normalization 9
We create new tables
Clearly we can’t take the data out and leave it out
of our database. We put it into a new table
consisting of the field that has the partial
dependency and the field it is dependent on.
Looking at our example we will need to create
two new tables:
Dependent    Partially      Dependent   Partially
On           Dependent      On          Dependent
Project No   Project Name   Employee    Employee Name
                            No
                                        Rate category

                                        Rate
Normalization 10
We now have 3 tables:
                                  tblProjects
tblProjects_Employees             Project No        Project Name
Project   Employee
                                  1023              Madagascar
No        No
                                                    travel site
1023      11
               tblEmployees       1056              Online estate
                                                    agency
1023      12   Employee   Employee       Rate         Rate
               No         Name           Category
1023      16   11         Jessica        A           £90
                          Brookes
1056      11   12         Andy           B           £80
                          Evans
1056      17   16         Max Fat        C           £70
               17         Alex           A           £80
                          Branton
Normalization 11
Looking at the project note the reduction in:
Redundant data
The text “Madagascar travel site” is stored once
only, not for each occurrence of an employee
working on the project.
Inconsistent data
Because we only store the project name once we
are less likely to enter “Madagascat”

The link is made through the key, Project No.
Obviously there is no way to remove this
duplication without losing the relation altogether,
but it is far more efficient storing a short number
repeatedly, than a large chunk of text.
Normalization 12
Our model has improved but is still far from
perfect. There is still room for inconsistency.
 Employee   Employee   Rate       Rate
 No         Name       Category          Alex Branton is
 11         Jessica    A          £90    being paid £80
            Brookes                      while Jessica
                                         Brookes gets £90 –
 12         Andy       B          £80    but they’re in the
            Evans                        same rate category!
 16         Max Fat    C          £70
 17         Alex       A          £80
            Branton

Again, we have stored redundant data: the hourly
rate- rate category relationship is being stored in
its entirety i.e. We have to key in both the rate
category AND the hourly rate.
Normalization 13
The solution, as before, is to remove this excess
data to another table. We do this by:
Looking for Transitive Relationships
Relationships where a non-key attribute is
dependent on another non-key attribute. Hourly
rate should depend on rate category BUT rate
category is not a key
Removing Transitive Relationships
As before we remove the redundant data and
place it in a separate table. In this case we create
a new table tblRates and add the fields rate
category and hourly rate. We then delete hourly
rate from the employees table.
Normalization 14
We now have 4 tables:
                                  tblProjects
tblProjects_Employees             Project No        Project Name
Project   Employee
                                  1023              Madagascar
No        No
                                                    travel site
1023      11
               tblEmployees       1056              Online estate
                                                    agency
1023      12   Employee   Employee       Rate
                                                            tblRates
               No         Name           Category
                                              Rate         Rate
1023      16   11         Jessica        A    Category
                          Brookes
                                                A          £90
1056      11   12         Andy           B
                          Evans
                                                B          £80
1056      17   16         Max Fat        C
               17         Alex           A      C          £70
                          Branton
Normalization 15
Again, we have cut down on redundancy and it is
now impossible to assume Rate category A is
associated with anything but £90.

Our model is now in its most efficient format
with:

Minimum REDUNDANCY

Minimum INCONSISTENCY
Normalization 16
What we have formally done is NORMALIZE the
database:
At the beginning we had a data structure:
Project No
Project Name
Employee No (1n)
Employee name (1n)
Rate Category (1n)
Hourly Rate (1n)
(1n indicates there are many occurrences of the
field – it is a repeating group).
To begin the normalization process we start by
moving from zero normal form to 1st normal form.
Normalization 17
The definition of 1st normal form
There are no repeating groups
All the key attributes are defined
All attributes are dependent on the primary key
So far, we have no keys, and there are repeating
groups. So we remove the repeating groups and
define the keys and are left with:
Employee Project table
Project number – part of key
Project name
Employee number – part of key
Employee name
Rate category
Hourly rate
This table is in first normal form (1NF)
Normalization 18
A table is in 2nd normal form if
It’s already in first normal form
It includes no partial dependencies (where an
attribute is dependent on only part of the key)

We look through the fields:
Project name is dependent only on project
number
Employee name, rate category and hourly rate
are dependent only on employee number.

So we remove them, and place these fields in a
separate table, with the key being that part of the
original key they are dependent on. We are left
with the following three tables:
Normalization 19
Employee Project table
Project number – part of key
Employee number – part of key

Employee table
Employee number - primary key
Employee name
Rate category
Hourly rate

Project table
Project number - primary key
Project name
The tables are now in 2nd normal form (2NF). Are
they in 3rd normal form?
Normalization 20
A table is in 3rd normal form if
It’s already in second normal form
It includes no transitive dependencies (where a
non-key attribute is dependent on another non-
key attribute)

We can narrow our search down to the Employee
table, which is the only one with more than one
non-key attribute. Employee name is not
dependent on either Rate category or Hourly
rate, the same applies to Rate category, but
Hourly rate is dependent on Rate category. So,
as before, we remove it, placing it in it's own
table, with the attribute it was dependent on as
key, as follows:
Normalization 21
Employee project table
Project number – part of key
Employee number – part of key
Employee table
Employee number - primary key
Employee name
Rate Category
Rate table
Rate category - primary key
Hourly rate
Arial
Project number - primary key
Project name
These tables are all now in 3rd normal form, and
ready to be implemented.
Normalization 22
There are other normal forms - Boyce-Codd
normal form, and 4th normal form, but these are
very rarely used for business applications. In
most cases, tables in 3rd normal form are already
in these normal forms anyway.

Before you start normalizing everything, a word
of warning. No process is better than common
sense. Take a look at this example.
Customer table
Customer Number - primary key
Name
Address
Postcode
Town
Normalization 23
What normal form is this table in? Giving it a
quick glance, we see:

no repeating groups, and a primary key defined,
so it's at least in 1st normal form.
There's only one key, so we needn't even look
for partial dependencies, so it's at least in 2nd
normal form.
How about transitive dependencies? Well, it
looks like Town might be determined by
Postcode. And in most parts of the world that's
usually the case.

So we should remove Town, and place it in a
separate table, with Postcode as the key?
Normalization 24
No! Although this table is not technically in 3rd
normal form, removing this information is not
worth it. Creating more tables increases the load
slightly, slowing processing down. This is often
counteracted by the reduction in table sizes, and
redundant data. But in this case, where the town
would almost always be referenced as part of the
address, it isn't worth it. Perhaps a company that
uses the data to produce regular mailing lists of
thousands of customers should normalize fully.
It always comes down to how the data is going to
be used. Normalization is just a helpful process
that usually results in the most efficient table
structure, and not a rule for database design.
Normalization 25
Further Reading:
Paper
Heathcote – pages 110 -114
De Watteville et al – pages 299 – 300
Mott et al – pages 106 - 123

Web
http://phoenix.ucr.edu/mis/mgt230/Lecture5/sld001.html

http://www.wamoz.com/rood/normalis.htm
(read “A concise dictionary of normal forms”)

http://www.problemsolving.com/codecorn/norm.htm

http://www.acm.org/classics/nov95/s1p4.html

Weitere ähnliche Inhalte

Andere mochten auch

Introduction to Dreamweaver
Introduction to DreamweaverIntroduction to Dreamweaver
Introduction to DreamweaverSarah Bombich
 
Penerbitan video korporat
Penerbitan video korporatPenerbitan video korporat
Penerbitan video korporatHazrul Halim
 
DHTML - Events & Buttons
DHTML - Events  & ButtonsDHTML - Events  & Buttons
DHTML - Events & ButtonsDeep Patel
 
Dreamweaver - Introduction AND WALKTHROUGH
Dreamweaver - Introduction AND WALKTHROUGHDreamweaver - Introduction AND WALKTHROUGH
Dreamweaver - Introduction AND WALKTHROUGHSahil Bansal
 
Database normalization
Database normalizationDatabase normalization
Database normalizationJignesh Jain
 
Web designp pt
Web designp ptWeb designp pt
Web designp ptBizzyb09
 
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
FUNCTION DEPENDENCY  AND TYPES & EXAMPLEFUNCTION DEPENDENCY  AND TYPES & EXAMPLE
FUNCTION DEPENDENCY AND TYPES & EXAMPLEVraj Patel
 
Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization emailharmeet
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Jargalsaikhan Alyeksandr
 

Andere mochten auch (12)

Normalization
NormalizationNormalization
Normalization
 
Introduction to Dreamweaver
Introduction to DreamweaverIntroduction to Dreamweaver
Introduction to Dreamweaver
 
Penerbitan video korporat
Penerbitan video korporatPenerbitan video korporat
Penerbitan video korporat
 
DHTML - Events & Buttons
DHTML - Events  & ButtonsDHTML - Events  & Buttons
DHTML - Events & Buttons
 
Dreamweaver - Introduction AND WALKTHROUGH
Dreamweaver - Introduction AND WALKTHROUGHDreamweaver - Introduction AND WALKTHROUGH
Dreamweaver - Introduction AND WALKTHROUGH
 
Dhtml sohaib ch
Dhtml sohaib chDhtml sohaib ch
Dhtml sohaib ch
 
Database normalization
Database normalizationDatabase normalization
Database normalization
 
Web designp pt
Web designp ptWeb designp pt
Web designp pt
 
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
FUNCTION DEPENDENCY  AND TYPES & EXAMPLEFUNCTION DEPENDENCY  AND TYPES & EXAMPLE
FUNCTION DEPENDENCY AND TYPES & EXAMPLE
 
DBMS - Normalization
DBMS - NormalizationDBMS - Normalization
DBMS - Normalization
 
Lecture 04 normalization
Lecture 04 normalization Lecture 04 normalization
Lecture 04 normalization
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 

Mehr von JTHSICT

Year 8 sow 2012 2013
Year 8 sow 2012 2013Year 8 sow 2012 2013
Year 8 sow 2012 2013JTHSICT
 
E book example
E book exampleE book example
E book exampleJTHSICT
 
Literacy glossary
Literacy glossaryLiteracy glossary
Literacy glossaryJTHSICT
 
Lookup table
Lookup tableLookup table
Lookup tableJTHSICT
 
Doublebookings2
Doublebookings2Doublebookings2
Doublebookings2JTHSICT
 
Doublebookings1
Doublebookings1Doublebookings1
Doublebookings1JTHSICT
 
Password
PasswordPassword
PasswordJTHSICT
 
Naming conventions
Naming conventionsNaming conventions
Naming conventionsJTHSICT
 
Validation
ValidationValidation
ValidationJTHSICT
 
Calendar
CalendarCalendar
CalendarJTHSICT
 
It4 Coursework Help
It4 Coursework HelpIt4 Coursework Help
It4 Coursework HelpJTHSICT
 

Mehr von JTHSICT (19)

Year 8 sow 2012 2013
Year 8 sow 2012 2013Year 8 sow 2012 2013
Year 8 sow 2012 2013
 
E book example
E book exampleE book example
E book example
 
Literacy glossary
Literacy glossaryLiteracy glossary
Literacy glossary
 
Lookup table
Lookup tableLookup table
Lookup table
 
Doublebookings2
Doublebookings2Doublebookings2
Doublebookings2
 
Doublebookings1
Doublebookings1Doublebookings1
Doublebookings1
 
Password
PasswordPassword
Password
 
Macros
MacrosMacros
Macros
 
Naming conventions
Naming conventionsNaming conventions
Naming conventions
 
Access1
Access1Access1
Access1
 
Access2
Access2Access2
Access2
 
Access3
Access3Access3
Access3
 
Access2
Access2Access2
Access2
 
Access1
Access1Access1
Access1
 
Access5
Access5Access5
Access5
 
Access4
Access4Access4
Access4
 
Validation
ValidationValidation
Validation
 
Calendar
CalendarCalendar
Calendar
 
It4 Coursework Help
It4 Coursework HelpIt4 Coursework Help
It4 Coursework Help
 

Kürzlich hochgeladen

Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 

Kürzlich hochgeladen (20)

Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdf
 
CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 

Normalization

  • 1. Normalization 1 Introduction In this exercise we are looking at the optimisation of data structure. The example system we are going to use as a model is a database to keep track of employees of an organisation working on different projects. Objectives By the end of the exercise you should be able to: Show understanding of why we normalize data Give formal definitions of 1NF, 2NF & 3NF Apply the process of normalization to your own work
  • 2. Normalization 2 The data we would want to store could be expressed as: Project Project Employee Employee Rate Rate No Name No Name category 1203 Madagascar 11 Jessica A £90 travel site Brookes 12 Andy B £80 Evans 16 Max Fat C £70 1506 Online 11 Jessica A £90 estate Brookes agency 17 Alex B £80 Branton
  • 3. Normalization 3 Three problems become apparent with our current model: Tables in a RDBMS use a simple grid structure Each project has a set of employees so we can’t even use this format to enter data into a table. How would you construct a query to find the employees working on each project? All tables in an RDBMS need a key Each record in a RDBMS must have a unique identity. Which field should be the primary key? Data entry should be kept to a minimum Our main problem is that each project contains repeating groups, which lead to redundancy and inconsistency.
  • 4. Normalization 4 We could place the data into a table called: tblProjects_Employees Project Project Employee Employee Rate Rate No. Name No. Name category 1203 Madagascar 11 Jessica A £90 travel site Brookes 1203 Madagascar 12 Andy B £80 travel site Evans 1203 Madagascat 16 Max Fat C £70 travel site 1506 Online 11 Jessica A £90 estate Brookes agency 1506 Online 17 Alex B £70 estate Branton agency
  • 5. Normalization 5 Addressing our three problems: Tables in a RDBMS use a simple grid structure We can find members of each project using a simple SQL or QBE search on either Project Number or Project Name All tables in an RDBMS need a key We CAN uniquely identify each record. Although no primary key exists we can use two or more fields to create a composite key. Data entry should be kept to a minimum Our main problem that each project contains repeating groups still remains. To create a RDBMS we have to eliminate these groups or sets.
  • 6. Normalization 6 Did you notice that Madagascar was misspelled in the 3rd record! Imagine trying to spot this error in thousands of records. By using this structure (flat filing) we create: Redundant data Duplicate copies of data – we would have to key in Madagascar travel site 3 times. Not only do we waste storage space we risk creating; Inconsistent data The more often we have to key in data the more likely we are to make mistakes. (see IT01 notes on the importance of accurate data).
  • 7. Normalization 7 The solution is simply to take out the duplication. We do this by: Identifying a key In this case we can use the project no and employee no to uniquely identify each row Project No Employee Unique Identifier No 1203 11 120311 1203 12 120312 1203 16 120316 Note: Project 1056 is not shown for reasons of space
  • 8. Normalization 8 We look for partial dependencies We look for fields that depend on only part of the key and not the entire key. Field Project No Employee No Project Name  Employee  Rate Category  Rate  We remove partial dependencies The fields listed are only dependent on part of the key so we remove them from the table.
  • 9. Normalization 9 We create new tables Clearly we can’t take the data out and leave it out of our database. We put it into a new table consisting of the field that has the partial dependency and the field it is dependent on. Looking at our example we will need to create two new tables: Dependent Partially Dependent Partially On Dependent On Dependent Project No Project Name Employee Employee Name No Rate category Rate
  • 10. Normalization 10 We now have 3 tables: tblProjects tblProjects_Employees Project No Project Name Project Employee 1023 Madagascar No No travel site 1023 11 tblEmployees 1056 Online estate agency 1023 12 Employee Employee Rate Rate No Name Category 1023 16 11 Jessica A £90 Brookes 1056 11 12 Andy B £80 Evans 1056 17 16 Max Fat C £70 17 Alex A £80 Branton
  • 11. Normalization 11 Looking at the project note the reduction in: Redundant data The text “Madagascar travel site” is stored once only, not for each occurrence of an employee working on the project. Inconsistent data Because we only store the project name once we are less likely to enter “Madagascat” The link is made through the key, Project No. Obviously there is no way to remove this duplication without losing the relation altogether, but it is far more efficient storing a short number repeatedly, than a large chunk of text.
  • 12. Normalization 12 Our model has improved but is still far from perfect. There is still room for inconsistency. Employee Employee Rate Rate No Name Category Alex Branton is 11 Jessica A £90 being paid £80 Brookes while Jessica Brookes gets £90 – 12 Andy B £80 but they’re in the Evans same rate category! 16 Max Fat C £70 17 Alex A £80 Branton Again, we have stored redundant data: the hourly rate- rate category relationship is being stored in its entirety i.e. We have to key in both the rate category AND the hourly rate.
  • 13. Normalization 13 The solution, as before, is to remove this excess data to another table. We do this by: Looking for Transitive Relationships Relationships where a non-key attribute is dependent on another non-key attribute. Hourly rate should depend on rate category BUT rate category is not a key Removing Transitive Relationships As before we remove the redundant data and place it in a separate table. In this case we create a new table tblRates and add the fields rate category and hourly rate. We then delete hourly rate from the employees table.
  • 14. Normalization 14 We now have 4 tables: tblProjects tblProjects_Employees Project No Project Name Project Employee 1023 Madagascar No No travel site 1023 11 tblEmployees 1056 Online estate agency 1023 12 Employee Employee Rate tblRates No Name Category Rate Rate 1023 16 11 Jessica A Category Brookes A £90 1056 11 12 Andy B Evans B £80 1056 17 16 Max Fat C 17 Alex A C £70 Branton
  • 15. Normalization 15 Again, we have cut down on redundancy and it is now impossible to assume Rate category A is associated with anything but £90. Our model is now in its most efficient format with: Minimum REDUNDANCY Minimum INCONSISTENCY
  • 16. Normalization 16 What we have formally done is NORMALIZE the database: At the beginning we had a data structure: Project No Project Name Employee No (1n) Employee name (1n) Rate Category (1n) Hourly Rate (1n) (1n indicates there are many occurrences of the field – it is a repeating group). To begin the normalization process we start by moving from zero normal form to 1st normal form.
  • 17. Normalization 17 The definition of 1st normal form There are no repeating groups All the key attributes are defined All attributes are dependent on the primary key So far, we have no keys, and there are repeating groups. So we remove the repeating groups and define the keys and are left with: Employee Project table Project number – part of key Project name Employee number – part of key Employee name Rate category Hourly rate This table is in first normal form (1NF)
  • 18. Normalization 18 A table is in 2nd normal form if It’s already in first normal form It includes no partial dependencies (where an attribute is dependent on only part of the key) We look through the fields: Project name is dependent only on project number Employee name, rate category and hourly rate are dependent only on employee number. So we remove them, and place these fields in a separate table, with the key being that part of the original key they are dependent on. We are left with the following three tables:
  • 19. Normalization 19 Employee Project table Project number – part of key Employee number – part of key Employee table Employee number - primary key Employee name Rate category Hourly rate Project table Project number - primary key Project name The tables are now in 2nd normal form (2NF). Are they in 3rd normal form?
  • 20. Normalization 20 A table is in 3rd normal form if It’s already in second normal form It includes no transitive dependencies (where a non-key attribute is dependent on another non- key attribute) We can narrow our search down to the Employee table, which is the only one with more than one non-key attribute. Employee name is not dependent on either Rate category or Hourly rate, the same applies to Rate category, but Hourly rate is dependent on Rate category. So, as before, we remove it, placing it in it's own table, with the attribute it was dependent on as key, as follows:
  • 21. Normalization 21 Employee project table Project number – part of key Employee number – part of key Employee table Employee number - primary key Employee name Rate Category Rate table Rate category - primary key Hourly rate Arial Project number - primary key Project name These tables are all now in 3rd normal form, and ready to be implemented.
  • 22. Normalization 22 There are other normal forms - Boyce-Codd normal form, and 4th normal form, but these are very rarely used for business applications. In most cases, tables in 3rd normal form are already in these normal forms anyway. Before you start normalizing everything, a word of warning. No process is better than common sense. Take a look at this example. Customer table Customer Number - primary key Name Address Postcode Town
  • 23. Normalization 23 What normal form is this table in? Giving it a quick glance, we see: no repeating groups, and a primary key defined, so it's at least in 1st normal form. There's only one key, so we needn't even look for partial dependencies, so it's at least in 2nd normal form. How about transitive dependencies? Well, it looks like Town might be determined by Postcode. And in most parts of the world that's usually the case. So we should remove Town, and place it in a separate table, with Postcode as the key?
  • 24. Normalization 24 No! Although this table is not technically in 3rd normal form, removing this information is not worth it. Creating more tables increases the load slightly, slowing processing down. This is often counteracted by the reduction in table sizes, and redundant data. But in this case, where the town would almost always be referenced as part of the address, it isn't worth it. Perhaps a company that uses the data to produce regular mailing lists of thousands of customers should normalize fully. It always comes down to how the data is going to be used. Normalization is just a helpful process that usually results in the most efficient table structure, and not a rule for database design.
  • 25. Normalization 25 Further Reading: Paper Heathcote – pages 110 -114 De Watteville et al – pages 299 – 300 Mott et al – pages 106 - 123 Web http://phoenix.ucr.edu/mis/mgt230/Lecture5/sld001.html http://www.wamoz.com/rood/normalis.htm (read “A concise dictionary of normal forms”) http://www.problemsolving.com/codecorn/norm.htm http://www.acm.org/classics/nov95/s1p4.html