4. Database Management
System (DBMS)
Definitions:
Data: Known facts that can be recorded and
that have implicit meaning
Database: Collection of related data
Ex. the names, telephone numbers and addresses
of all the people you know
Database Management System: A
computerized record-keeping system
raghu@theoracletrainer.com
www.theoracletrainer.com
5. DBMS (Contd.)
Goals of a Database Management System:
To provide an efficient as well as convenient
environment for accessing data in a database
Enforce information security: database security,
concurrence control, crash recovery
It is a general purpose facility for:
Defining database
Constructing database
Manipulating database
raghu@theoracletrainer.com
www.theoracletrainer.com
6. Benefits of database approach
Redundancy can be reduced
Inconsistency can be avoided
Data can be shared
Standards can be enforced
Security restrictions can be applied
Integrity can be maintained
Data independence can be provided
raghu@theoracletrainer.com
www.theoracletrainer.com
7. DBMS Functions
Data Definition
Data Manipulation
Data Security and Integrity
Data Recovery and Concurrency
Data Dictionary
Performance
raghu@theoracletrainer.com
www.theoracletrainer.com
9. Data Model
A set of concepts used to desscribe the structure of a
database
By structure, we mean the data types, relationships,
and constraints that should holds for the data
Categories of Data
Models
Conceptual
raghu@theoracletrainer.com
Physical
Representational
www.theoracletrainer.com
11. An example of the three levels
SNo FName LName
Age
Salary
Conceptual View
SNo FName LName
Age
External View1
SNo LName BranchNo
External View2
raghu@theoracletrainer.com
Salary
BranchNo
struct STAFF {
Internal
int staffNo;
View
int branchNo;
char fName[15];
char lName[15];
struct date dateOfBirth;
float salary;
struct STAFF *next;
/* pointer to next Staff record
*/
};
index staffNo; index branchNo;
/* define indexes for staff */
www.theoracletrainer.com
12. Schema
Schema: Description of data in terms of a data
model
Three-level DB Architecture defines following
schemas:
External Schema (or sub-schema)
Conceptual Schema (or schema)
Written using external DDL
Written using conceptual DDL
Internal Schema
Written using internal DDL or storage structure definition
raghu@theoracletrainer.com
www.theoracletrainer.com
13. Data Independence
Change the schema at one level of a database
system without a need to change the schema at
the next higher level
Logical data independence: Refers to the immunity
of the external schemas to changes in the conceptual
schema e.g., add new record or field
Physical data independence: Refers to the immunity
of the conceptual schema to changes in the internal
schema e.g., adding new index should not void
existing ones
www.theoracletrainer.com
raghu@theoracletrainer.com
14. TYPES OF DATABASE MODELS
HIERARCHICAL
NETWORK
COLUMN
ROW
VALUE
TABLE
RELATIONAL
raghu@theoracletrainer.com
www.theoracletrainer.com
17. Some Important Terms
Relation : a table
Tuple : a row in a table
Attribute : a Column in a table
Degree : number of attributes
Cardinality : number of tuples
Primary Key : a unique identifier for the table
Domain : a pool of values from which specific attributes
of specific relations draw their values
raghu@theoracletrainer.com
www.theoracletrainer.com
19. Keys and Referential Integrity
Enrolled
sid
53666
53688
53650
53666
cid
grade
carnatic101
C
reggae203
B
topology112
A
history105
B
Foreign key referring to
sid of STUDENT relation
raghu@theoracletrainer.com
Student
sid
name
login
age
gpa
53666 Jones
Jones@cs
18
3.4
53688 Smith
Smith@eecs
18
3.2
53650 Smith
Smith@math
19
3.8
Primary key
www.theoracletrainer.com
22. Overview of Database Design
Conceptual design : (ER Model is used at this
stage.)
Schema Refinement : (Normalization)
Physical Database Design and Tuning
raghu@theoracletrainer.com
www.theoracletrainer.com
23. Design Phases…
Requirements Collection
& Analysis
Data Requirements
Functional Requirements
User Defined Operations
Data Flow Diagrams
Sequence Diagrams, Scenarios
Conceptual Design
Entity Types, Constraints , Relationships
No Implementation Details.
Logical Design
Ensures Requirements
Meets the Design
Data Model Mapping – Type of Database is identified
Physical Design
Internal Storage Structures / Access Path / File Organizations
raghu@theoracletrainer.com
www.theoracletrainer.com
24. E-R Modeling
Entity
Entity Set
a group of similar entities
Attribute
is anything that exists and is distinguishable
properties that describe an entity
Relationship
an association between entities
raghu@theoracletrainer.com
www.theoracletrainer.com
25. Notations
ENTITY TYPE ( REGULAR )
WEAK ENTITY TYPE
RELATIONSHIP TYPE
WEAK RELATIONSHIP TYPE
raghu@theoracletrainer.com
www.theoracletrainer.com
26. Entity
Attributes
ssn
name
Employee
lot
SSN
NAME
123- 22- 3666 Attishoo
231- 31- 5368 Smiley
131- 24- 3650 Smethurst
LOT
48
22
35
Entity Set
CREATE TABLE Employees
(ssn CHAR (11),
name CHAR (20),
lot INTEGER,
PRIMARY KEY (ssn))
raghu@theoracletrainer.com
www.theoracletrainer.com
31. Key Constraints for Ternary Relationships
ssn
lot
name
Employee
since
Works_in
did
dname budget
Department
Location
address
raghu@theoracletrainer.com
capacity
www.theoracletrainer.com
34. ISA (‘is a’) Hierarchies
ssn
name
lot
Employee
Hrly_wages
Hrs_worked
Hourly_Emp
raghu@theoracletrainer.com
IsA
contractid
Contract_Emp
www.theoracletrainer.com
36. Entity vs. Attribute
Works_ In does not allow an employee to work in a department
for two or more periods (why?)
ssn
name
lot
Employee
raghu@theoracletrainer.com
from
to
Works_in
did
dname
budget
Department
www.theoracletrainer.com
37. Entity vs. Attribute (Contd.)
ssn
lot
name
Employee
from
raghu@theoracletrainer.com
did
Works_in
Duration
dname
budget
Department
to
www.theoracletrainer.com
40. Binary vs. Ternary Relationships
ssn
lot
name
Employee
pname
age
Dependent
covers
Policy
policyid
raghu@theoracletrainer.com
cost
www.theoracletrainer.com
41. Binary vs. Ternary Relationships
Better Design
ssn
name
lot
pname
Dependent
Employee
Beneficiary
purchaser
policyid
raghu@theoracletrainer.com
age
Policy
cost
www.theoracletrainer.com
42. Constraints Beyond the ER Model
• Some constraints cannot be captured in ER diagrams:
• Functional dependencies
• Inclusion dependencies
• General constraints
raghu@theoracletrainer.com
www.theoracletrainer.com
44. Example to Start with ….
An Example Database Application called
COMPANY which serves to illustrate the ER
Model concepts and their schema design.
The following are collection from the Client.
raghu@theoracletrainer.com
www.theoracletrainer.com
45. Analysis…
Company :
Organized into Departments, Each Department
has a name, no and manager who manages the
department. The Company keeps track of the
date that employee managing the department. A
Department may have a Several locations.
raghu@theoracletrainer.com
www.theoracletrainer.com
46. Analysis…
Department :
A Department controls a number of Projects each of
which has a unique name , no and a single Location.
Employee :
Name, Age, Gender, BirthDate, SSN, Address, Salary.
An Employee is assigned to one department, may work
on several projects which are not controlled by the
department. Track of the number of hours per week is
also controlled.
raghu@theoracletrainer.com
www.theoracletrainer.com
47. Analysis….
Keep track of the dependents of each employee
for insurance policies : We keep each dependant
first name, gender, Date of birth and
relationship to the employee.
raghu@theoracletrainer.com
www.theoracletrainer.com
48. DEPARTMENT
( Name , Number , { Locations } , Manager, Start Date )
PROJECT
( Name, Number, Location , Controlling Department )
EMPLOYEE
(Name (Fname, Lname) , SSN , Gender, Address, Salary
Birthdate, Department , Supervisor , (Workson ( Project , Hrs))
DEPENDENT
( Employee, Name, Gender, Birthdate , Relationship )
raghu@theoracletrainer.com
www.theoracletrainer.com
49. Example …
Manage:
Department and Employee
Partial Participation
Relation Attribute : StartDate.
Works For:
Department and Employee
Total Participation
raghu@theoracletrainer.com
www.theoracletrainer.com
50. Example…
Control :
Department , Project
Partial Participation from Department
Total Participation from Project
Control Department is a RKA.
Supervisor :
Employee, Employee
Partial and Recursive
raghu@theoracletrainer.com
www.theoracletrainer.com
51. Example …
Works – On :
Project , Employee
Total Participation
Hours Worked is a RKA.
Dependants of:
Employee , Dependant
Dependant is a Weaker
Dependant is Total , Employee is Partial.
raghu@theoracletrainer.com
www.theoracletrainer.com
52. One Possible mapping of the Problem
Statement
Name
No
Lname
Fname
Work
s For
Sal
Sex
Loc
Department
SSN
Name
Employee
Sdate
Address
Control
s
manage
s
Bdate
Hours
Project
Work
sOn
Supe
rvise
s
Name
No
Depend On
Dependent
raghu@theoracletrainer.com
Name
Sex
Bdate
Relationship
www.theoracletrainer.com
Loc
58. Normalization and Normal
Forms
Normalization:
Decomposing a larger, complex table into several
smaller, simpler ones.
Move from a lower normal form to a higher Normal
form.
Normal Forms:
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
*Higher Normal Forms (BCNF, 4NF, 5NF ....)
In practice, 3NF is often good enough.
www.theoracletrainer.com
raghu@theoracletrainer.com
59. Why Normal Forms
The first question to ask is whether any
refinement is needed!
If a relation is in a certain normal form (BCNF,
3NF etc.), it is known that certain kinds of
problems are avoided/ minimized. This can be
used to help us decide whether decomposing the
relation will help.
raghu@theoracletrainer.com
www.theoracletrainer.com
60. The Evils of Redundancy
Redundancy is at the root of several problems
associated with relational schemas
More seriously, data redundancy causes several
anomalies: insert, update, delete
Wastage of storage.
Main refinement technique: decomposition
(replacing ABCD with, say, AB and BCD, or
ACD and ABD).
raghu@theoracletrainer.com
www.theoracletrainer.com
61. Refining an ER Diagram - Before
ssn
name
lot
Employee
raghu@theoracletrainer.com
since
Works_in
did
dname
budget
Department
www.theoracletrainer.com
62. Refining an ER Diagram - After
ssn
name
since
did
dname
budget
lot
Employee
raghu@theoracletrainer.com
Works_in
Department
www.theoracletrainer.com
63. First Normal Form
A table is in 1NF, if every row contains exactly one
value for each attribute.
Disallow multivalued attributes, composite attributes
and their combinations.
1NF states that :
domains of attributes must include only atomic (simple,
indivisible) values and that value of any attribute in a tuple
must be a single value from the domain of that attribute.
By definition, any relational table must be in 1NF.
raghu@theoracletrainer.com
www.theoracletrainer.com
64. Functional Dependencies (FDs)
Provide a formal mechanism to express
constraints between attributes
Given a relation R, attribute Y of R is
functionally dependent on the attribute X of R
if & only if each X-value in R has associated
with it precisely one Y-value in R.
raghu@theoracletrainer.com
www.theoracletrainer.com
65. Full Dependency
Concept of full functional dependency
A FD x → y is a full functional dependency if
removal of any attribute A from X means that the
dependency does not hold any more.
raghu@theoracletrainer.com
www.theoracletrainer.com
66. Partial Dependency
An F.D. x → y is a partial dependency if there is
some attribute A ∈ X that can be removed
from X and the dependency will still hold.
raghu@theoracletrainer.com
www.theoracletrainer.com
67. Example: Constraints on Entity Set
S
N
123- 22- 3666 Attishoo
231- 31- 5368 Smiley
131- 24- 3650 Smethurst
434- 26- 3751 Guldu
612- 67- 4134 Madayan
S
N
123- 22- 3666 Attishoo
231- 31- 5368 Smiley
131- 24- 3650 Smethurst
434- 26- 3751 Guldu
612- 67- 4134 Madayan
raghu@theoracletrainer.com
L
48
22
35
35
35
H
40
30
30
32
40
L
48
22
35
35
35
R
8
8
5
5
8
R
8
8
5
5
8
W
10
10
7
7
10
H
40
30
30
32
40
R W
5 7
8 10
www.theoracletrainer.com
68. Second Normal Form (2NF)
A relation schema R is in 2NF if:
it is in 1NF and
every non-prime attribute A in R is fully functionally
dependent on the primary key of R.
2NF prohibits partial dependencies.
raghu@theoracletrainer.com
www.theoracletrainer.com
69. 2NF: An Example
Emp{Eno, Dept, ProjCode, Hours}
Primary key: {Eno, ProjCode}
{Eno} -> {Dept}, {Eno, ProjCode} -> {Hours}
Test of 2NF
{Eno} -> {Dept}: partial dependency.
Emp is in 1NF, but not in 2NF.
Decomposition:
Emp {Eno, Dept}
Proj {Eno, ProjCode,
raghu@theoracletrainer.com
Hours}
www.theoracletrainer.com
70. Transitive Dependency
An FD X → Y in a relation schema R is a
transitive dependency if
there is a set of attributes Z that is not a subset of
any key of R, and
both X → Z and Z → Y hold.
raghu@theoracletrainer.com
www.theoracletrainer.com
71. Third Normal Form
A relation schema R is in 3NF if
It is in 2NF and
No nonprime attribute of R is transitively dependent
on the primary key.
3NF means that each non-key attribute value
in any tuple is truly dependent on the Primary
Key and not even partially on other attributes.
3NF prohibits transitive dependencies.
raghu@theoracletrainer.com
www.theoracletrainer.com
72. 3NF: An Example
Emp{Eno, Dept, Dept_Head}
Primary key: {Eno}
{Eno} -> {Dept}, {Dept} -> {Dept_Head}
Test of 3NF
{Eno} -> {Dept} -> {Dept_Head}: Transitive dependency.
Emp is in 2NF, but not in 3NF.
Decomposition:
Emp {Eno, Dept}
Dept {Dept, Dept_Head}
raghu@theoracletrainer.com
www.theoracletrainer.com
73. Boyce –Codd Normal Form
The intention of BCNF is that- 3NF does not
satisfactorily handle the case of a relation
processing two or more composite or
overlapping candidate keys
raghu@theoracletrainer.com
www.theoracletrainer.com
74. BCNF ( Boyce Codd Normal
Form)
A Relation is said to be in Boyce Codd Normal
Form (BCNF) if and only if every determinant
is a candidate key.
raghu@theoracletrainer.com
www.theoracletrainer.com
Hinweis der Redaktion
Table of Contents
1. Introduction to Database Management Systems (DBMS) (Page : 3-16)
1.1 Database Management System: Definitions
1.2 DBMS
1.3 Benefits of database approach
1.4 DBMS functions
1.5 Database System
1.6 Data Model
1.7 Database Architecture
1.8 An Example of the Three Levels
1.9 Schema
1.10 Data Independence
1.11 Types Of Database Models
1.12 Database Design Phases
2. Introduction to RDBMS (Page : 17-24 )
2.1 Definition: RDBMS
2.2 Features Of an RDBMS
2.3 Some Important Terms
2.4 Properties of Relations
2.5 Keys
2.6 Referential Integrity
2.10 Summary
3. Relational Algebra(Page : 25-36)
3.1 Relational Query Languages
3.2 Example Instances
3.3 Relational Algebra
3.4 Projection
3.5 Selection
3.6 Union, Intersection, Set Difference
3.7 Cross Product
3.8 Joins
3.9 Equi-Joins
3.10 Division
3.11 Summary
4. Introduction to Query Optimization(Page : 37-43)
4.1 Processing a high-level query
4.2 Techniques for Query Optimization
4.3 Motivating Examples
4.2 Summary
5. Conceptual Design Using The Entity-Relational Model (Page : 44-69)
5.1 Overview Of Database Design
5.2 E-R Modeling
5.3 Graphical Representaion
5.4 Types Of Relationships
5.5 E-R Diagram: Some Examples
5.6 Summary and Case Studies
6. Schema Refinement and Normalization (Page : 70-95)
6.1 Normalization and Normal Forms
6.2 Why Normal Forms
6.3 The Evils Of Redundancy
6.4 Refining an ER Diagram
6.5 First Normal Form
6.6 Functional Dependencies
6.7 Example: Constraints On Entity Set
6.8 Second Normal Form
6.9 Transitive Dependency
6.10 Third Normal Form
6.11 Boyce Codd Normal Form (BCNF)
6.12 Decomposition of a Relation Scheme
6.13 Lossless Join Decompositions
6.14 Summary and Examples
7. Transaction, Concurrency Control and Recovery(Page : 96-116)
7.1 Transactions
7.2 The ACID Properties
7.3 Why Have Concurrent Processes?
7.4 Schedules
7.5 Serializable Schedules
7.6 Serializability Violations
7.7 Cascading Aborts
7.8 Recoverable Schedules
7.9 Locking: A Technique For Concurrency Control
7.10 Two-Phase Locking
7.11 Handling A Lock Request
7.11 Recovery
7.12 Logging
7.13 Handling the Buffer Pool
7.14 Write Alead Logging
7.15 Checkpoints in the System Log
7.16 Summary
Bibliographic Reference : Page 117)
Topics Covered :
Database Management System: Definitions
DBMS
Benefits of database approach
DBMS functions
Database System
Data Model
Database Architecture
An Example of the Three Levels
Schema
Data Independence
Types Of Database Models
Database Design Phases
Modern day Computer-based Information Systems (IS) are capable of serving a variety of complex tasks in a coordinated manner. Such systems handle large volumes of data, multiple users and several applications for activities occurring in a central and/ or distributed environment.
The heart of an IS is Database Management. This is because most IS have to handle massive amounts of data. This core module of an IS is called as Database Management System (DBMS). A DBMS provides for storage, retrieval and updation of data in an organized manner.
An Example: Consider the situation in a library. Here, we have data corresponding to books, authors, suppliers, borrowers, etc. The total volume of data stored and handled in a library may be quite large. The Library DBMS may require several operations such as issue, return or purchase of books; handle queries relating to book information, borrowing information, etc. Moreover, there are different types of users who operate various stages or activities. For example, a borrower may merely view certain information, whereas an issuer may be allowed to update the status of a book during issue or return. The Library staff may on the other hand add new books, their supplier, price and other information to the database. Each user category has a different access right on both the data as well as the processing capabilities. Multiple users may concurrently operate the Library DBMS performing several tasks at the same time. They may even try to access the same data simultaneously. It is the job of a DBMS to handle the data and its processing in an integrated, coordinated and consistent manner. Finally, the Library DBMS must have mechanisms to handle system failure (e.g., failure of power, disk crash, etc.) so that the database can be recovered to a consistent state.
A database management system (DBMS) is a collection of programs that facilitates the process of defining, constructing and manipulating databases.
Defining a database involves specifying the types of data to be stored in the database.
Constructing the database is the process of storing the data.
Manipulating a database includes querying the database, updating the database and generating reports from the data.
A DBMS does the following:
Adding new, empty files to the database
Inserting new data into existing files
Retrieving data from existing files
Updating data in existing files
Deleting data from existing files
Removing existing files, empty or otherwise, from the database
DBMS Functions :
Data Definition
Data Manipulation
Data Security and Integrity
Data Recovery and Concurrency
Data Dictionary
Performance
A database management system is a complex piece of software that usually consists of a number of modules. The DBMS may be considered as an agent that allows communication between the various types of users with the physical database and the operating system without the users being aware of every detail of how it is done. To enable the DBMS to fulfil its tasks, the database management system must maintain information about the data itself that is stored in the system. This information would normally include what data is stored, how it is stored, who has access to what parts of it and so on.
The information (data) about the data in a database is called the metadata. In addition to information listed above, some information regarding the use of a database is often collected to monitor the system's performance. This metadata helps management in maintaining an effective and efficient database system.
Three broad classes of users
Application programmers: Responsible for writing application programs that use the database
End users: Interact with the system from workstations or terminals. A given end user can access the database via one of the applications, or can use an interface provided as an integral part of the database system software (such interfaces are also supported by means of applications, of course, but those applications are built-in, not user-written, e.g., query language processor)
Database Administrator (DBA): Creates the actual database and implements technical controls needed to enforce various policy decisions. The DBA is also responsible for ensuring that the system operates with adequate performance and for providing a variety of other related technical services
One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are not needed by most database users. A data model is the main tool for providing this abstraction. A data model is a set of concepts that can be used to describe the structure of a database. It is a collection of high-level data description constructs that hide many low-level storage details.
Categories of Data Models :
Many data models have been proposed. We can categorize data models based on the types of concepts they provide to describe the data structure.
High Level or conceptual data models: provide concepts that are close to the way many users perceive data. Use concepts such as entities, attributes, and relationships, where Entity represents a real world object (e.g., student, employee) or concepts (e.g., course, company), Attribute represents properties that describes objects (e.g., color, name) while Relationships represent an interaction or links among entities (e.g., works-on, is-a, has, etc.)
Low-level or physical data models: provide concepts that describe the details of how data is stored in the computer. Concepts provided by low-level data models are generally meant for computer specialists, not for typical end users. Represent information such as record formats, record orderings, and access paths (structure that makes the search for particular database records efficient i.e. indexing)
Representational or implementation: Between above two extremes is a class of representational (or implementation) data models, which provide concepts that may be understood by end users but that are not too far removed from the way data is organized within the computer. Representational data models hide some details of data storage but can be implemented on a computer system in a direct way.
Three important characteristics of the database approach are
(a) Insulation of programs and data (program-data and program-operation independence).
(b) Support of multiple user views.
(c) Use of a catalog to store database description.
The three schema architecture was proposed to achieve these characteristic.
The Three levels of architecture :
The goal of the three schema architecture is to separate the user applications and the physical database.
The internal level is the one closest to the physical storage, i.e., it is the one concerned with the way data is physically stored
The external level is the one closest to the user, i.e., it is the one concerned with the way data is viewed by individual users
The conceptual level is a level of indirection between the other two
There will be many distinct external views, each consisting of a more or less abstract representation of some portion of the total database, and there will be one conceptual view, consisting of a similarly abstract representation of the database in its entirety. Likewise there will be precisely one internal view, representing the total Database as physically stored.
Mappings
The conceptual/internal mapping : defines the correspondence between the conceptual view and the stored database; it specifies how conceptual records and fields are represented at the internal level
The external/conceptual mapping : defines the correspondence between a particular external view and the conceptual view
A description of data in terms of a data model is called a schema. The description of a database is called database schema, which is specified during database design and is not expected to change frequently.
The Internal View/ Schema :
The internal view (or stored database) is a low-level representation of the entire database. The internal view is defined by the internal schema, which defines the various stored record types and specified what indexes exist, how stored fields are represented and what physical sequence the stored records are in, etc.
The Conceptual View / Schema :
The conceptual view is a representation of the entire content of the database, in a form that is somewhat abstract in comparison with the way in which the data is physically stored.
The conceptual view is defined by means of the conceptual schema, which includes definitions of each of the various conceptual record types.
The External View / Schema :
Each external view is defined by means of an external schema.
External schema consists of definitions of each of the various external record types in that external view.
There must be a definition of the mapping between the external schema and the underlying conceptual schema.
The three level database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout. A database system that is able to separate the three different views of data is likely to be flexible and adaptable. This flexibility and adaptability is data independence.
Physical data independence: The separation of the conceptual view from the internal view enables us to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence.
Logical data independence: Separating the external views from the conceptual view enables us to change the conceptual view without affecting the external views. This separation is sometimes called logical data independence.
Functions of the DBA (Database Administrator):
Defining the conceptual schema -- conceptual database design
Defining the internal schema -- physical database design and define the associated mapping between the internal and conceptual schemas
Liaison with users
Defining security and integrity rules
Defining backup and recovery procedures
Monitoring performance and responding to changing requirements
The most well-known record-based models are the relational model, the network model and the hierarchical model.
Relational model: In this model, each database item is viewed as a record with attributes. A set of records with similar attributes is called a table. Most of the popular commercial DBMS products like Oracle, Sybase, MySQL, etc. are based on relational model.
Network model: represents data as record types. However, unlike the relational model, here we have explicit linkages (expressed in the form of pointers) which relate various records. Each record has a link field corresponding to every relationship which it participates in. IDS (Integrated Data Store) is one of the DBMS product based on network models.
Hierarchical Model: represents data as hierarchical tree. This is a special kind of a network model in which the relationship is essentially a tree-like structure, where one parent may have many children but one child can not have more than one parent. The relationship borrower to books in a library system satisfies this condition. One of the popular DBMS based on hierarchical model is Information Management System (IMS) from IBM.
Object Oriented model: represents DB in terms of objects, their attributes, and their behaviors.
THE FOUR PHASES TO DESIGN ANY DATA BASE SYSTEM ARE:
1. FORMULATION OF INFORMATION REQUIREMENT & ANALYSIS PHASE: This phase is also called Feasibility phase. In this phase, through the interviews and reviewing all related documents and policies in the organization, the following items are identified:
a. Clear and concise definition of the problem
b. Local dependency lists
c. local dependency diagrams
d. Local Schema
2. LOGICAL SCHEMA DESIGN PHASE:
In this phase the following items are performed:
a. Consolidation of dependency lists.
b. Consolidation of logical schema.
The output of this phase is a logical schema that is independent of all computer hardware and software systems.
3. IMPLEMENTATION DESIGN PHASE:
In this phase the logical schema which was designed in the Logical Design Phase is modified to fit the specific data model, hardware and software system that the designer wants to use. This new schema is called IMPLEMENTATION SCHEMA.
4. PHYSICAL DESIGN PHASE:
In this phase the Implementation Schema which was designed in the Implementation Phase is programmed using the DDL (Data Definition Language) or any other software language which is available for the programmer.
Topics Covered :
Definition: RDBMS
Features of an RDBMS
Some Important Terms
Properties of Relations
Keys
Referential Integrity
Summary
Domain :
An attribute of an entity set has a particular value. The set of possible values that a given attribute can have is called its domain.
For example, the set of values that the attribute EMPLOYEE.id can assume is a positive integer of 5 digits.
Primary Key :
A unique identifier for the table (a column or a column combination with the property that at any given time no two rows of the table contain the same value in that column or column combination)
Key: An attribute or set of attributes whose values uniquely identify each entity in an entity set is called a key for that entity set.
Super Key: If we add additional attributes to a key, the resulting combination would still uniquely identify an instance of the entity set. Such augmented keys are called super keys.
Primary key: It is a minimum super key.
Candidate Keys : There may be two or more attributes or combinations of attributes that uniquely identify an instance of an entity set.These attributes or combinations of attributes are called candidate keys.
In such a case we must decide which of the candidate keys will be used as the primary key. The remaining candidate keys would be considered alternate keys.
Secondary Key: A secondary key is an attribute or combination of attributes that may not be a candidate key but that classifies the entity set on a particular characteristic.
A case in point is the entity set EMPLOYEE having the attribute department, which identifies by its value all instances EMPLOYEE who belong to a given department.
Any key consisting of a single attribute is called a simple key while that consisting of a combination of attributes is called a composite key.
A set of fields is a key for a relation if :
1. No two distinct tuples can have same values in all key fields, and
2. This is not true for any subset of the key.
If there’s >1 key for a relation, one of the keys is chosen (by DBA) to be the primary key . Eg. sid is a key for Students. (What about name ?) The set {sid, gpa} is a superkey.
Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key .
Foreign key: Set of fields in one relation that is used to `refer’ to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer’.
Eg. sid is a foreign key referring to Students:
– Enrolled (sid: string, cid: string, grade: string)– If all foreign key constraints are enforced, referential integrity is achieved, ie., no dangling references.
Enforcing Referential Integrity
Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students. What should be done if an Enrolled tuple with a non-existent student id is inserted? (Reject it!)
What should be done if a Students tuple is deleted?
– Also delete all Enrolled tuples that refer to it.
– Disallow deletion of a Students tuple that is referred to.
– Set sid in Enrolled tuples that refer to it to a default sid
– (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’) Similar if primary key of Students tuple is updated.
Summary :
A tabular representation of data.
Simple and intuitive, currently the most widely used.
Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.
– Two important Integrity Constraints: primary and foreign keys
– In addition, we always have domain constraints.
Powerful and natural query languages exist.
Topics Covered :
Database Design
E-R Modeling
Example E-R Diagrams
Summary
Case Studies
The database design can be divided into following steps:
Requirement Analysis: First of all, we should be clear about what the users want from database, what data to be stored, and operations to be performed.
Conceptual Design: The information gathered in the requirements analysis step is used to develop a high level description of the data to be stored in the database. In this step we have to address the following:
-What are the entities and relationships in the enterprise?
-What information about these entities and relationships should we store in the database?
-What are the integrity constraints or business rules that hold?
This step is often carried out using the ER model, or a similar high-level model. A database `schema’ in the ER Model can be represented pictorially ( ER diagrams ).
Logical Database Design: We must choose a DBMS to implement our database design, and convert the conceptual database design into a database schema in the data model of the chosen DBMS. For example, we can map an ER diagram into a relational database schema.
Schema Refinement (Normalization): Check relational schema for redundancies and related anomalies.
Physical Database Design and Tuning : Consider typical workloads and further refine the database design.
The Basic Design Phases is divided into different Phases:1. Requirement Collection & Analysis :
- The Database Designers Interview Prospective Database users to understand andDocument their Data requirements. The result of this step is concisely written set of users requirements. This concept of user defined operations that will be applied to the database and they include both retrievals and updates in soft ware design.
2. Conceptual Design :It is a concise description of the data requirements of the users and include detailed descriptions of the entity types , relationships and constraints and they are expressed using
The concepts provided by the high level data model.
3. Logical Design :
Identification of Data Model Mapping is done here. RDBMS / DBMS / Object Model
4. Physical Design :
The Internal storage structures / access paths and file organizations for the database files are specified. These Activities and application programs are designed and implemented as database transactions corresponding to the high level specifications.
Entity :
An Entity is a thing that exists and is distinguishable.
For example, each chair is an entity. So is each person and each automobile.
Entities can have concrete existence or constitute ideas or concepts.
Concepts like love and hate are entities.
Entity Set :
A group of similar entities forms an entity set.
Examples of entity sets are:
1. All Persons
2. All Automobiles
3. All Emotions
Attributes :
Attributes are the properties that characterize an entity set.
For Example, employees of an organization are modeled by the entity set EMPLOYEE. We must include in the model the properties of the employees that may be useful to the organization. Some of these properties are name, address, skill etc.
Relationship: It is an association between two or more entities.
For example, we may have the relationship that an employee works in a department.
There are two types of entities: regular and weak.
A regular (independent) entity does not depend on any other entity for its existence. For example, Employee is a regular entity. A regular entity is depicted using a rectangle.
An entity whose existence depends on the existence of another entity is called a weak (or dependent) entity. For example, the dependent of an employee is a weak entity, whose existence depends on the entity Employee. A dependent entity is depicted in a double-lined box, or a darkened rectangle.
Similarly, relationships can also be regular or weak.
Entity: Real- world object distinguishable from other objects. It could be an object, place, person, concept or activity about which an enterprise records data. To qualify something as an entity, it should
– Have an independent existence
– Be of interest to us.
An entity is described (in DB) using a set of attributes .
Entity Set : A collection of similar entities. Eg., all employees.
– All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!)
– Each entity set has a key .
– Each attribute has a domain .
– Can map entity set to a relation easily
A relationship is defined as an association among entities. For example, there is a relationship between students and course, which can be named as ‘enrols in’.
A relationship set is an association of entity sets (eg. student- course) while a relationship instance is an association of entity instances (eg. Ravi- DBMS).
An n- ary relationship set R relates n entity sets E1 ... En; each relationship in R involves entities e1 E1, ..., en En
Same entity set could participate in different relationship sets, or in different “roles” in same set.
A relationship is depicted by a diamond, with the name of the relationship type.
There are three types of relationships:
- One-to-one: One student is issued only one card (and vice-versa).
- One-to-many (or many-to-one): One Student can enrol for only one course, but one course can be offered to many students.
- Many-to-many: One Student can take many tests, and one test can be taken by many Students.
In above figure, we show the relationship set Works_in, in which each relationship indicates a department in which an employee works.
The entities are described by a set of attributes and identified by primary keys (PK).
Employee:
Attributes ssn, name, lot
PK: ssn
Department:
Attributes: did, dname, budget
PK: did
The entity sets that participate in a relationship set need not be distinct; sometimes a relationship might involve two entities in the same entity set. For example, in Reports_To relationship set, every relationship is of the form (emp1, emp2).
An instance of a relationship set is a set of relationships. Intuitively, an instance can be thought of as a ‘snapshot’ of the relationship set at some instance in time.
Relationship sets can also have descriptive attributes (e. g., the since attribute of Works_ In).
A relationship must be uniquely identified by the participating entities, without reference to the descriptive attributes. In the Works_in relationship set, for example, each Works_in relationship must be uniquely identified by the combination of employee ssn and department did. Thus, for a given employee-department pair, we cannot have more than one associated since value.
Thus, in translating a relationship set to a relation, attributes of the relation must include:
Keys for each participating entity set (as foreign keys). This set of attributes forms superkey for the relation.
All descriptive attributes.
A key constraint between an entity set S and a relationship set restricts instances of the relationship set by requiring that each entity of S participate in at most one relationship.
Consider Manages: Each dept has at most one manager, according to the key constraint on ‘Manages’ relationship (In contrast, Works_In relationship of earlier slide shows that an employee can work in many departments and a dept can have many employees). The arrow from Department to Manages indicates that each Department entity appears in at most one Manages relationship in any allowable instance of Manages. Thus given a Department entity, we can uniquely determine the Manages relationship in which it appears.
Translating ER Diagrams with Key Constraints:
Map relationship to a table: Note that did is the key now!
– Separate tables for Employees and Departments.
Since each department has a unique manager, we could instead combine Manages and Departments.
Manages table without Key constraint:
CREATE TABLE Manages(
ssn CHAR( 11),
did INTEGER,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn)
REFERENCES Employees,
FOREIGN KEY (did)
REFERENCES Departments)
Ternary Relationship: A relationship set involving three entity sets is known as a ternary Relationship.
Eg. Works_in relationship involving Employee, Department and Location Entity sets.
In above slide, we show a ternary relationship with a key constraint. The key constraint indicates that each employee works in at most one department, and at a single location. Notice that each department can be associated with several employees and locations, and each location can be associated with several departments and employees; however, each employee is associated with a single department, and location.
The key constraint on Manages tells us that a Department has at most one Manager (indicated by arrow). Let us now ask: Does every department have a manager? If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial ). The total participation is indicated by a dark line between entity and relationship. A participation that is not total is said to be partial. Eg. participation of Employee in Manages is partial.
The participation constraint specifies whether the existence of an entity depends on its being related to another entity via the relationship type.
A participation constraint between an entity set S and a relationship set restricts instances of the relationship set by requiring that each entity of S participate in at least one relationship.
Every did value in Department table must appear in a row of the Manages table (with a non- null ssn value!).
Similarly, every ssn value in Employee table must appear in a row of the Works_in table.
Participation Constraints in SQL: We can capture participation constraints involving one entity set in a binary relationship, but little else (without resorting to CHECK constraints).
CREATE TABLE Dept_ Mgr(
did INTEGER,
dname CHAR( 20),
budget REAL,
ssn CHAR( 11) NOT NULL,
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE NO ACTION )
A weak entity’s existence is dependent on another (owner) entity. Hence a weak entity will not have it’s own key. It can be identified uniquely only by considering the primary key of it’s owner entity.
– Owner entity set and weak entity set must participate in a one-to-many relationship set (1 owner, many weak entities).
– Weak entity set must have total participation in this identifying relationship set.
Translating Weak Entity Sets:
Weak entity set and identifying relationship set are translated into a single table.
– When the owner entity is deleted, all owned weak entities must also be deleted.
Eg. If the employee quits, any policy owned by the employee is terminated. All the relevant policy and dependent information is also deleted from the database.
To indicate that Dependent is a weak entity and policy is its identifying relationship, we draw both with dark lines.
CREATE TABLE Dep_ Policy (
pname CHAR( 20),
age INTEGER,
cost REAL,
ssn CHAR( 11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE )
As in C++, or other Programming Languages, attributes are inherited.
If we declare A ISA B, every A entity is also considered to be a B.
entity. (Query answers should reflect this: unlike C++!)
Overlap constraints : Can Joe be an Hourly_ Emp as well as a Contract_ Emp entity? ( Allowed/ disallowed )
Covering constraints : Does every Employee entity also have to be an Hourly_ Emp or a Contract_ Emp entity? (Yes/ no)
Reasons for using ISA :
– To add descriptive attributes specific to a subclass .
– To identify entities that participate in a relationship
Translating ISA Hierarchies to Relations:
General approach:
– 3 relations: Employee, Hourly_ Emp and Contract_ Emp.
Hourly_ Emp : Every employee is recorded in Employee.
For hourly emps, extra info recorded in
Hourly_ Emp ( hourly_ wages, hours_ worked, ssn) ;
must delete Hourly_ Emps tuple if referenced Employees tuple is deleted).
Queries involving all employees easy, those involving just Hourly_ Emp require a join to get some attributes.
Alternative: Just Hourly_ Emp and Contract_ Emp.
– Hourly_ Emp : ssn, name, lot, hourly_ wages, hours_ worked.
– Contract_ Emp : ssn, name, lot, contractid.
– Each employee must be in one of these two subclasses
Aggregation
Aggregation is meant to represent a relationship between a whole object and its component parts.
Used when we have to model a relationship involving (entitity sets and) a relationship set .
– Aggregation allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships.
– Eg. A Project is sponsored by a Department. This is a simple relationship.
An Employee monitors this Sponsorship (and not Project or Department). This is aggregation.
– Monitors mapped to table like any other relationship set.
Aggregation vs. ternary relationship:
Can we express relationships involving other relationships without using aggregation?
– The use of aggregation vs. ternary relationship may be guided by certain integrity constraints.
– Eg. we can impose a constraint that each sponsorship is monitored by at most one employee (not possible without aggregation).
Conceptual Design Using the ER Model
Design choices:
– Should a concept be modelled as an entity or an attribute?
– Should a concept be modelled as an entity or a relationship?
– Identifying relationships: Binary or ternary? Aggregation?
Entity vs. Attribute
Should address be an attribute of Employees or an entity (connected to Employees by a relationship)?
Depends upon the use we want to make of address information, and the semantics of the data:
If we have several addresses per employee, address must be an entity (since attributes cannot be set- valued).
If the structure (city, street, etc.) is important, e. g., we want to retrieve employees in a given city, address must be modelled as an entity (since attribute values are atomic).
Otherwise, address can be used as an attribute of Employee.
Similar to the problem of wanting to record several addresses for an employee: we want to record several values of the descriptive attributes for each instance of this relationship.
Consider that an employee works in a given department over more than one period. This possibility is ruled out by the ER diagram’s semantics of previous slide. The problem is that we want to record several values for descriptive attributes for each instance of Works_in relationship. We can address this problem by introducing an entity set called Duration, with attributes from and to.
ER diagram OK if a manager gets a separate discretionary
budget for each dept.
What if a manager gets a discretionary budget that covers all managed depts?
– Redundancy of dbudget, which is stored for each dept managed by the manager.
– Misleading: suggests dbudget tied to managed dept.
One of the possible designs to resolve the two issues of the previous ER diagram:
We model the appointment as an entity set, say Mgr_appt, and use a ternary relationship, say manages, to relate a manager, an appointment, and a department. The dbudget is now associated with the appointment of the employee as manager of a group of departments. The details of an appointment (such as the discretionary budget) are not repeated for each department that is included in the appointment now, although there is still one Manages relationship instance per such Department.
Above figure models a situation in which an employee can own several policies, each policy can be owned by several employees, and each dependent can be covered by several policies.
Suppose we have following constraint:
Each policy is owned by just 1 employee
– Key constraint on Policy would mean policy can only cover 1 dependent!
The key constraints allow us to combine Purchaser with Policy and Beneficiary with Dependent.
Participation constraints lead to NOT NULL constraints.
CREATE TABLE Policy (
policyid INTEGER,
cost REAL,
ssn CHAR( 11) NOT NULL,
PRIMARY KEY (policyid),
FOREIGN KEY (ssn) REFERENCES Employee,
ON DELETE CASCADE )
CREATE TABLE Dependent (
pname CHAR( 20),
age INTEGER,
policyid INTEGER,
PRIMARY KEY (pname, policyid),
FOREIGN KEY (policyid) REFERENCES Policy,
ON DELETE CASCADE )
Constraints in the ER Model:
– A lot of data semantics can (and should) be captured.
– But some constraints cannot be captured in ER diagrams.
Need for further refining the schema:
– Relational schema obtained from ER diagram is a good first step. But ER design subjective & can’t express certain constraints; so this relational schema may need refinement.
Functional dependencies:
– e. g., A dept can’t order two distinct parts from the same supplier .
Can’t express this wrt ternary Contracts relationship.
– Normalization refines ER design by considering FDs.
Inclusion dependencies:
– Special case: Foreign keys (ER model can express these).
– e. g., At least 1 person must report to each manager. (Set of ssn
values in Manages must be subset of supervisor_ ssn values
in Reports_ To.) Foreign key? Expressible in ER model?
General constraints:
– e. g., Manager’s discretionary budget less than 10% of the
combined budget of all departments he or she manages .
Regular Entities : Each regular entity type maps into a base relation
The database will thus contain 5 base relations : DEPT, EMP, Supplier, Part and Project; the primary keys for these relations being : DEPT#, EMP#, S#, P# and J#
Weak Entities :
The relationship from a weak entity type to the entity type on which it depends is of course a many-to-one relationship.
However the foreign key rules for that relationship be as follows :
DELETE CASCADES
UPDATE CASCADES
An Entity Type Department has attributes Name , Number, Location , Manager and Manager Start date. Location Is a Multi Valued attribute. Name and Number are key attributes since each was specified to be Unique.
An Entity Type Project with attributes Name, Number , Locaiton and Controlling Department. Both Name and Number are key attributes.
Employee Entity with attributes name , SSN ( Social Security Number ) , Gender, Birth Date , Salary , Supervisor. Both name and address are composite in nature.
Dependent Type is an Weaker Entity , SSN, Name of Dependant , Gender , Date of Birth , Relationship ( To the Employee).
NOTE : The Design is called Chen Design for Identifying Entities before implenting ER Diagram.
Number of People Work in the location can be a derived type.
Summary of Conceptual Design
Conceptual design follows requirements analysis,
– Yields a high- level description of data to be stored
ER model popular for conceptual design
– Constructs are expressive, close to the way people think about their applications.
Basic constructs: entities, relationships, and attributes (of entities and relationships).
Some additional constructs: weak entities, ISA hierarchies, and aggregation .
Note: There are many variations on ER model.
Summary of ER
Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/ covering constraints for ISA hierarchies. Some foreign key constraints are also implicit in the definition of a relationship set.
– Some of these constraints can be expressed in SQL only if we use general CHECK constraints or assertions.
– Some constraints (notably, functional dependencies ) cannot be expressed in the ER model.
– Constraints play an important role in determining the best database design for an enterprise.
ER design is subjective . There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include:
Entity vs. attribute, entity vs. relationship, binary or n- ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation.
Ensuring good database design: resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful.
Case Studies:
1. Prescriptions-R-X chain
The Prescriptions-R-X chain of pharmacies has offered to give you a free lifetime supply of medicines if you design its database. Given the rising cost of health care, you agree. Here's the information that you gather:
Patients are identifed by an SSN, and their names, addresses, and ages must be recorded.
Doctors are identifed by an SSN. For each doctor, the name, specialty, and years of experience must be recorded.
Each pharmaceutical company is identified by name and has a phone number.
For each drug, the trade name and formula must be recorded. Each drug is sold by a given pharmaceutical company, and the trade name identifes a drug uniquely from among the products of that company. If a pharmaceutical company is deleted, you need not keep track of its products any longer.
Each pharmacy has a name, address, and phone number.
Every patient has a primary physician. Every doctor has at least one patient.
Each pharmacy sells several drugs and has a price for each. A drug could be sold at several pharmacies, and the price could vary from one pharmacy to another.
Doctors prescribe drugs for patients. A doctor could prescribe one or more drugs for several patients, and a patient could obtain prescriptions from several doctors. Each prescription has a date and a quantity associated with it. You can assume that if a doctor prescribes the same drug for the same patient more than once, only the last such prescription needs to be stored.
Pharmaceutical companies have long-term contracts with pharmacies. A pharmaceutical company can contract with several pharmacies, and a pharmacy can contract with several pharmaceutical companies. For each contract, you have to store a start date, an end date, and the text of the contract.
Pharmacies appoint a supervisor for each contract. There must always be a supervisor for each contract, but the contract supervisor can change over the lifetime of the contract.
1. Draw an ER diagram that captures the above information. Identify any constraints that are not captured by the ER diagram.
2. How would your design change if each drug must be sold at a fixed price by all pharmacies?
3. How would your design change if the design requirements change as follows: If a doctor prescribes the same drug for the same patient more than once, several such prescriptions may have to be stored.
2. Dane County Airport
Computer Sciences Department frequent have been complaining to Dane County Airport officials about the poor organization at the airport. As a result, the officials have decided that all information related to the airport should be organized using a DBMS, and you've been hired to design the database. Your first task is to organize the information about all the airplanes that are stationed and maintained at the airport.
The relevant information is as follows:
Every airplane has a registration number, and each airplane is of a specic model.
The airport accommodates a number of airplane models, and each model is identified by a model number (e.g., DC-10) and has a capacity and a weight.
A number of technicians work at the airport. You need to store the name, SSN, address, phone number, and salary of each technician.
Each technician is an expert on one or more plane model(s), and his or her experitise may overlap with that of other technicians. This information about technicians must also be recorded.
Traffic controllers must have an annual medical examination. For each Traffic controller, you must store the date of the most recent exam.
All airport employees (including technicians) belong to a union. You must store the union membership number of each employee. You can assume that each employee is uniquely identified by the social security number.
The airport has a number of tests that are used periodically to ensure that air-planes are still airworthy. Each test has a Federal Aviation Administration (FAA) test number, a name, and a maximum possible score.
The FAA requires the airport to keep track of each time that a given airplane is tested by a given technician using a given test. For each testing event, the information needed is the date, the number of hours the technician spent doing the test, and the score that the airplane received on the test.
1. Draw an ER diagram for the airport database. Be sure to indicate the various attributes of each entity and relationship set; also specify the key and participation constraints for each relationship set. Specify any necessary overlap and covering constraints as well (in English).
2. The FAA passes a regulation that tests on a plane must be conducted by a technician who is an expert on that model. How would you express this constraint in the ER diagram? If you cannot express it, explain briefly.
3. University Database:
Consider the following information about a university database:
Professors have an SSN, a name, an age, a rank, and a research specialty.
Projects have a project number, a sponsor name (e.g., NSF), a starting date, an ending date, and a budget.
Graduate students have an SSN, a name, an age, and a degree program (e.g., M.S. or Ph.D.).
Each project is managed by one professor (known as the project's principal investigator).
Each project is worked on by one or more professors (known as the project's co-investigators).
Professors can manage and/or work on multiple projects.
Each project is worked on by one or more graduate students (known as the project's research assistants).
When graduate students work on a project, a professor must supervise their work on the project. Graduate students can work on multiple projects, in which case they will have a (potentially different) supervisor for each one.
Departments have a department number, a department name, and a main office. Departments have a professor (known as the chairman) who runs the department.
Professors work in one or more departments, and for each department that they work in, a time percentage is associated with their job.
Graduate students have one major department in which they are working on their degree.
Each graduate student has another, more senior graduate student (known as a student advisor) who advises him or her on what courses to take.
Design and draw an ER diagram that captures the information about the university.
Use only the basic ER model here, that is, entities, relationships, and attributes. Be sure to indicate any key and participation constraints.
Topics Covered :
Normalization and Normal Forms
Why Normal Forms
The Evils Of Redundancy
Refining an ER Diagram
First Normal Form
Functional Dependencies
Example: Constraints On Entity Set
Second Normal Form
Transitive Dependency
Third Normal Form
Boyce Codd Normal Form (BCNF)
Decomposition of a Relation Scheme
Lossless Join Decompositions
Summary and Examples
Normalization is a step-by-step decomposition of complex records into simple records. It results in the formation of tables that satisfy certain specified constraints, and represent certain normal forms. Normalization reduces redundancy using the principle of non-loss decomposition. A fully normalized record consists of
- A primary key that identifies an entity
-A set of attributes that describe the entity
Several normal forms have been identified, the most important and widely used of which are
first normal form
second normal form
third normal form and
Boyce-Codd normal form.
In order to produce good database design, we should ask questions like:
1) Does this design ensure that all database operations will be efficiently performed and that the design does not make the DBMS perform expensive consistency checks which could be avoided?
2) Is the information unnecessarily replicated?
Unless these issues are properly handled several difficulties like redundancy and loss of information may arise. There are several methods to avoid the above mentioned problems. One such method is database decomposition through normalization, which tries to minimize redundancy and the efforts of checking of constraints and dependencies.
Redundancy problems associated with relational schemas:
– redundant storage, insert/ delete/ update anomalies
Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.
Decomposition should be used judiciously:
– Is there reason to decompose a relation?
– What problems (if any) does the decomposition cause?
Consider the above ER diagram, with the Works_in relation having a Key constraint indicating that an employee can work in at most one department.
ER diagram can be translated into two relations:
Worker (ssn, name, lot, since, did)
Department (did, dname, budget)
– Lots associated with workers.
Suppose all workers in a dept are assigned the same lot: D L ie. did functionally determines lot.This leads to redundancy.
The redundancy in earlier slide can be fixed by breaking the relation Worker as:
Workers (ssn, name, since, did)
Dept_ Lots( did, lot)
Can fine- tune this:
Workers (ssn, name, since, did)
Department (did, dname, budget, lot)
EMP_PROJ = {eno, ename, {pnumber, hours}} mutivalued
eno is the primary key
Above relation not in 1NF
Pnumber is the partial primary key of each nested relation.
Within each tuple, the nested relation must have unique values of pnumber
Break EMP_PROJ as:
EMP_PROJ1(eno, ename)
EMP_PROJ2(eno, pnumber, hours)
Given a relation R, attribute A is functionally dependent on B if each A in R is associated with precisely one value of B.
We say B functionally determines A and represent it as B A
This means that there can be no two tuples which have the same value of attribute A and different values in attribute B.
An FD is a statement about all allowable relations.
– Must be identified based on semantics of application.
– Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R!
K is a candidate key for R means that K R
– However, K R does not require K to be minimal!
Role of FDs in detecting redundancy:
– Consider a relation R with 3 attributes, ABC.
No FDs hold: There is no redundancy here.
Given A B: Several tuples could have the same A value, and if so, they’ll all have the same B value!
Reasoning About FDs
Given some FDs, we can usually infer additional FDs:
ssn did, did lot implies ssn lot
Full Dependency:
An attribute B of a relation R is fully functional dependent on attribute A of R if it is functionally dependent on A & not functionally dependent on any proper subset of A.
{Eno, Pnumber} HOURS
Full functional dependency:
Eno hours and Pnumber Hours DOESN’T HOLD
{Eno, Pnumber} Ename
Partial dependency:
Eno Ename holds.
Consider relation obtained from Hourly_ Emps:
– Hourly_ Emps ( ssn, name, lot, rating, hrly_ wages, hrs_ worked )
Notation : We will denote this relation schema by listing the attributes: SNLRWH
– This is really the set of attributes {S, N, L, R, W, H}.
– Sometimes, we will refer to all attributes of a relation by using the relation name. (e. g., Hourly_ Emps for SNLRWH)
Some FDs on Hourly_ Emps:
– ssn is the key: S SNLRWH
– rating determines hrly_ wages : R W
Problems due to R W :
– Update anomaly : Can we change W in just the 1st tuple of SNLRWH?
– Insertion anomaly : What if we want to insert an employee and don’t know the hourly wage for his rating?
– Deletion anomaly : If we delete all employees with rating 5, we lose the information about the wage for rating 5!
General Definition of 2NF :
A table is said to be in 2NF when it is in 1NF and every non-prime attribute in the record is functionally dependent upon the whole key, and not just part of the key.
The steps for converting a database to 2NF are:
Find and remove attributes that are related to only a part of the key
Group the removed items in another table
Assign the new table a key that consists of that part of the old composite key
If a relation is not in 2NF, it can be further normalized into a number of 2NF relations.
EP1
Eno, Pnumber, Hours
EP2
Eno, Ename
EP3
Pnumber, Pname, Plocation
EP1, EP2 AND EP3 satisfy 2NF.
The data stored in the table
Emp{Eno, Dept, ProjCode, Hours}
is in 1NF. The Primary key here is composite: {Eno, ProjCode}
The attributes of this table depend upon only part of the Primary key:
Eno + ProjCode functionally determines Hours.
Eno functionally determines Dept. Attribute Dept has no dependency on ProjCode.
The situation could lead to the following problems:
Insertion: The record of employee cannot be entered until the employee is assigned a project.
Updation: For a given employee, the employee code and department is repeated several times. Hence, if an employee is transferred to another department, this change will have to be recorded in every instance or record of the employee. Any omissions will lead to inconsistencies.
Deletion: If an employee completes work on a project, the employee’s record will be deleted. The information regarding the department the employee belongs to will also be lost.
This table should therefore be decomposed without any loss of information as:
Emp {Eno, Dept}
Proj {Eno, ProjCode, Hours}
EMP_DEPT
Ename, Eno, Bdate, Addr, Dnumber, Dname, DMgrNo
Eno DMgrNo is a transitive dependency.
Dependency of DMgrNo on key attribute Eno is transitive via Dnumber because Eno Dnumber and Dnumber DMgrNo hold well.
Dnumber is not a subset of the key of EMP_DEPT.
General Definition of 3NF :
A relation schema R is in 3NF if whenever a functional dependency X A hold in R, then either (a) X is a superkey of R or (b) A is a prime attribute of R
R is in 3NF if every nonprime attribute of R is
(a) fully functionally dependent on every key of R and
(b) non-transitively depedent on every key of R.
If 3NF is violated by X A, one of the following holds:
X is a subset of some key K
We store (X, A) pairs redundantly.
X is not a proper subset of any key.
There is a chain of FDs K X A, which means that we cannot associate an X value with a K value unless we also associate an A value with an X value.
Consider the table Emp:
Emp{Eno, Dept, Dept_Head}
The primary key here is Eno. The attribute dept is dependent on Eno. The attribute Dept_Head is dependent on Dept.
Notice that there is an indirect dependence on the primary key.
Emp is in 2NF but not in 3NF because of transitive dependency of Dept_Head on Eno via Dept;.
The problems with dependency of this kind are:
Insertion: The department head of a new department that does not have any employees as yet cannot be entered.
Updation: For a given department, the particular head’s code is repeated several times. Hence, if a department head moves to another department, the changes will have to be made consistently across the table.
Deletion: If a particular employee’s record is deleted, the information regarding the head of the department will be a loss of information.
The relation is therefore decomposed to the following two relations:
Emp{Eno, Dept}
Dept{Dept, Dept_Head}
Emp and Dept are in 3NF. Natural join of Emp and Dept will recover original EMP table.