3. DATA MODELS
Data models in DBMS help define how the logical structure of a database is
modelled.
Data Models are basically the fundamental entities that introduce abstraction
in DBMS.
These Data models also define how data is connected to each other and
how they are processed and stored inside the system.
Types of Data model
Conceptual
Physical
Logical
3
4. WHY NEED THIS DATA MODELS?
It ensures that all data objects required by the database are accurately represented.
The omission of data at times will lead to the creation of faulty reports and produce
incorrect results.
A data model helps in designing the Database at the conceptual, physical and logical levels.
The structure helps to define the relational tables, primary and foreign keys, and stored
procedures.
It is also helpful to identify missing and redundant data.
4
5. CONCEPTUAL DATA MODELS
This type of Data Model defines what the system contains. The Conceptual model is created by Data
Architects in general. The purpose is to organize, scope and define business concepts and rules.
There are 3 basic styles under Conceptual Data models:
o Entity
o Attribute
o Relationship
This can be referred to as the Entity-Relationship Model.
Entity-Relationship (ER) Model is based on the idea of real-world entities and relationships among
them. This ER Model is best used for the Conceptual Design of a Database.
5
6. CONTINUE…
Entity: An Entity in an ER Model is a real-world entity having properties named as Attributes.
Every attribute is defined by its set of values called the Domains.
For example, consider the details of a Student. The details like the name, age, class, section and all of
these come under the entity.
Relationship: The logical association among the entities is called a Relationship. These Relationships
are mapped with entities in different ways. The Mapping (one-to-one, one-to-many, many-to-many)
defines the number of association between two entities.
6
relationshi
p
ENTITYENTITY
attribut
e
attribut
e
attribut
e
attribut
e
7. PHYSICAL DATA MODEL
A Physical Data Model helps in describing the database-specific implementation of the Data
model. The Physical Data model offers an abstraction of the Database and helps to
generate the Schema.
This Physical Data model also helps to visualize the Database structure. It also helps to
model Database columns keys, constraints, indexes, triggers, and other RDBMS features.
7
Customer
name(varchar)
Customer number(int)
Product
name(varchar)
Product price(int)
customer product
Primary key
Customer number
Unique key
Product name
8. LOGICAL DATA MODEL
Logical data models help to add further information to the Conceptual model elements.
This model defines the structure of the data elements and also set the corresponding
relationships between them.
In this, no Primary or Secondary key is defined and you need to verify and adjust the
connector details that were set earlier for relationships.
The main advantage of this Logical data model is to provide a foundation to form the base
for the Physical model.
8
Customer
name(varchar)
Customer number(int)
Product
name(varchar)
Product price(int)
productcustomer
9. DBMS KEYS
A DBMS Key is an attribute or a set of attributes which help you uniquely identify a record or
a row of data in a relation (table).
Keys are the most important concept of Databases. Keys play a vital role in the Relational
Database. This is used for identifying unique rows from the table. It also establishes the
relationship among tables.
DBMS possess different Keys that have different functionalities.
super key
candidate key
primary key
foreign key
composite and compound key
alternate key
surrogate key
9
10. EXAMPLE
STUDENT TABLE
10
SID SNAME REGID BRANCHCODE SEMAIL
1 AKASH VIT-2018-04 VIT akash@gmail.com
2 SUMIT VIT-2018-24 VIT sumit1@gmail.com
3 SUMIT AT-2018-75 AT sumit2@gmail.com
4 SAHIL ME-2018-02 ME gaurav@gmail.com
5 ROHIT EL-2018-09 EL rohit@gmail.com
SID AND SEMAIL are Keys for Student Table
Then we can say for sure that each row will be identifiable using SID (because it has unique value)
SID IS MANDATORY
11. WHY WE NEED DBMS KEYS?
11
For identifying any row of data in a table uniquely
We can force identity of data and ensure integrity of data is
maintained.
To establish relationship between tables and identifying
relationship between tables.
12. SUPER KEY
Super Key is defined as a set of attributes within a table that can uniquely identify each record
within a table. Super Key is a superset of Candidate key.
In the table defined above super key would include SID,REGID,EMAIL,SID+REGID,REGID+EMAIL,
EMAIL+SID, SID+REGID+EMAIL.
Confused? The first one is pretty simple as SID is unique for every row of data, hence it can be
used to identity each row uniquely.
Next comes, (SID, SNAME), now name of two students can be same, but their SID can't be same
hence this combination can also be a key.
Similarly, SEMAIL for every student will be unique, hence again, SEMAIL can also be a key.
So they all are super keys.
12
13. CANDIDATE KEY
Candidate keys are defined as the minimal set of fields which can uniquely identify each
record in a table. Or It is nothing but minimal subset of super key.
If any proper subset of a super key is a super key then that key cannot be a candidate key.
It is an attribute or a set of attributes that can act as a Primary Key for a table to uniquely
identify each record in that table. There can be more than one candidate key.
In our example, SID and SEMAIL both are candidate keys for table Student.
A candidate key can never be NULL or empty. And its value should be unique.
There can be more than one candidate keys for a table.
A candidate key can be a combination of more than one columns(attributes).
13
14. PRIMARY KEY & ALTERNATE KEY
Primary Key is the candidate key chosen to uniquely identify each row of data in a table.
or
Primary key is a candidate key that is most appropriate to become the main key for any table.
It is a key that can uniquely identify each record in a table.
No two rows can have the same primary key value, primary key value cannot be NULL and
every row must have a primary key.
For the table Student we can make the any column from SID, REGID, SEMAIL as the primary
key.
If we choose REDID (more appropriate) as Primary Key then SID and SEMAIL will become
Alternate Key.
The candidate key which are not selected as primary key are known as secondary keys or
alternative keys.
14
15. FOREIGN KEY
It is an attribute in a table which is used to define its relationship with another table.
Using foreign key helps in maintaining data integrity for tables in relationship.
15
BRANCHCODE BRANCHNAME HOD
CS Computer Science Mr. CS
VIT Vocational Information Technology Mr. VIT
ME Mechanical Engineering Mr. ME
EL Electronics Engineering Mr. EL
Lets, BRANCH TABLE
16. FOREIGN KEY
16
SID SNAME REGID BRANCHCODE SEMAIL
1 AKASH VIT-2018-04 VIT akash@gmail.com
2 SUMIT VIT-2018-24 VIT sumit1@gmail.com
3 SUMIT AT-2018-75 AT sumit2@gmail.com
4 SAHIL ME-2018-02 ME gaurav@gmail.com
5 ROHIT EL-2018-09 EL rohit@gmail.com
Foreign Key
Student- Branch
Table relationship
17. 17
SID SNAME REGID BRANCHCODE SEMAIL
1 AKASH VIT-2018-04 VIT akash@gmail.com
2 SUMIT VIT-2018-24 VIT sumit1@gmail.com
3 SUMIT AT-2018-75 AT sumit2@gmail.com
4 SAHIL ME-2018-02 ME gaurav@gmail.com
5 ROHIT EL-2018-09 EL rohit@gmail.com
BRANCHCODE BRANCHNAME HOD
CS Computer Science Mr. CS
VIT
Vocational Information
Technology
Mr. VIT
ME Mechanical Engineering Mr. ME
EL Electronics Engineering Mr. EL
STUDENT TABLE
BRANCH TABLE
If Update/Delete an Entry which is
referred in Student Table
DB
ERROR
Data integratory is maintained in
the relationship.
Why
?
18. COMPOSITE KEY
Any key with more than one attribute is called Composite Key.
OR
Key that consists of two or more attributes that uniquely identify any record in a table is
called Composite key.
But the attributes which together form the Composite key are not a key independently or
individually.
In the student table (SID,REGID), (REGID,SEMAIL), (SEMAIL,SID), (SID,REGID,SEMAIL) etc., all
are composite keys.
18
19. COMPOUND KEY AND SURROGATE KEY
If a composite key has at least one attribute which is a foreign key then it is called as
Compound Key.
In the given tables, if we have a composite key (REGID, BRANCHCODE) then it will be known
as Compound Key because BRANCH attribute is a Foreign Key.
19
SURROGATE KEY
If a relation has no attribute which can be used to identify the data stored in it, then we
create an attribute for this purpose.
It adds no meaning to the data but serves the sole purpose of identifying rows uniquely in a
table.
COMPOUND KEY
20. NORMALIZATION
Normalization is the process of reducing the redundancy of data in the table and also
improving data integrity.
Normalization is the process of organizing data to avoid data duplication and redundancy.
20
To minimize or eliminate duplicate data
To minimize or avoid data modification issues
To ensure that data dependency make a logical sense
To simplify queries
or
21. 21
So why is this required? without Normalization in SQL, we may face many issues due to
redundancy:
Insertion anomaly:
It occurs when we cannot insert data to the table without the presence of another attribute
Example: Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot
be inserted, or else we will have to set the branch information as NULL. Also, if we have to insert data of 100
students of same branch, then the branch information will be repeated for all those 100 students.
Update anomaly:
It is a data inconsistency that results from data redundancy and a partial update of data.
Example: What if Mr. CS leaves the college? or is no longer the HOD of computer science department? In that
case all the student records will have to be updated, and if by mistake we miss any record, it will lead to data
inconsistency.
Deletion Anomaly:
It occurs when certain attributes are lost because of the deletion of other attributes.
Example: Examine our Student table, two different information are kept together, Student information and
Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the
branch information.
22. ADVANTAGES DISADVANTAGES
i. Better database organization
ii. More tables with smaller rows
iii. Efficient data access
iv. Greater flexibility for queries
v. Quickly find the information
vi. Easier to implement security
vii. Allows easy modification
viii. Reduction of redundant and duplicate
data
ix. More compact database
x. Ensure consistent data after modification
22
i. Required experienced database
designer
ii. Difficult and expensive
iii. Requires detailed database design
24. FIRST NORMAL FORM (1NF)
In this Normal Form, we tackle the problem of atomicity. Here atomicity means values in the table
should not be further divided. In simple terms, a single cell cannot hold multiple values. If a table
contains a composite or multi-valued attribute, it violates the First Normal Form.
For a table to be in the First Normal Form, it should follow the following 4 rules:
1. It should only have single(atomic) valued attributes/columns.
2. A column should contain values that are of the same type. Do not inter-mix different types
of values in any column.
3. All the columns in a table should have unique names. Same names leads to confusion at
the time of data retrieval
4. And the order in which data is stored, does not matter. Using SQL query, you can easily
fetch data in any order from a table. 24
25. FIRST NORMAL FORM (1NF)
If your table is not even in 1st Normal form, its considered poor DB design.
25
Sid Sname Subject
101 Akash PHP, Python
103 Amit Java
102 Bhavya C, C++ Sid Sname Subject
101 Akash PHP
101 Akash Python
103 Amit Java
102 Bhavya C
102 Bhavya C++
Violation of 1 NF
In 1 NF Form
26. SECOND NORMAL FORM (2NF)
For a table to be in the Second Normal Form, it must satisfy two conditions:
1. The table should be in the First Normal Form.
2. There should be no Partial Dependency.
Here partial dependency means the proper subset of candidate key determines a non-prime
attribute.
This table has a composite primary key TeacherID, DepartmentID.
The non-key attribute is Location. In this case, Location only depends on DepartmentID, which
is only part of the primary key. Therefore, this table does not satisfy the second Normal Form.
To bring this table to Second Normal Form, we need to break the table into two parts.
Which will give us the below tables: 26
28. THIRD NORMAL FORM (3NF)
A table is said to be in the Third Normal Form when,
1. It is in the Second Normal form.
2. And, it doesn't have Transitive Dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds at least one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
28
29. TRANSITIVE DEPENDENCY
Advantage of removing Transitive Dependency
The advantage of removing transitive
dependency is,
Amount of data duplication is reduced.
Data integrity achieved.
29
StudentID StudentName SubjectID Subject Address
1805361 A DEE 431 SQl Agra
1805362 B DEE 330 JAVA Delhi
1805363 C VIT 451 CPP Punjab
1805364 D DEE 330 JAVA Pune
In the above table,
Student ID → Subject ID, and
Subject ID → Subject.
Therefore, Student ID → Subject via Subject ID.
This implies that we have a transitive functional
dependency,
and this structure does not satisfy the third normal
form.
30. 30
StudentID StudentName SubjectID Subject Address
1805361 A DEE 431 SQl Agra
1805362 B DEE 330 JAVA Delhi
1805363 C VIT 451 CPP Punjab
1805364 D DEE 330 JAVA Pune
StudentID StudentName SubjectID Address
1805361 A DEE 431 Agra
1805362 B DEE 330 Delhi
1805363 C VIT 451 Punjab
1805364 D DEE 330 Pune
SubjectID Subject
DEE 431 SQl
DEE 330 JAVA
VIT 451 CPP
DEE 330 JAVA
Now in order to achieve third normal form, we need to divide the table as shown below:
As you can see from the above tables all the non-key attributes are now fully functional
dependent only on the primary key. In the first table, columns {StudentName, SubjectID and
Address} → Student ID. In the second table, Subject → SubjectID.
Solution:
3NF
31. BOYCE AND CODD NORMAL FORM (BCNF) : 3.5 NF
Its the higher version 3NF and was developed by Raymond F. Boyce and
Edgar F. Codd to address certain types of anomalies which were not deal
with 3NF.
BCNF does not allow dependencies between attributes that belong to candidate keys.
BCNF is a refinement of the third normal form in which it drops the restriction of a non-key
attribute from the 3rd NF.
3rd NF and BCNF are not same if the following conditions are true:
o The table has two or more candidate keys
o At least two of the candidate keys are composed of more than one attribute
o The keys are not disjoint i.e. The composite candidate keys share some attributes 31
32. EXAMPLE OF TABLE NOT IN BCNF
Key: {Student, Course}
Functional Dependency:
{Student, Course} → Teacher
Teacher → Course
Problem: Teacher is not a superkey but determines Course
32
Student Course Teacher
Shivam Database Mr. DB
Shristi Database Mr. SQL
Chaya JAVA Mr. JV
Shivam JAVA Mr. JV
Chaya Database Mr. SQL
33. 33
Student Course
Shivam Database
Shristi Database
Chaya JAVA
Shivam JAVA
Chaya Database
Course Teacher
Database Mr. DB
Database Mr. SQL
JAVA Mr. JV
Course
Database
JAVA
Solution: Decouple a table contains Teacher and Course
From original table (Student, Course). Finally, connect the
New and old table to third table contains Course.
34. FOURTH NORMAL FORM (4NF)
Fourth Normal Form comes into picture when Multi-valued Dependency occur in any
relation.
A table is said to be in the Fourth Normal Form when,
1. It is in the Boyce-Codd Normal Form.
2. And, it doesn't have Multi-Valued Dependency.
A table is said to have multi-valued dependency, if the following conditions are true,
For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B
and C should be independent of each other.
34
35. EXAMPLE OF TABLE NOT IN 4 NF
SID COURSE HOBBY
21 CN Chess
21 DBMS Singing
34 WD Chess
74 CPP Cricket
59 OOP Football
35
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY .
Key: {SID, COURSE, HOBBY}
Multi-value Dependency:
SID → → COURSE, HOBBY
36. 36
SID COURSE
21 CN
21 DBMS
34 WD
74 CPP
59 OOP
SID HOBBY
21 Chess
21 Singing
34 Chess
74 Cricket
59 Football
SID
21
21
34
74
59
Solution: To make 4NF
Decouple to each table contains
MVD. Finally connect to each to a third
Table contain SID.
In the STUDENT relation, a student with SID,
21 contains two courses, CN and DBMS
and two hobbies, Chess and Singing.
So there is a Multi-valued dependency on SID,
which leads to unnecessary repetition of data.
37. FIFTH NORMAL FORM (5NF)
5NF is also known as Project-join normal form (PJ/NF).
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
If we can decompose table further to eliminate redundancy and anomaly, and when we re-
join the decomposed tables by means of candidate keys, we should not be losing the
original data or any new record set should not arise.
In simple words, joining two or more decomposed table should not loose records nor create
new records.
37
38. EXAMPLE OF TABLE NOT IN 5 NF
Theatre Company Movie
T1 Paramount A Walk to remember
T2 Marvel The Avengers
T2 Marvel Age of Ultron
T2 Marvel Dr. Strange
T3 DCEU Batman Vs Superman
T4 Sony Spiderman Homecoming
38
Key: {Theatre, Company, Movie}
Multi-value Dependency:
Theatre → → Company, Movie
Movie is related to Company
39. 39
Theatre Movie
T1 A Walk to remember
T2 The Avengers
T2 Age of Ultron
T2 Dr. Strange
T3 Batman Vs Superman
T4 Spiderman Homecoming
Company Movie
Paramount A Walk to remember
Marvel The Avengers
Marvel Age of Ultron
Marvel Dr. Strange
DCEU Batman Vs Superman
Sony Spiderman Homecoming
Theatre Company
T1 Paramount
T2 Marvel
T3 DCEU
T4 Sony
Solution:
After decomposition into
Fifth Normal Form it looks like:
40. DENORMALISATION
Denormalization refers to a technique which is used to access data
from higher to lower forms of a databases.
It increase the performance of the entire infrastructure as it
introduces redundancy into a table.
It adds the redundant data into a table by incorporating database
queries that combine data from various tables into a single table.
40
41. CONTINUE…
ADVANTAGES
Retrieving data is faster since we do fewer joins
Queries to retrieve can be simpler(and therefore less likely to have bugs),
since we need to look at fewer tables.
DISADVANTAGES
Updates and inserts are more expensive.
Denormalization can make update and insert code harder to write.
Data may be inconsistent . Which is the “correct” value for a piece of data?
Data redundancy necessitates more storage. 41
42. DIFFERENCE BETWEEN
42
NORMALIZATION DENORMALIZATION
Non-redundancy and consistency data are
stored in set schema.
Data are combined to execute the query
quickly.
Data redundancy and inconsistency is
reduced.
Redundancy is added for quick execution of
queries.
Data integrity is maintained in
normalization.
Data integrity is not maintained in
denormalization.
Redundancy is reduced or eliminated.
Redundancy is added instead of reduction
or elimination of redundancy.
Number of tables in normalization is
increased.
Number of tables in decreased.
Normalization optimize the uses of disk
spaces.
Denormalization do not optimize the disk
spaces.