2. Relational Model
• Data are organized in two-dimensional tables called relations.
• The tables are related to each other.
• The most popular model.
2
3. • Relational data model first introduced by Ted Codd of IBM in 1970.
• It attracted attention due to its simplicity and mathematical
foundation.
• First commercial implementations of the relational model became
available in the early 1980s, such as the SQL/DS system on MVS
operating system by IBM and the Oracle DBMS.
• Current popular DBMSs include DB2 and Informix Dynamic Server (
from IBM), Oracle and Rdb (from Oracle), and SQL Server and Access
(from Microsoft).
3
4. • The relational model represents the database as a collection of
relations.
• A relation is a table of values, each row in the table represents a
collection of related data values.
• Each row in the table represents a fact that typically corresponds to a
real-world entity or relationship.
• The table name and column names are used to help to interpret the
meaning of values in each row.
4
5. • RDBMS (Relational Database Management System)
• external view
• The data are represented as a set of relations.
• A relation is a two-dimensional table.
• This doesn’t mean that data are stored as tables; the physical storage
of the datais independent of the way the data are logically organized.
5
6. • In the formal relational model terminology, a row is called a
tuple, a column header is called an attribute, and the table is called a
relation.
• The data type describing the types of values that can appear in
each column is represented by a domain of possible values.
• For example:
• Name: the set of character strings that represent names of persons.
• Employee_age: the possible ages of employees of a company; each must be
an integer between18 to 60.
• Grade_point: possible values of computed grade point average; each must a
real number between 0 to 4.
6
7. Properties of relation
• A relation, or table, in a relational database has certain properties. First off, its name must be
unique in the database, i.e. a database cannot contain multiple tables of the same name. Next,
each relation must have a set of columns or attributes, and it must have a set of rows to contain
the data. As with the table names, no attributes can have the same name.
• Next, no tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate
rows, but there should be practices in place to avoid this, such as the use of unique primary keys
(next up).
• Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one
attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key.
This primary key cannot be duplicated. This means that no tuple can have the same unique,
primary key. The key cannot have a NULL value, which simply means that the value must be
known.
• Further, each cell, or field, must contain a single value. For example, you cannot enter something
like "Tom Smith" and expect the database to understand that you have a first and last name;
rather, the database will understand that the value of that cell is exactly what has been entered.
• Finally, all attributes—or columns—must be of the same domain, meaning that they must have
the same data type. You cannot mix a string and a number in a single cell.
• All these properties, or constraints, serve to ensure data integrity, important to maintain the
accuracy of data.
7
8. Schemas
• A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.
• A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.
• A database schema can be divided broadly into two categories −
• Physical Database Schema − This schema pertains to the actual storage of data and its form of
storage like files, indices, etc. It defines how the data will be stored in a secondary storage.
• Logical Database Schema − This schema defines all the logical constraints that need to be applied
on the data stored. It defines tables, views, and integrity constraints.
•
8
10. Tuples
• A relation r of relation schema R(A 1 , A 2 , A 3 , ..., A n ), also denoted
by r(R), is a set of n-tuples, r = { t 1 , t 2 , t 3 ,... , t n }.
• Each n-tuple t is an ordered list of n values t = < t 1 , t 2 , t 3 ,... , t n >.
• The ithvalue of n-tuple t corresponds to the attribute A i .
• The terms relationintensionfor schema R and relation extension for a
relation state r(R) are commonly used.
10
11. Domains constraints
• Domain constraints specify that within each tuple, the value of each
attribute A must be an atomic value from the domain dom(A).
• This is specified as data types which includes standard numeric data
types integers, real numbers, characters, Booleans, fixed length
strings, variable length strings etc.
• These are specified in DDL statements.
11
12. Relational Algebra
• Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses operators to
perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output. Relational algebra is
performed recursively on a relation and intermediate results are also considered
relations.
• The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
• Set different
• Cartesian product
• Rename
12
13. Select
• The SELECT operation is used to choose a subset of the tuples from a
relation that satisfies a selection condition.
• One can consider the SELECT operation to be a filter that keeps only
those tuples that satisfy a qualifying condition.
• For example, to select the EMPLOYEE tuples whose department is 4,
or those whose salary is greater than $30,000, we can individually
specify each of these two conditions with a SELECT operation as
follows:
• σ Dno=4 (EMPLOYEE)
• σ Salary>30000 (EMPLOYEE)
13
14. • In general, the SELECT operation is denoted by
• σ <selectioncondition> (R)
• where the symbol σ (sigma) is used to denote the SELECT operator and the
selection condition is a Boolean expression (condition) specified on the
attributes of relation R.
• Notice that R is generally a relational algebra expression whose result is a
relation—the simplest such expression is just the name of a database
relation.
• The relation resulting from the SELECT operation has the same attributes as
R.
• For example, to select the tuples for all employees who either work in
department 4 and make over $25,000 per year, or work in department 5 and
make over $30,000, we can specify the following SELECT operation:
• σ( Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000 )(EMPLOYEE)
14
15. • An unaryoperation.
• It is applied to one single relationand creates another relation.
• The tuples in the resulting relationare a subset of the tuplesin the
original relation.
• Use some criteriato select
15
16. Project
• The PROJECT operation selects certain columns from the table and discards
the other columns.
• If we are interested in only certain attributes of a relation, we use the
PROJECT operation to project the relation over these attributes only.
• Therefore, the result of the PROJECT operation can be visualized as a
vertical partition of the relation into two relations:
• One has the needed columns (attributes) and contains the result of the
operation, and the other contains the discarded columns.
• For example, to list each employee’s first and last name and salary, we can
use the PROJECT operation as follows:
• π Lname, Fname, Salary (EMPLOYEE)
16
17. • The general form of the PROJECT operation is
• π <attribute list> (R)
• where π (pi) is the symbol used to represent the PROJECT operation, and <
attribute list > is the desired sub list of attributes from the attributes of
relation R.
• Again, notice that R is, in general, a relational algebra expression whose
result is a relation, which in the simplest case is just the name of a
database relation.
• The result of the PROJECT operation has only the attributes specified in
<attribute list> in the same order as they appear in the list.
• Hence, its degree is equal to the number of attributes in <attribute list>.
17
18. • An unaryoperation.
• It is applied to one single relationand creates another relation.
• The attributes in the resulting relationare a subset of the attributesin
the original relation.
18
19. Rename
• In general, for most queries, we need to apply several relational algebra operations one
after the other. Either we can write the operations as a single relational algebra
expression by nesting the operations, or we can apply one operation at a time and
create intermediate result relations.
• In the latter case, we must give names to the relations that hold the intermediate results.
• For example, to retrieve the first name, last name, and salary of all employees who work in
department number 5, we must apply a SELECT and a PROJECT operation.
• We can write a single relational algebra expression, also known as an in-line expression, as
follows:
• π Fname, Lname, Salary (σ Dno=5 (EMPLOYEE))
• Alternatively, we can explicitly show the sequence of operations, giving a name to each
intermediate relation, as follows:
• DEP5_EMPS ←σ Dno=5 (EMPLOYEE)
• RESULT ←π Fname, Lname, Salary (DEP5_EMPS)
19
20. Union, Intersection and Difference
• UNION: The result of this operation, denoted by R ∪S, is a relation
that includes all tuples that are either in R or in S or in both R and S.
Duplicate tuples are eliminated.
• INTERSECTION: The result of this operation, denoted by R ∩S, is a
relation that includes all tuples that are in both R and S.
• DIFFERENCE (or MINUS): The result of this operation, denoted by R –
S, is a relation that includes all tuples that are in R but not in S.
20
21. • For example, to retrieve the Social Security numbers of all employees
who either work in department 5 or directly supervise an employee
who works in department 5, we can use the UNION operation as
follows:
• DEP5_EMPS ←σ Dno=5 (EMPLOYEE)
• RESULT1 ←π Ssn (DEP5_EMPS)
• RESULT2 ←π Super_ssn (DEP5_EMPS)
• RESULT ←RESULT1 ∪RESULT2
21
22. Union
• A binary operation.
• Creates a new relation in which each tuple is either in the first relation, in
the second, or in both.
• The two relations must have the same attributes.
22
23. Intersection
• A binary operation.
• Creates a new relation in which each tuple is a member in both
relations.
• The two relations must have the same attributes.
23
24. Difference
• A binary operation.
• Creates a new relation in which each tuple is in the first relation but
not the second.
• The two relations must have the same attributes.
24
25. Cartesian Product
• This is also a binary set operation, but the relations on which it is applied
do not have to be union compatible.
• In its binary form, this set operation produces a new element by combining
every member (tuple) from one relation (set) with every member (tuple)
from the other relation (set).
• In general, the result of R(A1, A2, ..., An) ×S(B1, B2, ..., Bm) is a relation Q
with degree n + m attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
• The resulting relation Q has one tuple for each combination of tuples—one
from R and one from S.
• Hence, if R has n R tuples (denoted as |R| = n R ), and S has n S tuples, then
R ×S will have n R * n S tuples.
25
26. Join Operation
• The JOIN operation, denoted by , is used to combine related tuples from
two relations into single “longer” tuples.
• The JOIN operation can be specified as a CARTESIAN PRODUCT operation
followed by a SELECT operation.
• Suppose that we want to retrieve the name of the manager of each
department. To get the manager’s name, we need to combine each
department tuple with the employee tuple whose Ssnvalue matches the
Mgr_ssnvalue in the department tuple.
•Note that Mgr_ssnis a foreign key of the DEPARTMENT relation that
references Ssn, the primary key of the EMPLOYEE relation. This referential
integrity constraint plays a role in having matching tuples in the referenced
relation EMPLOYEE.
26
27. • The general form of a JOIN operation on two relations R(A1, A2, ..., An) and
S(B1, B2, ..., Bm) is
• R ⋈ <join condition> S
• The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ..., An,
B1, B2, ... , Bm) in that order; Q has one tuple for each combination of
tuples—one from R and one from S—whenever the combination satisfies
the join condition.
• This is the main difference between CARTESIAN PRODUCT and JOIN. In
JOIN, only combinations of tuples satisfying the join condition appear in
the result, whereas in the CARTESIAN PRODUCT all combinations of tuples
are included in the result.
• The join condition is specified on attributes from the two relations R and S
and is evaluated for each combination of tuples.
• Each tuple combination for which the join condition evaluate to TRUE is
included in the resulting relation Q as a single combined tuple.
27
28. • A binary operation.
• Combines two relations based on common attributes.
28