SlideShare a Scribd company logo
1 of 103
Dr. B . RAGHU
Professor & Dean
Department of Computer Science and Engineering
Sri Ramanujar Engineering College
What’s a database?
 A collection of logically-related information stored
in a consistent fashion
 Phone book
 Bank records (checking statements, etc)
 Library card catalog
 Soccer team roster
 The storage format typically appears to users as
some kind of tabular list (table, spreadsheet)
What Does a Database Do?
 Stores information in a highly organized manner
 Manipulates information in various ways, some of
which are not available in other applications or are
easier to accomplish with a database
 Models some real world process or activity through
electronic means
 Often called modeling a business process
 Often replicates the process only in appearance or end
result
Databases and the Systems which manage them
 Modern electronic databases are created and managed
through means of RDBMS: Relational DataBase
Management Systems
 An individual data storage structure created with an
RDBMS is typically called a “database”
 A database and its attendant views, reports, and
procedures is called an “application”
Database Applications
 Database (the actual DB with its attendant storage
structure)
 SQL Engine - interprets between the database and the
interface/application
 Interface or application – the part the user gets to see
and use
Relational Database Management Systems
 Low-end, proprietary, specific purpose
 Email: Outlook, Eudora, Mulberry
 Bibliographic: Ref. Mgr., EndNote, ProCite
 Mid-level
 Microsoft Access, Lotus Approach, Borland’s Paradox
 More or less total control of design allows custom builds
 High-end
 Oracle, Microsoft SQL Server, Sybase, IBM DB2
 Professional level DBs: Banks, e-commerce, secure
 Amazon.com, Ebay.com, Yahoo.com
Problems with Bad Design
 Early computers were slow and had limited storage
capacity
 Redundant or repeating data slowed operations and
took up too much precious storage space
 Poor design increased chance of data errors, lost or
orphaned information
Benefits of Good Design
 Computers today are faster and possess much larger
storage devices
 Rigid structure of modern relational databases helped
codify problems and solutions
 Design problems are still possible, because the DBMS
software won’t protect you from poor practices
 Good design still increases efficiency of data processes,
reduces waste of storage, and helps eliminate data
entry errors
The Design Process
1) Identify the purpose of the database
2) Review existing data
3) Make a preliminary list of fields
4) Make a preliminary list of tables and enter fields
5) Identify the key fields
6) Draft the table relationships
7) Enter sample data and normalize the data/tables
8) Review and finalize the design
Database Modeling
 Refers to various, more-or-less formal methods for
designing a database
 Some provide precision steps and tools
 Ex.: Entity-Relationship (E-R) Modeling
 Widely used, especially by high-end database
designers who can’t afford to miss things
 Fairly complex process
 Extremely precise
1. Identify purpose of the DB
Clients can tell you what information they want but have
no idea what data they need.
 “We need to keep track of inventory”
 “We need an order entry system”
 “I need monthly sales reports”
 “We need to provide our product catalog on the
Web”
Be sure to Limit the Scope of the database.
2. Review Existing Data
 Electronic
 Legacy database(s)
 Spreadsheets
 Web forms
 Manual
 Paper forms
 Receipts and other printed output
3. Make Preliminary Field List
 Make sure fields exist to support needs
 Ex. if client wants monthly sales reports, you need a
date field for orders.
 Ex. To group employees by division, you need a
division identifier
 Make sure values are atomic
 Ex. First and Last names stored separately
 Ex. Addresses broken down to Street, City, State, etc.
 Do not store values that can be calculated from other
values
 Ex. “Age” can be calculated from “Date of Birth”
4. Make Preliminary Tables
(and insert the fields into them)
 Each table holds info about one subject
 Don’t worry about the quantity of tables
 Look for logical groupings of information
 Use a consistent naming convention
Naming Conventions
Rules of thumb
 Table names must be unique in DB; should be plural
 Field names must be unique in the table(s)
 Clearly identify table subject or field data
 Be as brief as possible
 Avoid abbreviations and acronyms
 Use less than 30 characters,
 Use letters, numbers, underscores (_)
 Do not use spaces or other special characters
5. Identify the Key Fields
 Primary Key(s)
 Can never be Null; must hold unique values
 Automatically indexed in most RDBMSs
 Values rarely (if ever) change
 Try to include as few fields as possible
 Multi-field Primary Key
 Combination of two or more fields that uniquely
identify an individual record
 Candidate Key
 Field or fields that qualify as a primary key
 Important in Third and Boyce-Codd Normal Forms
6. Identify Table Relationships
Based on business rules being modeled
Examples:
 “each customer can place many orders”
 “all employees belong to a department”
 “each TA is assigned to one course”
Relationship Terminology
 Relationship Type
 One-to-one: expressed as 1:1
 One-to-Many: expressed as 1:N or 1:M or 1:∞
 Many-to-Many: expressed as N:N or M:M
 Primary or Parent Table
 Table on the left side of 1:N relationship
 Related or Child Table
 Table on the right side of 1:N relationship
 Relational Schema
 Diagram of table relationships in database
Relationship Terminology (cont’d)
 Join
 Definition of how related records are returned
 Join Line
 Visual relationship indicators in schema
 Key fields
 Primary Key: the linking field on the one side of a 1:N
relationship
 Foreign Key: the primary key from one table that is
added to another table so the records can be related
 Non-Key Fields: any field that is not part of a primary
key, multi-field primary key, or foreign key
One-to-One (1:1)
 Each record in Table A relates to one, and only one,
record in Table B, and vice versa.
 Either table can be considered the Primary, or Parent
Table
 Can usually be combined into one table, although may
not be most efficient design
One-to-Many (1:N)
 Each record in Table A may relate to zero, one or
many records in Table B, but each record in Table
B relates to only one record in Table A.
 The potential relationship is what’s important:
there might be no related records, or only one, but
there could be many.
 The table on the One (or left) side of a 1:N
relationship is considered the Primary Table.
Many-to-Many (N:N)
 A record in Table A can relate to many records in
Table B, and a record in Table B can relate to many
records in Table A.
 Most RDBMSs do not support N:N relationships,
requiring the use of a linking (or intersection or
bridge) table that breaks the N:N relationship
down into two 1:N relationships with the linking
table being on the Many side of both new
relationships.
Relational Schema
Table 1
Field1_1
Field1_2
Field1_3
Field1_4
Table 2
Field2_1
Field1_1
Field2_2
Field2_3
1
N
Normalization
 Normal Forms (NF): design standards based on
database design theory
 Normalization is the process of applying the NFs to
table design to eliminate redundancy and create a
more efficient organization of DB storage.
 Each successive NF applies an increasingly stringent
set of rules
Introduction to Normalization
 Normalization: Process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
 Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
 2NF, 3NF, BCNF based on keys and FDs of a relation
schema
 4NF based on keys, multi-valued dependencies
First Normal Form (1NF)
 A table is in first normal form if there are no repeating
groups.
 Repeating Groups : a set of logically related fields or
values that occur multiple times in one record
 1: non-atomic value, or multiple values, stored in a field
 2: multiple fields in the same table that hold logically
similar values
Second Normal Form (2NF)
 A table is in 2NF if it is in 1NF and each non-key field is
functionally dependent on the entire primary key.
 Functional dependency: a relationship between fields such
that the value in one field determines the one value that
can be contained in the other field.
 Determinant: a field in which the value determines the
value in another field.
Example
Airport – City
Dulles – Washington, DC
Third Normal Form (3NF)
 A table is in 3NF when it is in 2NF and there are no
transitive dependencies.
 Transitive Dependency: a type of functional
dependency in which the value of a non-key field is
determined by the value in another non-key field and
that field is not a candidate key.
Boyce-Codd Normal Form (BCNF)
 A table is in BCNF when it is in 3NF and all
determinants are candidate keys.
 Developed to cover situations that 3NF did not
address.
 Applies to situations where you have overlapping
candidate keys.
BCNF
 {Student,course}  Instructor
 Instructor  Course
 Decomposing into 2 schemas
 {Student,Instructor} {Student,Course}
 {Course,Instructor} {Student,Course}
 {Course,Instructor} {Instructor,Student}
Example
 Given the relation
Book(Book_title, Authorname, Book_type, Listprice,
Author_affil, Publisher)
The FDs are
Book_title  Publisher, Book_type
Book_type  Listprice
Authorname Author_affil
Fourth Normal Form (4NF)
 A table is in 4NF when it is in BCNF and there are no
multi-valued dependencies.
 Multi-valued Dependency: occurs when, for each value
in field A, there is a set of values for field B and a set of
values for field C, but B and C are not related.
 Occurs when the table contains fields that are not
logically related.
Fifth Normal Form (5NF)
 A table is in 5NF when it is in 4NF and there are no
cyclic dependencies.
 Cyclic Dependency: occurs when there is a multi-field
primary key with three or more fields (ex. A, B, C) and
those fields are related in pairs AB, BC and AC.
 Can occur only with a multi-field primary key of three
or more fields
8. Finalizing the Design
 Double-check to ensure good, principle-based design
 Evaluate design in light of business model and
determine desired deviations from design principles
 Process efficiency
 Security concerns
Functional Dependencies
 Functional dependencies (FDs) are used to specify
formal measures of the "goodness" of relational
designs
 FDs and keys are used to define normal forms for
relations
 FDs are constraints that are derived from the
meaning and interrelationships of the data attributes
Definition
A functional dependency is defined as a constraint
between two sets of attributes in a relation from a
database.
Given a relation R, a set of attributes X in R is said to
functionally determine another attribute Y, also in
R, (written X → Y) if and only if each X value is
associated with at most one Y value.
Functional Dependencies (2)
 A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique value
for Y
 X Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)
 X  Y in R specifies a constraint on all relation instances
r(R)
 FDs are derived from the real-world constraints on the
attributes
Examples of FD constraints
 Social Security Number determines employee
name
SSN  ENAME
 Project Number determines project name and
location
PNUMBER  {PNAME, PLOCATION}
 Employee SSN and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER}  HOURS
Functional Dependencies (3)
 An FD is a property of the attributes in the schema R
 The constraint must hold on every relation instance
r(R)
 If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct tuples
with t1[K]=t2[K])
Inference Rules for FDs
 Given a set of FDs F, we can infer additional FDs
that hold whenever the FDs in F hold
 Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X  Y
A2. (Augmentation) If X  Y, then XZ  YZ
(Notation: XZ stands for X U Z)
A3. (Transitive) If X  Y and Y  Z, then X  Z
 A1, A2, A3 form a sound and complete set of
inference rules
Additional Useful Inference Rules
 Decomposition
 If X  YZ, then X  Y and X  Z
 Union
 If X  Y and X  Z, then X  YZ
 Psuedotransitivity
 If X  Y and WY  Z, then WX  Z
 Closure of a set F of FDs is the set F+ of all FDs
that can be inferred from F
Functional Dependencies
 Constraints on the set of legal relations.
 Require that the value for a certain set of attributes
determines uniquely the value for another set of
attributes.
 A functional dependency is a generalization of the
notion of a key.
Functional Dependencies (Cont.)
 Let R be a relation schema
  R and   R
 The functional dependency
  
holds on R if and only if for any legal relations r(R), whenever any two tuples t1
and t2 of r agree on the attributes , they also agree on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
 Example: Consider r(A,B) with the following instance of r.
 On this instance, A  B does NOT hold, but B  A does hold.
1 4
1 5
3 7
Functional Dependencies (Cont.)
 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only if
 K  R, and
 for no   K,   R
 Functional dependencies allow us to express constraints
that cannot be expressed using superkeys. Consider the
schema:
Loan-info-schema = (customer-name, loan-number,
branch-name, amount).
We expect this set of functional dependencies to hold:
loan-number  amount
loan-number  branch-name
but would not expect the following to hold:
loan-number  customer-name
Use of Functional Dependencies
 We use functional dependencies to:
 test relations to see if they are legal under a given set of
functional dependencies.
 If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
 specify constraints on the set of legal relations
 We say that F holds on R if all legal relations on R satisfy the
set of functional dependencies F.
 Note: A specific instance of a relation schema may
satisfy a functional dependency even if the functional
dependency does not hold on all legal instances. For
example, a specific instance of Loan-schema may, by
chance, satisfy
loan-number  customer-name.
Functional Dependencies (Cont.)
 A functional dependency is trivial if it is satisfied by all
instances of a relation
 E.g.
 customer-name, loan-number  customer-name
 customer-name  customer-name
 In general,    is trivial if   
PARALLEL AND DISTRIBUTED DATABASES
 A Parallel database system is one that seeks to
improve performance through parallel
implementation of various operations such as loading
data, building indexes, and evaluating queries.
 In a Distributed database system, data is physically
stored across several sites, and each site is typically
managed by a DBMS that is capable of running
independently of the other sites.
ARCHITECTURES FOR PARALLEL DATABASES
Three main architectures are proposed for building
parallel databases:
1. Shared - memory system, where multiple CPUs are
attached to an interconnection network and can access
a common region of main memory.
2. Shared - disk system, where each CPU has a private
memory and direct access to all disks through an
interconnection network.
3. Shared - nothing system, where each CPU has local
main memory and disk space, but no two CPUs can
access the same storage area; all communication
between CPUs is through a network connection.
Cont’d
Shared - memory
system, where
multiple CPUs are
attached to an
interconnection
network and can
access a common
region of main
memory.
Cont’d
Shared – disk
system, where
each CPU has a
private memory
and direct access
to all disks
through an
interconnection
network.
Cont’d
Shared – nothing
system,
where each CPUs has local
main memory and disk
space, but no two CPUs
can access the same
storage area; all
Communication between
CPUs is through a
network connection.
Cont’d
 Scaling the system is an issue with shared memory and
shared disk architectures because as more CPUs are
added, existing CPUs are slowed down because of the
increased contention for memory accesses and
network bandwidth.
Cont’d
The Shared Nothing Architecture has shown:
a) Linear Speed Up: the time taken to execute
operations decreases in proportion to the increase in
the number of CPU‟s and disks
b) Linear Scale Up: the performance is sustained if the
number of CPU‟s and disks are increased in
proportion to the amount of data.
PARALLEL QUERY EVALUATION
 Parallel evaluation of a relational query in a DBMS with a
shared-nothing architecture is discussed. Parallel
execution of a single query has been emphasized.
 A relational query execution plan is a graph of relational
algebra operators and the operators in a graph can be
executed in parallel. If an operator consumes the output
of a second operator, we have pipelined parallelism.
 Each individual operator can also be executed in parallel
by partitioning the input data and then working on each
partition in parallel and then combining the result of each
partition. This approach is called Data Partitioned parallel
Evaluation.
Data Partitioning:
Here large datasets are partitioned horizontally across
several disk, this enables us to exploit the I/O
bandwidth of the disks by reading and writing them in
parallel. This can be done in the following ways:
a. Round Robin Partitioning
b. Hash Partitioning
c. Range Partitioning
Parallelizing Sequential Operator
Evaluation Code
 Input data streams are divided into parallel data
streams. The output of these streams are merged as
needed to provide as inputs for a relational operator,
and the output may again be split as needed to
parallelize subsequent processing.
PARALLELIZING INDIVIDUAL OPERATIONS
Bulk Loading and Scanning:
Pages can be read in parallel while scanning a relation
and the retrieved tuples can then be merged, if the
relation is partitioned across several disks.
If a relation has associated indexes, any sorting of data
entries required for building the indexes during bulk
loading can also be done in parallel.
Sorting:
Sorting could be done by redistributing all tuples in the
relation using range partitioning.
Ex. Sorting a collection of employee tuples by salary
whose values are in a certain range.
For N processors each processor gets the tuples which
lie in range assigned to it. Like processor 1 contains all
tuples in range 10 to 20 and so on.
Each processor has a sorted version of the tuples which
can then be combined by traversing and collecting the
tuples in the order on the processors (according to the
range assigned)
PARALLELIZING INDIVIDUAL OPERATIONS
Joins
Here we consider how the join operation can be parallelized
Consider 2 relations A and B to be joined using the age
attribute. A and B are initially distributed across several
disks in a way that is not useful for join operation
So we have to decompose the join into a collection of k
smaller joins by partitioning both A and B into a collection
of k logical partitions.
If same partitioning function is used for both A and B then
the union of k smaller joins will compute to the join of A
and B.
PARALLELIZING INDIVIDUAL OPERATIONS
DISTRIBUTED DATABASES
A Distributed Database should exhibit the following
properties:
1) Distributed Data Independence: - The user
should be able to access the database without
having the need to know the location of the data.
2) Distributed Transaction Atomicity: - The concept
of atomicity should be distributed for the
operation taking place at the distributed sites.
Types of Distributed Databases are:-
a) Homogeneous Distributed Database is where the data
stored across multiple sites is managed by same DBMS
software at all the sites.
b) Heterogeneous Distributed Database is where
multiple sites which may be autonomous are under the
control of different DBMS software.
There are 3 architectures:
Client-Server:
Collaborating Server:
Middleware:
Architecture of DDBs
Client-Server:
A Client-Server system has one or more client processes and
one or more server processes, and a client process can send a
query to any one server process. Clients are responsible for
user-interface issues, and servers manage data and execute
transactions.
Thus, a client process could run on a personal computer and
send queries to a server running on a mainframe.
In the client sever architecture a single query cannot be split
and executed across multiple servers because the client
process would have to be quite complex and intelligent
enough to break a query into sub queries to be executed at
different sites and then place their results together making
the client capabilities overlap with the server. This makes it
hard to distinguish between the client and server
What is a distributed database?
Why distribute a database
 Scalability and performance
 Resilience to failures
Throughput
Datasize
versusX X
Why distribute a database
 Data is already distributed
 Or needs to be distributed
 Data is in multiple systems
Why not distribute a database
You must earn your complexity!
 Communication needed
 Must build a complex infrastructure
 Unpredictable latencies must be masked
 More types of failures
 More components to fail
 Network failures
 Congestion, timeouts
 More complex planning
 Communication cost plus I/O cost
 May have to deal with heterogeneity
 Different types of systems
 Different schemas, possibly incompatible
 Different administrative domains
Client-Server:
Advantages: -
1. Simple to implement because of the centralized server
and separation of functionality.
2. Expensive server machines are not underutilized with
simple user interactions which are now pushed on to
inexpensive client machines.
3. The users can have a familiar and friendly client side
user interface rather than unfamiliar and unfriendly
server interface
Collaborating Server
In Collaborating Server system, we can have collection
of database servers, each capable of running
transactions against local data, which cooperatively
execute transactions spanning multiple servers.
When a server receives a query that requires access to data
at other servers, it generates appropriate sub queries to be
executed by other servers and puts the results together to
compute answers to the original query.
Middleware
Middleware system is as special server, a layer of software
that coordinates the execution of queries and transactions
across one or more independent database servers.
The Middleware architecture is designed to allow a single
query to span multiple servers, without requiring all
database servers to be capable of managing such multi site
execution strategies. It is especially attractive when trying
to integrate several legacy systems, whose basic capabilities
cannot be extended.
We need just one database server that is capable of managing
queries and transactions spanning multiple servers; the
remaining servers only need to handle local queries and
transactions.
STORING DATA IN DDBS
Data storage involved 2 concepts
1. Fragmentation
2. Replication
Fragmentation:
It is the process in which a relation is broken into smaller
relations called fragments and possibly stored at different
sites. It is of 2 types
1. Horizontal Fragmentation where the original relation is
broken into a number of fragments, where each fragment
is a subset of rows. The union of the horizontal fragments
should reproduce the original relation.
2. Vertical Fragmentation where the original relation is
broken into a number of fragments, where each fragment
consists of a subset of columns. The system often assigns
a unique tuple id to each tuple in the original relation so
that the fragments when joined again should from a
lossless join. The collection of all vertical fragments
should reproduce the original relation.
Replication:
Replication occurs when we store more than one copy of a
relation or its fragment at multiple sites.
Advantages:-
1. Increased availability of data: If a site that contains a replica
goes down, we can find the same data at other sites.
Similarly, if local copies of remote relations are available,
we are less vulnerable to failure of communication links.
2. Faster query evaluation: Queries can execute faster by
using a local copy of a relation instead of going to a remote
site.
Client-server
User interaction
Data processing
Network
Parallel database
Primary/secondary
X
Multidatabase
How do they work?
 What is shared?
 How to distribute the data?
 How to process the data?
 How to update the data?
Server 1 Server 2 Server 3 Server 4
Bike $866/2/07 636353
Chair $106/5/07 662113
How to distribute the data?
Couch $5706/1/07 424252
Car $11236/1/07 256623
Lamp $196/7/07 121113
Bike $566/9/07 887734
Scooter $186/11/07 252111
Hammer $80006/11/07 116458
How to distribute the data?
Hash partitioning Range partitioning
(key,value)
Hash()
(key,value)
<= X > X
Server 1 Server 2 Server 3 Server 4
How to distribute the data?
Bike
Chair
Couch
Car
Lamp
Bike
Scooter
Hammer
$86
$10
$570
$1123
$19
$56
$18
$8000
6/2/07
6/5/07
6/1/07
6/1/07
6/7/07
6/9/07
6/11/07
6/11/07
636353
662113
424252
256623
121113
887734
252111
116458
Query processing
 Intra-operator parallelism
 Inter-operator parallelism
Parallel scanning
filter filter filter filter filter filter
Result
Sorting
Sorting
Parallel hash join
Hash()
Join
Semi-join
Inter-operator parallelism
Updating distributed data
 Synchronous: read-any-write-all
Reads are fast
Updating distributed data
 Synchronous: voting
Updating distributed data
 Synchronous: voting
Writes tolerant to disconnection
Consistency of distributed data
 Should provide ACID
Primary/secondary
Two-phase commit
PREPARE
PREPARED PREPARED
COMMIT
Two-phase commit
PREPARE
PREPARED ABORT
ABORT
Two-phase commit
PREPARE
PREPARED
ABORT
Two-phase commit
PREPARE
PREPARED PREPARED
X
Conclusion
 Parallelism and distribution very useful
 Performance
 Fault tolerance
 Scale
 But complex!
 Rethink lots of aspects of the system
 Must earn the complexity

More Related Content

What's hot

Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMSkoolkampus
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraintsmadhav bansal
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and modelssabah N
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity ConstraintsMegha yadav
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Jargalsaikhan Alyeksandr
 
All data models in dbms
All data models in dbmsAll data models in dbms
All data models in dbmsNaresh Kumar
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architecturesPooja Dixit
 
Dbms classification according to data models
Dbms classification according to data modelsDbms classification according to data models
Dbms classification according to data modelsABDUL KHALIQ
 
7. Relational Database Design in DBMS
7. Relational Database Design in DBMS7. Relational Database Design in DBMS
7. Relational Database Design in DBMSkoolkampus
 
Database Normalization
Database NormalizationDatabase Normalization
Database NormalizationArun Sharma
 
Types Of Keys in DBMS
Types Of Keys in DBMSTypes Of Keys in DBMS
Types Of Keys in DBMSPadamNepal1
 

What's hot (20)

Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
1. Introduction to DBMS
1. Introduction to DBMS1. Introduction to DBMS
1. Introduction to DBMS
 
Relational model
Relational modelRelational model
Relational model
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
 
Integrity Constraints
Integrity ConstraintsIntegrity Constraints
Integrity Constraints
 
Dbms architecture
Dbms architectureDbms architecture
Dbms architecture
 
Data models
Data modelsData models
Data models
 
Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)Database design & Normalization (1NF, 2NF, 3NF)
Database design & Normalization (1NF, 2NF, 3NF)
 
All data models in dbms
All data models in dbmsAll data models in dbms
All data models in dbms
 
Entity relationship modelling
Entity relationship modellingEntity relationship modelling
Entity relationship modelling
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
Dbms classification according to data models
Dbms classification according to data modelsDbms classification according to data models
Dbms classification according to data models
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
ER-Model-ER Diagram
ER-Model-ER DiagramER-Model-ER Diagram
ER-Model-ER Diagram
 
7. Relational Database Design in DBMS
7. Relational Database Design in DBMS7. Relational Database Design in DBMS
7. Relational Database Design in DBMS
 
ER Model in DBMS
ER Model in DBMSER Model in DBMS
ER Model in DBMS
 
Data Models
Data ModelsData Models
Data Models
 
Database Normalization
Database NormalizationDatabase Normalization
Database Normalization
 
Types Of Keys in DBMS
Types Of Keys in DBMSTypes Of Keys in DBMS
Types Of Keys in DBMS
 

Similar to Relational Database Design

AB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docx
AB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docxAB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docx
AB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docxbartholomeocoombs
 
Database Design Process
Database Design ProcessDatabase Design Process
Database Design Processmussawir20
 
Dependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its typesDependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its typesnsrChowdary1
 
Understanding about relational database m-square systems inc
Understanding about relational database m-square systems incUnderstanding about relational database m-square systems inc
Understanding about relational database m-square systems incMuthu Natarajan
 
Data resource management
Data resource managementData resource management
Data resource managementNirajan Silwal
 
Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Dave Stokes
 
ICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptx
ICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptxICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptx
ICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptxAmanda783100
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database Rc Os
 

Similar to Relational Database Design (20)

AB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docx
AB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docxAB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docx
AB Database Assignment 1 –FOR STUDENTS TO COMPLETEFirst create .docx
 
Ch10
Ch10Ch10
Ch10
 
Database Design Process
Database Design ProcessDatabase Design Process
Database Design Process
 
Dependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its typesDependencies in various topics like normalisation and its types
Dependencies in various topics like normalisation and its types
 
Understanding about relational database m-square systems inc
Understanding about relational database m-square systems incUnderstanding about relational database m-square systems inc
Understanding about relational database m-square systems inc
 
Data processing
Data processingData processing
Data processing
 
Database.pptx
Database.pptxDatabase.pptx
Database.pptx
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
RDMS AND SQL
RDMS AND SQLRDMS AND SQL
RDMS AND SQL
 
Research gadot
Research gadotResearch gadot
Research gadot
 
CIS 145 test 1 review
CIS 145 test 1 reviewCIS 145 test 1 review
CIS 145 test 1 review
 
Data resource management
Data resource managementData resource management
Data resource management
 
James hall ch 9
James hall ch 9James hall ch 9
James hall ch 9
 
Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016Relational Theory for Budding Einsteins -- LonestarPHP 2016
Relational Theory for Budding Einsteins -- LonestarPHP 2016
 
Access 05
Access 05Access 05
Access 05
 
DBMS (1).pptx
DBMS (1).pptxDBMS (1).pptx
DBMS (1).pptx
 
ICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptx
ICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptxICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptx
ICT NOTESmbjbujbhbuhuhhipv;ihsjhis 7.pptx
 
Research design
Research designResearch design
Research design
 
Ch09
Ch09Ch09
Ch09
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Relational Database Design

  • 1. Dr. B . RAGHU Professor & Dean Department of Computer Science and Engineering Sri Ramanujar Engineering College
  • 2. What’s a database?  A collection of logically-related information stored in a consistent fashion  Phone book  Bank records (checking statements, etc)  Library card catalog  Soccer team roster  The storage format typically appears to users as some kind of tabular list (table, spreadsheet)
  • 3. What Does a Database Do?  Stores information in a highly organized manner  Manipulates information in various ways, some of which are not available in other applications or are easier to accomplish with a database  Models some real world process or activity through electronic means  Often called modeling a business process  Often replicates the process only in appearance or end result
  • 4. Databases and the Systems which manage them  Modern electronic databases are created and managed through means of RDBMS: Relational DataBase Management Systems  An individual data storage structure created with an RDBMS is typically called a “database”  A database and its attendant views, reports, and procedures is called an “application”
  • 5. Database Applications  Database (the actual DB with its attendant storage structure)  SQL Engine - interprets between the database and the interface/application  Interface or application – the part the user gets to see and use
  • 6. Relational Database Management Systems  Low-end, proprietary, specific purpose  Email: Outlook, Eudora, Mulberry  Bibliographic: Ref. Mgr., EndNote, ProCite  Mid-level  Microsoft Access, Lotus Approach, Borland’s Paradox  More or less total control of design allows custom builds  High-end  Oracle, Microsoft SQL Server, Sybase, IBM DB2  Professional level DBs: Banks, e-commerce, secure  Amazon.com, Ebay.com, Yahoo.com
  • 7. Problems with Bad Design  Early computers were slow and had limited storage capacity  Redundant or repeating data slowed operations and took up too much precious storage space  Poor design increased chance of data errors, lost or orphaned information
  • 8. Benefits of Good Design  Computers today are faster and possess much larger storage devices  Rigid structure of modern relational databases helped codify problems and solutions  Design problems are still possible, because the DBMS software won’t protect you from poor practices  Good design still increases efficiency of data processes, reduces waste of storage, and helps eliminate data entry errors
  • 9. The Design Process 1) Identify the purpose of the database 2) Review existing data 3) Make a preliminary list of fields 4) Make a preliminary list of tables and enter fields 5) Identify the key fields 6) Draft the table relationships 7) Enter sample data and normalize the data/tables 8) Review and finalize the design
  • 10. Database Modeling  Refers to various, more-or-less formal methods for designing a database  Some provide precision steps and tools  Ex.: Entity-Relationship (E-R) Modeling  Widely used, especially by high-end database designers who can’t afford to miss things  Fairly complex process  Extremely precise
  • 11. 1. Identify purpose of the DB Clients can tell you what information they want but have no idea what data they need.  “We need to keep track of inventory”  “We need an order entry system”  “I need monthly sales reports”  “We need to provide our product catalog on the Web” Be sure to Limit the Scope of the database.
  • 12. 2. Review Existing Data  Electronic  Legacy database(s)  Spreadsheets  Web forms  Manual  Paper forms  Receipts and other printed output
  • 13. 3. Make Preliminary Field List  Make sure fields exist to support needs  Ex. if client wants monthly sales reports, you need a date field for orders.  Ex. To group employees by division, you need a division identifier  Make sure values are atomic  Ex. First and Last names stored separately  Ex. Addresses broken down to Street, City, State, etc.  Do not store values that can be calculated from other values  Ex. “Age” can be calculated from “Date of Birth”
  • 14. 4. Make Preliminary Tables (and insert the fields into them)  Each table holds info about one subject  Don’t worry about the quantity of tables  Look for logical groupings of information  Use a consistent naming convention
  • 15. Naming Conventions Rules of thumb  Table names must be unique in DB; should be plural  Field names must be unique in the table(s)  Clearly identify table subject or field data  Be as brief as possible  Avoid abbreviations and acronyms  Use less than 30 characters,  Use letters, numbers, underscores (_)  Do not use spaces or other special characters
  • 16. 5. Identify the Key Fields  Primary Key(s)  Can never be Null; must hold unique values  Automatically indexed in most RDBMSs  Values rarely (if ever) change  Try to include as few fields as possible  Multi-field Primary Key  Combination of two or more fields that uniquely identify an individual record  Candidate Key  Field or fields that qualify as a primary key  Important in Third and Boyce-Codd Normal Forms
  • 17. 6. Identify Table Relationships Based on business rules being modeled Examples:  “each customer can place many orders”  “all employees belong to a department”  “each TA is assigned to one course”
  • 18. Relationship Terminology  Relationship Type  One-to-one: expressed as 1:1  One-to-Many: expressed as 1:N or 1:M or 1:∞  Many-to-Many: expressed as N:N or M:M  Primary or Parent Table  Table on the left side of 1:N relationship  Related or Child Table  Table on the right side of 1:N relationship  Relational Schema  Diagram of table relationships in database
  • 19. Relationship Terminology (cont’d)  Join  Definition of how related records are returned  Join Line  Visual relationship indicators in schema  Key fields  Primary Key: the linking field on the one side of a 1:N relationship  Foreign Key: the primary key from one table that is added to another table so the records can be related  Non-Key Fields: any field that is not part of a primary key, multi-field primary key, or foreign key
  • 20. One-to-One (1:1)  Each record in Table A relates to one, and only one, record in Table B, and vice versa.  Either table can be considered the Primary, or Parent Table  Can usually be combined into one table, although may not be most efficient design
  • 21. One-to-Many (1:N)  Each record in Table A may relate to zero, one or many records in Table B, but each record in Table B relates to only one record in Table A.  The potential relationship is what’s important: there might be no related records, or only one, but there could be many.  The table on the One (or left) side of a 1:N relationship is considered the Primary Table.
  • 22. Many-to-Many (N:N)  A record in Table A can relate to many records in Table B, and a record in Table B can relate to many records in Table A.  Most RDBMSs do not support N:N relationships, requiring the use of a linking (or intersection or bridge) table that breaks the N:N relationship down into two 1:N relationships with the linking table being on the Many side of both new relationships.
  • 23. Relational Schema Table 1 Field1_1 Field1_2 Field1_3 Field1_4 Table 2 Field2_1 Field1_1 Field2_2 Field2_3 1 N
  • 24. Normalization  Normal Forms (NF): design standards based on database design theory  Normalization is the process of applying the NFs to table design to eliminate redundancy and create a more efficient organization of DB storage.  Each successive NF applies an increasingly stringent set of rules
  • 25. Introduction to Normalization  Normalization: Process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations  Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form  2NF, 3NF, BCNF based on keys and FDs of a relation schema  4NF based on keys, multi-valued dependencies
  • 26. First Normal Form (1NF)  A table is in first normal form if there are no repeating groups.  Repeating Groups : a set of logically related fields or values that occur multiple times in one record  1: non-atomic value, or multiple values, stored in a field  2: multiple fields in the same table that hold logically similar values
  • 27. Second Normal Form (2NF)  A table is in 2NF if it is in 1NF and each non-key field is functionally dependent on the entire primary key.  Functional dependency: a relationship between fields such that the value in one field determines the one value that can be contained in the other field.  Determinant: a field in which the value determines the value in another field. Example Airport – City Dulles – Washington, DC
  • 28. Third Normal Form (3NF)  A table is in 3NF when it is in 2NF and there are no transitive dependencies.  Transitive Dependency: a type of functional dependency in which the value of a non-key field is determined by the value in another non-key field and that field is not a candidate key.
  • 29.
  • 30.
  • 31. Boyce-Codd Normal Form (BCNF)  A table is in BCNF when it is in 3NF and all determinants are candidate keys.  Developed to cover situations that 3NF did not address.  Applies to situations where you have overlapping candidate keys.
  • 32.
  • 33.
  • 34. BCNF  {Student,course}  Instructor  Instructor  Course  Decomposing into 2 schemas  {Student,Instructor} {Student,Course}  {Course,Instructor} {Student,Course}  {Course,Instructor} {Instructor,Student}
  • 35. Example  Given the relation Book(Book_title, Authorname, Book_type, Listprice, Author_affil, Publisher) The FDs are Book_title  Publisher, Book_type Book_type  Listprice Authorname Author_affil
  • 36. Fourth Normal Form (4NF)  A table is in 4NF when it is in BCNF and there are no multi-valued dependencies.  Multi-valued Dependency: occurs when, for each value in field A, there is a set of values for field B and a set of values for field C, but B and C are not related.  Occurs when the table contains fields that are not logically related.
  • 37. Fifth Normal Form (5NF)  A table is in 5NF when it is in 4NF and there are no cyclic dependencies.  Cyclic Dependency: occurs when there is a multi-field primary key with three or more fields (ex. A, B, C) and those fields are related in pairs AB, BC and AC.  Can occur only with a multi-field primary key of three or more fields
  • 38. 8. Finalizing the Design  Double-check to ensure good, principle-based design  Evaluate design in light of business model and determine desired deviations from design principles  Process efficiency  Security concerns
  • 39. Functional Dependencies  Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs  FDs and keys are used to define normal forms for relations  FDs are constraints that are derived from the meaning and interrelationships of the data attributes
  • 40. Definition A functional dependency is defined as a constraint between two sets of attributes in a relation from a database. Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written X → Y) if and only if each X value is associated with at most one Y value.
  • 41. Functional Dependencies (2)  A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y  X Y holds if whenever two tuples have the same value for X, they must have the same value for Y If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R)  X  Y in R specifies a constraint on all relation instances r(R)  FDs are derived from the real-world constraints on the attributes
  • 42. Examples of FD constraints  Social Security Number determines employee name SSN  ENAME  Project Number determines project name and location PNUMBER  {PNAME, PLOCATION}  Employee SSN and project number determines the hours per week that the employee works on the project {SSN, PNUMBER}  HOURS
  • 43. Functional Dependencies (3)  An FD is a property of the attributes in the schema R  The constraint must hold on every relation instance r(R)  If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])
  • 44. Inference Rules for FDs  Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold  Armstrong's inference rules A1. (Reflexive) If Y subset-of X, then X  Y A2. (Augmentation) If X  Y, then XZ  YZ (Notation: XZ stands for X U Z) A3. (Transitive) If X  Y and Y  Z, then X  Z  A1, A2, A3 form a sound and complete set of inference rules
  • 45. Additional Useful Inference Rules  Decomposition  If X  YZ, then X  Y and X  Z  Union  If X  Y and X  Z, then X  YZ  Psuedotransitivity  If X  Y and WY  Z, then WX  Z  Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
  • 46. Functional Dependencies  Constraints on the set of legal relations.  Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.  A functional dependency is a generalization of the notion of a key.
  • 47. Functional Dependencies (Cont.)  Let R be a relation schema   R and   R  The functional dependency    holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is, t1[] = t2 []  t1[ ] = t2 [ ]  Example: Consider r(A,B) with the following instance of r.  On this instance, A  B does NOT hold, but B  A does hold. 1 4 1 5 3 7
  • 48. Functional Dependencies (Cont.)  K is a superkey for relation schema R if and only if K  R  K is a candidate key for R if and only if  K  R, and  for no   K,   R  Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). We expect this set of functional dependencies to hold: loan-number  amount loan-number  branch-name but would not expect the following to hold: loan-number  customer-name
  • 49. Use of Functional Dependencies  We use functional dependencies to:  test relations to see if they are legal under a given set of functional dependencies.  If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.  specify constraints on the set of legal relations  We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F.  Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loan-schema may, by chance, satisfy loan-number  customer-name.
  • 50. Functional Dependencies (Cont.)  A functional dependency is trivial if it is satisfied by all instances of a relation  E.g.  customer-name, loan-number  customer-name  customer-name  customer-name  In general,    is trivial if   
  • 51. PARALLEL AND DISTRIBUTED DATABASES  A Parallel database system is one that seeks to improve performance through parallel implementation of various operations such as loading data, building indexes, and evaluating queries.  In a Distributed database system, data is physically stored across several sites, and each site is typically managed by a DBMS that is capable of running independently of the other sites.
  • 52. ARCHITECTURES FOR PARALLEL DATABASES Three main architectures are proposed for building parallel databases: 1. Shared - memory system, where multiple CPUs are attached to an interconnection network and can access a common region of main memory. 2. Shared - disk system, where each CPU has a private memory and direct access to all disks through an interconnection network. 3. Shared - nothing system, where each CPU has local main memory and disk space, but no two CPUs can access the same storage area; all communication between CPUs is through a network connection.
  • 53. Cont’d Shared - memory system, where multiple CPUs are attached to an interconnection network and can access a common region of main memory.
  • 54. Cont’d Shared – disk system, where each CPU has a private memory and direct access to all disks through an interconnection network.
  • 55. Cont’d Shared – nothing system, where each CPUs has local main memory and disk space, but no two CPUs can access the same storage area; all Communication between CPUs is through a network connection.
  • 56. Cont’d  Scaling the system is an issue with shared memory and shared disk architectures because as more CPUs are added, existing CPUs are slowed down because of the increased contention for memory accesses and network bandwidth.
  • 57. Cont’d The Shared Nothing Architecture has shown: a) Linear Speed Up: the time taken to execute operations decreases in proportion to the increase in the number of CPU‟s and disks b) Linear Scale Up: the performance is sustained if the number of CPU‟s and disks are increased in proportion to the amount of data.
  • 58. PARALLEL QUERY EVALUATION  Parallel evaluation of a relational query in a DBMS with a shared-nothing architecture is discussed. Parallel execution of a single query has been emphasized.  A relational query execution plan is a graph of relational algebra operators and the operators in a graph can be executed in parallel. If an operator consumes the output of a second operator, we have pipelined parallelism.  Each individual operator can also be executed in parallel by partitioning the input data and then working on each partition in parallel and then combining the result of each partition. This approach is called Data Partitioned parallel Evaluation.
  • 59. Data Partitioning: Here large datasets are partitioned horizontally across several disk, this enables us to exploit the I/O bandwidth of the disks by reading and writing them in parallel. This can be done in the following ways: a. Round Robin Partitioning b. Hash Partitioning c. Range Partitioning
  • 60. Parallelizing Sequential Operator Evaluation Code  Input data streams are divided into parallel data streams. The output of these streams are merged as needed to provide as inputs for a relational operator, and the output may again be split as needed to parallelize subsequent processing.
  • 61. PARALLELIZING INDIVIDUAL OPERATIONS Bulk Loading and Scanning: Pages can be read in parallel while scanning a relation and the retrieved tuples can then be merged, if the relation is partitioned across several disks. If a relation has associated indexes, any sorting of data entries required for building the indexes during bulk loading can also be done in parallel.
  • 62. Sorting: Sorting could be done by redistributing all tuples in the relation using range partitioning. Ex. Sorting a collection of employee tuples by salary whose values are in a certain range. For N processors each processor gets the tuples which lie in range assigned to it. Like processor 1 contains all tuples in range 10 to 20 and so on. Each processor has a sorted version of the tuples which can then be combined by traversing and collecting the tuples in the order on the processors (according to the range assigned) PARALLELIZING INDIVIDUAL OPERATIONS
  • 63. Joins Here we consider how the join operation can be parallelized Consider 2 relations A and B to be joined using the age attribute. A and B are initially distributed across several disks in a way that is not useful for join operation So we have to decompose the join into a collection of k smaller joins by partitioning both A and B into a collection of k logical partitions. If same partitioning function is used for both A and B then the union of k smaller joins will compute to the join of A and B. PARALLELIZING INDIVIDUAL OPERATIONS
  • 64. DISTRIBUTED DATABASES A Distributed Database should exhibit the following properties: 1) Distributed Data Independence: - The user should be able to access the database without having the need to know the location of the data. 2) Distributed Transaction Atomicity: - The concept of atomicity should be distributed for the operation taking place at the distributed sites.
  • 65. Types of Distributed Databases are:- a) Homogeneous Distributed Database is where the data stored across multiple sites is managed by same DBMS software at all the sites. b) Heterogeneous Distributed Database is where multiple sites which may be autonomous are under the control of different DBMS software.
  • 66. There are 3 architectures: Client-Server: Collaborating Server: Middleware: Architecture of DDBs
  • 67. Client-Server: A Client-Server system has one or more client processes and one or more server processes, and a client process can send a query to any one server process. Clients are responsible for user-interface issues, and servers manage data and execute transactions. Thus, a client process could run on a personal computer and send queries to a server running on a mainframe. In the client sever architecture a single query cannot be split and executed across multiple servers because the client process would have to be quite complex and intelligent enough to break a query into sub queries to be executed at different sites and then place their results together making the client capabilities overlap with the server. This makes it hard to distinguish between the client and server
  • 68. What is a distributed database?
  • 69. Why distribute a database  Scalability and performance  Resilience to failures Throughput Datasize versusX X
  • 70. Why distribute a database  Data is already distributed  Or needs to be distributed  Data is in multiple systems
  • 71. Why not distribute a database You must earn your complexity!  Communication needed  Must build a complex infrastructure  Unpredictable latencies must be masked  More types of failures  More components to fail  Network failures  Congestion, timeouts  More complex planning  Communication cost plus I/O cost  May have to deal with heterogeneity  Different types of systems  Different schemas, possibly incompatible  Different administrative domains
  • 72. Client-Server: Advantages: - 1. Simple to implement because of the centralized server and separation of functionality. 2. Expensive server machines are not underutilized with simple user interactions which are now pushed on to inexpensive client machines. 3. The users can have a familiar and friendly client side user interface rather than unfamiliar and unfriendly server interface
  • 73. Collaborating Server In Collaborating Server system, we can have collection of database servers, each capable of running transactions against local data, which cooperatively execute transactions spanning multiple servers. When a server receives a query that requires access to data at other servers, it generates appropriate sub queries to be executed by other servers and puts the results together to compute answers to the original query.
  • 74. Middleware Middleware system is as special server, a layer of software that coordinates the execution of queries and transactions across one or more independent database servers. The Middleware architecture is designed to allow a single query to span multiple servers, without requiring all database servers to be capable of managing such multi site execution strategies. It is especially attractive when trying to integrate several legacy systems, whose basic capabilities cannot be extended. We need just one database server that is capable of managing queries and transactions spanning multiple servers; the remaining servers only need to handle local queries and transactions.
  • 75. STORING DATA IN DDBS Data storage involved 2 concepts 1. Fragmentation 2. Replication
  • 76. Fragmentation: It is the process in which a relation is broken into smaller relations called fragments and possibly stored at different sites. It is of 2 types 1. Horizontal Fragmentation where the original relation is broken into a number of fragments, where each fragment is a subset of rows. The union of the horizontal fragments should reproduce the original relation. 2. Vertical Fragmentation where the original relation is broken into a number of fragments, where each fragment consists of a subset of columns. The system often assigns a unique tuple id to each tuple in the original relation so that the fragments when joined again should from a lossless join. The collection of all vertical fragments should reproduce the original relation.
  • 77. Replication: Replication occurs when we store more than one copy of a relation or its fragment at multiple sites. Advantages:- 1. Increased availability of data: If a site that contains a replica goes down, we can find the same data at other sites. Similarly, if local copies of remote relations are available, we are less vulnerable to failure of communication links. 2. Faster query evaluation: Queries can execute faster by using a local copy of a relation instead of going to a remote site.
  • 82. How do they work?  What is shared?  How to distribute the data?  How to process the data?  How to update the data?
  • 83. Server 1 Server 2 Server 3 Server 4 Bike $866/2/07 636353 Chair $106/5/07 662113 How to distribute the data? Couch $5706/1/07 424252 Car $11236/1/07 256623 Lamp $196/7/07 121113 Bike $566/9/07 887734 Scooter $186/11/07 252111 Hammer $80006/11/07 116458
  • 84. How to distribute the data? Hash partitioning Range partitioning (key,value) Hash() (key,value) <= X > X
  • 85. Server 1 Server 2 Server 3 Server 4 How to distribute the data? Bike Chair Couch Car Lamp Bike Scooter Hammer $86 $10 $570 $1123 $19 $56 $18 $8000 6/2/07 6/5/07 6/1/07 6/1/07 6/7/07 6/9/07 6/11/07 6/11/07 636353 662113 424252 256623 121113 887734 252111 116458
  • 86. Query processing  Intra-operator parallelism  Inter-operator parallelism
  • 87. Parallel scanning filter filter filter filter filter filter Result
  • 91. Join
  • 94. Updating distributed data  Synchronous: read-any-write-all Reads are fast
  • 95. Updating distributed data  Synchronous: voting
  • 96. Updating distributed data  Synchronous: voting Writes tolerant to disconnection
  • 97. Consistency of distributed data  Should provide ACID
  • 103. Conclusion  Parallelism and distribution very useful  Performance  Fault tolerance  Scale  But complex!  Rethink lots of aspects of the system  Must earn the complexity