2. What is a Transaction?
A transaction is a unit of program under execution that accesses and possibly
updates various data items.
It is either completed in its entirety or not done at all.
This logical unit of database processing includes one or more access
operations (read -retrieval, write - insert or update, delete).
E.g. transaction to transfer $50 from account A to account B:
1. read(A) 4. read(B)
2. A := A – 50 5. B := B + 50
3. write(A) 6. write(B)
3. Cont’d.
A successful transaction changes the database from one consistent state to
another.
A consistent database state is one in which all data integrity constraints are
satisfied.
To ensure consistency of the database, every transaction must begin with the
database in a known consistent state and end in a consistent state.
Two main issues to deal with:
Failures of various kinds, such as hardware failures and system crashes
Concurrent execution of multiple transactions
4. Cont’d.
A transaction may consist of a single SQL statement or a collection of
related SQL statements.
If the database operations in a transaction do not update the database but only
retrieve data, the transaction is called a read-only transaction; otherwise it is
known as a read-write transaction
A transaction can have one of two outcomes:
Committed: If a transaction completed successfully and the database reaches a
new consistent state.
Aborted: If the transaction does not execute successfully. The database must be
restored to the consistent state it was before the transaction started. Such a
transaction is called rolled back or undone.
5. Cont’d.
According to the number of users who can use the system concurrently a
DBMS is classified into two.
1. Single-User : A DBMS is single-user if at most one user at a time can use
the system
2. Multiuser: if many users can use the system and accesses the database
concurrently.
Database systems used in banks, insurance agencies, stock exchanges,
supermarkets, and many other applications are multiuser systems.
In these systems, hundreds or thousands of users are typically operating on the
data-base by submitting transactions concurrently to the system.
6. Concurrency Control
when two or more users are accessing the database simultaneously and at least
one is updating data, there may be interference that can result in
inconsistencies.
Concurrency control is the process of managing simultaneous operations on
the database without having them interfere with one another.
Why Concurrency Control is needed?
A major objective in developing a database is to enable many users to access
shared data concurrently.
The objective of concurrency control is to ensure the serializability of
transactions in a multiuser database environment.
7. Problems of Concurrent Sharing
The simultaneous execution of transactions over a shared database can create
several data integrity and consistency problems.
The three main problems are lost updates, uncommitted data, and
inconsistent retrievals.
1. Lost Update
The lost update problem occurs when two concurrent transactions, T1 and T2,
are updating the same data element and one of the updates is lost
(overwritten by the other transaction.)
8. Cont’d.
Assume that you have a product whose current quantity value is 35. Also
assume that two concurrent transactions, T1 and T2, occur that update the
quantity value for some item in the database. The transactions are as below:
Transaction Computation
T1: Buy 100 units quantity =quantity +100
T2 : Sell 30 units quantity=qunatitiy-30
9. Cont’d.
Following table shows the serial execution of those transactions under
normal circumstances, yielding the correct answer quantity = 105.
Time Transaction Step Stored Value
1 T1 Read quantity 35
2 T1 quantity =35 +100
3 T1 Write quantity 135
4 T2 Read quantity 135
5 T2 quantity =135-30
6 T2 Write quantity 105
10. Cont’d.
The sequence depicted below shows how the lost update problem can arise.
Note that the first transaction (T1) has not yet been committed when the
second transaction (T2) is executed.
Time Transaction Step Stored Value
1 T1 Read quantity 35
2 T2 Read quantity 35
3 T1 quantity =35 +100
4 T2 quantity =35-30
5 T1 Write quantity 135
6 T2 Write quantity 5 (Lost update)
11. 2. Uncommitted Data (The Temporary Update (or
Dirty Read) Problem)
Uncommitted data occurs when two transactions, T1 and T2, are executed
concurrently and the first transaction (T1) is rolled back after the second
transaction (T2) has already accessed the uncommitted data.
Example: let’s use the same transactions described during the lost updates
discussion. T1 is forced to roll back due to an error. T1 transaction is rolled
back to eliminate the addition of the 100 units. Because T2 subtracts 30 from
the original 35 units, the correct answer should be 5.
12. Cont’d.
Following table shows the serial execution of those transactions under
normal circumstances, yielding the correct answer quantity = 105.
Time Transaction Step Stored Value
1 T1 Read quantity 35
2 T1 quantity =35 +100
3 T1 Write quantity 135
4 T1 Rollback 35
5 T2 Read quantity 35
6 T2 quantity =35-30
7 T2 Write quantity 5
13. Cont’d.
Following table shows how the uncommitted data problem can arise when
the ROLLBACK is completed after T2 has begun its execution.
Time Transaction Step Stored Value
1 T1 Read quantity 35
2 T1 quantity =35 +100
3 T1 Write quantity 135
4 T2 Read quantity
(Reading uncommitted data)
135
5 T2 quantity =135-30
6 T1 ROLLBACK 35
7 T2 Write quantity 105
14. 3. Inconsistent Retrievals (The Incorrect
Summary Problem)
Inconsistent retrievals occur when a transaction accesses data before and
after another transaction(s) finish working with such data.
For example, an inconsistent retrieval would occur if transaction T1
calculated some summary (aggregate) function over a set of data while another
transaction (T2) was updating the same data.
The problem is that the transaction might read some data before they are
changed and other data after they are changed, thereby yielding inconsistent
results.
15. Cont’d.
To illustrate that problem, assume the following conditions:
1. T1 calculates the total quantity of products stored in the PRODUCT table.
2. At the same time, T2 updates the quantity for two of the items in the
PRODUCT table.
The two transactions are shown as:
Transaction 1 Transaction 2
Select sum(quantity) from product; Update product set quantity=quantity+10 where pid = 1003
Update product set quantity =quantity-10 where pid=1004
Commit
16. Cont’d.
The following table shows the serial execution of those transactions under
normal circumstances
Product_id(pid) Before update After Update
1001 8 8
1002 32 32
1003 15 25(15+10)
1004 23 13(23-10)
1005 8 8
Total 86 86
17. Cont’d.
Below Table demonstrates that inconsistent retrievals are possible during the
transaction execution, making the result of T1’s execution incorrect.
Time Transaction Action Value Total
1 T1 Read quantity for pid=1001 8 8
2 T1 Read quantity for pid=1002 32 40
3 T2 Read quantity for pid=1003 15
4 T2 Quantity=qunatity+10
5 T2 Write quantity for pid=1003 25
6 T1 Read quantity for pid=1003 25 65
7 T1 Read quantity for pid=1004 23 88
8 T2 Read quantity for pid=1004 23
9 T2 Quantity=qunatity-10
10 T2 Write quantity for pid=1004 13
18. 4. The Unrepeatable Read Problem
A transaction T1 reads the same item twice and the item is changed by
another transaction T2 between the two reads. Hence, T1 receives different
values for its two reads of the same item.
For example, during an airline reservation transaction, a customer inquiry
about seat availability on several flights.
When the customer decides on a particular flight, the transaction then reads
the number of seats on that flight a second time before completing the
reservation, and it may end up reading a different value for the item.
19. Concepts of Serializability
Schedule: – a sequence of instructions that specify the chronological
order in which instructions of concurrent transactions are executed.
A schedule for a set of transactions must consist of all instructions of
those transactions.
The scheduler is a special DBMS process that establishes the order
in which the operations within concurrent transactions are executed.
The scheduler interleaves the execution of database operations to ensure
serializability and isolation of transactions.
20. Cont’d.
To determine the appropriate order, the scheduler bases its actions on
concurrency control algorithms, such as locking or time stamping
methods.
Additionally, the scheduler facilitates data isolation to ensure that two
transactions do not update the same data element at the same time.
21. Cont.
Two operations are said to conflict if they:
1. Belong to different transactions
2. Access the same database item and
3. At least one of the two operations is a write operation.
22. Cont.
Transactions T1 T2 Result
Operations
Read Read No conflict
Read Write Conflict
Write Read Conflict
Write Write Conflict
23. Characterizing Schedules Based on Serializability
Serial Schedule: A schedule S is serial if, for every transaction T participating
in the schedule, all the operations of T are executed consecutively in the
schedule.
No interleaving occurs in serial schedule
Eg: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from
A to B.
A serial schedule in which T1 is followed by T2 OR T2 is followed by T1 can
be given as:
25. Cont.
In serial execution there is no interference between transactions,
because only one transaction is executing at any given time. So, it
prevents the occurrence of concurrency problems.
Serial schedule never leaves the database in an inconsistent sate,
therefore every serial schedule is considered correct.
26. Problems of Serial Schedules:
They limit concurrency or interleaving of operations.
If a transaction waits for an I/O operation to complete, we cannot
switch the CPU Processor to another transaction.
If some transaction T is long, the other transactions must wait for T to
complete all its operations.
27. Non-Serial Schedule (Concurrent Schedule):
The schedule that is not serial is called non-serial schedule.
Here each sequence interleaves operations from the two transactions.
For eg: Let T1 transfer $50 from A to B, and T2 transfer 10% of the
balance from A to B.
Two non-serial schedules with interleaving of operations for T1 and T2 is
listed below.
28. Cont.
Non Serial Schedule I
TI T2
Read(A)
AA-50
Write(A)
Read(B)
B=B+50
Write(B)
Read(A)
TempA*0.1
AA-temp
Write(A)
Read(B)
B=B + temp
Write(B)
Non Serial Schedule II
TI T2
Read(A)
AA-50
Write(A)
Read(B)
B=B+50
Write(B)
Read(A)
TempA*0.1
AA-temp
Write(A)
Read(B)
B=B + temp
Write(B)
29. When are two schedules considered equivalent?
There are several ways to define schedule equivalence.
1. Result Equivalency
2. Conflict Equivalency
3. View Equivalence
30. Cont.
1. Result Equivalent
Two schedules are called result equivalent if they produce the same final
state of the database.
However two different schedules may accidently produce the same final
state. Hence result equivalence is not used to define equivalence of
schedules.
31. Cont.
2. Conflict Equivalent
Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules.
If a schedule S can be transformed into a schedule S´ by a series of swaps of
non-conflicting instructions, we say that S and S´ are conflict equivalent.
32. Cont.
Conflict Serializable Schedule
Any given concurrent schedule is said to be conflict serializable schedule if it is
CONFLICT EQUIVALENT to one of the possible serial schedule.
Schedule S is conflict serializable if it is conflict equivalent to some serial schedule S’.
For any 2 given schedules, say S1 and S2 if the order of all possible conflicting
operations is the same in both then it is said to be conflict equivalent.
33. Cont.
Note:
Being Serializable is distinct from being Serial.
A Serial Schedule leads to inefficient utilization of CPU because of
no interleaving of operations from different transactions.
A Serializable Schedule gives the benefits of concurrent execution
without giving up any correctness.
34. Cont.
Schedule I (S1)
(Concurrent Schedule)
Schedule II (S2)
(Serial Schedule)
Schedule III(S3)
(Serial Schedule)
T1 T2 T2 T1 T1 T2
R(A) W(A) R(A)
W(A) W(B) R(B)
R(B) R(A) W(A)
W(B) R(B) W(B)
Existing conflicting
operation is R-W
Existing conflicting
operation is W-R
Existing conflicting
operation is R-W
So S1and S3 are Conflict Equivalent
35. Precedence Graph
Precedence Graph is a graph used to test the Conflict Serializability of a
Schedule
The algorithm looks at only the read item and write item operations in a
schedule to construct a precedence graph.
It is a directed graph G = (N, E) that consists of a set of nodes N = {T1,
T2, ..., Tn } and a set of directed edges E = {e1,e2, ..., em }.
There is one node in the graph for each transaction Ti in the schedule.
36. Algorithm For Testing Conflict Serializability of a Schedule
1. For each transaction Ti participating in schedule S, create a node and labelled it Ti in the
precedence graph.
2. For each case in S where Tj executes a read item(X) after Ti executes a write item(X), create
an edge (Ti→ Tj) in the precedence graph.
3. For each case in S where Tj executes a write item(X) after Ti executes a read item(X), create
an edge (Ti→Tj) in the precedence graph.
4. For each case in S where Tj executes a write item(X) after Ti executes a write item(X),
create an edge (Ti→ Tj) in the precedence graph.
5. The schedule S is serializable if and only if the precedence graph has no cycles.
(Cycle means - the sequence starts and ends at the same node).
37. Exercise
Consider the following concurrent schedule for three transactions and
specify whether it is conflict serializable schedule.
T1 T2 T3
R(X)
R(X)
W(X)
R(X)
W(X)
38. Cont.
From T2T3 there is R-W conflict,
From T3T1 there is W-R conflict,
From T2T1 there is R-W conflict.
There is no cycle or loop in precedence graph. Therefore the given schedule is a
conflict serializable schedule
T1 T2 T3
R(X)
R(X)
W(X)
R(X)
W(X)
39. Topological Sorting
The process of ordering the nodes of an acrylic graph is known as
topological sorting.
We use topological sorting to find the equivalent serial schedule for the
conflict serializable schedule.
40. Steps of Topological Sorting
Step 1: Consider the In-degree of the nodes. Find the nodes with in-
degree zero.
In-degree= No of edges coming to the node.
Out-degree= No of edges going out of the node
For the given example, In-degree of T1, I=0 and In-degree of T2, I=1
41. Steps of Topological Sorting
Step 2: Ignore the node whose in-degree is zero and note it down, T1.
Step 3: Ignore all those edges which is connected with T1
Step 4: Consider the remaining nodes and continue the above steps again.
42. Exercise 1
Consider the following graph for three transactions and specify whether it
supports conflict serializable schedule and write the equivalent serial
schedule
43. Step 1:
Indegree of T1 –0
Indegree of T2 –1
Indegree of T3 –2
Therefore consider T1 first and delete/ignore T1 and all edges from T1.
44. Step 2:
Indegree of T2 –0
Indegree of T3 –1
Therefore consider T2 first and delete/ignore T2 and all edges from T2.
45. Step 3:
Indegree of T3 –0
Therefore consider T3
Hence the serial schedule equivalent to the conflict serializable schedule is
T1, T2 and T3.
T3
46. Exercise:2
Consider the following concurrent schedule for three transactions and specify whether it
is conflict serializable schedule. If it is a conflict serializable schedule find the equivalent
serial schedule.
T1 T2 T3
R(X)
W(X)
W(X)
W(X)
47. Cont.
From T1T2 there is R-W conflict,
From T2T1 there is W-W conflict,
From T2T3 there is W-W conflict.
From T1T3 there is W-W conflict.
There is cycle or loop in precedence graph. Therefore the given schedule is NOT conflict
serializable schedule
T1 T2 T3
R(X)
W(X)
W(X)
W(X)
T1
T3
T2
48. Class Work
Consider the following two schedules S1 and S2 for the transactions T1
and T2. Now determine which among the schedules are conflict
serializable schedule.
Concurrent Schedule (S1) Concurrent Schedule (S2)
T1 T2 T1 T2
R(x) R(x)
R(y) R(x) R(x)
W(y) W(y)
W(x) R(y)
W(x)
49. View Serializable Schedule
Any given concurrent schedule is said to be view serializable schedule if it is View
EQUIVALENT to one of the possible serial schedule.
View Equivalent
Two schedules is said to be view equivalent if the following three conditions hold:
1. Initial Reads: If T1 reads the initial data X in S1, then T1 also reads the initial data X in S2.
2. W-R Conflict: If T1 reads the value written by T2 in S1, then T1 also reads the value written by
T2 in S2.
3. Final Write: If T1 performs the final write on the data value in S1, then it also performs the final
write on the data value in S2.
50. Cont.
Consider the following example:
Schedule 1(S1)
(Concurrent Schedule)
Schedule 2(S2)
(Serial Schedule)
T1 T2 T3 T2 T3 T1
R(X) R(X)
R(X) R(X)
W(X) W(X)
R(X) R(X)
W(X) W(X)
51. Cont.
Initial Reads: In S1 there are only two initial reads, one from T2 and one from
T3. Similarly in the given serial schedule also there are two initial reads exactly one
from T2 and T3 on data item X. So the initial read condition is satisfied here.
W-R Conflict: In S1 there is one write- read conflict from T3 to T1. Similarly
there is write-read schedule from T3 to T1. So the second condition is also
satisfied.
Final write: In S1 the final write on X is from T1, similarly in schedule S2 also
the final write on X is from T1. So the final condition is also satisfied.
Hence the above given schedule is view serializable.