2. Concurrency Control Techniques.............................................................................................. 2
Concurrency control .............................................................................................................................................2
Locking Techniques for concurrency control.....................................................................................3
Pessimistic Locking................................................................................................................................................3
Optimistic Locking.................................................................................................................................................3
Lock Problems..................................................................................................................................3
Deadlock ...............................................................................................................................................................3
Live lock ................................................................................................................................................................4
Basic Time stamping.........................................................................................................................4
Time stamping protocols for concurrency control .................................................................... 4
Time-stamp ordering protocol..........................................................................................................5
Thomas' Write rule...........................................................................................................................................5
Validation based protocol ....................................................................................................... 6
Multiple Granularities ............................................................................................................. 7
Gray, et al.: Granularity of Locks...........................................................................................................................8
Multi version schemes ............................................................................................................. 9
Recovery with concurrent transaction ................................................................................... 10
Concurrency Control Techniques
Concurrency control
Concurrency control is a database management systems (DBMS) concept that is used to address
conflicts with the simultaneous accessing or altering of data that can occur with a multi-user
system. Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous
transactions while preserving data integrity. The Concurrency is about to control the multi-user
access of Database
3. Locking Techniques for concurrency control
Pessimistic Locking
This concurrency control strategy involves keeping an entity in a database locked the entire time
it exists in the database's memory. This limits or prevents users from altering the data entity that
is locked. There are two types of locks that fall under the category of pessimistic locking: write
lock and read lock. With write lock, everyone but the holder of the lock is prevented from
reading, updating, or deleting the entity. With read lock, other users can read the entity, but no
one except for the lock holder can update or delete it.
Optimistic Locking
This strategy can be used when instances of simultaneous transactions, or collisions, are
expected to be infrequent. In contrast with pessimistic locking, optimistic locking doesn't try to
prevent the collisions from occurring. Instead, it aims to detect these collisions and resolve them
on the chance occasions when they occur. Pessimistic locking provides a guarantee that database
changes are made safely. However, it becomes less viable as the number of simultaneous users
or the number of entities involved in a transaction increase because the potential for having to
wait for a lock to release will increase. Optimistic locking can alleviate the problem of waiting
for locks to release, but then users have the potential to experience collisions when attempting to
update the database.
Lock Problems
Deadlock
When dealing with locks two problems can arise, the first of which being deadlock. Deadlock
refers to a particular situation where two or more processes are each waiting for another to
release a resource, or more than two processes are waiting for resources in a circular chain.
Deadlock is a common problem in multiprocessing where many processes share a specific type
of mutually exclusive resource. Some computers, usually those intended for the time-sharing
and/or real-time markets, are often equipped with a hardware lock, or hard lock, which
guarantees exclusive access to processes, forcing serialization. Deadlocks are particularly
disconcerting because there is no general solution to avoid them. A fitting analogy of the
deadlock problem could be a situation like when you go to unlock your car door and your
passenger pulls the handle at the exact same time, leaving the door still locked. If you have ever
been in a situation where the passenger is impatient and keeps trying to open the door, it can be
4. very frustrating. Basically you can get stuck in an endless cycle, and since both actions cannot be
satisfied, deadlock occurs.
Live lock
Live lock is a special case of resource starvation. A live lock is similar to a deadlock, except that
the states of the processes involved constantly change with regard to one another wile never
progressing. The general definition only states that a specific process is not progressing. For
example, the system keeps selecting the same transaction for rollback causing the transaction to
never finish executing. Another live lock situation can come about when the system is deciding
which transaction gets a lock and which waits in a conflict situation. An illustration of live lock
occurs when numerous people arrive at a four way stop, and are not quite sure who should
proceed next. If no one makes a solid decision to go, and all the cars just keep creeping into the
intersection afraid that someone else will possibly hit them, then a kind of live lock can happen.
Basic Time stamping
Basic time stamping is a concurrency control mechanism that eliminates deadlock. This method
doesn’t use locks to control concurrency, so it is impossible for deadlock to occur. According to
this method a unique timestamp is assigned to each transaction, usually showing when it was
started. This effectively allows an age to be assigned to transactions and an order to be assigned.
Data items have both a read-timestamp and a write-timestamp. These timestamps are updated
each time the data item is read or updated respectively. Problems arise in this system when a
transaction tries to read a data item which has been written by a younger transaction. This is
called a late read. This means that the data item has changed since the initial transaction start
time and the solution is to roll back the timestamp and acquire a new one. Another problem
occurs when a transaction tries to write a data item which has been read by a younger
transaction. This is called a late write. This means that the data item has been read by another
transaction since the start time of the transaction that is altering it. The solution for this problem
is the same as for the late read problem. The timestamp must be rolled back and a new one
acquired. Adhering to the rules of the basic time stamping process allows the transactions to be
serialized and a chronological schedule of transactions can then be created. Time stamping may
not be practical in the case of larger databases with high levels of transactions. A large amount of
storage space would have to be dedicated to storing the timestamps in these cases.
Time stamping protocols for concurrency control
The most commonly used concurrency protocol is time-stamp based protocol. This protocol uses
either system time or logical counter to be used as a time-stamp. Lock based protocols manage
5. the order between conflicting pairs among transaction at the time of execution whereas time-
stamp based protocols start working as soon as transaction is created. Every transaction has a
time-stamp associated with it and the ordering is determined by the age of the transaction. A
transaction created at 0002 clock time would be older than all other transaction, which come
after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and
priority may be given to the older one. In addition, every data item is given the latest read and
write-timestamp. This lets the system know, when last read was and write operation made on the
data item.
Time-stamp ordering protocol
The timestamp-ordering protocol ensures serializability among transaction in their conflicting
read and writes operations. This is the responsibility of the protocol system that the conflicting
pair of tasks should be executed according to the timestamp values of the transactions.
• Time-stamp of Transaction Ti is denoted as TS(Ti).
• Read time-stamp of data-item X is denoted by R-timestamp(X).
• Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows:
• If a transaction Ti issues read(X) operation:
o If TS(Ti) < W-timestamp(X)
Operation rejected.
o If TS(Ti) >= W-timestamp(X)
Operation executed.
o All data-item Timestamps updated.
• If a transaction Ti issues write(X) operation:
o If TS(Ti) < R-timestamp(X)
Operation rejected.
o If TS(Ti) < W-timestamp(X)
Operation rejected and Ti rolled back.
o Otherwise, operation executed.
Thomas' Write rule
This rule states that in case of:
6. • If TS(Ti) < W-timestamp(X)
•
• Operation rejected and Ti rolled back. Timestamp ordering rules can be modified to make
the schedule view serializable. Instead of making Ti rolled back, the 'write' operation
itself is ignored.
Validation based protocol
A validation phase checks whether any of the transaction’s updates violate serializability. Certain
information needed by the validation phase must be kept by the system. If serializability is not
violated, the transaction is committed and the database is updated from the local copies;
otherwise, the transaction is aborted and then restarted later.
There are three phases for this concurrency control protocol:
1. Read phase: A transaction can read values of committed data items from the database.
However, updates are applied only to local copies (versions) of the data items kept in the
transaction workspace.
2. Validation phase: Checking is performed to ensure that serializability will not be violated if
the transaction updates are applied to the database.
3. Write phase: If the validation phase is successful, the transaction updates are applied to the
database; otherwise, the updates are discarded and the transaction is restarted.
The idea behind optimistic concurrency control is to do all the checks at once; hence, transaction
execution proceeds with a minimum of overhead until the validation phase is reached. If there is
little interference among transactions, most will be validated successfully. However, if there is
much interference, many transactions that execute to completion will have their results discarded
and must be restarted later. Under these circumstances, optimistic techniques do not work well.
The techniques are called "optimistic" because they assume that little interference will occur and
hence that there is no need to do checking during transaction execution. The optimistic protocol
we describe uses transaction timestamps and also requires that the write sets and read sets of the
transactions be kept by the system. In addition, start and end times for some of the three phases
need to be kept for each transaction. Recall that the write set of a transaction is the set of items it
writes, and the read set is the set of items it reads. In the validation phase for transaction Ti, the
protocol checks that Ti does not interfere with any committed transactions or with any other
transactions currently in their validation phase. The validation phase for Ti checks that, for each
such transaction Tj that is either committed or is in its validation phase, one of the following
conditions holds:
7. 1. Transaction Tj completes its write phase before Ti starts its read phase.
2. Ti starts its write phase after Tj completes its write phase, and the read_set of Ti has no items
in common with the write_set of Tj.
3. Both the read_set and write_set of Ti have no items in common with the write_set of Tj, and
Tj completes its read phase before Ti completes its read phase.
When validating transaction Ti, the first condition is checked first for each transaction Tj, since
(1) is the simplest condition to check. Only if condition (1) is false is condition (2) checked, and
only if (2) is false is condition (3)—the most complex to evaluate—checked. If any one of these
three conditions holds, there is no interference and Ti is validated successfully. If none of these
three conditions holds, the validation of transaction Ti fails and it is aborted and restarted later
because interference may have occurred.
Multiple Granularities
• Locks vs. Latches
o Locks assure logical (i.e. xactional) consistency of data. They are implemented
via a lock table, held for a long time (e.g. 2PL), and part of the deadlock detection
mechanism.
o Latches are like semaphores. They assure physical consistency of data and
resources, which are not visible at the transactional level (e.g. latch a frame in a
buffer pool). They are not subject to 2PL (usually held for a very short time), and
the deadlock detector doesn't know about them (and therefore...?).
o Acquiring a latch is much cheaper than acquiring a lock (10s vs. 100s of
instructions in the no-conflict case)
latch control info is always in VM in a fixed place
lock tables are dynamically managed (can have a lock per tuple!), so data
structure mgmt on lock/release is more complex
• Lock table is a hashed main-mem structure
• Lock/Unlock must be atomic operations (protected by latches or critical sections)
• typically costs several hundred instructions to lock/unlock an item
• Lock Upgrades
o Suppose T1 has an S lock on P, T2 is waiting to get X lock on P, and now T3
wants S lock on P. Do we grant T3 an S lock?
No! (Starvation, unfair, etc.) So
8. • Manage FCFS queue for each locked object with outstanding requests
• all exacts that are adjacent and compatible are a compatible group
• The front group is the granted group
• group mode is most restrictive mode amongst group members
• Conversions: often want to convert (e.g. S to X for "test and modify" actions). Should
conversions go to back of queue?
• No! Instant deadlock. So put conversions right after granted group.
• More on deadlocks below
• More notes for the curious...
• We will read more in a couple weeks about non-2PL locking schemes, and the
consistency guarantees they provide
• Most DBMSs have multiple lock request types, in part to help with that. E.g. DB2 has:
• Conditional vs. unconditional lock requests (are you willing to wait?)
• instant vs. manual vs. commit duration lock requests (wait, but plan to hold for different
amounts of time)
Gray, et al.: Granularity of Locks
Theme: Correctness and performance
• Granularity tradeoff: small granularity (e.g. field of a tuple) means high concurrency but
high overhead. Large granularity (e.g. file) means low overhead but low concurrency.
• Possible granularities:
o DB
o Areas
o Files
o Pages
o Tuples (records)
o fields of tuples
• Want hierarchical locking, to allow "large" xacts to set large locks, "small" xacts to set
small locks
• Problem: T1 S-locks a record in a file, then T2 X-locks the whole file. How can T2
discover that T1 has locked the record?
• Solution: "Intention" locks
NL IS IX S SIX X
NL
IS
IX
S
9. SIX
X
• IS and IX locks
• T1 obtains S lock on record in question, but first gets IS lock on file.
• Now T2 cannot get X lock on file
• However, T3 can get IS or S lock on file (the reason for distinguishing IS and IX: if there
were only I, T3 couldn’t get an S lock on file)
• For higher concurrency, one more mode: SIX. Intuitively, you read all of the object but
only lock some subparts. Allows concurrent IS locks (IX alone would not). Note: gives S
access, so disallows IX to others.
• requires that exacts lock items root to leaf in the hierarchy, unlock leaf to root
• Generalization to DAG of resources: X locks all paths to a node, S locks at least one.
Multi version schemes
Multisession concurrency control (MCC or MVCC), is a concurrency control method commonly
used by database management systems to provide concurrent access to the database and in
programming languages to implement transactional memory. If someone is reading from a
database at the same time as someone else is writing to it, it is possible that the reader will see a
half-written or inconsistent piece of data. There are several ways of solving this problem, known
as concurrency control methods. The simplest way is to make all readers wait until the writer is
done, which is known as a lock. This can be very slow, so MVCC takes a different approach:
each user connected to the database sees a snapshot of the database at a particular instant in time.
Any changes made by a writer will not be seen by other users of the database until the changes
have been completed (or, in database terms: until the transaction has been committed.) When an
MVCC database needs to update an item of data, it will not overwrite the old data with new data,
but instead mark the old data as obsolete and add the newer version elsewhere. Thus there are
multiple versions stored, but only one is the latest. This allows readers to access the data that was
there when they began reading, even if it was modified or deleted part way through by someone
else. It also allows the database to avoid the overhead of filling in holes in memory or disk
structures but requires (generally) the system to periodically sweep through and delete the old,
obsolete data objects. For a document-oriented database it also allows the system to optimize
documents by writing entire documents onto contiguous sections of disk—when updated, the
entire document can be re-written rather than bits and pieces cut out or maintained in a linked,
non-contiguous database structure. MVCC provides point in time consistent views. Read
transactions under MVCC typically use a timestamp or transaction ID to determine what state of
the DB to read, and read these versions of the data. This avoids managing locks for read
transactions because writes can be isolated by virtue of the old versions being maintained, rather
10. than through a process of locks or murexes. Writes affect a future version but at the transaction
ID that the read is working at, everything is guaranteed to be consistent because the writes are
occurring at a later transaction ID. MVCC uses timestamps or increasing transaction IDs to
achieve transactional consistency. MVCC ensures a transaction never has to wait for a database
object by maintaining several versions of an object. Each version would have a write timestamp
and it would let a transaction (Ti) read the most recent version of an object which precedes the
transaction timestamp (TS(Ti)).
If a transaction (Ti) wants to write to an object, and if there is another transaction (Tk), the
timestamp of Ti must precede the timestamp of Tk (i.e., TS(Ti) < TS(Tk)) for the object write
operation to succeed. Every object would also have a read timestamp, and if a transaction Ti
wanted to write to object P, and the timestamp of that transaction is earlier than the object's read
timestamp (TS(Ti) < RTS(P)), the transaction Ti is aborted and restarted. Otherwise, Ti creates a
new version of P and sets the read/write timestamps of P to the timestamp of the transaction TS
(Ti). The obvious drawback to this system is the cost of storing multiple versions of objects in
the database. On the other hand reads are never blocked, which can be important for workloads
mostly involving reading values from the database. MVCC is particularly adept at implementing
true snapshot isolation, something which other methods of concurrency control frequently do
either incompletely or with high performance costs.
Recovery with concurrent transaction
Remote backup, described next, is one of the solutions to save life. Alternatively, whole database
backups can be taken on magnetic tapes and stored at a safer place. This backup can later be
restored on a freshly installed database and bring it to the state at least at the point of backup.
Grown up databases are too large to be frequently backed-up. Instead, we are aware of
techniques where we can restore a database by just looking at logs. So backup of logs at frequent
rate is more feasible than the entire database. Database can be backed-up once a week and logs,
being very small can be backed-up every day or as frequent as every hour