This document proposes an approach called RDB that can render databases intrusion resilient by tracking transaction dependencies and enabling selective transaction rollback. It discusses challenges in determining which transactions were malicious after a compromise and repairing databases. RDB inserts a proxy that intercepts SQL statements to track dependencies by adding transaction IDs to rows and logs. It then analyzes logs to reconstruct dependencies and generate compensating transactions to selectively rollback malicious transactions and repair the database. Experiments show the proxy overhead is between 6-13% for typical loads.
1. RDB: Repairable Database Systems
Alexey Smirnov and Tzi-cker Chiueh
Experimental Computer Systems Lab, Department of Computer Science, SUNY Stony Brook
Motivation Transaction Dependency Tracking SQL Statements Rewriting
Suppose you are a DBA and you have just noticed that
your database has been compromised 24 hours ago. How To track dependencies successfully, the tracking As a part of the dependency tracking mechanism, SQL proxy rewrites certain classes of SQL statements coming from a client.
would you repair the database? mechanism should be able to intercept both read and
write actions performed by the database server. Possible
Original statement Modified statement(s)
Currently, the only way to do this is to restore a database ways of interception are:
SELECT t1.a1,…,t1.an,…,tk.ank FROM t1,…,tk WHERE c SELECT t1.a1,…,t1.an,…,tk.ank, t1.trid,…,tk.trid FROM t1,…,tk WHERE c
backup and recommit all benign transactions manually. •Database triggers (online) – cannot add a trigger for
SELECT statements; SELECT t.trid FROM t WHERE c
SELECT SUM(t.a) FROM t WHERE c GROUP BY t.b
Challenges: (1) How to tell which transactions are benign SELECT SUM(t.a) FROM t WHERE c GROUP BY t.b
and which are malicious? Identifying the initial set of •Database log analysis (offline) – read operations are not
logged; no run-time overhead is its big advantage. UPDATE t SET a1=v1,…,an=vn WHERE c UPDATE t SET a1=v1,…,an=vn, trid=curTrID WHERE c
malicious transactions is not enough because initial
damage can spread over the database by subsequent INSERT INTO t(a1,…,an) VALUES (v1,…,vn) INSERT INTO t(a1,…,an,trid) VALUES (v1,…,vn,curTrID)
•Tracking proxy (online) – a small program sitting between
benign transactions. (2) The amount of data can be huge INSERT INTO trans_dep(curTrID,…)
and the repair process is very error-prone. There is a need a client and a server that intercepts all SQL statements COMMIT
a way to automate it. sent by the client and results sent back by the server. COMMIT
RDB uses both second and third approaches to
Ideally, an intrusion-resilient DBMS should be able to implement dependency tracking.
Track inter-transaction dependencies;
Perform a selective transaction rollback.
Database Repair Performance Results
We propose an implementation framework called RDB
that can render an off-the-self DBMS intrusion resilient The database is repaired by compensating malicious We used TPC-C benchmark to evaluate the run-time
without modifying its internals. RDB has two major transactions. When using RDB, the repair process overhead of JDBC proxy. The size of the test database was
components: tracking subsystem which runs at run-time consists of the following steps: about 4GB.
and recovery subsystem which runs offline.
• Database log analysis to reconstruct complete We varied the following parameters:
RDB inserts a proxy JDBC driver between the DB dependency information and generate compensating Transaction mix (read intensive and read/write intensive);
server and a client that transparently intercepts all transactions; Connection type (local or over a network);
queries and results. The proxy can be either on the
Total footprint size W (effect of database cache);
client side or on the server side. • Dependency graph visualization;
Our results suggest that the overhead of the proxy is between 6% and
Definition of Transaction Dependency • Repairing database by committing compensating 13% for a typical load.
A read set of an SQL statement S is the set of rows transactions.
fetched by this statement. Different DBMSs provide different facilities for log
analysis. We have studied three database servers: Oracle
We will say that statement S2 depends on statement S1 if 9.2.0, PostgreSQL 7.2.2, and Sybase ASE 12.5.
at least one row from the read set of S2 was modified by Eventually, all of them provide enough information to
S1. We will say that transaction T2 depends on transaction The following changes are made to the database at the generate compensating transactions.
T1 if at least one statement of T2 depends on a statement time of its creation:
We used GraphViz – a free
from T1. This definition is prone to both false positives and
graph drawing software
false negatives. Example of a false positive dependency: A new field tr_id is added to each table. It from AT&T.
contains the ID of last transaction that modified a
A1 A2 A3 particular row;
T1: SET A2=5 WHERE The application allows the
100 5 5 Table trans_dep(tr_id:INTEGER,
A1<250 user to select an initial set
200 5 6 T2: SELECT A3 WHERE dep_ids:VARCHAR) – stores IDs of of malicious transactions
A3>3 transactions that depend on transaction tr_id; and computes its transitive
300 1 7 Table annot(tr_id:INTEGER, closure. Then the result
descr:VARCHAR) – stores annotations for can be refined by the user
Also, in general it is impossible to determine all transaction tr_id; to build the final set of Contact Information
transaction dependencies by looking at the traffic transactions to be
compensated. ECSL Lab at SUNY Stony Brook:
between a client and the DB server only because part of The proxy uses its own transaction Ids because there is http://www.ecsl.cs.sunysb.edu
the logic may be inside the application itself. no standard way to access the internal transaction ID of
a database. E-mail: {alexey,chiueh}@cs.sunysb.edu