The document discusses several solutions for performing queries on encrypted data while preserving privacy. It begins by describing a basic solution where all data is encrypted and queries return precisely the encrypted records matching the query conditions. It then presents an enhanced solution using a pseudorandom permutation to hide which attributes are queried. Finally, a high performance solution is introduced using metadata to allow binary searching and improve efficiency, particularly for queries returning few records. The solutions aim to reveal only the minimum necessary information to process queries securely.
1. Zhiqian Yang -
Sheng Zong -
Rebecca N. Wright -
Rohit Ainapure
2. SQL Tables referenced during
presentation
• NURSERY [1]
• NURSERY has eight input attributes:
• parents, has_nurs, form, children, housing, finance, social, health
• The Authors have added ID, so that the table has a Primary Key.
These are the values the attributes can take:
• ID: 1,2,3, more
• parents: usual, pretentious, great_pret
• has_nurs: proper, less_proper, improper, critical, very_crit
• form: complete, completed, incomplete, foster
• children: 1, 2, 3, more
• housing: convenient, less_conv, critical
• finance: convenient, inconv
• social: non-prob, slightly_prob, problematic
• health: recommended, priority, not_recom
• TECHNICAL_PAPER AS paper
• ID, intro, key_aspects, tna_model, table_structure, encryption_method
3. SELECT key_aspects FROM paper
• Data Confidentiality
• Encryption
• Performance of Database (Queries)
• Trust and Attack Model
• Basic Solution
• Enhanced Solution
• High Performance Solution
• Benchmark (Minimal Information Revelation)
• Performance of the system
4. CREATE reVIEW intro AS
SELECT introduction FROM paper
• Security breaches in 2005 – 06 brought to light that the existing
techniques could not ensure that a database is fully immune to
intrusion & unauthorized access. [2]
• Encryption for protection of sensitive data in databases. [3]
• Simpler and obvious solution : – if database server can be fully
trusted, send the encryption key together with the query.
• Impractical Solution : – retrieve all encrypted tables from the
database, decrypt the tables, and then perform queries on the
cleartext.
• There were many others[4][5][6][7][8] – but this one was the
closest : - research titled “Search on encrypted data” [9] – finding
keywords in existing files.
5. From the key_aspects Query
• Basic Solution
• All data is stored and processed in the “SR/sX7uzuUk=“ed format.
• It returns to the user precisely the encryptions in the database that
match the query - R(Q) {set of coordinates of the cells satisfying
the condition of the query Q.
• Trade off privacy for efficiency.
• Enhanced Solution
• For a broad class of tables this solution reveals nothing but the
Minimum Information Revelation.
• Performance Efficient Solution
• Solution adds metadata to further speed up queries
7. more ….
The model assumes that an intruder who has complete access to the
database server for some time should learn very little about the data
stored in the database.
The model is based on the following:
•Database server is vulnerable to intrusion.
•A reasonable assumption for an Intruder – once he has access to the
database, he can not only observe the encrypted data but also control
the whole database system – for a time interval.
•Assume the communication channel between the user and the
database is secure (using the likes of SSL & IPsec).
•User’s front end can be trusted.
•All the data and metadata including user logs and scheme metadata
are to be stored in an encrypted format.
8. SELECT table_structure,encryption_method
FROM paper
• Ti,j - Cell at the intersection of ith row and jth column.
• Jth attribute – Aj has domain Dj
• T’ is the encrypted table where each T’i,j is an encryption
of Ti,j
• Each cell Ti,j of plaintext table is a bit string of exactly k1
bits -> ∀j ∈ [1,m] , Dj⊆ {0,1}^k1
• The Encryption algorithm appends a random string of k2
bits to the plaintext.
k0 = k1 + k2; => k0 is the i/p to the encryption algorithm
using symmetric key encryption, o/p of encryption is
also k0 bits long.
9. Privacy Preserving Queries
• Consider that an intruder may observe upto t Queries -
Q1,...,Qt, where t is a polynomially bounded function of
k0.
• quantify the information leaked by the protocol using a
random variable α. the protocol only reveals α beyond
the minimum information revelation if, after these
queries are processed, what the database intruder has
observed can be simulated by a probabilistic polynomial-
time algorithm using only α, R(Q), and the encrypted
table
• |Pr[Ak0(Q1,...,Qt,q1,...,qt,T′) = 1]−
Pr[Ak0(Q1,...,Qt,S(α,R(Q1),...,R(Qt),T′)) = 1]| < 1/p(k0).
10. SELECT * FROM paper WHERE
key_aspects = “BASIC SOLUTION”
Queries of the format
SELECT * FROM T WHERE Aj = v; v ⊆ Dj
•Encode each cell in a special redundant format for each cell Ti,j,
the encrypted cell T′i,j= (T′i,j<1>,T′i,j<2>) has two parts.
•The first part T′i,j<1>, is a simple encryption of Ti,j using a block
cipher E(); the second part, T′i,j<2>, is a “checksum” that together
with the first part enables the database to check whether this cell
satisfies the condition of the query or not.
•T′i,j<1> and T′i,j<2> satisfy a secret equation determined by the
value of Ti,j.
•? What equation to use as a secret equation ?
11. ekFZ8JZMmdpMaAnLdNeyyM/pX/MKOvMa
aka
The Secret Equation
Ef(Ti,j)(T′i,j<1>) = T′i,j<2>
where f is a function. When the user has a query with condition Aj = v,
the user only needs to send f(v) to the database so that the database
can check, for each i, whether
E f(v) (T′i,j<1>) = T′i,j<2>
•v should not be easily derivable from f(v).
•Hence, define f(v) as an Encryption of v using a Block Cipher – E(.)
•Append a random string to Ti,j before applying E(.) to obtain T’i,j<1>.
(Why ?)
•To avoid having the same f(v) for different attributes, we append j to
f(v) before applying E(.).
12. CREATE TABLE nursery_encrypted
• The user first picks two secret keys s1,s2 from {0,1}^k0 independently
and uniformly. The user keeps s1,s2 secret. For each cell Ti,j, the user
picks ri,j from {0,1}^k2 uniformly at random and stores
• T′ (i,j) △ = (T′ (i,j)<1>,T′ (i,j)<2>)
= (Es1(Ti,j , ri,j), EEs2(Ti,j,j)(Es1(Ti,j,ri,j)))
The above equation is just a Mathematical representation of all that we
have seen so far.
• Denote by Aj the jth attribute of T. Suppose there is a query
select Aj1,..., Ajℓ from T where Aj0= v. To carry out this
query, the user computes q = Es2(v,j0) and sends j0, q, and (j1,...,jℓ) to the
database.
• For i = 1,...,n, the database tests whether T′i,j0<2> = Eq(T′i,j0<1>) holds. For
any i such that the above equation holds, the database returns T
′i,j1<1>,...,T′i,jℓ<1> to the user. The user decrypts each received cell using
secret key s1 and discards the k2-bit tail of the cleartext.
13. nursery_encrypted has been created;
COMMIT; operation took 25 seconds;
but how secure is this table ?
Claim:
The only information revealed beyond the Minimal Information Revelation is the
attributes tested in the “where” condition.
Proof:
Theorem 1: If the block cipher E is a pseudorandom permutation (with the
encryption key as the random seed), the basic protocol reveals only j1,0,...,jt,0
beyond the minimum information revelation.
(Left as an exercise for the enthusiastic minds)
Conclude:
Even if the intruder has access to the whole database, the intruder can learn nothing
about the encrypted data. By combining j1,0,··· ,jt,o with the minimum information
revelation, an intruder can derive some statistical information about the underlying
data or the queries.
14. Performance Evaluations
The Table
•Table now has 9 attributes
•12,960 records
•100,000 data cells
The Environmental setup
•NetBSD Operating System
•AMD Athlon 2GHz
•512M Memory
The Encryption Algorithm
•Blowfish Symmetric Encryption Algorithm
•K0 = 64 bit block size
16. SELECT * FROM nursery WHERE …
Query 1:
SELECT ∗ FROM nursery_encrypted WHERE Parent=usual;
(2 records returned)
Query 2:
SELECT ∗ FROM nursery_encrypted WHERE Class=recommend;
Query 3:
SELECT ∗ FROM nursery_encrypted WHERE Class=very recom;
(4320 records returned)
Query 4:
SELECT ∗ FROM nursery_encrypted WHERE Class=priority;
(3266 records returned)
17. SELECT * FROM paper WHERE key_aspects
= “ENHANCED SECURITY SOLUTION”
Drawback of the Basic Solution
•Reveals which attributes are tested in the “where” clause.
A straightforward solution to this is to randomly permute the attributes
in the encrypted table, in order to make it difficult for a database
intruder to determine which attributes are tested.
Such a random permutation provides an Enhanced Security Solution.
What permutation to choose ?
•Uniform Distribution
•Pseudorandom Permutation
18. more …
• If the permutation is chosen uniformly from all permutations of the
attributes, the user needs to “memorize” where each attribute is
after the permutation. When the number of attributes is large, this is
a heavy burden for the user.
• To eliminate this problem, we use a pseudorandom permutation,
which is by definition indistinguishable from a uniformly random
permutation. [10]
• We do not need to permute all the attributes in the encrypted table.
• For each (i,j), we can keep T′i,j<1> as defined in the basic solution; we
only need to permute the equations satisfied by T′i,j<1> and T′i,j<2>
because only these equations are tested when there is a query.
19. SELECT * FROM paper WHERE key_aspects
= “HIGH PERFORMANCE SOLUTION”
• In the two solutions presented, performing a query on the encrypted
table requires testing each row of the table which is very inefficient
in large-sized databases.
• We can significantly improve the efficiency if we are able to replace
the sequential search in the basic solution with a binary search.
However, our basic solution finds the appropriate rows by testing an
equation, while a binary search cannot be used to find the items that
satisfy an equation.
• To sidestep this difficulty, we add some metadata to eliminate the
need for testing “ekFZ8JZMmdpMaAnLdNeyyM/pX/MKOvMa”
which was our secret equation.
20. Example of
Metadata
• Specifically, for each cell in the column, add a tag
and a link.
• The tag is decided by the value of the cell; the link
points to the cell.
• Sort the metadata according to the order of the
tags.
• When there is a query on the attribute, the user
sends the appropriate tag to the database so that
the database can perform a binary search on the
tags.
21. Query Time: solution
with metadata v.s
basic solution
Clearly, the trend is that the solution with metadata
gains more in efficiency if there are fewer records in
the query result.
However, even for a query with a large number of
records in the result, the solution with metadata is
much faster than the basic solution.
22. COMMIT;
• Investigated privacy-preserving queries on encrypted data.
• Present privacy-preserving protocols for certain types of queries.
(limited support)
• Provided a rigorous, quantitative (and cryptographically strong)
security.
• Provided a high performance, secured solution.
23. SELECT references FROM paper
[1] C. Blake and C. Merz. UCI repository, 1998. http://archive.ics.uci.edu/ml/datasets/Nursery
[2] Eric Dash. Lost credit data improperly kept, company admits. New York Times, June 20 2005.
[3] G. I. Davida, D. L. Wells, and J. B. Kam. A database encryption system with subkeys. ACM
TODS, 6(2):312–328, 1981.
[4] L. Bouganim and P. Pucheral. Chip-secured data access: Confidential data on untrusted
servers. In VLDB, 2002.
[5] Hakan Hacigumus, Balakrishna R. Iyer, and Sharad Mehrotra. Efficient execution of
aggregation queries over encrypted relational databases. In DASFAA, 2004.
[6] J. He and J. Wang. Cryptography and relational database management systems. In Int.
Database Engineering and Application Symposium, 2001
[7] J. Karlsson. Using encryption for secure data storage in mobile database systems. Friedrich-
Schiller-Universitat Jena, 2002.
[8] Gultekin Ozsoyoglu, David Singer, and Sun Chung. Anti-tamper databases: Query-ing
encrypted databases. In Proc. of the 17th Annual IFIP WG 11.3 Working Conference on Database
and Applications Security, 2003.
24. more …
[9] Dawn Xiaodong Song, David Wagner, and Adrian Perrig. Practical techniques for
searches on encrypted data. In IEEE Symposium on Security and Privacy, 2000.
[10] O. Goldreich. Foundations of Cryptography, volume 1. Cambridge University
Press,2001.