Privacy and integrity-preserving range queries in sensor networks
P2 Project
1. BY: MOHAMMED ATHEEQ SHARIEFF
HARSHA VAIDYANATH
AMITH B.K
UNDER GUIDANCE OF: Mr.RAJESH
A project on
Privacy-Preserving Detection
of Sensitive Data Exposure
1
2. Abstract
The exposure of sensitive data in storage and
transmission poses a serious threat to organizational
and personal security.
Data leak detection aims at scanning content for
exposed sensitive data.
2
3. In this project the system propose a data- leake
detection (DLD).
It can be outsourced and be deployed in a semi-honest
detection environment.
This approach works well especially in the case where
consecutive data blocks are leaked
3
4. INTRODUCTION
Current applications tend to use personal sensitive
information to achieve better quality with respect to their
services. Since the third parties are not trusted the data must
be protected such that individual data privacy is not
compromised but at the same time operations on it would be
compatible. 4
5. The system implement, and evaluate a new privacy-
preserving data-leak detection system that enables the
data owner to safely deploy locally, or to delegate the
traffic-inspection task to DLD providers without
exposing the sensitive data.
5
6. In our model, the data owner computes a special
set of digests or fingerprints from the sensitive data,
and then discloses only a small amount of digest
information to the DLD provider.
6
7. Existing system
In existing system, the system used MD5 algorithms.
The MD5 message-digest algorithm is a widely used
cryptographic hash function producing a 128-bit (16-byte) hash
value, typically expressed in text format as a 32 digit
hexadecimal number.
MD5 has been utilized in a wide variety of cryptographic
applications, and is also commonly used to verify data integrity. 7
8. Disadvantages
The customer or data owner does not need to fully
trust the DLD provider using our approach.
Keywords usually do not cover enough sensitive data
segments for data-leak detection.
It does not aim to provide an remote service.
8
9. Proposed system
The system propose a privacy-preserving data-leak
detection model for preventing inadvertent data leak
in network traffic.
The DLD provider may learn sensitive information
from the traffic, which is inevitable for all deep
packet inspection approaches. 9
10. The proposed system uses (Secure Hash algorithm
(SHA) to generate short and hard-to-reverse digests
through the fast polynomial modulus operation.
10
11. Advantages
This strong privacy guarantee yields a powerful
application of fuzzy fingerprint method in the cloud
computing environment.
It provides high accuracy performance
It has very low false positive rate.
The privacy guarantee of this approach is much higher 11
19. Data Owner
The system enables the data owner to securely
delegate the content-inspection task to DLD providers
without exposing the sensitive data.
The data owner computes a special set of digests or
fingerprints from the sensitive data and then discloses
only a small amount of them to the DLD provider. 19
20. It is the data owner, who post-processes the potential
leaks sent back by the DLD provider and determines
whether there is any real data leak.
The sensitive data is sent by a legitimate user intended
for legitimate purposes. The data owner is aware of
legitimate data transfers and permits such transfers.
20
21. So the data owner can tell whether a piece of
sensitive data in the network traffic is a leak using
legitimate data transfer policies.
21
23. Fuzzy finger Print
To achieve the privacy goal, the data owner
generates a special type of digests.
The digests are called fuzzy fingerprints.
23
25. • DES works by encrypting groups of 64 message bits,
• Out of which 56 are key bits and remaining 8 are
check bits.
25
26. • 2.Secure Hash Algorithm
• Message digest is 160 bits, 20 bytes, 40 digit
hexadecimal format notation .
• It has 80 rounds.
• It produces a short and hard to reverse hash
key
26
27. • Algorithm structure :
• Step 1: Padding bits
• Step 2: Appending length as 64 bit unsigned
• Step 3: Buffer initiation
• Step 4: Processing of message
• Step 5: Output
• example, the SHA-256 hash code for “www.mytecbits.com ” is
• 575f62a15889fa8ca55514a10754d2f98e30c57c4538f0f3e39dc531
14533857.
27
28. It prevents the DLD provider from learning its exact
value.
The data owner transforms each fingerprints into a
fuzzy fingerprint.
All fuzzy fingerprints are collected and form the
output of this operation.
28
30. DLD
The DLD provider computes fingerprints from
network traffic and identifies potential leaks in
them.
To prevent the DLD provider from gathering
exact knowledge about the sensitive data,
30
31. the collection of potential leaks is composed of
real leaks and noises.
It is the data owner, who post-processes the
potential leaks sent back by the DLD provider and
determines whether there is any real data leak.
31
32. DLD
The DLD server detects the sensitive data within
each packet on basis of a stateless filtering
system.
DLD provider inspects the network traffic for
potential data leaks.
32
33. The inspection can be performed offline without
causing any real-time delay in routing the packets.
However, the DLD provider may attempt to gain
knowledge about the sensitive data.
33
35. Data receiver
This operation is run by the data receiver on
each piece of sensitive data.
The data reciever recieves the data and this
data is in encrypted format.
The data is decrypted and text is obtained.
35
41. Title Year Author Methodology Advantages Disadvantag
es
Data leak
detection as a
service
2012 Xiaokui
Shu
Danfeng
(Daphne)
Yao
The system propose a
network-based data-
leak detection (DLD)
technique, the main
feature of which is that
the detection
does not require the
data owner to reveal
the content of the
sensitive data. Instead,
only a small amount of
specialized digests are
needed
provide a
quantifiable
method to
measure the
privacy
guarantee
offered by
our
fuzzy
fingerprint
framework.
It is not
efficient
enough for
practical data
leak
inspection in
this setting.
41
42. Title Year Author Methodology Advantages Disadvantag
es
Quantifying
Information
Leaks in
Outbound
Web Traffic
2009 Kevin
Borders
Atul
Prakash
The system present an
approach for
quantifying
information leak
capacity in network
traffic. Instead of
trying to detect the
presence of sensitive
data—an impossible
task in the
general case—our goal
is to measure and
constrain its
maximum volume
it possible to
identify
smaller
leaks.
Traffic
measurement
does not
completely
stop
information
leaks from
slipping by
undetected
42
43. Title Year Author Methodology Advantages Disadvantag
es
Panorama:
Capturing
system-wide
information
flow for
malware
detection and
analysis
2007 H. Yin, D.
Song, M.
Egele, C.
Kruegel,
and E.
Kirda,
We propose a system,
Panorama, to
detect and analyze
malware by capturing
this fundamental
trait. In our extensive
experiments,
Panorama successfully
detected all the
malware samples and
had very few false
positives.
It does send
back
sensitive
information
to remote
servers in
certain
settings
detecting
malware and
analyzing
unknown
code samples
are
insufficient
and have
significant
shortcomings
.
43
44. Title Year Author Methodology Advantages Disadvantag
es
Protecting
confidential
data on
personal
computers
with storage
capsules
2009 K.
Borders,
E. V.
Weele, B.
Lau, and A
. Prakash
This paper introduces
Storages Capsules, a
new approach for
protecting confidential
files on a personal
computer. Storage
Capsules are
encrypted file
containers that allow a
compromised machine
to securely view and
edit sensitive files
without malware being
able to steal
confidential data
The system
achieves this
goal by
taking a
checkpoint of
the current
system state
and disabling
device output
before
allowing
access a
Storage
Capsule
It do not rely
on high
integrity.
44
45. Title Year Author Methodology Advantages Disadvantag
es
Preventing
accidental
data
disclosure in
modern
operating
systems
2013 A.
Nadkarni
and W.
Enck,
This paper presents
Aquifer as a policy
framework and system
for preventing
accidental information
disclosure in modern
operating systems. In
Aquifer, application
developers define
secrecy restrictions
that protect the entire
user interface
workflow defining the
user task
the lack of
application
separation
did not
expose it as a
concern.
It may not be
trusted with
that data.
45
46. Title Year Author Methodology Advantages Disadvantag
es
Revolver: An
automated
approach to
the detection
of evasive
web-based
malware,
2013 A.
Kapravelo
s, Y.
Shoshitais
hvili, M.
Cova, C.
Kruegel,
and G.
Vigna
In this paper, we
present Revolver, a
novel approach to
automatically detect
evasive behavior in
malicious JavaScript.
Revolver uses efficient
techniques to identify
similarities between a
large number of
JavaScript programs
(despite their use of
obfuscation
techniques, such as
packing,polymorphism
Revolver
has identified
several
techniques
that attackers
use to evade
existing
detection
tools by
continuously
running in
parallel with
a honeyclient.
This
approach was
defeated by
static
detection of
the malicious
code using
signatures.
46
47. Title Year Author Methodology Advantages Disadvantag
es
Gyrus: A
framework
for
user-intent
monitoring of
text-based
networked
applications,
2014 Y. Jang, S.
P. Chung,
B. D.
Payne, and
W. Lee
In this paper, we
propose a way to
break this cycle by
ensuring that a
system’s behavior
matches the user’s
intent. Since our
approach is attack
agnostic, it will scale
better than traditional
security systems
Gyrus is very
efficient and
introduces
no noticeable
delay to a
users’
interaction
with the
protected
applications
Gyrus solves
problem by
relying on the
semantics,
but not the
timing of user
generated
events
47
48. Title Year Author Methodology Advantages Disadvantag
es
Privacy-
preserving
scanning
of big content
for sensitive
data exposure
with
MapReduce
2015 F. Liu, X.
Shu, D.
Yao, and
A. R. Butt,
Our solution uses the
MapReduce-
framework for
detecting exposed
sensitive content,
because it has the
ability to arbitrarily
scale and utilize public
resources for the task,
such as Amazon EC2.
We design new
MapReduce
algorithms for
computing collection
intersection for data
This
transformatio
n supports
the secure
out-
sourcing of
the data leak
detection to
untrusted
MapReduce
and cloud
providers.
a significant
portion
of the
incidents are
caused by
unintentional
mistakes of
employees or
data owners
48
49. Title Year Author Methodology Advantages Disadvantag
es
Fuzzy
keyword
search over
encrypted
data in cloud
computing
2010 J. Li, Q.
Wang, C.
Wang, N.
Cao, K.
Ren, and
W. Lou,
In this paper, for
the first time we
formalize and solve
the problem of
effective fuzzy
keyword search over
encrypted cloud data
while maintaining
keyword privacy.
proposed
solution is
secure and
privacy-
preserving,
while
correctly
realizing the
goal of fuzzy
keyword
search.
unsuitable in
Cloud
Computing as
it greatly
affects
system
usability,
rendering
user
searching
experiences
very
frustrating
and system
efficacy very
50. Title Year Author Methodology Advantages Disadvantag
es
Towards
practical
avoidance of
information
leakage in
enterprise
networks
2011 J. Croft
and M.
Caesar,
In this paper, we
propose a network-
wide method
of confining and
controlling the flow of
sensitive data
within a network. Our
approach is based on
black-box differencing
– we run two logical
copies of the network,
one with private data
scrubbed, and compare
outputs of the two to
determine if and when
purpose
schemes that
leverage
black-box
differencing
to mitigate
leakage of
private data.
It may not be
able to
monitor
encrypted
traffic
without
encryption
keys or
information
flows that are
intentionally
obfuscated by
attackers.
50
51. Conclusion
Preventing sensitive data from being compromised is an
important and practical research problem.
The proposed system used (Secure Hash algorithm (SHA) to
generate short and hard-to-reverse digests through the fast
polynomial modulus operation.
51
52. Using special digests, the exposure of the sensitive
data is kept to a minimum during the detection.
52
53. References
[1] X. Shu and D. Yao, “Data leak detection as a service,”
in Proc. 8th Int. Conf. Secur. Privacy Commun. Netw.,
2012, pp. 222–240.
[2] K. Borders and A. Prakash, “Quantifying information
leaks in outbound web traffic,” in Proc. 30th IEEE Symp.
Secur. Privacy , May 2009, pp. 129–140. 53
54. References
[3] H. Yin, D. Song, M. Egele, C. Kruegel, and E.
Kirda, “Panorama: Capturing system-wide
information flow for malware detection and analysis,”
in Proc. 14th ACM Conf. Comput. Commun. Secur. ,
2007, pp. 116–127.
54