2. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
89
That is, higher the value for Rscore means the record is redundant and it can be filtered out. Then the metadata
related to the filtered data is taken to get the evidence regarding the intrusion attempt. If any fraudulent is identified, then
a notification is given to the database owner to alert him about the intrusion. The database owner recovers details
regarding the attempt using metadata and does the reverse action to cancel its effect.
2. RELATED WORK
Database forensics is an important area in digital forensic analysis. Analyzing the large amount of data [3]
is a
tiresome job and is a major challenge in almost all the fields such as engineering, medicine, attack detection etc. Many
data reduction methods are already available now, but are complex in nature. Besides handling the bulk amount of data,
there are also many other challenges in the area of database forensics. The fact that the technologies used in digital
forensics cannot be copied while conducting database forensics is one among them. The next challenge is that how we
know whether a modification is occurred in a database or not [4]
, and if occurred how we overcome it, from where we
need to start the investigation etc. Another one is the different file format used in the database. The next is that, most of
the tools used in database forensics are DBMS dependent. Another one is the anti-forensic attack which disrupts the
forensic investigation process [5]
. While we carry out the forensic investigation, the attackers or intruders will perform
attack against the forensic investigation, or disturb the investigation process. This is known as anti-forensic attack. Trail
Obfuscation, artifact wiping are examples of anti-forensic attack. Using metadata we can detect these anti-forensic
attacks also.
Overall, we can divide the database forensics process [4]
into three stages as, data acquisition and preservation,
collection and analysis of artifacts and database forensic investigation.
There are different methods that can be used in each of these phases. For the first phase, we have three methods,
Dead data acquisition, Live data acquisition and Hybrid data acquisition. In dead data acquisition [6]
, the system is turned
off and its hardware is removed. Then it is attached to a forensic tool to make a copy of it.
The disadvantage is that, it can’t deal with encrypted data. But in the case of live data acquisition [6]
, there is no
need to go offline. The data is retrieved from the RAM when the system is on and it can defeat the hardware as well as
the software encryption. The problem faced in this method is the data modification during the time of acquisition. In
hybrid [6]
, it has the advantage of both the dead and live data acquisition and can deal with different data format like
ASCII, binary, etc.
In the second phase, the artifacts for analysis can be collected from different logs such as webserver logs,
transaction logs, trace files etc. and the final database forensic investigation process can be done using DBMS dependent
tools like Oracle Log miner in Oracle, SQL Trace in SQL etc. and the Olivier’s method which divide the DBMS in to
four different layers and finally Fowler’s method which analyze the volatile as well as non-volatile artifacts of the
database.
In the case of data reduction, most of the methods are based on false alert reduction mechanism. The methods
available are the K Nearest Neighbors Classifier [7]
method put forward by Law and Kwok, Naïve Bayesian’s method
based on Statistical theory, in which it assumes that each attribute is independent of each other. Another one is the ALAC
[8]
for false alert reduction by classifying the alerts in to true and false put forward by Pietraszak. Next, based on the idea
of data clustering in which the available data are classified in to different classes and their similarity or dissimilarity
measure is calculated based on their attribute value.
So compared to these methods, the reduction method presented in this paper based on relevant key attribute is
much simpler and also the metadata provides explanatory information regarding an action performed as compared to
other database forensic mechanisms. And also since we are using the metadata, this method is independent of the
database.
3. AN ALGORITHM FOR THE SYSTEM
The overall process of the system can be described as below (fig 1)
• The users of the system perform some action and submit or commit it.
• Filter the log evidence to avoid redundant data.
• Using the metadata of the filtered data, check for the intrusion attempt.
• If the intrusion happens ,inform the database owner to cancel its effect
The system works as follows, A normal user will authenticate into the system and perform some actions. And
when he submits or commits the action performed, a query is generated and passed to the database, which may contain
an intrusion attempt. So, after performing a submit operation or in definite interval of time, the logs are automatically
filtered based on an Rscore algorithm [1]
to remove the redundant data. The log contains routine data as well as intrusion
data, from which we need to remove the routine information. Routine data will be more in the overall available data and
they occur frequently whereas the intrusion data will be less and occur infrequently.
3. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
90
After this, we retrieve the metadata of these filtered data and perform a pattern matching with the known set of
attack library. If a match is occurred, a notification is given to the database owner regarding the attempt and the database
owner will does the reverse action to cancel the effect of the action performed by the intruder.
Fig 1: Algorithm for the system
4. FRAMEWORK OF THE SYSTEM
The framework of the system mainly consist of three stages (fig 2)
• Data Reduction
• Analysis based on metadata
• Intrusion Notification
4. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
91
4.1. Data Reduction Mechanism
There are multiple logs in a database system. These logs are to be analyzed [9]
to carry out the forensic activity.
In order to reduce the data to be analyzed in the forensics analysis, first we need filter the unwanted contents. For that we
should understand about the difference between the intrusion as well as routine data. Both of them are different in which
the routine data is the one that occur repeatedly and in large quantity whereas the data occur infrequently. So we need to
remove this redundant routine data.
For that, in a transaction set, key attribute of each transaction set is taken and their support value [1] is
calculated. The Frequent Itemset Redundant Factor (FIRF) is calculated based on this support value and then using this
redundant score is calculated which indicates how redundant the data is. If the FIRF value is high then that particular data
is redundant i.e., it doesn’t show any intrusion behavior so it is filtered out. This is done based on an Rscore algorithm
which can be summarized as follows: We have a database D having n transactions and a set RF consisting of some
redundant features.
1. For each item set X in each transaction t , do the following,
i. If RF contains the item set X
ii. totalsupport=totalsupport+ support (X)
2. Calculate FIRF as the average of the totalsupport
3. Calculate Redundant Score as the sum of totalsupport and FIRF
4. Add the redundant score in to the Redundant List
Now after sorting the Redundant List in descending order, a threshold is set. Then, based on it the redundant
data is filtered out.
Fig 2: Framework of the system
5. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
92
4.2 Using Metadata
In a database, each and every action performed on it are recorded in different logs like audit log, cache, trace
files, webserver logs etc. (fig 3) It also contains metadata [10]
that describes the data residing in these logs. So the term
metadata refers to the “data about the data”.
By analyzing the metadata we get information like that who performed a particular action, when it occurs, how
is it done [11]
etc. So when any abnormal behavior is identified this metadata can be used as evidence. Using metadata we
can replay the events that had occurred. The metadata is retrieved from the log files [12]
using the corresponding utility
program of the DBMS used and is represented as an XML file, which is then used for pattern matching.
Fig 3: Log Metadata file
5. CONCLUSION AND FUTURE WORK
In this paper, we present a concept for the prevention of database intrusion through interaction. In this, the log
data is first filtered using a reduction algorithm to avoid the redundant records and then using the metadata of the filtered
data, the intrusion is detected. After that a notification is given to the database owner to cancel the effect of the attempt.
And also the concept presented here is independent of the DBMS.
Here, the intrusion is detected by performing a comparison with the inference rules regarding some known
attacks. Since the threats are evolving in rapid manner, the existing inference rule won’t be sufficient. So a mechanism to
overcome this can be considered as a future work.
REFERENCES
[1] Z. He, X., Xu J.Z. Huang, et al, “FP-Outlier: Frequent Pattern Based Outlier Detection”, Computer Science and
Information System, 2005, 2(1), pp. 103-118.
[2] Jian Zhang, Xiao Fu*, Xiaojiang Du, Bin Luo, Zhihing Zhao, “A Method to Automatically Filter Log Evidence
for Intrusion Forensics”, 2013 IEEE 33rd
International Conference on Distributed Computing Systems
Workshops, pp. 39-44.
[3] Ali Reza Arastch, Mourad Debbabi, Assaad Sakha, Mohamed Saleh, “Analyzing Multiple Logs for Forensic
Evidence” Science Direct Digital investigation4s(2000) s82- s91.
[4] O.M Fasan and M.S. Olivier, “On Dimensions of Reconstruction in Database Forensics” Seventh International
workshop on Digital Forensics & Incident Analysis (WDFIA) 2012.
[5] Slim Rekhis and Noureddine Boudriga, “A System for Formal Digital Forensic Investigation Aware of Anti-
forensic Attacks” IEEE transactions on Information Forensics and Security, vol. 7. No.2 April 2012.
[6] Seema Yadav, “Analysis of Digital Forensics and Investigation”, VSRD-IJCSIT, Vol.1 (3), 2011, 171-178.
[7] Pietraszek, T:Using Adaptive Alert Classification to Reduce False Positives in Intrusion Detection., In:Jonsson,
E, Valdes, A., Almgren, M. RAID 2004. LNCS, vol. 3325, pp 102-124, Spronger, Heidelberg (2004).
DATABASE MANAGEMENT SYSTEM
Trace
Files
Server
Logs
Cache
Files
Binary
Files
Log Metadata
File
Parse
6. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
93
[8] Pietraszek, T:Using Adaptive Alert Classification to Reduce False Positives in Intrusion Detection., In:Jonsson,
E, Valdes, A., Almgren, M. RAID 2004. LNCS, vol. 3325, pp 102-124, Spronger, Heidelberg (2004).
[9] Florian Buchholz, Eugene Spafford, “On the Role of File System Metadata in Digital Forensics” Digital
Investigation (2004) 1, 298e309 Elsevier.com.
[10] Harmeet Khanuja,mShraddha S. Suratkar, “Role of Metadata in Forensic Analysis of Database Attacks”, 2014
IEEE International Advance Computing Conference, pp. 457-462.
[11] Nitin Agarwal, William J. Bolosky, John R. Douceur, and Jacob R. Lorch, “A Five-year Study of File-system
Metadata” ACM Trans Storage 3(3):9:1-93:32-2007.
[12] Martin S. Olivier, “On Metadata Context in Database Forensics” Science Direct Digital investigation 5(2009)
115-123.
[13] Dr. Narayan A. Joshi and Dr. D. B. Choksi, “Implementation of Process Forensic for System Calls”,
International Journal of Advanced Research in Engineering & Technology (IJARET), Volume 5, Issue 6, 2014,
pp. 77 - 82, ISSN Print: 0976-6480, ISSN Online: 0976-6499.