SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Extracting Authoring Information Based on Keywords and
Semantic Search
Faisal Alkhateeb, Amal
Alzubi, Iyad Abu Doush
Computer Sciences
Department
Yarmouk University, Irbid,
Jordan
{alkhateebf,iyad.doush}@yu.edu.jo
Shadi Aljawarneh
Faculty of Science and
Information Technology
Al-Isra University, Amman,
Jordan
[email protected]
Eslam Al Maghayreh
Computer Sciences
Department
Yarmouk University, Irbid,
Jordan
[email protected]
ABSTRACT
Many people, in particular researchers, are interested in
searching and retrieving authoring information from online
authoring databases to be cited in their research projects.
In this paper, we propose a novel approach for retrieving
authoring information that combines keyword and semantic-
based approaches. In this approach, the user is interested
only in retrieving authoring information considering some
specified keywords and ignore how the internal semantic
search is being processed. Additionally, this approach ex-
ploits the semantics and relationships between different re-
sources for a better knowledge-based inference.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Search pro-
cess
Keywords
Semantic web, RDF, SPARQL, Authoring Information, Key-
word Search, Semantic Search
1. INTRODUCTION
The world wide web (or simply the web) has become the
first source of knowledge for all life domains. It can be seen
as an extensive information system that allows exchanging
the resources as well as documents. The semantic web is an
evolving extension of the web aiming at giving well defined
forms and semantics to the web resources (e.g., content of
an HTML web page) [4].
Due to the growth of the semantic web, semantic search
became an attracting approach. The term refers to meth-
ods of searching web documents beyond the syntactic level
of matching keywords. Exposing metadata is an essential
point for a semantic search approach associated with the
semantic web. The most important recent development is
Permission to make digital or hard copies of all or part of this
work for
personal or classroom use is granted without fee provided that
copies are
not made or distributed for profit or commercial advantage and
that copies
bear this notice and the full citation on the first page. To copy
otherwise, to
republish, to post on servers or to redistribute to lists, requires
prior specific
permission and/or a fee.
ISWSA’10, June 14–16, 2010, Amman, Jordan.
Copyright 2010 ACM 978-1-4503-0475-7/0 /2010 ...$10.00.
in the area of embedding metadata directly into web doc-
uments. RDF (Resource Description Framework) [15] is a
knowledge representation language dedicated to the annota-
tion of resources within the Semantic web. Currently, many
documents are annotated via RDF due to its simple data
model and its formal semantics. For example, it is embed-
ded in (X)HTML web pages using the RDFa language [1],
in SMIL documents [7] using RDF/XML [3], etc. SPARQL
[17] is a W3C recommendation language developed in order
to query RDF knowledge bases, e.g., retrieving nodes from
RDF graphs.
Another approach that is found in search engines is based
on using keywords. More precisely, both queries and docu-
ments are typically treated at a word or gram level (simi-
lar to Information Retrieval). The search engine is missing
a semantic-level understanding of the query and can only
understand the content of a document by picking out docu-
ments with the most commonly occurring keywords.
The objective of this paper is to provide a novel approach
for retrieving authoring information that combines keyword-
based and semantic-based approaches. In this approach,
the user is interested only in retrieving authoring informa-
tion considering some specified keywords and ignores how
the internal semantic search is being processed. In particu-
lar, the user is interested in searching authoring information
from online authoring information portals (such as DBLP1,
ACM2, IEEE3, etc). For instance, show me all documents
of the author ”faisal alkhateeb” or the author ”jerome eu-
zenat” with a title containing ”SPARQL”. In the proposed
approach, keywords are used for collecting authoring infor-
mation about the authors, which are then filtered with se-
mantic search (using RDF and SPARQL) based on the se-
mantic relations of the query.
The remainder of the paper is organized as follows: we
introduce the research background in Section 2. The com-
bined approach is presented in Section 3 as well as a testcase
illustrating the proposed approach. A review of related work
is discussed in Section 4. Discussion issues drawn from this
study are presented in Section 5.
2. RESEARCH BACKGROUND
This section provides an overview of the elements that
are necessary for presenting the proposed approach namely:
1
http://www.informatik.uni-trier.de/~ley/db/
2
http://portal.acm.org/portal.cfm
3
http://www.ieee.org/portal/site6
BibTeX, RDF, and SPARQL.
2.1 BibTeX
BibTeX4[16, 10] is a tool and a file format which are used
to describe and process lists of references, mostly in conjunc-
tion with LaTeX documents. BibTeX makes it easy to cite
sources in a consistent manner, by separating bibliographic
information from the presentation of this information. Bib-
TeX uses a style-independent text-based file format for lists
of bibliography items, such as articles, books, thesis. Each
bibliography entry contains some subset of standard data
entries: author, booktitle, number, organization, pages,
title, type, volume, year, institution, and others. Bib-
liography entries included in a .bib file are split by types.
The following types are understood by virtually all BibTeX
styles: article, book, booklet, conference, inproceed-
ings, phdthesis, etc.
Example 1. The following is an instance of a BibTeX
element:
@article{DBLP:AlkhateebBE09,
author = {Faisal Alkhateeb and Jean-Francois
Baget and Jerome Euzenat},
title = {Extending SPARQL with regular exp-
ression patterns (for querying RDF)
},
journal = {J. web Sem.},
volume = {7},
number = {2},
year = {2009},
pages = {57-73},
}
2.2 RDF
RDF is a language for describing resources. In its abstract
syntax, an RDF document is a set of triples of the form
<subject, predicate, object>.
Example 2. The assertion of the following RDF triples:
{ <ex:person1 foaf:name "Faisal Alkhateeb">,
<ex:document1 BibTeX:author ex:person1>,
<ex:document1 rdf:type BibTeX:inproceedings>,
<ex:document1 BibTeX:title "PSPARQL">,
<ex:person1 foaf:knows ex:person2>,
<ex:person2 foaf:name "Jerome Euzenat">,
<ex:document1 BibTeX:author ex:person2>,
}
means that there exists an inproceedings document, which
is coauthored by two persons named ”Faisal Alkhateeb” and
”Jerome Euzenat”, whose title is ”PSPARQL”.
An RDF document can be represented by a directed la-
beled graph, as shown in Figure 1, where the set of nodes is
the set of terms appearing as a subject or object in a triple
and the set of arcs is the set of predicates (i.e., if <s, p, o>
is a triple, then s
p
−→ o).
2.3 SPARQL
SPARQL is the query language developed by the W3C for
querying RDF graphs. A simple SPARQL query is expressed
using a form resembling the SQL SELECT query:
4
http://www.bibtex.org/
ex:person1 ex:person2
ex:document1
BibTeX:inproceedings
”Faisal Alkhateeb” ”Jerome Euzenat”
”PSPARQL”
foaf:knows
foaf:name foaf:name
BibTeX:author BibTeX:author
rdf:typeBibTeX:title
Figure 1: An RDF graph.
SELECT ~B FROM u WHERE P
where u is the URL of an RDF graph G to be queried, P is
a SPARQL graph pattern (i.e., a pattern constructed over
RDF graphs with variables) and ~B is a tuple of variables
appearing in P . Intuitively, an answer to a SPARQL query
is an instantiation of the variables of ~B by the terms of the
RDF graph G such that the substitution of the values to the
variables of P yields to a subset of the graph G.5
Example 3. Consider the RDF graph of Figure 1 repre-
senting some possible authoring information. For instance
the existence of the following triples {〈ex:document1, rdf:type,
BibTeX:inproceedings〉, 〈ex:document1, BibTeX:title,
"PSPARQL"〉}
asserts that there exists an inproceedings document whose ti-
tle is ”PSPARQL”.
The following SPARQL query modeling this information:
SELECT *
FROM <Figure1>
WHERE {
?document BibTeX:author ?author .
?document BibTeX:title "PSPARQL" .
?author foaf:name ?name .
}
could be used, when evaluated against the RDF graph of Fig-
ure 1, to return the following answers:
# ?document ?author ?name
1 ex:document1 ex:person1 ”F aisalAlkhateeb”
2 ex:document1 ex:person2 ”JeromeEuzenat”
In RDF there exists a set of reserved words (called RDF
Schema or simply RDFS [6]), designed to describe the re-
lationships between resources and properties, e.g., classA
subClassOf classB. It adds additional constraints to the re-
sources associated to the RDFS terms, and thus permitting
more consequences (reasoning).
Example 4. Using the RDF graph presented in Figure 1,
we can deduce the following triple <ex:document1 rdf:type
BibTeX:proceedings> from the following triples <ex:document1
rdf:type BibTeX:inproceedings> and <BibTeX:inproceedings
rdfs:subClassOf BibTeX:publications>. Hence, the following
SPARQL query :
SELECT *
FROM <Figure1>
WHERE {
?document rdf:type BibTeX:publications .
?document BibTeX:author ?author .
5When using RDFS semantics [6], this intuitive definition
is irrelevant and one could apply RDFS reasoning rules to
calculate answers over RDFS documents.
?document BibTeX:title "PSPARQL" .
?author foaf:name ?name .
}
returns the same set of answers described in Example 1 be-
cause a inproceedings is a subclass of publications.
SPARQL provides several result forms other than SE-
LECT that can be used for formating the query results. For
example, a CONSTRUCT query can be used for building
an RDF graph from the set of answers to the query. More
precisely, an RDF graph pattern (i.e., an RDF involving
variables) is specified in the CONSTRUCT clause that will
be constructed. For each answer to the query, the variable
values are substituted in the RDF graph pattern and the
merge of the resulting RDF graphs is computed.6 This fea-
ture can be viewed as rules over RDF permitting to build
new relations from the linked data.
Example 5. The following CONSTRUCT query:
CONSTRUCT {?author BibTeX:coauthorof ?document .}
FROM <Figure1>
WHERE {
?document BibTeX:author ?author .
?document BibTeX:title "PSPARQL" .
?author foaf:name ?name .
}
constructs the RDF graph (containing the coauthor relation)
by substituting for each located answer the values of the vari-
ables ?author and ?document to have the following graph (as
done for SPARQL, we encode the resulting graph in the Tur-
tle language7):
@prefix ex: <http://ex.org/> .
ex:person1 BibTeX:coauthorof ex:document1 .
ex:person2 BibTeX:coauthorof ex:document1 .
3. METHODOLOGY
The Extracting Authoring Information system which we
have implemented is used to achieve the following:
Given: - A user query in the form of textual keywords.
Find: - A set of BibTeX elements that are relevant to the
query.
The proposed methodology consists of the following ma-
jor phases: connecting to Google search engine, connecting
to DBLP page and extracting BibTeX elements, convert-
ing BibTeX to RDF and keywords to SPARQL query, and
then evaluate the SPARQL query against the RDF docu-
ment. The first two phases deal with extracting author in-
formation based on keyword search, while the third and the
fourth represent the semantic search. In the following, we
present the basic work flow of the system as well as its main
components.
3.1 System Work Flow
As shown in Figure 2, the system works as follows: the
user firstly enters the keywords to be searched such as key-
words from author name, title of the paper title, year of
publication, etc. Then, uses goolge search engine to cor-
rect misspelled entered keywords (in particular, names of
the authors) as well as finding the pages for the corrected
6A definition of RDF merge operation can be found at http:
//www.w3.org/TR/2001/WD-rdf-mt-20010925/#merging.
7
http://www.dajobe.org/2004/01/turtle/.
entered keywords (for instance, DBLP pages of the author).
After that, BibTeX elements will be extracted and these
BibTeX elements will be converted to RDF document. The
corrected keywords will be transformed to a SPARQL query
to be used for querying the RDF document corresponding
to the extracted BibTeX elements.
Figure 2: The Basic Flow of the System.
3.2 System Components
The following are the main components of the system:
• Google Search: after entering the keywords in the
corresponding positions, they will be passed to a com-
ponent that connects to Google engine. That is, the
magic URL ”http://www.google.com/search?hl=ar&q=”
+”searchParameters” of Google search engine will be
used to search for the specified keywords. To this end,
there could be two cases returned from this search ei-
ther:
– correct author name; or
– misspelled author name.
In the second, the new search path ”did you mean
structure” will be used to reconnect to the Google search
engine. This process is repeated until finding the corre-
sponding author page in the specified authoring database
(DBLP, ACM, IEEE, etc).
• BibTeX extractor: this component is responsible for
extracting the BibTeX elements and save them in a file
for later usage. it should be noticed that this compo-
nent contains several methods, each of them is specific
to a bibliography database. This is due to the fact
that each bibliography database has its own style to
include BibTeX elements in the authoring web pages.
Therefore, we suggest to include BibTeX elements in
web pages as an RDFa annotations8.
8
http://www.w3.org/TR/xhtml-rdfa-primer/
Figure 3: The user interface of the system as well as the found
results.
• BibTeX parser: BibTeX elements are then converted
to RDF documents using results from the BibTeX parser
that we have implemented in the system. Note that if
the RDFa is used to annotate BibTeX elements, then
there is no need for this parser. In this case, the on-
line RDF distiller9 could be used to extract RDF doc-
uments corresponding to the annotated BibTeX ele-
ments from web pages. In addition to the RDF triples
that correspond to the BibTeX entries, RDF triples
corresponding to RDFS relationships (such as <BibTeX:in-
proceedings rdfs:subClassOf BibTeX:proceedings> and
<BibTeX:booklet rdfs:subClassOf BibTeX:book>) are added
to the RDF document to allow reasoning more results.
• Keywords to SPARQL query: the entered key-
words are also used to build a SPARQL query auto-
matically. The query will be then used to filter the
results obtained in search based on keywords. More
precisely, when entering keywords, the user selects the
type of the data entry to be entered such as ”Title”,
”Author”, ”Publication”, ”Pages”, and so on. Note
that, the user can enter multiple authors. If the key-
word begins with underscore ” ”, this means that the
entered keyword is part of the BibTeX data entry. In
this case, the ”regex” function can be used in the Filter
constraint to build the SPARQL query. Otherwise, it
is considered to be an exact search for such keyword.
Moreover, the user can specify the relationship be-
tween the entered keywords (i.e., ”or” or ”and”). When
building the SPARQL query, these relationships corre-
9
http://www.w3.org/2007/08/pyRDFa/
spond to the ”UNION” and ”AND” SPARQL query
graph patterns.
• Query evaluator: this component is used to evaluate
the SPARQL query (i.e., the query obtained from the
entered keywords) against the RDF document (i.e., the
RDF document obtained from the file containing the
BibTeX elements) to find and construct the precise
results. Any query evaluator could be used at this
stage10, but we have used jena11.
It should be noticed that DBLP provides the capability of
searching by allowing users to pose keyword-based queries
over only its bibliography dataset. For instance, one can
pose the query ”alkhateeb|jerome euzenat” that searches for
documents matching the keyword ”alkhateeb” or ”jerome eu-
zenat”. The search process in DBLP offers good features
such as a search is triggered after each keystroke with in-
stant times if the network connections is not lame and case-
insensitive search [2]. However, a misspelled keyword such
as ”alkhateb” has no hits while ”alkhateeb” returns five doc-
uments. Additionally, the semantic relations are neither
fully preserved nor well defined. In particular, one can pose
the query ”alkhateeb|euzenat” providing 79 documents while
putting a space after the pipe ”alkhateeb| euzenat” provides
only 2 documents. The semantic reasoning is not provided
(see Example 4). We avoided these limitations in the pro-
posed methodology.
3.3 Test Case
10
http://esw.w3.org/topic/SparqlImplementations
11
http://jena.sourceforge.net/
Suppose that the user had entered ”faisal alkhateb” as an
author, ”jerome euzenat” as an another author, and ” sparql”
as a title in the interface shown in Figure 3 and selected
DBLP as a search database as well ”or” and ”and” connec-
tions between the authors and the title keywords. Then the
query equation will be as: ((Author or Author)) and Title)
= ((faisal alkhateeb or jerome euzenat) and sparql).
The search will be done in Google to check if the author
name exists in DBLP or not. In this testcase, the Google
engine corrects the misspelled author name ”faisal alkhateb”
and uses ”faisal alkhateeb” instead to connect to the DBLP
with the correct name. Then the BibTeX elements cor-
responding to the keywords ”faisal alkhateb”, ”jerome eu-
zenat”, and ”sparql” are extracted from DBLP:
@article{DBLP:AlkhateebBE09,
author = {Faisal Alkhateeb and Jean-Francois
Baget and Jerome Euzenat},
title = {Extending SPARQL with regular expre-
ssion patterns (for querying RDF)},
journal = {J. web Sem.},
volume = {7},
number = {2},
year = {2009},
pages = {57-73},}
...
The BibTeX elements will be then converted to an RDF
document such as the one in Example 2. Also, the cor-
rected entered keywords will be used to build the following
SPARQL query used to filter the results:
CONSTRUCT{?doc BibTeX:author "Faisal Alkhateeb"
?doc BibTeX:author "Jerome Euzenat"
...
}
FROM<RDF document corresponds to BibTeX>
WHERE{{{?doc BibTeX:author "Faisal Alkhateeb".
?doc BibTeX:title ?title.
?doc BibTeX:year ?year.
?doc BibTeX:pages ?pages. }
Union { ?doc BibTeX:author "Jerome Euzenat".
?doc BibTeX:title ?title.
?doc BibTeX:year ?year.
?doc BibTeX:pages ?pages. }}
{ ?doc BibTeX:title ?title }
FILTER(regex(?title, "^sparql"))}
}
Note that the keyword ” sparql” begins with underscore
” ” and so it is considered to be part of the title while other
keywords such ”‘faisal alkhateeb” do not and considered to
be the full author names. Note that the user can specify a
range for the publishing years. For instance, show me the
authoring information between ”2004” and ”2008”. In this
case, she/he must can enter ”2004-2008” in the year field,
which in tern converted to the following part of a SPARQL
query:
?document BibTeX:hasyear ?year .
FILTER ((?year >=2004) && (?year <= 2008))
4. RELATED WORK
The literature on combining the keyword search with the
semantic search is rich; in this section we provide a brief
overview of some relevant proposals.
Semantic web languages (i.e., RDF and OWL) can be used
for knowledge encoding and can be used by services, tools,
and applications [11]. The semantic web will not enable
only human to process web contents, but also machines will
be able to process web contents. This can help in creating
intelligent services, customized web, and have more powerful
search engines [9].
Traditional search engines use keywords as their search
basis. Semantic search applies semantic processing on key-
words for a better retrieval search. Hybrid search utilizes the
keyword search from regular search along with the ability to
use semantic search to query and reason using Meta data.
Using ontologies the search engines can find pages that have
different syntax, but similar semantics [9].
The hybrid search provided users with more capabilities
for searching and reasoning to get better results. According
to Bhagdev et al. [5] there are three types of queries that
are possible using hybrid search:
• Semantic search using the defined Meta data and the
relations between instances.
• Regular search using keywords.
• Search for keywords within specific contents.
Kiryakov et al. [14] proposed a system in which the user
can select between keyword based search or ontology based
search, but s/he cannot merge them to obtain search results
using the two approaches together.
Another work by Bhagdev et al. [5] introduced a search
method that combines ontology and keyword search based
methods. The research results shows that the use of hybrid
search gives a better performance over keyword search or
semantic search in real world cases.
Rocha et al. [18] combined ontology based information
retrieval with regular search in a semantic search technique.
They used spread activation algorithm to get activation value
of the relevance of search results with keywords. The links
in the ontology are given weights according to certain prop-
erties. The proposed method do not identify promptly the
unique concepts and relations.
In another work Gilardoni et al. [12] provided integration
of keyword based search with ontology search, but with no
capability for Boolean queries.
Hybrid search is implemented by some large companies in
the industry. Google Product Search12is a semantic search
service from Google which searches for products by linking
between different attributes in the knowledge base to re-
trieve a product. Sheth et al. [19] use keyword query to
apply multi-domain search by automatically classifying and
extracting information along with ontology and meta data
information.
Guha et al. [13] used a semantic search that uses an ap-
proach which combines traditional search and other data
from distributed sources to answer the user query in more
details. In the work of Davies et al. [8] QuizRDF is in-
troduced. A system that combines the traditional search
method with the ability to query and navigate RDF. The
system shortcoming when there is a chaining in the query.
5. DISCUSSION
We have presented in this paper, an approach for search-
ing and extracting authoring information. The approach is
based on keyword and semantic search approaches. In the
keyword search part, the entered keywords are used to col-
lect authoring information. In this part, the Google search
12
http://www.google.com/products
engine is used to correct the misspelled keywords, in particu-
lar the author’s name, which allows to give more results. ad-
ditionally, ad-hoc routines are used to extract bibliography
elements from online databases. So, we suggest to include
BibTeX elements in web pages as an RDFa annotations so
that standard methods can be exploited. In the semantic
part, the SPARQL query obtained from entered keywords is
queried against the metadata corresponding to the author-
ing information, which allows to give more precise results.
6. REFERENCES
[1] Adida, B., and Birbeck, M. RDFa primer - bridging
the human and data webs. Working draft, W3C, 2008.
http://www.w3.org/TR/xhtml-rdfa-primer/.
[2] Bast, H., Mortensen, C. W., and Weber, I.
Output-sensitive autocompletion search. Inf. Retr. 11,
4 (2008), 269–286.
[3] Beckett, D., and McBride, B. RDF/XML syntax
specification (revised). Recommendation, W3C, 2004.
http://www.w3.org/TR/rdf-syntax-grammar/.
[4] Berners-Lee, T., Hendler, J., and Lassila, O.
The semantic web, 2001.
http://www.sciam.com/article.cfm?articleID=
00048144-10D2-1C70-84A9809EC588EF21.
[5] Bhagdev, R., Chapman, S., Ciravegna, F.,
Lanfranchi, V., and Petrelli, D. Hybrid search:
Effectively combining keywords and semantic searches.
In ESWC (2008), pp. 554–568.
[6] Brickley, D., and Guha, R. RDF vocabulary
description language 1.0: RDF schema.
Recommendation, W3C, 2004.
http://www.w3.org/TR/rdf-schema/.
[7] Bulterman, D., Grassel, G., Jansen, J.,
Koivisto, A., Layäıda, N., Michel, T.,
Mullender, S., and Zucker, D. Synchronized
Multimedia Integration Language (SMIL 2.1).
Recommendation, W3C, 2005.
http://www.w3.org/TR/SMIL/.
[8] Davies, J., and Weeks, R. Quizrdf: Search
technology for the semantic web. In HICSS ’04:
Proceedings of the Proceedings of the 37th Annual
Hawaii International Conference on System Sciences
(HICSS’04) - Track 4 (Washington, DC, USA, 2004),
IEEE Computer Society, p. 40112.
[9] Decker, S., Melnik, S., van Harmelen, F.,
Fensel, D., Klein, M., Broekstra, J., Erdmann,
M., and Horrocks, I. The semantic web: the roles
of XML and RDF. 63–73.
[10] Fenn, J. Managing citations and your bibliography
with BibTeX. The PracTeX Journal 4 (2006).
http://www.tug.org/pracjourn/2006-4/fenn/.
[11] Finin, T., and Ding, L. Search Engines for Semantic
Web Knowledge. In Proceedings of XTech 2006:
Building Web 2.0 (May 2006).
[12] Gilardoni, L., Biasuzzi, C., Ferraro, M., Fonti,
R., and Slavazza, P. Lkms - a legal knowledge
management system exploiting semantic web
technologies. In International Semantic Web
Conference (2005), Y. Gil, E. Motta, V. R. Benjamins,
and M. A. Musen, Eds., vol. 3729 of Lecture Notes in
Computer Science, Springer, pp. 872–886.
[13] Guha, R., McCool, R., and Miller, E. Semantic
search. In WWW ’03: Proceedings of the 12th
international conference on World Wide Web (New
York, NY, USA, 2003), ACM, pp. 700–709.
[14] Kiryakov, A., Popov, B., Terziev, I., Manov, D.,
and Ognyanoff, D. Semantic annotation, indexing,
and retrieval. Web Semantics: Science, Services and
Agents on the World Wide Web 2, 1 (2004), 49 – 79.
[15] Manola, F., and Miller, E. RDF primer.
Recommendation, W3C, 2004.
http://www.w3.org/TR/rdf-primer/.
[16] Patashnik, O. Bibtexing, 1988.
http://ftp.ntua.gr/mirror/ctan/biblio/bibtex/
contrib/doc/btxdoc.pdf.
[17] Prud’hommeaux, E., and Seaborne, A. SPARQL
query language for RDF. Recommendation, W3C,
January 2008.
http://www.w3.org/TR/rdf-sparql-query/.
[18] Rocha, C., Schwabe, D., and Aragao, M. P. A
hybrid approach for searching in the semantic web. In
WWW ’04: Proceedings of the 13th international
conference on World Wide Web (New York, NY, USA,
2004), ACM, pp. 374–383.
[19] Sheth, A., Bertram, C., Avant, D., Hammond,
B., Kochut, K., and Warke, Y. Managing semantic
content for the web. IEEE Internet Computing 6, 4
(2002), 80–87.
contributed articles
m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c
at i o n s o f t h e a c m 121
d o i : 1 0 . 1 1 4 5 / 1 6 6 6 4 2 0 . 1 6 6 6 4 5 2
by fabio arduini and Vincenzo morabito
S i n c e t h e S e p t e m b e r 1 1 t h a t ta c k S on the
World
Trade Center,8 tsunami disaster, and hurricane
Katrina, there has been renewed interest in emergency
planning in both the private and public sectors. In
particular, as managers realize the size of potential
exposure to unmanaged risk, insuring “business
continuity” (BC) is becoming a key task within all
industrial and financial sectors (Figure 1).
Aside from terrorism and natural disasters, two
main reasons for developing the BC approach in the
finance sector have been identified as unique to it:
regulations and business specificities.
Regulatory norms are key factors for all financial
sectors in every country. Every organization is required
to comply with federal/national law in addition to
national and international governing bodies. Referring
to business decisions, more and more organizations
recognize that Business Continuity could be and
should be strategic for the good of the business. The
finance sector is, as a matter of fact, a sector in which
the development of information technology (IT) and
information systems (IS) have had a dramatic effect
upon competitiveness. In this sector, organizations
have become dependent upon tech-
nologies that they do not fully compre-
hend. In fact, banking industry IT and
IS are considered production not sup-
port technologies. As such, IT and IS
have supported massive changes in the
ways in which business is conducted
with consumers at the retail level. In-
novations in direct banking would have
been unthinkable without appropriate
IS. As a consequence business continu-
ity planning at banks is essential as the
industry develops in order to safeguard
consumers and to comply with interna-
tional regulatory norms. Furthermore,
in the banking industry, BC planning
is important and at the same time dif-
ferent from other industries, for three
other specific reasons as highlighted
by the Bank of Japan in 2003:
Maintaining the economic activity of ˲
residents in disaster areas2 by enabling
the continuation of financial services
during and after disasters, thereby sus-
taining business activities in the dam-
aged area;
Preventing widespread payment and ˲
settlement disorder2 or preventing sys-
temic risks, by bounding the inability
of financial institutions in a disaster
area to execute payment transactions;
Reduce managerial risks ˲ 2 for example,
by limiting the difficulties for banks
to take profit opportunities and lower
their customer reputation.
Business specificities, rather than
regulatory considerations, should be
the primary drivers of all processes.
Even if European (EU) and US markets
differ, BC is closing the gap. Progres-
sive EU market consolidation neces-
sitates common rules and is forcing
major institutions to share common
knowledge both on organizational and
technological issues.
The financial sector sees business
continuity not only as a technical or
risk management issue, but as a driver
towards any discussion on mergers
and acquisitions; the ability to manage
BC should also be considered a strate-
gic weapon to reduce the acquisition
timeframe and shorten the data center
business
continuity and
the banking
industry
122 c o m m u n i c at i o n s o f t h e a c m | m a r c h 2
0 1 0 | v o l . 5 3 | n o . 3
contributed articles
differences in preparing and imple-
menting strategies that enhance busi-
ness process security. Two approaches
seem to be prevalent. Firstly, there are
those disaster recovery (DR) strate-
gies that are internally and hardware-
focused9 and secondly, there are those
strategies that treat the issues of IT and
IS security within a wider internal-ex-
ternal, hardware-software framework.
The latter deals with IS as an integrat-
ing business function rather than as a
stand-alone operation. We have labeled
this second type of business continuity
approach (BCA).
As a consequence, we define BCA as
a framework of disciplines, processes,
and techniques aiming to provide
continuous operation for “essential
business functions” under all circum-
stances.
More specifically, business continu-
ity planning (BCP) can be defined as “a
collection of procedures and informa-
tion” that have been “developed, com-
piled and maintained” and are “ready
to use - in the event of an emergency
or disaster.”6 BCP has been addressed
by different contributions to the litera-
ture. Noteworthy studies include Julia
Allen’s contribution on Cert’s Octave
methoda1 the activities of the Business
Continuity Institute (BCI) in defining
certification standards and practice
guidelines, the EDS white paper on
Business Continuity Management4 and
merge, often considered one of the top
issues in quick wins and information
and communication technology (ICT)
budget savings.
business continuity concepts
The evolution of IT and IS have chal-
lenged the traditional ways of conduct-
ing business within the finance sector.
These changes have largely represented
improvements to business processes
and efficiency but are not without their
flaws, in as much as business disrup-
tion can occur due to IT and IS sources.
The greater complexity of new IT and IS
operating environments requires that
organizations continually reassess how
best they may keep abreast of changes
and exploit those for organizational ad-
vantage. In particular, this paper seeks
to investigate how companies in the fi-
nancial sector understand and manage
their business continuity problems.
BC has become one of the most im-
portant issues in the banking industry.
Furthermore, there still appears to be
some discrepancy as to the formal defi-
nitions of what precisely constitutes a
disaster and there are difficulties in as-
sessing the size of claims in the crises
and disaster areas.
One definition of what constitutes
a disaster is an incident that leads to
the formal invocation of contingency/
continuity plans or any incident which
leads to a loss of revenue; in other
words it is any accidental, natural or
malicious event which threatens or dis-
rupts normal operations or services, for
as long a time as to significantly cause
the failure of the enterprise. It follows
then that when referring to the size of
claims in the area of organizational cri-
ses and disasters, the degree to which
a company has been affected by such
interruptions is the defining factor.
The definition of these concepts is
important because 80% of those orga-
nizations which face a significant crisis
without either a contingency/recovery
or a business continuity plan, fail to
survive a further year (Business Con-
tinuity Institute estimate). Moreover,
the BCI believes that only a small num-
ber of organizations have disaster and
recovery plans and, of those, few have
been renewed to reflect the changing
nature of the organization.
In observing Italian banking indus-
try practices, there seems to be major
finally, referring to banking, Business
Continuity Planning at Financial Insti-
tutions by the Bank of Japan.2 This last
study illustrates the process and activi-
ties for successful business continuity
planning in three steps:
1. Formulating a framework for robust
project management, where banks
should:
a. develop basic policy and guidelines
for BC planning (basic policy);
b. Develop a study firm-wide aspects
(firm-wide control section);
c. Implement appropriate progress
control (project management pro-
cedures)
2. Identifying assumptions and condi-
tions for business continuity plan-
ning, where banks should:
a. Recognize and identify the poten-
tial threats, analyze the frequency
of potential threats and identify
the specific scenarios with mate-
rial risk (Disaster scenarios);
b. Focus on continuing prioritized
critical operations (Critical opera-
tions);
c. Target times for the resumption of
operations (Recovery time objec-
tives);
3. Introducing action plans, where
banks should:
a. Study specific measures for busi-
ness continuity planning (BC
measures);
b. acquire and maintain back-up
data (Robust back-up data);
c. Determine the managerial re-
sources and infrastructure avail-
ability capacity required (Procure-
ment of managerial resources);
figure 1. 2004 top business priorities in industrial and financial
sectors (source Gartner)
a The Operationally Critical Threat, Asset, and Vulnerability
Evaluation Method of CERT. CERT is a center of Internet
security expertise, located at the Software Engineering
Institute, a federally funded research and development
center operated by Carnegie Mellon University.
contributed articles
m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c
at i o n s o f t h e a c m 123
d. Determine strong time con-
straints, a contact list and a means
of communication on emergency
decisions (Decision-making pro-
cedures and communication ar-
rangements);
e. Realize practical operational pro-
cedures for each department and
level (Practical manual)
4. Implement a test/training program
on a regular basis (Testing and re-
viewing).
business continuity aspects
The business continuity approach has
three fundamental aspects that can be
viewed in a systemic way: technology,
people and process.
Firstly, technology refers to the re-
covery of mission-critical data and
applications contained in the disas-
ter recovery plan (DRP). It establishes
technical and organizational measures
in order to face events or incidents with
potentially huge impact that in a worst
case scenario could lead to the unavail-
ability of data centers. Its development
ought to ensure IT emergency proce-
dures intervene and protect the data in
question at company facilities. In the
past, this was, whenever it even existed,
the only part of the BCP.
Secondly, people refers to the recov-
ery of the employees and physical work-
space. In particular, BCP teams should
be drawn from a variety of company
departments including those from per-
sonnel, marketing and internal consul-
tants. Also the managers of these teams
should possess general skill and they
should be partially drawn from busi-
ness areas other than IT departments.
Nowadays this is perceived as essential
to real survival with more emphasis on
human assets and value rather than on
those hardware and software resources
that in most cases are probably protect-
ed by backup systems.
Finally, the term process here refers
to the development of a strategy for the
deployment, testing and maintenance
of the plan. All BCP should be regularly
updated and modified in order to take
into consideration the latest kinds of
threats, both physical as well as tech-
nological.
Whereas a simple DR approach aims
at salvaging those facilities that are sal-
vageable, a BCP approach should have
different foci. One of these ought to be
treating IT and IS security with a wider
internal-external, hardware-software
framework where all processes are nei-
ther in-house nor subcontracted-out
but are a mix of the two so as to be an
integrating business function rather
than a stand alone operation. From
this point of view the BCP constitutes
a dual approach where management
and technology function together.
In addition, the BCP as a global ap-
proach must also consider all existing
relationships, thus giving value to cli-
ents and suppliers considering the to-
tal value chain for business and to pro-
tect business both in-house and out.
The BCP proper incorporates the di-
saster recovery (DR) approach but rejects
its exclusive focus upon facilities. It de-
fines the process as essentially business-
wide and one which enables competitive
and/or organizational advantages.
it focus Versus business
focus as a starting Point
The starting point for planning pro-
cesses that an organization will use as
its BCP must include an assessment of
the likely impact different types of ‘in-
cidents’ will/would make on the busi-
ness. As far as financial companies are
concerned, IT focus is critical since, as
mentioned, new technologies continue
to become more and more integral to
on going financial activities. In addition
to assessing the likely impact upon the
entire organization, banks must con-
sider the likely effects upon their differ-
ent business areas. The “vulnerability
& business impact matrix” (Figure 2) is
a tool that can be used to summarize
the inter-linkages between the various
information system services, their vul-
nerability and the impact on business
activities. It is useful in different ways.
To start, the BC approach doesn’t fo-
cus solely upon IT problems but rather
uses a business-wide approach. Given
the strategic focus of BCP, an under-
standing of the relationships between
value-creating activities is a key deter-
minant of the effectiveness of any such
process. In this way we can define cor-
rect BC perimeter (Figure 2) by trying to
extract the maximum value from BCP
within a context of bounded rationality
and limited resources. What the BCP
teams in these organizations have done
is focus upon how resources were uti-
lized and how they were added to value-
creation rather than merely being “sup-
port activity” which consumes financial
resources unproductively. In addition,
the convergence of customer with client
technologies also demands that those
managing the BCP process are aware of
the need to “... expand the contingency
role to not merely looking inward but
actually looking out.” Such a dual focus
uncovers the linkages between customer
and client which create competitive ad-
vantage. Indeed, in cases where clients’
business fundamentally depends upon
information exchange, for instance
many banks today provide online equity
brokerage services, it might be argued
that there is a ‘virtual value chain’ which
the BCP team protects thereby provid-
ing the ‘market-space’ for value creation
to take place. Finally, another benefit is
that vulnerability and business impact
can aid the prioritization of particular
key areas.
figure 2. Vulnerability & business impact matrix
124 c o m m u n i c at i o n s o f t h e a c m | m a r c h 2
0 1 0 | v o l . 5 3 | n o . 3
contributed articles
player, yet their functions are just as
vital to achieving the overall objectives
of the football team. The value chain
provides an opportunity to examine
the connection between the exciting
and the hum drum links that deliver
customer value. The evolution of crisis
preparations from the IT focused di-
saster recovery (DR) solutions towards
the BC approach reflects a growing un-
derstanding that business continuity
depends upon the maintenance of all
elements which provide organizational
efficiency-effectiveness and customer
value, whether directly or indirectly.
Prevention focus of
business continuity
A final key characteristic of the BC ap-
proach concerns its primary role in
prevention. A number of authors have
identified that the potential for crises
is normal for organizations.7,11 Crisis
avoidance requires a strategic approach
and requires a good understanding of
both the organization’s operating pro-
cesses, systems and the environment
in which it operates.
In the BC approach, a practice orga-
nization should develop a BCP culture
to eliminate the barriers to the develop-
ment of crisis prevention strategies. In
particular, these organizations should
recognize that incidents, such as the
New York terrorist attach or the City of
London bombings are merely triggered
by external technical causes and that
their effects are largely determined by
internal factors that were within the
control of their organizations. In these
cases a cluster of crises should be iden-
new and obsolete technologies
Today’s approach to BCP is focused on
well-structured process management and
business-driven paradigms. Even if some
technology systems seem to be “business
as usual,” some considerations must be
made to avoid any misleading conjecture
from an analytical side.
When considering large institutions
with systemic impact- not only on their
own but on clients businesses as well-
two key objectives need to be consid-
ered when facing an event. These have
been named RPO (Recovery Point Ob-
jective) and RTO (Recovery Time Ob-
jective) as shown in Figure 3. RPO deals
with how far in the past you have to go
to resume a consistent situation; RTO
considers how long it takes to resume a
standard or regular situation. The defi-
nitions of RPO and RTO can change ac-
cording to data center organization and
how high a level a company wants to its
own security and continuity to be.
For instance a dual site recovery sys-
tem organization must consider and
evaluate three points of view (Figure
3). These are: application’s availability,
BC process and data perspective.
Data are first impacted (RTO) before
the crisis event (CE) due to the closest
“consistent point” from which to re-
start. The crisis opening (CO) or decla-
ration occurs after the crisis event (CE).
“RTO_s,” or computing environ-
ment restored point, considers the
length of time the computing environ-
ment needs in order to be restored (for
example, when servers, network etc.
are once again available); “RTO_rc,” or
mission critical application restarted
point, indicates the “critical or vital ap-
plications” (in rank order) are working
once again; “RTO_r,” or applications
and data restored point, is the point
from which all applications and data
are restored, but (and it is a big but)
“RTO_end,” or previous environment
restored point, is the true end point
when the previous environment is fully
restored (all BC solutions are properly
working). Of the utmost importance
is that during the period between
“RTO_r” and “RTO_end” a second di-
saster event could be fatal!
Natural risks are also increasing in
scope and frequency, both in terms of
floods (central Europe 2002) and hurri-
canes (U.S. 2005), thus the coining of an
actual geographical recovery distance,
today considered more than 500 miles.
Such distance is forcing businesses and
institutions alike to consider a new tech-
nological approach and to undertake
critical discussion on synchronous-asyn-
chronous data replication: their intervals
and quality. Therefore, more complex
analysis about RPO and RTO is required.
However the most important issue,
from a business point of view when
faced with an imminent and unfore-
seen disaster, is how to reduce restore
or restart time, trying to shrink this win-
dow to mere seconds or less. New push-
ing technologies (SATA – Serial ATA
and MAID – Massive Arrays Inexpen-
sive Disk) are beginning to make some
progress in reducing the time problem.
business focus Versus
Value chain focus
The business area selected by the “vul-
nerability and business impact analy-
sis matrix” should be treated in accor-
dance with the value chain and value
system. In addition to assessing the
likely disaster impact upon IT depart-
ments, organizations should consider
disaster impacts over all company de-
partments and their likely effects upon
customers. Organizations should avoid
the so-called Soccer Star Syndrome.6
In drawing an analogy with the football
industry, one recognizes that greater
management attention is often focused
on the playing field rather than the un-
glamorous, but very necessary, locker
room and stadium management sup-
port activities. Defenders and goalkeep-
ers, let alone the stadium manager, do
not get paid at the same level as the star
figure 3. rPo & rto
contributed articles
m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c
at i o n s o f t h e a c m 125
tified. Such clusters should be catego-
rized along the axis of internal-external
and human/social-technical/economic
causes and effects. By adopting a strate-
gic approach, decisions could be made
about the extent of exposure in particu-
lar product markets or geographical
sites. An ongoing change management
program could contribute to real com-
mitment from middle managers who,
from our first investigation, emerged
as key determinants of the success of
the BC approach.
management support
and sponsorship
BCP success requires the commitment
of middle managers. Hence manag-
ers need to avoid considering BCP as
a costly, administrative inconvenience
that diverts time away from money-
making activities. All organizational
levels should be aware of the fact that
BCP was developed in partnership be-
tween the BCP team and front line op-
eratives. As a result, strategic business
units should own BCP plans. In addi-
tion, CEO involvement is key in rallying
support for the BCP process.
Two other key elements support
the BC approach. Firstly, there is the
recognition that responsibility for the
process rests with business managers
and this is reinforced through a formal
appraisal and other reward systems.
Secondly, peer pressure is deemed im-
portant in getting laggards to assume
responsibility and so affect a more re-
ceptive culture.
Finally, BCP teams need to regard
BCP as a process rather than as a spe-
cific end-point.
conclusion
Although the risk of terrorism and
regulations are identified as two key
factors for developing a business con-
tinuity perspective, we see that orga-
nizations need to adopt the BC ap-
proach for strategic reasons. The trend
to adopt a BC approach is also a proxy
for organizational change in terms of
culture, structure and communica-
tions. The BC approach is increasingly
viewed as a driver to generate competi-
tive advantage in the form of resilient
information systems and as an impor-
tant marketing characteristic to attract
and maintain customers.
Referring to organizational change
and culture, the BC approach should
be a business-wide approach and not
an IT-focused one. It needs supportive
measures to be introduced to encour-
age managers to adhere to the BC idea.
Management as a whole should also be
confident that the BC approach is an
ongoing process and not only an end
point that remains static upon comple-
tion. It requires changes of key assump-
tions and values within the organiza-
tional structure and culture that lead to
a real cultural and organizational shift.
This has implications for the role that
the BC approach has to play within the
strategic management processes of the
organization as well as within the levels
of strategic risk that an organization
may wish to undertake in its efforts to
secure a sustainable competitive or so
called first mover advantage.
References
1. Allen J.H. CERT® Guide to System and Network
Security Practices. Addison Wesley Professional, 2001.
2. Bank of Japan, Business Continuity Planning at
Financial Institutions, July 2003. http://www.boj.or.jp/
en/type/release/zuiji/kako03/fsk0307a.htm
3. Cerullo V. and Cerullo, J. Business continuity planning:
A comprehensive approach. Informtion System
Management Journal, Summer 2004.
4. Decker A. Business continuity management: A model
for survival. EDS White Paper, 2004.
5. Dhillon, G. The challenge of managing information
security. In International Journal of Information
Management 1, 1(2004), 243–244.
6. Elliott D. and Swartz E. Just waiting for the next big
bang: Business continuity planning in the uk finance
sector. Journal of Applied Management Studies 8, 1
(1999), 45-60.
7. Greiner, L. Evolution and revolution as organisations
grow. In Harvard Business Review (July/August)
reprinted in Asch, D. & Bowman, C. (Eds) (1989)
Readings in Strategic Management (London,
Macmillan), 373-387.
8. Lam, W. Ensuring business continuity. IT Professional
4, 3 (2002), 19 - 25
9. Lewis, W. and Watson, R.T. Pickren A. An empirical
assessment of IT disaster risk. Comm. ACM 46, 9
(2003), 201-206.
10. McAdams, A.C. Security and risk management:
A fundamental business issue. Information
Management Journal 38, 4 (2004), 36–44.
11. Pauchant, T.C. and Mitroff, I. Crisis prone versus crisis
avoiding organisations: is your company’s culture its
own worst enemy in creating crises?. Industrial Crisis
Quarterly 2, 4 (1998), 53-63.
12. Quirchmayr, G. Survivability and business continuity
management. In Proceedings of the 2nd Workshop on
Australasian Information Security, Data Mining and
Web Intelligence, and Software Internationalisation.
ACSW Frontiers (2004).
Vincenzo Morabito ([email protected])
is assistant professor of Organization and Information
System at the Bocconi University in Milan where he
teaches management information system, information
management and organization. He is also Director of the
Master of Management Information System System at
the Bocconi University.
Fabio Arduini ([email protected]) is
responsible for IT architecture and Business Continuity
for defining the technological and business continuity
statements for the Group according to the ICT
department.
© 2010 ACM 0001-0782/10/0300 $10.00
The Anti-Forensics Challenge
Kamal Dahbur
[email protected]
Bassil Mohammad
[email protected]
School of Engineering and Computing Sciences
New York Institute of Technology
Amman, Jordan
ABSTRACT
Computer and Network Forensics has emerged as a new field in
IT that is aimed at acquiring and analyzing digital evidence for
the purpose of solving cases that involve the use, or more
accurately misuse, of computer systems. Many scientific
techniques, procedures, and technological tools have been
evolved and effectively applied in this field. On the opposite
side, Anti-Forensics has recently surfaced as a field that aims at
circumventing the efforts and objectives of the field of
computer
and network forensics. The purpose of this paper is to highlight
the challenges introduced by Anti-Forensics, explore the
various
Anti-Forensics mechanisms, tools and techniques, provide a
coherent classification for them, and discuss thoroughly their
effectiveness. Moreover, this paper will highlight the challenges
seen in implementing effective countermeasures against these
techniques. Finally, a set of recommendations are presented
with
further seen research opportunities.
Categories and Subject Descriptors
K.6.1 [Management of Computing and Information
Systems]: Projects and People Management – System Analysis
and Design, System Development.
General Terms
Management, Security, Standardization.
Keywords
Computer Forensics (CF), Computer Anti-Forensics (CAF),
Digital Evidence, Data Hiding.
1. INTRODUCTION
The use of technology is increasingly spreading
covering various aspects of our daily lives. An equal increase, if
not even more, is realized in the methods and techniques created
with the intention to misuse the technologies serving varying
objectives being political, personal or anything else. This has
clearly been reflected in our terminology as well, where new
terms like cyber warfare, cyber security, and cyber crime,
amongst others, were introduced. It is also noticeable that such
attacks are getting increasingly more sophisticated, and are
utilizing novel methodologies and techniques. Fortunately,
these
attacks leave traces on the victim systems that, if successfully
recovered and analyzed, might help identify the offenders and
consequently resolve the case(s) justly and in accordance with
applicable laws. For this purpose, new areas of research
emerged
addressing Network Forensics and Computer Forensics in order
to define the foundation, practices and acceptable frameworks
for scientifically acquiring and analyzing digital evidence in to
be presented in support of filed cases. In response to Forensics
efforts, Anti-Forensics tools and techniques were created with
the main objective of frustrating forensics efforts, and taunting
its credibility and reliability.
This paper attempts to provide a clear definition for Computer
Anti-Forensics and consolidates various aspects of the topic. It
also presents a clear listing of seen challenges and possible
countermeasures that can be used. The lack of clear and
comprehensive classification for existing techniques and
technologies is highlighted and a consolidation of all current
classifications is presented.
Please note that the scope of this paper is limited to Computer-
Forensics. Even though it is a related field, Network-Forensics
is
not discussed in this paper and can be tackled in future work.
Also, this paper is not intended to cover specific Anti-Forensics
tools; however, several tools were mentioned to clarify the
concepts.
After this brief introduction, the remainder of this paper is
organized as follows: section 2 provides a description of the
problem space, introduces computer forensics and computer
anti-forensics, and provides an overview of the current issues
concerning this field; section 3 provides an overview of related
work with emphasis on Anti-Forensics goals and classifications;
section 4 provides detailed discussion of Anti-Forensics
challenges and recommendations; section 5 provides our
conclusion, and suggested future work.
2. THE PROBLEM SPACE
Rapid changes and advances in technology are impacting every
aspect of our lives because of our increased dependence on such
systems to perform many of our daily tasks. The achievements
in the area of computers technology in terms of increased
capabilities of machines, high speeds communication channels,
and reduced costs resulted in making it attainable by the public.
The popularity of the Internet, and consequently the technology
associated with it, has skyrocketed in the last decade (see Table
1 and Figure 1). Internet usage statistics for 2010 clearly show
the huge increase in Internet users who may not necessary be
computer experts or even technology savvy [1].
Permission to make digital or hard copies of all or part of this
work for
personal or classroom use is granted without fee provided that
copies are
not made or distributed for profit or commercial advantage and
that
copies bear this notice and the full citation on the first page. To
copy
otherwise, or republish, to post on servers or to redistribute to
lists,
requires prior specific permission and/or a fee.
ISWSA’11, April 18–20, 2011, Amman, Jordan.
Copyright 2011 ACM 978-1-4503-0474-0/04/2011…$10.00.
WORLD INTERNET USAGE AND POPULATION
STATISTICS
World Regions
Population
(2010 Est.)
Internet Users
Dec. 31, 2000
Internet Users
Latest Data
Growth
2000-2010
Africa 1,013,779,050 4,514,400 110,931,700 2357%
Asia 3,834,792,852 114,304,000 825,094,396 622%
Europe 813,319,511 105,096,093 475,069,448 352%
Middle East 212,336,924 3,284,800 63,240,946 1825%
North America 344,124,450 108,096,800 266,224,500 146%
Latin America/
Caribbean
592,556,972 18,068,919 204,689,836 1033%
Oceania/Australia 34,700,201 7,620,480 21,263,990 179%
WORLD TOTAL 6,845,609,960 360,985,492 1,966,514,816
445%
Table 1. World Internet Usage – 2010 (Reproduced from [1]).
Figure 1. World Internet Usage–2010 (Based on Data from [1])
Unfortunately, some of the technology users will not use it in a
legitimate manner; instead, some users may deliberately misuse
it. Such misuse can result in many harmful consequences
including, but not limited to, major damage to others systems or
prevention of service for legitimate users. Regardless of the
objectives that such “bad guys” might be aiming for from such
misuse (e.g. personal, financial, political or religious purposes),
one common goal for such users is the need to avoid detection
(i.e. source determination). Therefore, these offenders will exert
thought and effort to cover their tracks to avoid any liabilities
or
accountability for their damaging actions. Illegal actions (or
crimes) that involve a computing system, either as a mean to
carry out the attack or as a target, are referred to as
Cybercrimes
[2]. Computer crime or Cybercrime are two terms that are being
used interchangeably to refer to the same thing. A Distributed
Denial of Service attack (DDoS) is a good example for a
computer crime where the computing system is used as a mean
as well as a target. Fortunately, cybercrimes leave fingerprints
that investigators can collect, correlate and analyze to
understand what, why, when and how a crime was committed;
and consequently, and most importantly, build a good case that
can bring the criminals to justice. In this sense, computers can
be
seen as great source of evidence. For this purpose Computer
Forensics (CF) emerged as a major area of interest, research and
development driven by the legislative needs of having scientific
reliable framework, practices, guidelines, and techniques for
forensics activities starting from evidence acquisition,
preservation, analysis, and finally presentation. Computer
Forensics can be defined as the process of scientifically
obtaining, examining and analyzing digital information so that
it
can be used as evidence in civil, criminal or administrative
cases
[2]. A more formal definition of Computer Forensics is “the
discipline that combines elements of law and computer science
to collect and analyse data from computer systems, networks,
wireless communications, and storage devices in a way that is
admissible as evidence in a court of law” [3].
To hinder the efforts of Computer Forensics, criminals work
doggedly to instigate, develop and promote counter techniques
and methodologies, or what is commonly referred to as Anti-
Forensics. If we adopt the definition of Computer Forensics
(CF) as scientifically obtaining, examining, and analysing
digital
information to be used as evidence in a court of law, then Anti-
Forensics can be defined similarly but in the opposite direction.
In Computer Anti-Forensics (CAF) scientific methods are used
to simply frustrate Forensics efforts at all forensics stages. This
includes preventing, impeding, and/or corrupting the acquiring
of the needed evidence, its examination, its analysis, or its
credibility. In other words, whatever necessary to ensure that
computer evidence cannot get to, or will not be admissible in, a
court of law.
The use of Computer Anti-Forensics tools and techniques is
evident and far away from being an illusion. So, criminals’
reliance on technology to cover their tracks is not a claim, as
clearly reflected in recent researches conducted on reported and
investigated incidents. Based on 2009-2010 Data Breach
Investigations Reports [4][5], investigators found signs of anti-
forensics usage in over one third of cases in 2009 and 2010 with
the most common forms being the same for both years. The
results show that the overall use of anti-forensics remained
relatively flat with slight movement among the techniques
themselves. Figure [2] below shows the types of anti-Forensic
techniques used (data wiping, data hiding and data corruption)
by percentage of breaches. As shown in Figure [2] below, data
wiping is still the most common, because it is supported by
many commercial off-the-shelf products that are available even
as freeware that are easy to install, learn and use; while data
hiding and data corruption remain a distant behind.
Figure 2 Types of Anti-Forensics – 2010 (Reproduced from [5])
It is important to note that the lack of understanding on what
CAF is and what it is capable of may lead to underestimating or
probably overlooking CAF impact on the legitimate efforts of
CF. Therefore, when dealing with computer forensics, it is
important that we address the following questions, among
others, that are related to CAF: Do we really have everything?
Are the collected evidences really what were left behind or they
are only just those intentionally left for us to find? How to
know
if the CF tool used was not misleading us due to certain
weaknesses in the tool itself? Are these CF tools developed
according to proper secure software engineering methodologies?
Are these CF tools immune against attacks? What are the recent
CAF methods and techniques? This paper attempts to provide
some answers to such questions that can assist in developing the
proper understanding for the issue.
3. RELATED WORK, CAF GOALS AND
CLASSIFICATIONS
Even though computer forensics and computer ant-forensics are
tightly related, as if they are two faces of the same coin, the
amount of research they received was not the same. CF received
more focus over the past ten years or so because of its relation
with other areas like data recovery, incident management and
information systems risk assessment. CF is a little bit older, and
therefore more mature than CAF. It has consistent definition,
well defined systematic approach and complete set of leading
best practices and technology.
CAF on the other side, is still a new field, and is expected to get
mature overtime and become closer to CF. In this effort, recent
research papers attempted to introduce several definitions,
various classifications and suggest some solutions and
countermeasures. Some researchers have concentrated more on
the technical aspects of CF and CAF software in terms of
vulnerabilities and coding techniques, while others have
focused
primarily on understanding file systems, hardware capabilities,
and operating systems. A few other researchers chose to address
the issue from an ethical or social angle, such as privacy
concerns. Despite the criticality of CAF, it is hard to find a
comprehensive research that addresses the subject in a holistic
manner by providing a consistent definition, structured
taxonomies, and an inclusive view of CAF.
3.1. CAF Goals
As stated in the previous section, CAF is a collection of tools
and techniques that are intended to frustrate CF tools and CF’s
investigators efforts. This field is growingly receiving more
interest and attention as it continues to expose the limitations of
currently available computer forensics techniques as well as
challenge the presumed reliability of common CF tools. We
believe, along with other researchers, that the advancements in
the CAF field will eventually put the necessary pressure on CF
developers and vendors to be more proactive in identifying
possible vulnerabilities or weaknesses in their products, which
consequently should lead to enhanced and more reliable tools.
CAF can have a broad range of goals including: avoiding
detection of event(s), disrupting the collection of information,
increasing the time an examiner needs to spend on a case,
casting doubt on a forensic report or testimony. In addition,
these goals may also include: forcing the forensic tool to reveal
its presence, using the forensic tool to attack the organization in
which it is running, and leaving no evidence that an anti-
forensic
tool has been run [6].
3.2. CAF Classifications
Several classifications for CAF have been introduced in the
literature. These various taxonomies differ in the criteria used
to
do the classification. The following are the most common
approaches used:
1. Categories Based on the Attacked Target
• Attacking Data: The acquisition of evidentiary data in
the forensics process is a primary goal. In this
category CAFs seek to complicate this step by
wiping, hiding or corrupting evidentiary data.
• Attacking CF Tools: The major focus of this category
is the examination step of the forensics process. The
objective of this category is to make the examination
results questionable, not trustworthy, and/or
misleading by manipulating essential information
like hashes and timestamps.
• Attacking the Investigator: This category is aimed at
exhausting the investigator’s time and resources,
leading eventually to the termination of the
investigation.
2. CAF Techniques vs. Tactics
This categorization makes a clear distinction
between the terms anti-forensics and counter-forensics
[7], even though the two terms have been used
interchangeably by many others as the emphasis is
usually on technology rather than on tactics.
• Counter-Forensics: This category includes all
techniques that target the forensics tools directly to
cause them to crash, erase collected evidence,
and/or break completely (thus disallowing the
investigator from using it). Compression bombs
are good example on this category.
• Anti-Forensics: This category includes all
technology related techniques including
encryption, steganography, and alternate data
streams (ADMs).
3. Traditional vs. Non-Traditional
• Traditional Techniques: This category includes
techniques involving overwriting data,
Cryptography, Steganography, and other data hiding
approaches beside generic data hiding techniques.
• Non-Traditional Techniques: As opposed to
traditional techniques, these techniques are more
creative and impose more risk as they are harder to
detect. These include:
o Memory injections, where all malicious
activities are done on the volatile memory area.
o Anonymous storage, utilizes available web-
based storage to hide data to avoid being found
on local machines.
o Exploitation of CF software bugs, including
Denial of Service (DoS) and Crashers, amongst
others.
4. Categories Based on Functionality
This categorization includes data hiding, data
wiping and obfuscation. Attacks against CF processes
and tools is considered a separate category based on
this scheme
4. CAF CHALLENGES
Because Computer Anti-Forensics (CAF) is a relatively new
discipline, the field faces many challenges that need considered
and addressed. In this section, we have attempted to identify the
most pressing challenges surrounding this area, highlight the
research needed to address such challenges, and attempt to
provide perceptive answers to some the concerns.
4.1. Ambiguity
Aside from having no industry-accepted definition for CAF,
studies in this area view anti-forensics differently; this leads to
not having a clear set of standards or frameworks for this
critical
area. Consequently, misunderstanding may be an unavoidable
end result that could lead to improperly addressing the
associated concerns. The current classification schemes, stated
above, which mostly reflect the author’s viewpoint and probably
background, confirm as well as contribute to the ambiguity in
this field. A classification can only be beneficial if it must has
clear criteria that can assist not only in categorizing the current
known techniques and methodologies but will also enable
proper
understanding and categorization of new ones. The attempt to
distinguish between the two terms, anti-forensics and counter-
forensics based on technology and tactics is a good initiative
but
yet requires more elaboration to avoid any unnecessary
confusions.
To address the definition issue, we suggest to adopt a definition
for CAF that is built from our clear understanding of CF. The
classification issue can be addressed by narrowing the gaps
amongst the different viewpoints in the current classifications
and excluding the odd ones.
4.2. Investigation Constraints
A CF investigation has three main constraints/challenges,
namely: time, cost and resources. Every CF investigation case
should be approached as separate project that requires proper
planning, scoping, budgeting and resources. If these elements
are not properly accounted for, the investigation will eventually
fail, with most efforts up to the point of failure being wasted. In
this regard, CAF techniques and methodologies attempt to
attack
the time, cost and resources constraints of an investigation
project. An investigator may not able to afford the additional
costs or allocate the additional necessary resources. Most
importantly, the time factor might play a critical role in the
investigation as evidentiary data might lose value with time,
and/or allow the suspect(s) the opportunity to cover their tracks
or escape. Most, if not all, CAF techniques and methodologies
(including data wiping, data hiding, and data corruption)
attempt
to exploit this weakness. Therefore, it proper project
management is imperative before and during every CF
investigation.
4.3. Integration of Anti-Forensics into Other
Attacks
Recent researches show an increased adoption of CAF
techniques into other typical attacks. The primary purposes of
integrating CAF into other attacks are undetectability and
deletion of evidence. Two major areas for this threatening
integration are Malware and Botnets [8][9]. Malwares and
Botnets when armed with these techniques will make the
investigative efforts labour and time intensive which can lead to
overlooking critical evidence, if not abandoning the entire
investigation.
4.4. Breaking the Forensics Software
CF tools are, of course, created by humans, just like other
software systems. Rushing to release their products to the
market before their competition, companies tend to,
unintentionally, introduce vulnerabilities into their products. In
such cases, software development best practices, which are
intended to ensure the quality of the product, might be
overlooked leading to the end product being exposed to many
known vulnerabilities, such as buffer overflow and code
injection. Because CF software is ultimately used to present
evidence in courts, the existence of such weaknesses is not
tolerable. Hence, all CF software, before being used, must be
subjected to thorough security testing that focuses on robustness
against data hiding and accurate reproduction of evidence.
The Common Vulnerabilities and Exposures (CVE) database is
a great source for getting updates on vulnerabilities in existing
products [10]. Some studies have reported several weaknesses
that may result in crashes during runtime leaving no chance for
interpreting the evidence [11]. Regardless of the fact that some
of these weaknesses are still being disputed [12], it is important
to be aware that these CF tools are not immune to
vulnerabilities, and that CAF tools would most likely take
advantage of such weaknesses. A good example of a common
technique that can cause a CF to fail or crash is the
“Compression Bomb”; where files are compressed hundreds of
times such that when a CF tool tries to decompress, it will use
up so many resources causing the computer or the tool to hang
or crash.
4.5. Privacy Concerns
Increasingly, users are becoming more aware of the fact that
just
deleting a file does not make it really disappear from the
computer and that it can be retrieved by several means. This
awareness is driving the market for software solutions that
provide safe and secure means for files deletion. Such tools are
marketed as “privacy protection” software and claim to have the
ability to completely remove all traces of information
concerning user’s activity on a system, websites, images and
downloaded files. Some of these tools do not only provide
protection through secure deletion; but also offer encryption
and
compression. Moreover, these tools are easy use, and some can
even be downloaded for free. WinZip is a popular tool that
offers encryption, password protection, and compression. Such
tools will most definitely complicate the search for and
acquiring of evidence in any CF investigation because they
make the whole process more time and resources consuming.
Privacy issues in relation to CF have been the subject of
detailed
research in an attempt to define appropriate policies and
procedures that would maintain users’ privacy when excessive
data is acquired for forensics purposes [13].
4.6. Nature of Digital Evidence
CF investigations rely on two main assumptions to be
successful: (1) the data can be acquired and used as evidence,
and (2) the results of the CF tools are authentic, reliable, and
believable. The first assumption highlights the importance of
digital evidence as the basis for any CF investigation; while the
second assumption highlights the critical role of the
trustworthiness of the CF tools in order for the results to stand
solid in courts.
Digital evidence is more challenging than physical evidence
because of its more susceptible to being altered, hidden,
removed, or simply made unreadable. Several techniques can be
utilized to achieve such undesirable objectives that can
complicate the acquisition process of evidentiary digital data,
and thus compromise the first assumption.
CF tools rely on many techniques that can attest to their
trustworthiness, including but limited to: hashing; timestamps;
and signatures during examination, analyses and inspection of
source files. CAF tools can in turn utilize new advances in
technology to break such authentication measures, and thus
comprise the second assumption..
The following is a brief explanation of some of the techniques
that are used to compromise these two assumptions:
• Encryption is used to make the data unreadable. This is one
of the most challenging techniques, as advances in
encryption algorithms and tools empowered it to be applied
on entire hard drive, selected partitions, or specific
directories and files. In all cases, an encryption key is
usually needed to reverse the process and decrypt the
desired data, which is usually unknown to an investigator,
in most cases. To complicate matters, decryption using
brute-force techniques becomes infeasible when long keys
are used. More success in this regard might be achieved
with keyloggers or volatile memory content acquisition.
• Steganography aims at hiding the data, by embedding it
into another digital form, such as images or videos.
Commercial Steganalysis tools, that can detect hidden data,
exist and can be utilized to counter Steganography.
Encryption and Steganography can be combined to obscure
data and make it also unreadable, which can extremely
complicate a CF investigation.
• Secure-Deletion removes the target data completely from
the source system, by overwriting it with random data, and
thus rendering the target data unrecoverable. Fortunately,
most of the available commercial secure-deletion tools tend
to underperform and thus miss some data [14]. More
research is needed in this area to understand the weaknesses
and identify the signatures of such tools. Such information
is needed to detect the operations and minimize the impact
of these tools.
• Hashing is used by CF tools to validate the integrity of
data. A hashing algorithm accepts a variable-size input,
such as a file, and generates a unique fixed-size value that
corresponds to the given input. The generated output is
unique and can be used as a fingerprint for the input file.
Any change in the original file, no matter how minor, will
result in considerable change in the hash value produced by
the hashing algorithm. A key feature in hashing algorithms
is “Irreversibility” where having the hash value in hand will
not allow the recovery of the original input. Another key
feature is “Uniqueness” which basically means that the
hash values of two files will be equal if and only if the files
are absolutely identical. Many hashing algorithms have
developed, and some have been already infiltrated or
cracked. Other algorithms like MD5, MD6, Secure Hashing
Algorithms (SHA), SHA-1, SHA-2, amongst others, are
harder to break. However, all are vulnerable to being
infiltrated as technology and research advance [15].
Research is also necessary in the other direction to enhance
the capabilities of CF tools in this regard and maintain their
credibility.
• Timestamps are associated with files and are critical for the
task of establishing the chain of events during a CF
investigation. The time line for the events is contingent on
the accuracy of timestamps. CAF tools have provided the
capability to modify timestamps of files or logs, which can
mislead an investigation and consequently coerce the
conclusion. Many tools currently exist on the market, some
are even freely available, that make it easy to manipulate
the timestamps, such as Timestamp Modifier and
SKTimeStamp [16].
• File Signatures, also known as Magic Numbers, are
constant known values that exist at the beginning of each
file to identify the file type (e.g. image file, word
document, etc.). Hexadecimal editors, such as WinHex, can
be used to view and inspect these values. Forensics
investigators rely on these values to search for evidence of
certain type. When a file extension is changed, the actual
type file is not changed, and thus the file signature remains
unchanged. ACF tools intentionally change the file
signatures in their attempt to mislead the investigations as
some evidence files are overlooked or dismissed. Complete
listing of file signatures or magic numbers can be found on
the web in [17].
• CF Detection is simply the capability of ACF tools to
detect the presences of CF software and their activities or
functionalities. Self-Monitoring, Analysis and Reporting
Technology (SMART) built into most hard drives reports
the total number of power cycles (Power_Cycle_Count),
the total time that a hard drive has been in use
(Power_On_Hours or Power_On_Minutes), a log of high
temperatures that the drive has reached, and other
manufacturer-determined attributes. These counters can be
reliably read by user programs and cannot be reset.
Although the SMART specification implements a
DISABLE command (SMART 96), experimentation
indicates that the few drives that actually implement the
DISABLE command continue to keep track of the time-in-
use and power cycle count and make this information
available after the next power cycle. CAF tools can read
SMART counters to detect attempts at forensic analysis and
alter their behavior accordingly. For example, a dramatic
increase in Power_On_Minutes might indicate that the
computer’s hard drive has been imaged [18].
• Business Needs: Cloud Computing (CC) is a business
model typically suited for small and medium enterprises
(SME) that do not have enough resources to invest in
building their own IT infrastructure. Hence, they tend to
outsource this to third parties who will in turn lease their
infrastructure and probably applications as services. This
new model introduces more challenges to CF investigations
due to mainly the fact that the data is on the cloud (i.e.
hosted somewhere in the Internet space), being transferred
across countries with different regulations, and most
importantly might reside on a machine that hosts other data
instances of other enterprises. In some instances, the data
for the same enterprise might even be stored across
multiple data centres [19][20]. These issues complicate the
CF’s primary functions (i.e. data acquisition, examination,
and analyses) needed to build a good case extremely hard.
4.7 Recommendations
Based on our findings, we see room for improvement in the
field
of ACF that can address some of the issues surrounding this
field. We believe that such recommendations, when adopted
and/or implemented properly, can add value and consolidate the
efforts for advancing this field. Below is a list and brief
explanation of the recommendations:
a) Spend More Efforts to Understand ACF
More efforts should be spent in order to reach an agreed
upon comprehensive definition for ACF that would assist in
getting better understanding of the concepts in the field.
These efforts should also extend to develop acceptable best
practices, procedures and processes that constitute the
proper framework, or standard, that professionals can use
and build onto. ACF classifications also need to be
integrated, clarified, and formulated on well-defined
criteria. Such fundamental foundational efforts would
eventually assist researchers and experts in addressing the
issues and mitigating the associated risks.
Awareness of AFC techniques and their capabilities will
prevent, or at least reduce, their success and consequently
their impact on CF investigations. Knowledge in this area
should encompass both techniques and tactics. Continued
education and research are necessary to stay atop of latest
developments in the field, and be ready with appropriate
countermeasures when and as necessary.
b) Define Laws that Prohibit Unjustified Use of ACF
Existence of strict and clear laws that detail the obligations
and consequences of violations can play a key deterrent
role for the use of these tools in a destructive manner.
When someone knows in advance that having certain ACF
tools on one’s machine might be questioned and possibly
pose some liabilities, one would probably have second
thoughts about installing such tools.
Commercial non-specialized ACF tools, which are more
commonly used, always leave easily detectable fingerprints
and signatures. They sometimes also fail to fulfil their
developers’ promises of deleting all traces of data. This can
later be used as evidence against a suspected criminal and
can lead to an indictment. The proven unjustified use of
ACF tools can be used as supporting incriminatory
evidence in courts in some countries [21].
To address the privacy concerns, such as users needs to
protect personal data like family pictures or videos, an
approved list of authorized software can be compiled with
known fingerprints, signatures and special recovery keys.
Such information, especially recovery keys, would then be
safe-guarded in possession of the proper authorities. It
would strictly be used to reverse the process of AFC tools,
through the appropriate judicial processes.
c) Utilize Weaknesses of ACF Software
In some cases, digital evidence can still be recovered if a
data wiping tool is poorly used or is functioning
improperly. Hence, each AFC software must be carefully
examined and continuously analyzed in order to fully
understand its exact behaviour and determine its
weaknesses and vulnerabilities [14][22]. This can help to
develop the appropriate course of actions given the
different possible scenarios and circumstances. This could
prove to be valuable in saving time and resources during an
investigation.
d) Harden CF Software
CAF and CF thrive on the weaknesses of each other. To
ensure justice CF must always strive to be more advanced
than its counterpart. This can be achieved by conducting
security and penetration tests to verify the software is
immune to external attacks. Also, it is imperative not to
submit to market pressure and demand for tools by rapidly
releasing products without proper validation. The best
practices of software development must not be overlooked
at any rate. When vulnerabilities are identified, proper fixes
and patches must be tested, verified and deployed promptly
in order to avoid zero-day attacks.
5. CONCLUSION AND FUTURE WORK
5.1. Conclusion
Computer Anti-Forensics (CAF) is an important developing area
of technology. Because CAF success means that digital
evidence
will not be admissible in courts, Computer Forensics (CF) must
evaluate its techniques and tactics very carefully. Also, CF
efforts must be integrated and expedited to narrow the current
exiting gap with CAF. It is important to agree on an acceptable
definition and classification for CAF which will assist in
implementing proper countermeasures. Current definitions and
classifications all seem to concentrate on specific aspects of
CAF without truly providing the needed holistic view.
It is very important to realize that CAF is not only about tools
that are used to delete, corrupt, or hide evidence. CAF is a
blend
of techniques and tactics that utilize technological
advancements
in areas like encryption and data overwriting amongst other
techniques to obstruct investigators’ efforts.
Many challenges exist and need to be carefully analyzed and
addressed. In this paper we attempted to identify some of these
challenges and suggested some recommendations that might, if
applied properly, mitigate the risks.
5.2. Future Work
This paper provides solid foundation for future work that can
further elaborate on the various highlighted areas. It suggests a
definition for CAF that is closely aligned with CF and presents
several classifications that we deem acceptable. It also
discusses
several challenges that can be further addressed in future
research. CAF technologies, techniques, and tactics need to
receive more attention in research, especially in the areas that
present debates on hashes, timestamps, and file signatures.
Research opportunities in Computer Forensics, Network
Forensics, and Anti-Forensics can use the work presented in this
paper as a base. Privacy concerns and other issues related to the
forensics field introduce a raw domain that requires serious
consideration and analysis. Cloud computing, virtualization, and
related laws and regulations concerns are topics that can be
considered in future research.
6. REFERENCES
[ 1 ] Corey Thuen, University of Idaho: “Understanding
Counter-Forensics to Ensure a Successful Investigation”.
DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
10.1.1.138.2196
[ 2 ] Internet Usage Statistics, “The Internet Big Picture,
World Internet Users and Population Stats”. DOI=
http://www.internetworldstats.com/stats.htm
[ 3 ] Bill Nelson, Amelia Phillips, and Steuart, “Guide to
Computer Forensics and Investigations”, pp 2-3, 4
th
Edition.
[ 4 ] US-Computer Emergency Readiness Team, CERT, a
government organization, “Computer Forensics”, 2008.
[ 5 ] Verizon Business, “2009 Data Breach Investigations
Report”. A study conducted by the Verizon RISK Team
in cooperation with the United States Secret Service.
DOI=http://www.verizonbusiness.com/about/news/podca
sts/1008a1a3-111=129947--
Verizon+Business+2009+Data+Breach+Investigations+
Report.xml
[ 6 ] Verizon Business, “2010 Data Breach Investigations
Report”. A study conducted by the Verizon RISK Team
in cooperation with the United States Secret Service.
DOI=http://www.verizonbusiness.com/resources/reports/
rp_2010-data-breach-
report_en_xg.pdf?&src=/worldwide/resources/index.xml
&id=
[ 7 ] Simson Garfinkel, “Anti-Forensics: Techniques,
Detection and Countermeasures”, 2
nd
International
Conference in i-Warefare and Security, pp 77, 2007
[ 8 ] W.Matthew Hartley, “Current and Future Threats to
Digital Forensics”, ISSA Journal, August 2007
[ 9 ] Murray Brand, (2007), “Forensics Analysis Avoidance
Techniques of Malware”, Edith Cowan University,
Australia.
[ 10 ] “Security 101: Botnets”. DOI=
http://www.secureworks.com/research/newsletter/2008/0
5/
[ 11 ] Common Vulnerabilities and Exposures (CVE) database,
http://cve.mitre.org/
[ 12 ] Tim Newsham, Chris Palmer, Alex Stamos, “Breaking
Forensics Software: Weaknesses in Critical Evidence
Collection”, iSEC Partners http://www.isecpartners.com,
2007
[ 13 ] Guidance Software: Computer Forensics
Solution
s and
Digital Investigations
(http://www.guidancesoftware.com/)
[ 14 ] S. Srinivasan, “Security and Privacy vs. Computer
Forensics Capabilities”, ISACA Online Journal, 2007
[ 15 ] Matthew Geiger, Carnegie Mellon University,
“Evaluating Commercial Counter-Forensic Tools”,
Digital Forensic Research Workshop (DFRWS), 2005
[ 16 ] Xiaoyun Wang and Hongbo Yu, Shandong University,
China, “How to Break MD5 and Other Hash Functions”,
EUROCRYPT 2005, pp.19-35, May, 2005
[ 17 ] How to Change TimeStamp of a File in Windows. DOI=
http://www.trickyways.com/2009/08/how-to-change-
timestamp-of-a-file-in-windows-file-created-modified-
and-accessed/.
[ 18 ] File Signature Table. DOI=
http://www.garykessler.net/library/file_sigs.html,
[ 19 ] McLeod S, “SMART Anti-Forensics”, DOI=
http://www.forensicfocus.com/smart-anti-forensics, .
[ 20 ] Stephen Biggs and Stilianos, “Cloud Computing
Storms”, International Journal of Intelligent Computing
Research (IJICR), Volume 1, Issue 1, MAR, 2010
[ 21 ] U Gurav, R Shaikh, “Virtualization – A key feature of
cloud computing”, International Conference and
Workshop on Emerging Trends in technology (ICWET
2010), Mumbai, India
[ 22 ] U.S .v .Robert Johnson - Child Pornography Indictment.
DOI=http://news.findlaw.com/hdocs/docs/chldprn/usjhns
n62805ind.pdf
[ 23 ] United States of America v. H. Marc Watzman. DOI=
http://www.justice.gov/usao/iln/.../2003/watzman.pdf
[ 24 ] Mark Whitteker, “Anti-Forensics: Breaking the
Forensics Process”, ISSA Journal, November, 2008
[ 25 ] Gary C. Kessler,“Anti-Forensics and the Digital
Investigator”, Champlain College, USA
[ 26 ] Ryan Harris, “Arriving at an anti-forensics consensus:
examining how to define and control the anti-forensics
problem”, DOI= www.elsevier.com/locate/dinn.
Appendix A: Anti-Forensics Tools
The following is a list of some commercial CAF software
packages available on the market. The tools listed below are
intended as examples; none of these tools were purchased or
tested as part of this paper work.
Category Tool Name
Privacy and Secure Deletion Privacy Expert; SecureClean;
PrivacyProtection; Evidence
Eliminator; Internet Cleaner
File and Disk Encryption TruCrypt, PointSec; Winzip 14
Time stamp Modifiers SKTimeStamp; Timestamp
Modifier; Timestomp
Others The Defiler’s Toolkit – Necrofile
and Klimafile; Metasploit Anti-
Forensic Investigation Arsenal
(known affectionately as MAFIA)
Download and read the following articles available in the ACM
Digital Library:
Arduini, F., & Morabito, V. (2010, March). Business continuity
and the banking industry. Communications of the ACM, 53(3),
121-125
Dahbur, K., & Mohammad, B. (2011). The anti-forensics
challenge. Proceedings from ISWSA '11: International
Conference on Intelligent Semantic Web-Services and
Applications. Amman, Jordan.
Write a five to seven (5-7) page paper in which you:
1. Consider that Data Security and Policy Assurance methods
are important to the overall success of IT and Corporate data
security.
a. Determine how defined roles of technology, people, and
processes are necessary to ensure resource allocation for
business
continuity.
b. Explain how computer security policies and data retention
policies help maintain user expectations of levels of business
continuity that could be achieved.
c. Determine how acceptable use policies, remote access
policies, and email policies could help minimize any anti-
forensics
efforts. Give an example with your response.
2. Suggest at least two (2) models that could be used to ensure
business continuity and ensure the integrity of corporate
forensic
efforts. Describe how these could be implemented.
3. Explain the essentials of defining a digital forensics process
and provide two (2) examples on how a forensic recovery and
analysis
plan could assist in improving the Recovery Time Objective
(RTO) as described in the first article.
4. Provide a step-by-step process that could be used to develop
and sustain an enterprise continuity process.
5. Describe the role of incident response teams and how these
accommodate business continuity.
6. There are several awareness and training efforts that could be
adopted in order to prevent anti-forensic efforts.
a. Suggest two (2) awareness and training efforts that could
assist in preventing anti-forensic efforts.
b. Determine how having a knowledgeable workforce could
provide a greater level of secure behavior. Provide a rationale
with
your response.
c. Outline the steps that could be performed to ensure
continuous effectiveness.
7. Use at least three (3) quality resources in this assignment.
Note: Wikipedia and similar Websites do not qualify as quality
resources.
Your assignment must follow these formatting requirements:
· Be typed, double spaced, using Times New Roman font (size
12), with one-inch margins on all sides; citations and references
must follow APA or school-specific format. Check with your
professor for any additional instructions.
· Include a cover page containing the title of the assignment, the
student’s name, the professor’s name, the course title, and the
date. The cover page and the reference page are not included in
the required assignment page length.
The specific course learning outcomes associated with this
assignment are:
· Describe and apply the 14 areas of common practice in the
Department of Homeland Security (DHS) Essential Body of
Knowledge.
· Describe best practices in cybersecurity.
· Explain data security competencies to include turning policy
into practice.
· Describe digital forensics and process management.
· Evaluate the ethical concerns inherent in cybersecurity and
how these concerns affect organizational policies.
· Create an enterprise continuity plan.
· Describe and create an incident management and response
plan.
· Describe system, application, network, and
telecommunications security policies and response.
· Use technology and information resources to research issues in
cybersecurity.
· Write clearly and concisely about topics associated with
cybersecurity using proper writing mechanics and technical
style conventions.

Weitere ähnliche Inhalte

Ähnlich wie Extracting Authoring Information Based on Keywords andSemant.docx

Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Serverwebhostingguy
 
ontology.ppt
ontology.pptontology.ppt
ontology.pptPrerak10
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesIJwest
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015Cason Snow
 
RDFa Semantic Web
RDFa Semantic WebRDFa Semantic Web
RDFa Semantic WebRob Paok
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQLLino Valdivia
 
An Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum SAn Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum SRichard Hogue
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...dannyijwest
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarianstrevorthornton
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsRinke Hoekstra
 
semantic web resource description framework
semantic web resource description frameworksemantic web resource description framework
semantic web resource description frameworkKomalFatima37
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Morgan Briles
 

Ähnlich wie Extracting Authoring Information Based on Keywords andSemant.docx (20)

Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
 
ontology.ppt
ontology.pptontology.ppt
ontology.ppt
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
RDFa Semantic Web
RDFa Semantic WebRDFa Semantic Web
RDFa Semantic Web
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
 
Semantic web
Semantic web Semantic web
Semantic web
 
An Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum SAn Ontology-Based Information Extraction Approach For R Sum S
An Ontology-Based Information Extraction Approach For R Sum S
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentation
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentation
 
Jpl presentation
Jpl presentationJpl presentation
Jpl presentation
 
Analysis on semantic web layer cake entities
Analysis on semantic web layer cake entitiesAnalysis on semantic web layer cake entities
Analysis on semantic web layer cake entities
 
Spotlight
SpotlightSpotlight
Spotlight
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
 
SNSW CO3.pptx
SNSW CO3.pptxSNSW CO3.pptx
SNSW CO3.pptx
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
 
SemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n BoltsSemanticWeb Nuts 'n Bolts
SemanticWeb Nuts 'n Bolts
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
semantic web resource description framework
semantic web resource description frameworksemantic web resource description framework
semantic web resource description framework
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 

Mehr von mydrynan

CSIA 413 Cybersecurity Policy, Plans, and Programs.docx
CSIA 413 Cybersecurity Policy, Plans, and Programs.docxCSIA 413 Cybersecurity Policy, Plans, and Programs.docx
CSIA 413 Cybersecurity Policy, Plans, and Programs.docxmydrynan
 
CSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docx
CSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docxCSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docx
CSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docxmydrynan
 
CSI Paper Grading Rubric- (worth a possible 100 points) .docx
CSI Paper Grading Rubric- (worth a possible 100 points)   .docxCSI Paper Grading Rubric- (worth a possible 100 points)   .docx
CSI Paper Grading Rubric- (worth a possible 100 points) .docxmydrynan
 
CSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docx
CSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docxCSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docx
CSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docxmydrynan
 
CSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docx
CSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docxCSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docx
CSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docxmydrynan
 
CSE422 Section 002 – Computer Networking Fall 2018 Ho.docx
CSE422 Section 002 – Computer Networking Fall 2018  Ho.docxCSE422 Section 002 – Computer Networking Fall 2018  Ho.docx
CSE422 Section 002 – Computer Networking Fall 2018 Ho.docxmydrynan
 
CSCI  132  Practical  Unix  and  Programming   .docx
CSCI  132  Practical  Unix  and  Programming   .docxCSCI  132  Practical  Unix  and  Programming   .docx
CSCI  132  Practical  Unix  and  Programming   .docxmydrynan
 
CSCI 714 Software Project Planning and EstimationLec.docx
CSCI 714 Software Project Planning and EstimationLec.docxCSCI 714 Software Project Planning and EstimationLec.docx
CSCI 714 Software Project Planning and EstimationLec.docxmydrynan
 
CSCI 561Research Paper Topic Proposal and Outline Instructions.docx
CSCI 561Research Paper Topic Proposal and Outline Instructions.docxCSCI 561Research Paper Topic Proposal and Outline Instructions.docx
CSCI 561Research Paper Topic Proposal and Outline Instructions.docxmydrynan
 
CSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docx
CSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docxCSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docx
CSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docxmydrynan
 
CryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docx
CryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docxCryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docx
CryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docxmydrynan
 
CSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docx
CSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docxCSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docx
CSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docxmydrynan
 
CSCE 1040 Homework 2 For this assignment we are going to .docx
CSCE 1040 Homework 2  For this assignment we are going to .docxCSCE 1040 Homework 2  For this assignment we are going to .docx
CSCE 1040 Homework 2 For this assignment we are going to .docxmydrynan
 
CSCE509–Spring2019Assignment3updated01May19DU.docx
CSCE509–Spring2019Assignment3updated01May19DU.docxCSCE509–Spring2019Assignment3updated01May19DU.docx
CSCE509–Spring2019Assignment3updated01May19DU.docxmydrynan
 
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxCSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxmydrynan
 
CSCE 3110 Data Structures & Algorithms Summer 2019 1 of .docx
CSCE 3110 Data Structures & Algorithms Summer 2019   1 of .docxCSCE 3110 Data Structures & Algorithms Summer 2019   1 of .docx
CSCE 3110 Data Structures & Algorithms Summer 2019 1 of .docxmydrynan
 
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docxCSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docxmydrynan
 
CSC-321 Final Writing Assignment In this assignment, you .docx
CSC-321 Final Writing Assignment  In this assignment, you .docxCSC-321 Final Writing Assignment  In this assignment, you .docx
CSC-321 Final Writing Assignment In this assignment, you .docxmydrynan
 
Cryptography is the application of algorithms to ensure the confiden.docx
Cryptography is the application of algorithms to ensure the confiden.docxCryptography is the application of algorithms to ensure the confiden.docx
Cryptography is the application of algorithms to ensure the confiden.docxmydrynan
 
CSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docx
CSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docxCSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docx
CSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docxmydrynan
 

Mehr von mydrynan (20)

CSIA 413 Cybersecurity Policy, Plans, and Programs.docx
CSIA 413 Cybersecurity Policy, Plans, and Programs.docxCSIA 413 Cybersecurity Policy, Plans, and Programs.docx
CSIA 413 Cybersecurity Policy, Plans, and Programs.docx
 
CSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docx
CSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docxCSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docx
CSIS 100CSIS 100 - Discussion Board Topic #1One of the object.docx
 
CSI Paper Grading Rubric- (worth a possible 100 points) .docx
CSI Paper Grading Rubric- (worth a possible 100 points)   .docxCSI Paper Grading Rubric- (worth a possible 100 points)   .docx
CSI Paper Grading Rubric- (worth a possible 100 points) .docx
 
CSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docx
CSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docxCSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docx
CSIA 413 Cybersecurity Policy, Plans, and ProgramsProject #4 IT .docx
 
CSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docx
CSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docxCSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docx
CSI 170 Week 3 AssingmentAssignment 1 Cyber Computer CrimeAss.docx
 
CSE422 Section 002 – Computer Networking Fall 2018 Ho.docx
CSE422 Section 002 – Computer Networking Fall 2018  Ho.docxCSE422 Section 002 – Computer Networking Fall 2018  Ho.docx
CSE422 Section 002 – Computer Networking Fall 2018 Ho.docx
 
CSCI  132  Practical  Unix  and  Programming   .docx
CSCI  132  Practical  Unix  and  Programming   .docxCSCI  132  Practical  Unix  and  Programming   .docx
CSCI  132  Practical  Unix  and  Programming   .docx
 
CSCI 714 Software Project Planning and EstimationLec.docx
CSCI 714 Software Project Planning and EstimationLec.docxCSCI 714 Software Project Planning and EstimationLec.docx
CSCI 714 Software Project Planning and EstimationLec.docx
 
CSCI 561Research Paper Topic Proposal and Outline Instructions.docx
CSCI 561Research Paper Topic Proposal and Outline Instructions.docxCSCI 561Research Paper Topic Proposal and Outline Instructions.docx
CSCI 561Research Paper Topic Proposal and Outline Instructions.docx
 
CSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docx
CSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docxCSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docx
CSCI 561 DB Standardized Rubric50 PointsCriteriaLevels of .docx
 
CryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docx
CryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docxCryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docx
CryptographyLesson 10© Copyright 2012-2013 (ISC)², Inc. Al.docx
 
CSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docx
CSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docxCSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docx
CSCI 352 - Digital Forensics Assignment #1 Spring 2020 .docx
 
CSCE 1040 Homework 2 For this assignment we are going to .docx
CSCE 1040 Homework 2  For this assignment we are going to .docxCSCE 1040 Homework 2  For this assignment we are going to .docx
CSCE 1040 Homework 2 For this assignment we are going to .docx
 
CSCE509–Spring2019Assignment3updated01May19DU.docx
CSCE509–Spring2019Assignment3updated01May19DU.docxCSCE509–Spring2019Assignment3updated01May19DU.docx
CSCE509–Spring2019Assignment3updated01May19DU.docx
 
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docxCSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
CSCI 2033 Elementary Computational Linear Algebra(Spring 20.docx
 
CSCE 3110 Data Structures & Algorithms Summer 2019 1 of .docx
CSCE 3110 Data Structures & Algorithms Summer 2019   1 of .docxCSCE 3110 Data Structures & Algorithms Summer 2019   1 of .docx
CSCE 3110 Data Structures & Algorithms Summer 2019 1 of .docx
 
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docxCSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
 
CSC-321 Final Writing Assignment In this assignment, you .docx
CSC-321 Final Writing Assignment  In this assignment, you .docxCSC-321 Final Writing Assignment  In this assignment, you .docx
CSC-321 Final Writing Assignment In this assignment, you .docx
 
Cryptography is the application of algorithms to ensure the confiden.docx
Cryptography is the application of algorithms to ensure the confiden.docxCryptography is the application of algorithms to ensure the confiden.docx
Cryptography is the application of algorithms to ensure the confiden.docx
 
CSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docx
CSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docxCSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docx
CSc3320 Assignment 6 Due on 24th April, 2013 Socket programming .docx
 

Kürzlich hochgeladen

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Kürzlich hochgeladen (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 

Extracting Authoring Information Based on Keywords andSemant.docx

  • 1. Extracting Authoring Information Based on Keywords and Semantic Search Faisal Alkhateeb, Amal Alzubi, Iyad Abu Doush Computer Sciences Department Yarmouk University, Irbid, Jordan {alkhateebf,iyad.doush}@yu.edu.jo Shadi Aljawarneh Faculty of Science and Information Technology Al-Isra University, Amman, Jordan [email protected] Eslam Al Maghayreh Computer Sciences Department Yarmouk University, Irbid, Jordan [email protected] ABSTRACT Many people, in particular researchers, are interested in
  • 2. searching and retrieving authoring information from online authoring databases to be cited in their research projects. In this paper, we propose a novel approach for retrieving authoring information that combines keyword and semantic- based approaches. In this approach, the user is interested only in retrieving authoring information considering some specified keywords and ignore how the internal semantic search is being processed. Additionally, this approach ex- ploits the semantics and relationships between different re- sources for a better knowledge-based inference. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Search pro- cess Keywords Semantic web, RDF, SPARQL, Authoring Information, Key- word Search, Semantic Search 1. INTRODUCTION The world wide web (or simply the web) has become the first source of knowledge for all life domains. It can be seen as an extensive information system that allows exchanging the resources as well as documents. The semantic web is an evolving extension of the web aiming at giving well defined forms and semantics to the web resources (e.g., content of an HTML web page) [4]. Due to the growth of the semantic web, semantic search became an attracting approach. The term refers to meth- ods of searching web documents beyond the syntactic level of matching keywords. Exposing metadata is an essential point for a semantic search approach associated with the semantic web. The most important recent development is
  • 3. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISWSA’10, June 14–16, 2010, Amman, Jordan. Copyright 2010 ACM 978-1-4503-0475-7/0 /2010 ...$10.00. in the area of embedding metadata directly into web doc- uments. RDF (Resource Description Framework) [15] is a knowledge representation language dedicated to the annota- tion of resources within the Semantic web. Currently, many documents are annotated via RDF due to its simple data model and its formal semantics. For example, it is embed- ded in (X)HTML web pages using the RDFa language [1], in SMIL documents [7] using RDF/XML [3], etc. SPARQL [17] is a W3C recommendation language developed in order to query RDF knowledge bases, e.g., retrieving nodes from RDF graphs. Another approach that is found in search engines is based on using keywords. More precisely, both queries and docu- ments are typically treated at a word or gram level (simi- lar to Information Retrieval). The search engine is missing a semantic-level understanding of the query and can only understand the content of a document by picking out docu- ments with the most commonly occurring keywords. The objective of this paper is to provide a novel approach for retrieving authoring information that combines keyword-
  • 4. based and semantic-based approaches. In this approach, the user is interested only in retrieving authoring informa- tion considering some specified keywords and ignores how the internal semantic search is being processed. In particu- lar, the user is interested in searching authoring information from online authoring information portals (such as DBLP1, ACM2, IEEE3, etc). For instance, show me all documents of the author ”faisal alkhateeb” or the author ”jerome eu- zenat” with a title containing ”SPARQL”. In the proposed approach, keywords are used for collecting authoring infor- mation about the authors, which are then filtered with se- mantic search (using RDF and SPARQL) based on the se- mantic relations of the query. The remainder of the paper is organized as follows: we introduce the research background in Section 2. The com- bined approach is presented in Section 3 as well as a testcase illustrating the proposed approach. A review of related work is discussed in Section 4. Discussion issues drawn from this study are presented in Section 5. 2. RESEARCH BACKGROUND This section provides an overview of the elements that are necessary for presenting the proposed approach namely: 1 http://www.informatik.uni-trier.de/~ley/db/ 2 http://portal.acm.org/portal.cfm 3 http://www.ieee.org/portal/site6
  • 5. BibTeX, RDF, and SPARQL. 2.1 BibTeX BibTeX4[16, 10] is a tool and a file format which are used to describe and process lists of references, mostly in conjunc- tion with LaTeX documents. BibTeX makes it easy to cite sources in a consistent manner, by separating bibliographic information from the presentation of this information. Bib- TeX uses a style-independent text-based file format for lists of bibliography items, such as articles, books, thesis. Each bibliography entry contains some subset of standard data entries: author, booktitle, number, organization, pages, title, type, volume, year, institution, and others. Bib- liography entries included in a .bib file are split by types. The following types are understood by virtually all BibTeX styles: article, book, booklet, conference, inproceed- ings, phdthesis, etc. Example 1. The following is an instance of a BibTeX element: @article{DBLP:AlkhateebBE09, author = {Faisal Alkhateeb and Jean-Francois Baget and Jerome Euzenat}, title = {Extending SPARQL with regular exp- ression patterns (for querying RDF) }, journal = {J. web Sem.},
  • 6. volume = {7}, number = {2}, year = {2009}, pages = {57-73}, } 2.2 RDF RDF is a language for describing resources. In its abstract syntax, an RDF document is a set of triples of the form <subject, predicate, object>. Example 2. The assertion of the following RDF triples: { <ex:person1 foaf:name "Faisal Alkhateeb">, <ex:document1 BibTeX:author ex:person1>, <ex:document1 rdf:type BibTeX:inproceedings>, <ex:document1 BibTeX:title "PSPARQL">, <ex:person1 foaf:knows ex:person2>, <ex:person2 foaf:name "Jerome Euzenat">, <ex:document1 BibTeX:author ex:person2>, } means that there exists an inproceedings document, which is coauthored by two persons named ”Faisal Alkhateeb” and ”Jerome Euzenat”, whose title is ”PSPARQL”. An RDF document can be represented by a directed la- beled graph, as shown in Figure 1, where the set of nodes is the set of terms appearing as a subject or object in a triple
  • 7. and the set of arcs is the set of predicates (i.e., if <s, p, o> is a triple, then s p −→ o). 2.3 SPARQL SPARQL is the query language developed by the W3C for querying RDF graphs. A simple SPARQL query is expressed using a form resembling the SQL SELECT query: 4 http://www.bibtex.org/ ex:person1 ex:person2 ex:document1 BibTeX:inproceedings ”Faisal Alkhateeb” ”Jerome Euzenat” ”PSPARQL” foaf:knows foaf:name foaf:name BibTeX:author BibTeX:author rdf:typeBibTeX:title Figure 1: An RDF graph. SELECT ~B FROM u WHERE P
  • 8. where u is the URL of an RDF graph G to be queried, P is a SPARQL graph pattern (i.e., a pattern constructed over RDF graphs with variables) and ~B is a tuple of variables appearing in P . Intuitively, an answer to a SPARQL query is an instantiation of the variables of ~B by the terms of the RDF graph G such that the substitution of the values to the variables of P yields to a subset of the graph G.5 Example 3. Consider the RDF graph of Figure 1 repre- senting some possible authoring information. For instance the existence of the following triples {〈ex:document1, rdf:type, BibTeX:inproceedings〉, 〈ex:document1, BibTeX:title, "PSPARQL"〉} asserts that there exists an inproceedings document whose ti- tle is ”PSPARQL”. The following SPARQL query modeling this information: SELECT * FROM <Figure1> WHERE { ?document BibTeX:author ?author . ?document BibTeX:title "PSPARQL" . ?author foaf:name ?name . } could be used, when evaluated against the RDF graph of Fig- ure 1, to return the following answers: # ?document ?author ?name 1 ex:document1 ex:person1 ”F aisalAlkhateeb” 2 ex:document1 ex:person2 ”JeromeEuzenat”
  • 9. In RDF there exists a set of reserved words (called RDF Schema or simply RDFS [6]), designed to describe the re- lationships between resources and properties, e.g., classA subClassOf classB. It adds additional constraints to the re- sources associated to the RDFS terms, and thus permitting more consequences (reasoning). Example 4. Using the RDF graph presented in Figure 1, we can deduce the following triple <ex:document1 rdf:type BibTeX:proceedings> from the following triples <ex:document1 rdf:type BibTeX:inproceedings> and <BibTeX:inproceedings rdfs:subClassOf BibTeX:publications>. Hence, the following SPARQL query : SELECT * FROM <Figure1> WHERE { ?document rdf:type BibTeX:publications . ?document BibTeX:author ?author . 5When using RDFS semantics [6], this intuitive definition is irrelevant and one could apply RDFS reasoning rules to calculate answers over RDFS documents. ?document BibTeX:title "PSPARQL" . ?author foaf:name ?name . } returns the same set of answers described in Example 1 be- cause a inproceedings is a subclass of publications. SPARQL provides several result forms other than SE-
  • 10. LECT that can be used for formating the query results. For example, a CONSTRUCT query can be used for building an RDF graph from the set of answers to the query. More precisely, an RDF graph pattern (i.e., an RDF involving variables) is specified in the CONSTRUCT clause that will be constructed. For each answer to the query, the variable values are substituted in the RDF graph pattern and the merge of the resulting RDF graphs is computed.6 This fea- ture can be viewed as rules over RDF permitting to build new relations from the linked data. Example 5. The following CONSTRUCT query: CONSTRUCT {?author BibTeX:coauthorof ?document .} FROM <Figure1> WHERE { ?document BibTeX:author ?author . ?document BibTeX:title "PSPARQL" . ?author foaf:name ?name . } constructs the RDF graph (containing the coauthor relation) by substituting for each located answer the values of the vari- ables ?author and ?document to have the following graph (as done for SPARQL, we encode the resulting graph in the Tur- tle language7): @prefix ex: <http://ex.org/> . ex:person1 BibTeX:coauthorof ex:document1 . ex:person2 BibTeX:coauthorof ex:document1 . 3. METHODOLOGY The Extracting Authoring Information system which we
  • 11. have implemented is used to achieve the following: Given: - A user query in the form of textual keywords. Find: - A set of BibTeX elements that are relevant to the query. The proposed methodology consists of the following ma- jor phases: connecting to Google search engine, connecting to DBLP page and extracting BibTeX elements, convert- ing BibTeX to RDF and keywords to SPARQL query, and then evaluate the SPARQL query against the RDF docu- ment. The first two phases deal with extracting author in- formation based on keyword search, while the third and the fourth represent the semantic search. In the following, we present the basic work flow of the system as well as its main components. 3.1 System Work Flow As shown in Figure 2, the system works as follows: the user firstly enters the keywords to be searched such as key- words from author name, title of the paper title, year of publication, etc. Then, uses goolge search engine to cor- rect misspelled entered keywords (in particular, names of the authors) as well as finding the pages for the corrected 6A definition of RDF merge operation can be found at http: //www.w3.org/TR/2001/WD-rdf-mt-20010925/#merging. 7 http://www.dajobe.org/2004/01/turtle/. entered keywords (for instance, DBLP pages of the author). After that, BibTeX elements will be extracted and these BibTeX elements will be converted to RDF document. The corrected keywords will be transformed to a SPARQL query
  • 12. to be used for querying the RDF document corresponding to the extracted BibTeX elements. Figure 2: The Basic Flow of the System. 3.2 System Components The following are the main components of the system: • Google Search: after entering the keywords in the corresponding positions, they will be passed to a com- ponent that connects to Google engine. That is, the magic URL ”http://www.google.com/search?hl=ar&q=” +”searchParameters” of Google search engine will be used to search for the specified keywords. To this end, there could be two cases returned from this search ei- ther: – correct author name; or – misspelled author name. In the second, the new search path ”did you mean structure” will be used to reconnect to the Google search engine. This process is repeated until finding the corre- sponding author page in the specified authoring database (DBLP, ACM, IEEE, etc). • BibTeX extractor: this component is responsible for extracting the BibTeX elements and save them in a file for later usage. it should be noticed that this compo- nent contains several methods, each of them is specific to a bibliography database. This is due to the fact that each bibliography database has its own style to include BibTeX elements in the authoring web pages. Therefore, we suggest to include BibTeX elements in web pages as an RDFa annotations8.
  • 13. 8 http://www.w3.org/TR/xhtml-rdfa-primer/ Figure 3: The user interface of the system as well as the found results. • BibTeX parser: BibTeX elements are then converted to RDF documents using results from the BibTeX parser that we have implemented in the system. Note that if the RDFa is used to annotate BibTeX elements, then there is no need for this parser. In this case, the on- line RDF distiller9 could be used to extract RDF doc- uments corresponding to the annotated BibTeX ele- ments from web pages. In addition to the RDF triples that correspond to the BibTeX entries, RDF triples corresponding to RDFS relationships (such as <BibTeX:in- proceedings rdfs:subClassOf BibTeX:proceedings> and <BibTeX:booklet rdfs:subClassOf BibTeX:book>) are added to the RDF document to allow reasoning more results. • Keywords to SPARQL query: the entered key- words are also used to build a SPARQL query auto- matically. The query will be then used to filter the results obtained in search based on keywords. More precisely, when entering keywords, the user selects the type of the data entry to be entered such as ”Title”, ”Author”, ”Publication”, ”Pages”, and so on. Note that, the user can enter multiple authors. If the key- word begins with underscore ” ”, this means that the entered keyword is part of the BibTeX data entry. In this case, the ”regex” function can be used in the Filter constraint to build the SPARQL query. Otherwise, it is considered to be an exact search for such keyword.
  • 14. Moreover, the user can specify the relationship be- tween the entered keywords (i.e., ”or” or ”and”). When building the SPARQL query, these relationships corre- 9 http://www.w3.org/2007/08/pyRDFa/ spond to the ”UNION” and ”AND” SPARQL query graph patterns. • Query evaluator: this component is used to evaluate the SPARQL query (i.e., the query obtained from the entered keywords) against the RDF document (i.e., the RDF document obtained from the file containing the BibTeX elements) to find and construct the precise results. Any query evaluator could be used at this stage10, but we have used jena11. It should be noticed that DBLP provides the capability of searching by allowing users to pose keyword-based queries over only its bibliography dataset. For instance, one can pose the query ”alkhateeb|jerome euzenat” that searches for documents matching the keyword ”alkhateeb” or ”jerome eu- zenat”. The search process in DBLP offers good features such as a search is triggered after each keystroke with in- stant times if the network connections is not lame and case- insensitive search [2]. However, a misspelled keyword such as ”alkhateb” has no hits while ”alkhateeb” returns five doc- uments. Additionally, the semantic relations are neither fully preserved nor well defined. In particular, one can pose the query ”alkhateeb|euzenat” providing 79 documents while putting a space after the pipe ”alkhateeb| euzenat” provides only 2 documents. The semantic reasoning is not provided (see Example 4). We avoided these limitations in the pro- posed methodology.
  • 15. 3.3 Test Case 10 http://esw.w3.org/topic/SparqlImplementations 11 http://jena.sourceforge.net/ Suppose that the user had entered ”faisal alkhateb” as an author, ”jerome euzenat” as an another author, and ” sparql” as a title in the interface shown in Figure 3 and selected DBLP as a search database as well ”or” and ”and” connec- tions between the authors and the title keywords. Then the query equation will be as: ((Author or Author)) and Title) = ((faisal alkhateeb or jerome euzenat) and sparql). The search will be done in Google to check if the author name exists in DBLP or not. In this testcase, the Google engine corrects the misspelled author name ”faisal alkhateb” and uses ”faisal alkhateeb” instead to connect to the DBLP with the correct name. Then the BibTeX elements cor- responding to the keywords ”faisal alkhateb”, ”jerome eu- zenat”, and ”sparql” are extracted from DBLP: @article{DBLP:AlkhateebBE09, author = {Faisal Alkhateeb and Jean-Francois Baget and Jerome Euzenat}, title = {Extending SPARQL with regular expre- ssion patterns (for querying RDF)}, journal = {J. web Sem.}, volume = {7}, number = {2}, year = {2009},
  • 16. pages = {57-73},} ... The BibTeX elements will be then converted to an RDF document such as the one in Example 2. Also, the cor- rected entered keywords will be used to build the following SPARQL query used to filter the results: CONSTRUCT{?doc BibTeX:author "Faisal Alkhateeb" ?doc BibTeX:author "Jerome Euzenat" ... } FROM<RDF document corresponds to BibTeX> WHERE{{{?doc BibTeX:author "Faisal Alkhateeb". ?doc BibTeX:title ?title. ?doc BibTeX:year ?year. ?doc BibTeX:pages ?pages. } Union { ?doc BibTeX:author "Jerome Euzenat". ?doc BibTeX:title ?title. ?doc BibTeX:year ?year. ?doc BibTeX:pages ?pages. }} { ?doc BibTeX:title ?title } FILTER(regex(?title, "^sparql"))} } Note that the keyword ” sparql” begins with underscore ” ” and so it is considered to be part of the title while other keywords such ”‘faisal alkhateeb” do not and considered to be the full author names. Note that the user can specify a range for the publishing years. For instance, show me the authoring information between ”2004” and ”2008”. In this
  • 17. case, she/he must can enter ”2004-2008” in the year field, which in tern converted to the following part of a SPARQL query: ?document BibTeX:hasyear ?year . FILTER ((?year >=2004) && (?year <= 2008)) 4. RELATED WORK The literature on combining the keyword search with the semantic search is rich; in this section we provide a brief overview of some relevant proposals. Semantic web languages (i.e., RDF and OWL) can be used for knowledge encoding and can be used by services, tools, and applications [11]. The semantic web will not enable only human to process web contents, but also machines will be able to process web contents. This can help in creating intelligent services, customized web, and have more powerful search engines [9]. Traditional search engines use keywords as their search basis. Semantic search applies semantic processing on key- words for a better retrieval search. Hybrid search utilizes the keyword search from regular search along with the ability to use semantic search to query and reason using Meta data. Using ontologies the search engines can find pages that have different syntax, but similar semantics [9]. The hybrid search provided users with more capabilities for searching and reasoning to get better results. According to Bhagdev et al. [5] there are three types of queries that are possible using hybrid search: • Semantic search using the defined Meta data and the
  • 18. relations between instances. • Regular search using keywords. • Search for keywords within specific contents. Kiryakov et al. [14] proposed a system in which the user can select between keyword based search or ontology based search, but s/he cannot merge them to obtain search results using the two approaches together. Another work by Bhagdev et al. [5] introduced a search method that combines ontology and keyword search based methods. The research results shows that the use of hybrid search gives a better performance over keyword search or semantic search in real world cases. Rocha et al. [18] combined ontology based information retrieval with regular search in a semantic search technique. They used spread activation algorithm to get activation value of the relevance of search results with keywords. The links in the ontology are given weights according to certain prop- erties. The proposed method do not identify promptly the unique concepts and relations. In another work Gilardoni et al. [12] provided integration of keyword based search with ontology search, but with no capability for Boolean queries. Hybrid search is implemented by some large companies in the industry. Google Product Search12is a semantic search service from Google which searches for products by linking between different attributes in the knowledge base to re- trieve a product. Sheth et al. [19] use keyword query to apply multi-domain search by automatically classifying and extracting information along with ontology and meta data information.
  • 19. Guha et al. [13] used a semantic search that uses an ap- proach which combines traditional search and other data from distributed sources to answer the user query in more details. In the work of Davies et al. [8] QuizRDF is in- troduced. A system that combines the traditional search method with the ability to query and navigate RDF. The system shortcoming when there is a chaining in the query. 5. DISCUSSION We have presented in this paper, an approach for search- ing and extracting authoring information. The approach is based on keyword and semantic search approaches. In the keyword search part, the entered keywords are used to col- lect authoring information. In this part, the Google search 12 http://www.google.com/products engine is used to correct the misspelled keywords, in particu- lar the author’s name, which allows to give more results. ad- ditionally, ad-hoc routines are used to extract bibliography elements from online databases. So, we suggest to include BibTeX elements in web pages as an RDFa annotations so that standard methods can be exploited. In the semantic part, the SPARQL query obtained from entered keywords is queried against the metadata corresponding to the author- ing information, which allows to give more precise results. 6. REFERENCES [1] Adida, B., and Birbeck, M. RDFa primer - bridging the human and data webs. Working draft, W3C, 2008. http://www.w3.org/TR/xhtml-rdfa-primer/.
  • 20. [2] Bast, H., Mortensen, C. W., and Weber, I. Output-sensitive autocompletion search. Inf. Retr. 11, 4 (2008), 269–286. [3] Beckett, D., and McBride, B. RDF/XML syntax specification (revised). Recommendation, W3C, 2004. http://www.w3.org/TR/rdf-syntax-grammar/. [4] Berners-Lee, T., Hendler, J., and Lassila, O. The semantic web, 2001. http://www.sciam.com/article.cfm?articleID= 00048144-10D2-1C70-84A9809EC588EF21. [5] Bhagdev, R., Chapman, S., Ciravegna, F., Lanfranchi, V., and Petrelli, D. Hybrid search: Effectively combining keywords and semantic searches. In ESWC (2008), pp. 554–568. [6] Brickley, D., and Guha, R. RDF vocabulary description language 1.0: RDF schema. Recommendation, W3C, 2004. http://www.w3.org/TR/rdf-schema/. [7] Bulterman, D., Grassel, G., Jansen, J., Koivisto, A., Layäıda, N., Michel, T., Mullender, S., and Zucker, D. Synchronized Multimedia Integration Language (SMIL 2.1). Recommendation, W3C, 2005. http://www.w3.org/TR/SMIL/. [8] Davies, J., and Weeks, R. Quizrdf: Search technology for the semantic web. In HICSS ’04: Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences
  • 21. (HICSS’04) - Track 4 (Washington, DC, USA, 2004), IEEE Computer Society, p. 40112. [9] Decker, S., Melnik, S., van Harmelen, F., Fensel, D., Klein, M., Broekstra, J., Erdmann, M., and Horrocks, I. The semantic web: the roles of XML and RDF. 63–73. [10] Fenn, J. Managing citations and your bibliography with BibTeX. The PracTeX Journal 4 (2006). http://www.tug.org/pracjourn/2006-4/fenn/. [11] Finin, T., and Ding, L. Search Engines for Semantic Web Knowledge. In Proceedings of XTech 2006: Building Web 2.0 (May 2006). [12] Gilardoni, L., Biasuzzi, C., Ferraro, M., Fonti, R., and Slavazza, P. Lkms - a legal knowledge management system exploiting semantic web technologies. In International Semantic Web Conference (2005), Y. Gil, E. Motta, V. R. Benjamins, and M. A. Musen, Eds., vol. 3729 of Lecture Notes in Computer Science, Springer, pp. 872–886. [13] Guha, R., McCool, R., and Miller, E. Semantic search. In WWW ’03: Proceedings of the 12th international conference on World Wide Web (New York, NY, USA, 2003), ACM, pp. 700–709. [14] Kiryakov, A., Popov, B., Terziev, I., Manov, D., and Ognyanoff, D. Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web 2, 1 (2004), 49 – 79. [15] Manola, F., and Miller, E. RDF primer. Recommendation, W3C, 2004.
  • 22. http://www.w3.org/TR/rdf-primer/. [16] Patashnik, O. Bibtexing, 1988. http://ftp.ntua.gr/mirror/ctan/biblio/bibtex/ contrib/doc/btxdoc.pdf. [17] Prud’hommeaux, E., and Seaborne, A. SPARQL query language for RDF. Recommendation, W3C, January 2008. http://www.w3.org/TR/rdf-sparql-query/. [18] Rocha, C., Schwabe, D., and Aragao, M. P. A hybrid approach for searching in the semantic web. In WWW ’04: Proceedings of the 13th international conference on World Wide Web (New York, NY, USA, 2004), ACM, pp. 374–383. [19] Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., and Warke, Y. Managing semantic content for the web. IEEE Internet Computing 6, 4 (2002), 80–87. contributed articles m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c at i o n s o f t h e a c m 121 d o i : 1 0 . 1 1 4 5 / 1 6 6 6 4 2 0 . 1 6 6 6 4 5 2 by fabio arduini and Vincenzo morabito S i n c e t h e S e p t e m b e r 1 1 t h a t ta c k S on the
  • 23. World Trade Center,8 tsunami disaster, and hurricane Katrina, there has been renewed interest in emergency planning in both the private and public sectors. In particular, as managers realize the size of potential exposure to unmanaged risk, insuring “business continuity” (BC) is becoming a key task within all industrial and financial sectors (Figure 1). Aside from terrorism and natural disasters, two main reasons for developing the BC approach in the finance sector have been identified as unique to it: regulations and business specificities. Regulatory norms are key factors for all financial sectors in every country. Every organization is required to comply with federal/national law in addition to national and international governing bodies. Referring to business decisions, more and more organizations recognize that Business Continuity could be and should be strategic for the good of the business. The finance sector is, as a matter of fact, a sector in which the development of information technology (IT) and information systems (IS) have had a dramatic effect upon competitiveness. In this sector, organizations have become dependent upon tech- nologies that they do not fully compre- hend. In fact, banking industry IT and IS are considered production not sup- port technologies. As such, IT and IS have supported massive changes in the ways in which business is conducted with consumers at the retail level. In- novations in direct banking would have been unthinkable without appropriate
  • 24. IS. As a consequence business continu- ity planning at banks is essential as the industry develops in order to safeguard consumers and to comply with interna- tional regulatory norms. Furthermore, in the banking industry, BC planning is important and at the same time dif- ferent from other industries, for three other specific reasons as highlighted by the Bank of Japan in 2003: Maintaining the economic activity of ˲ residents in disaster areas2 by enabling the continuation of financial services during and after disasters, thereby sus- taining business activities in the dam- aged area; Preventing widespread payment and ˲ settlement disorder2 or preventing sys- temic risks, by bounding the inability of financial institutions in a disaster area to execute payment transactions; Reduce managerial risks ˲ 2 for example, by limiting the difficulties for banks to take profit opportunities and lower their customer reputation. Business specificities, rather than regulatory considerations, should be the primary drivers of all processes. Even if European (EU) and US markets differ, BC is closing the gap. Progres- sive EU market consolidation neces- sitates common rules and is forcing
  • 25. major institutions to share common knowledge both on organizational and technological issues. The financial sector sees business continuity not only as a technical or risk management issue, but as a driver towards any discussion on mergers and acquisitions; the ability to manage BC should also be considered a strate- gic weapon to reduce the acquisition timeframe and shorten the data center business continuity and the banking industry 122 c o m m u n i c at i o n s o f t h e a c m | m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 contributed articles differences in preparing and imple- menting strategies that enhance busi- ness process security. Two approaches seem to be prevalent. Firstly, there are those disaster recovery (DR) strate- gies that are internally and hardware- focused9 and secondly, there are those strategies that treat the issues of IT and IS security within a wider internal-ex- ternal, hardware-software framework. The latter deals with IS as an integrat-
  • 26. ing business function rather than as a stand-alone operation. We have labeled this second type of business continuity approach (BCA). As a consequence, we define BCA as a framework of disciplines, processes, and techniques aiming to provide continuous operation for “essential business functions” under all circum- stances. More specifically, business continu- ity planning (BCP) can be defined as “a collection of procedures and informa- tion” that have been “developed, com- piled and maintained” and are “ready to use - in the event of an emergency or disaster.”6 BCP has been addressed by different contributions to the litera- ture. Noteworthy studies include Julia Allen’s contribution on Cert’s Octave methoda1 the activities of the Business Continuity Institute (BCI) in defining certification standards and practice guidelines, the EDS white paper on Business Continuity Management4 and merge, often considered one of the top issues in quick wins and information and communication technology (ICT) budget savings. business continuity concepts The evolution of IT and IS have chal- lenged the traditional ways of conduct-
  • 27. ing business within the finance sector. These changes have largely represented improvements to business processes and efficiency but are not without their flaws, in as much as business disrup- tion can occur due to IT and IS sources. The greater complexity of new IT and IS operating environments requires that organizations continually reassess how best they may keep abreast of changes and exploit those for organizational ad- vantage. In particular, this paper seeks to investigate how companies in the fi- nancial sector understand and manage their business continuity problems. BC has become one of the most im- portant issues in the banking industry. Furthermore, there still appears to be some discrepancy as to the formal defi- nitions of what precisely constitutes a disaster and there are difficulties in as- sessing the size of claims in the crises and disaster areas. One definition of what constitutes a disaster is an incident that leads to the formal invocation of contingency/ continuity plans or any incident which leads to a loss of revenue; in other words it is any accidental, natural or malicious event which threatens or dis- rupts normal operations or services, for as long a time as to significantly cause the failure of the enterprise. It follows then that when referring to the size of
  • 28. claims in the area of organizational cri- ses and disasters, the degree to which a company has been affected by such interruptions is the defining factor. The definition of these concepts is important because 80% of those orga- nizations which face a significant crisis without either a contingency/recovery or a business continuity plan, fail to survive a further year (Business Con- tinuity Institute estimate). Moreover, the BCI believes that only a small num- ber of organizations have disaster and recovery plans and, of those, few have been renewed to reflect the changing nature of the organization. In observing Italian banking indus- try practices, there seems to be major finally, referring to banking, Business Continuity Planning at Financial Insti- tutions by the Bank of Japan.2 This last study illustrates the process and activi- ties for successful business continuity planning in three steps: 1. Formulating a framework for robust project management, where banks should: a. develop basic policy and guidelines for BC planning (basic policy); b. Develop a study firm-wide aspects
  • 29. (firm-wide control section); c. Implement appropriate progress control (project management pro- cedures) 2. Identifying assumptions and condi- tions for business continuity plan- ning, where banks should: a. Recognize and identify the poten- tial threats, analyze the frequency of potential threats and identify the specific scenarios with mate- rial risk (Disaster scenarios); b. Focus on continuing prioritized critical operations (Critical opera- tions); c. Target times for the resumption of operations (Recovery time objec- tives); 3. Introducing action plans, where banks should: a. Study specific measures for busi- ness continuity planning (BC measures); b. acquire and maintain back-up data (Robust back-up data); c. Determine the managerial re- sources and infrastructure avail-
  • 30. ability capacity required (Procure- ment of managerial resources); figure 1. 2004 top business priorities in industrial and financial sectors (source Gartner) a The Operationally Critical Threat, Asset, and Vulnerability Evaluation Method of CERT. CERT is a center of Internet security expertise, located at the Software Engineering Institute, a federally funded research and development center operated by Carnegie Mellon University. contributed articles m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c at i o n s o f t h e a c m 123 d. Determine strong time con- straints, a contact list and a means of communication on emergency decisions (Decision-making pro- cedures and communication ar- rangements); e. Realize practical operational pro- cedures for each department and level (Practical manual) 4. Implement a test/training program on a regular basis (Testing and re- viewing). business continuity aspects The business continuity approach has
  • 31. three fundamental aspects that can be viewed in a systemic way: technology, people and process. Firstly, technology refers to the re- covery of mission-critical data and applications contained in the disas- ter recovery plan (DRP). It establishes technical and organizational measures in order to face events or incidents with potentially huge impact that in a worst case scenario could lead to the unavail- ability of data centers. Its development ought to ensure IT emergency proce- dures intervene and protect the data in question at company facilities. In the past, this was, whenever it even existed, the only part of the BCP. Secondly, people refers to the recov- ery of the employees and physical work- space. In particular, BCP teams should be drawn from a variety of company departments including those from per- sonnel, marketing and internal consul- tants. Also the managers of these teams should possess general skill and they should be partially drawn from busi- ness areas other than IT departments. Nowadays this is perceived as essential to real survival with more emphasis on human assets and value rather than on those hardware and software resources that in most cases are probably protect- ed by backup systems.
  • 32. Finally, the term process here refers to the development of a strategy for the deployment, testing and maintenance of the plan. All BCP should be regularly updated and modified in order to take into consideration the latest kinds of threats, both physical as well as tech- nological. Whereas a simple DR approach aims at salvaging those facilities that are sal- vageable, a BCP approach should have different foci. One of these ought to be treating IT and IS security with a wider internal-external, hardware-software framework where all processes are nei- ther in-house nor subcontracted-out but are a mix of the two so as to be an integrating business function rather than a stand alone operation. From this point of view the BCP constitutes a dual approach where management and technology function together. In addition, the BCP as a global ap- proach must also consider all existing relationships, thus giving value to cli- ents and suppliers considering the to- tal value chain for business and to pro- tect business both in-house and out. The BCP proper incorporates the di- saster recovery (DR) approach but rejects its exclusive focus upon facilities. It de- fines the process as essentially business-
  • 33. wide and one which enables competitive and/or organizational advantages. it focus Versus business focus as a starting Point The starting point for planning pro- cesses that an organization will use as its BCP must include an assessment of the likely impact different types of ‘in- cidents’ will/would make on the busi- ness. As far as financial companies are concerned, IT focus is critical since, as mentioned, new technologies continue to become more and more integral to on going financial activities. In addition to assessing the likely impact upon the entire organization, banks must con- sider the likely effects upon their differ- ent business areas. The “vulnerability & business impact matrix” (Figure 2) is a tool that can be used to summarize the inter-linkages between the various information system services, their vul- nerability and the impact on business activities. It is useful in different ways. To start, the BC approach doesn’t fo- cus solely upon IT problems but rather uses a business-wide approach. Given the strategic focus of BCP, an under- standing of the relationships between value-creating activities is a key deter- minant of the effectiveness of any such process. In this way we can define cor- rect BC perimeter (Figure 2) by trying to extract the maximum value from BCP
  • 34. within a context of bounded rationality and limited resources. What the BCP teams in these organizations have done is focus upon how resources were uti- lized and how they were added to value- creation rather than merely being “sup- port activity” which consumes financial resources unproductively. In addition, the convergence of customer with client technologies also demands that those managing the BCP process are aware of the need to “... expand the contingency role to not merely looking inward but actually looking out.” Such a dual focus uncovers the linkages between customer and client which create competitive ad- vantage. Indeed, in cases where clients’ business fundamentally depends upon information exchange, for instance many banks today provide online equity brokerage services, it might be argued that there is a ‘virtual value chain’ which the BCP team protects thereby provid- ing the ‘market-space’ for value creation to take place. Finally, another benefit is that vulnerability and business impact can aid the prioritization of particular key areas. figure 2. Vulnerability & business impact matrix 124 c o m m u n i c at i o n s o f t h e a c m | m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3
  • 35. contributed articles player, yet their functions are just as vital to achieving the overall objectives of the football team. The value chain provides an opportunity to examine the connection between the exciting and the hum drum links that deliver customer value. The evolution of crisis preparations from the IT focused di- saster recovery (DR) solutions towards the BC approach reflects a growing un- derstanding that business continuity depends upon the maintenance of all elements which provide organizational efficiency-effectiveness and customer value, whether directly or indirectly. Prevention focus of business continuity A final key characteristic of the BC ap- proach concerns its primary role in prevention. A number of authors have identified that the potential for crises is normal for organizations.7,11 Crisis avoidance requires a strategic approach and requires a good understanding of both the organization’s operating pro- cesses, systems and the environment in which it operates. In the BC approach, a practice orga- nization should develop a BCP culture to eliminate the barriers to the develop- ment of crisis prevention strategies. In particular, these organizations should
  • 36. recognize that incidents, such as the New York terrorist attach or the City of London bombings are merely triggered by external technical causes and that their effects are largely determined by internal factors that were within the control of their organizations. In these cases a cluster of crises should be iden- new and obsolete technologies Today’s approach to BCP is focused on well-structured process management and business-driven paradigms. Even if some technology systems seem to be “business as usual,” some considerations must be made to avoid any misleading conjecture from an analytical side. When considering large institutions with systemic impact- not only on their own but on clients businesses as well- two key objectives need to be consid- ered when facing an event. These have been named RPO (Recovery Point Ob- jective) and RTO (Recovery Time Ob- jective) as shown in Figure 3. RPO deals with how far in the past you have to go to resume a consistent situation; RTO considers how long it takes to resume a standard or regular situation. The defi- nitions of RPO and RTO can change ac- cording to data center organization and how high a level a company wants to its own security and continuity to be. For instance a dual site recovery sys-
  • 37. tem organization must consider and evaluate three points of view (Figure 3). These are: application’s availability, BC process and data perspective. Data are first impacted (RTO) before the crisis event (CE) due to the closest “consistent point” from which to re- start. The crisis opening (CO) or decla- ration occurs after the crisis event (CE). “RTO_s,” or computing environ- ment restored point, considers the length of time the computing environ- ment needs in order to be restored (for example, when servers, network etc. are once again available); “RTO_rc,” or mission critical application restarted point, indicates the “critical or vital ap- plications” (in rank order) are working once again; “RTO_r,” or applications and data restored point, is the point from which all applications and data are restored, but (and it is a big but) “RTO_end,” or previous environment restored point, is the true end point when the previous environment is fully restored (all BC solutions are properly working). Of the utmost importance is that during the period between “RTO_r” and “RTO_end” a second di- saster event could be fatal! Natural risks are also increasing in scope and frequency, both in terms of floods (central Europe 2002) and hurri-
  • 38. canes (U.S. 2005), thus the coining of an actual geographical recovery distance, today considered more than 500 miles. Such distance is forcing businesses and institutions alike to consider a new tech- nological approach and to undertake critical discussion on synchronous-asyn- chronous data replication: their intervals and quality. Therefore, more complex analysis about RPO and RTO is required. However the most important issue, from a business point of view when faced with an imminent and unfore- seen disaster, is how to reduce restore or restart time, trying to shrink this win- dow to mere seconds or less. New push- ing technologies (SATA – Serial ATA and MAID – Massive Arrays Inexpen- sive Disk) are beginning to make some progress in reducing the time problem. business focus Versus Value chain focus The business area selected by the “vul- nerability and business impact analy- sis matrix” should be treated in accor- dance with the value chain and value system. In addition to assessing the likely disaster impact upon IT depart- ments, organizations should consider disaster impacts over all company de- partments and their likely effects upon customers. Organizations should avoid the so-called Soccer Star Syndrome.6
  • 39. In drawing an analogy with the football industry, one recognizes that greater management attention is often focused on the playing field rather than the un- glamorous, but very necessary, locker room and stadium management sup- port activities. Defenders and goalkeep- ers, let alone the stadium manager, do not get paid at the same level as the star figure 3. rPo & rto contributed articles m a r c h 2 0 1 0 | v o l . 5 3 | n o . 3 | c o m m u n i c at i o n s o f t h e a c m 125 tified. Such clusters should be catego- rized along the axis of internal-external and human/social-technical/economic causes and effects. By adopting a strate- gic approach, decisions could be made about the extent of exposure in particu- lar product markets or geographical sites. An ongoing change management program could contribute to real com- mitment from middle managers who, from our first investigation, emerged as key determinants of the success of the BC approach. management support and sponsorship BCP success requires the commitment
  • 40. of middle managers. Hence manag- ers need to avoid considering BCP as a costly, administrative inconvenience that diverts time away from money- making activities. All organizational levels should be aware of the fact that BCP was developed in partnership be- tween the BCP team and front line op- eratives. As a result, strategic business units should own BCP plans. In addi- tion, CEO involvement is key in rallying support for the BCP process. Two other key elements support the BC approach. Firstly, there is the recognition that responsibility for the process rests with business managers and this is reinforced through a formal appraisal and other reward systems. Secondly, peer pressure is deemed im- portant in getting laggards to assume responsibility and so affect a more re- ceptive culture. Finally, BCP teams need to regard BCP as a process rather than as a spe- cific end-point. conclusion Although the risk of terrorism and regulations are identified as two key factors for developing a business con- tinuity perspective, we see that orga- nizations need to adopt the BC ap- proach for strategic reasons. The trend to adopt a BC approach is also a proxy
  • 41. for organizational change in terms of culture, structure and communica- tions. The BC approach is increasingly viewed as a driver to generate competi- tive advantage in the form of resilient information systems and as an impor- tant marketing characteristic to attract and maintain customers. Referring to organizational change and culture, the BC approach should be a business-wide approach and not an IT-focused one. It needs supportive measures to be introduced to encour- age managers to adhere to the BC idea. Management as a whole should also be confident that the BC approach is an ongoing process and not only an end point that remains static upon comple- tion. It requires changes of key assump- tions and values within the organiza- tional structure and culture that lead to a real cultural and organizational shift. This has implications for the role that the BC approach has to play within the strategic management processes of the organization as well as within the levels of strategic risk that an organization may wish to undertake in its efforts to secure a sustainable competitive or so called first mover advantage. References 1. Allen J.H. CERT® Guide to System and Network
  • 42. Security Practices. Addison Wesley Professional, 2001. 2. Bank of Japan, Business Continuity Planning at Financial Institutions, July 2003. http://www.boj.or.jp/ en/type/release/zuiji/kako03/fsk0307a.htm 3. Cerullo V. and Cerullo, J. Business continuity planning: A comprehensive approach. Informtion System Management Journal, Summer 2004. 4. Decker A. Business continuity management: A model for survival. EDS White Paper, 2004. 5. Dhillon, G. The challenge of managing information security. In International Journal of Information Management 1, 1(2004), 243–244. 6. Elliott D. and Swartz E. Just waiting for the next big bang: Business continuity planning in the uk finance sector. Journal of Applied Management Studies 8, 1 (1999), 45-60. 7. Greiner, L. Evolution and revolution as organisations grow. In Harvard Business Review (July/August) reprinted in Asch, D. & Bowman, C. (Eds) (1989) Readings in Strategic Management (London, Macmillan), 373-387. 8. Lam, W. Ensuring business continuity. IT Professional 4, 3 (2002), 19 - 25 9. Lewis, W. and Watson, R.T. Pickren A. An empirical assessment of IT disaster risk. Comm. ACM 46, 9 (2003), 201-206. 10. McAdams, A.C. Security and risk management:
  • 43. A fundamental business issue. Information Management Journal 38, 4 (2004), 36–44. 11. Pauchant, T.C. and Mitroff, I. Crisis prone versus crisis avoiding organisations: is your company’s culture its own worst enemy in creating crises?. Industrial Crisis Quarterly 2, 4 (1998), 53-63. 12. Quirchmayr, G. Survivability and business continuity management. In Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation. ACSW Frontiers (2004). Vincenzo Morabito ([email protected]) is assistant professor of Organization and Information System at the Bocconi University in Milan where he teaches management information system, information management and organization. He is also Director of the Master of Management Information System System at the Bocconi University. Fabio Arduini ([email protected]) is responsible for IT architecture and Business Continuity for defining the technological and business continuity statements for the Group according to the ICT department. © 2010 ACM 0001-0782/10/0300 $10.00 The Anti-Forensics Challenge
  • 44. Kamal Dahbur [email protected] Bassil Mohammad [email protected] School of Engineering and Computing Sciences New York Institute of Technology Amman, Jordan ABSTRACT Computer and Network Forensics has emerged as a new field in IT that is aimed at acquiring and analyzing digital evidence for the purpose of solving cases that involve the use, or more accurately misuse, of computer systems. Many scientific techniques, procedures, and technological tools have been evolved and effectively applied in this field. On the opposite side, Anti-Forensics has recently surfaced as a field that aims at circumventing the efforts and objectives of the field of computer and network forensics. The purpose of this paper is to highlight the challenges introduced by Anti-Forensics, explore the various Anti-Forensics mechanisms, tools and techniques, provide a coherent classification for them, and discuss thoroughly their effectiveness. Moreover, this paper will highlight the challenges seen in implementing effective countermeasures against these techniques. Finally, a set of recommendations are presented with further seen research opportunities. Categories and Subject Descriptors K.6.1 [Management of Computing and Information Systems]: Projects and People Management – System Analysis and Design, System Development.
  • 45. General Terms Management, Security, Standardization. Keywords Computer Forensics (CF), Computer Anti-Forensics (CAF), Digital Evidence, Data Hiding. 1. INTRODUCTION The use of technology is increasingly spreading covering various aspects of our daily lives. An equal increase, if not even more, is realized in the methods and techniques created with the intention to misuse the technologies serving varying objectives being political, personal or anything else. This has clearly been reflected in our terminology as well, where new terms like cyber warfare, cyber security, and cyber crime, amongst others, were introduced. It is also noticeable that such attacks are getting increasingly more sophisticated, and are utilizing novel methodologies and techniques. Fortunately, these attacks leave traces on the victim systems that, if successfully recovered and analyzed, might help identify the offenders and consequently resolve the case(s) justly and in accordance with applicable laws. For this purpose, new areas of research emerged addressing Network Forensics and Computer Forensics in order to define the foundation, practices and acceptable frameworks for scientifically acquiring and analyzing digital evidence in to be presented in support of filed cases. In response to Forensics efforts, Anti-Forensics tools and techniques were created with the main objective of frustrating forensics efforts, and taunting its credibility and reliability. This paper attempts to provide a clear definition for Computer Anti-Forensics and consolidates various aspects of the topic. It
  • 46. also presents a clear listing of seen challenges and possible countermeasures that can be used. The lack of clear and comprehensive classification for existing techniques and technologies is highlighted and a consolidation of all current classifications is presented. Please note that the scope of this paper is limited to Computer- Forensics. Even though it is a related field, Network-Forensics is not discussed in this paper and can be tackled in future work. Also, this paper is not intended to cover specific Anti-Forensics tools; however, several tools were mentioned to clarify the concepts. After this brief introduction, the remainder of this paper is organized as follows: section 2 provides a description of the problem space, introduces computer forensics and computer anti-forensics, and provides an overview of the current issues concerning this field; section 3 provides an overview of related work with emphasis on Anti-Forensics goals and classifications; section 4 provides detailed discussion of Anti-Forensics challenges and recommendations; section 5 provides our conclusion, and suggested future work. 2. THE PROBLEM SPACE Rapid changes and advances in technology are impacting every aspect of our lives because of our increased dependence on such systems to perform many of our daily tasks. The achievements in the area of computers technology in terms of increased capabilities of machines, high speeds communication channels, and reduced costs resulted in making it attainable by the public. The popularity of the Internet, and consequently the technology associated with it, has skyrocketed in the last decade (see Table 1 and Figure 1). Internet usage statistics for 2010 clearly show the huge increase in Internet users who may not necessary be computer experts or even technology savvy [1].
  • 47. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISWSA’11, April 18–20, 2011, Amman, Jordan. Copyright 2011 ACM 978-1-4503-0474-0/04/2011…$10.00. WORLD INTERNET USAGE AND POPULATION STATISTICS World Regions Population (2010 Est.) Internet Users Dec. 31, 2000 Internet Users Latest Data
  • 48. Growth 2000-2010 Africa 1,013,779,050 4,514,400 110,931,700 2357% Asia 3,834,792,852 114,304,000 825,094,396 622% Europe 813,319,511 105,096,093 475,069,448 352% Middle East 212,336,924 3,284,800 63,240,946 1825% North America 344,124,450 108,096,800 266,224,500 146% Latin America/ Caribbean 592,556,972 18,068,919 204,689,836 1033% Oceania/Australia 34,700,201 7,620,480 21,263,990 179% WORLD TOTAL 6,845,609,960 360,985,492 1,966,514,816 445% Table 1. World Internet Usage – 2010 (Reproduced from [1]). Figure 1. World Internet Usage–2010 (Based on Data from [1]) Unfortunately, some of the technology users will not use it in a legitimate manner; instead, some users may deliberately misuse it. Such misuse can result in many harmful consequences including, but not limited to, major damage to others systems or prevention of service for legitimate users. Regardless of the objectives that such “bad guys” might be aiming for from such
  • 49. misuse (e.g. personal, financial, political or religious purposes), one common goal for such users is the need to avoid detection (i.e. source determination). Therefore, these offenders will exert thought and effort to cover their tracks to avoid any liabilities or accountability for their damaging actions. Illegal actions (or crimes) that involve a computing system, either as a mean to carry out the attack or as a target, are referred to as Cybercrimes [2]. Computer crime or Cybercrime are two terms that are being used interchangeably to refer to the same thing. A Distributed Denial of Service attack (DDoS) is a good example for a computer crime where the computing system is used as a mean as well as a target. Fortunately, cybercrimes leave fingerprints that investigators can collect, correlate and analyze to understand what, why, when and how a crime was committed; and consequently, and most importantly, build a good case that can bring the criminals to justice. In this sense, computers can be seen as great source of evidence. For this purpose Computer Forensics (CF) emerged as a major area of interest, research and development driven by the legislative needs of having scientific reliable framework, practices, guidelines, and techniques for forensics activities starting from evidence acquisition, preservation, analysis, and finally presentation. Computer Forensics can be defined as the process of scientifically obtaining, examining and analyzing digital information so that it can be used as evidence in civil, criminal or administrative cases [2]. A more formal definition of Computer Forensics is “the discipline that combines elements of law and computer science to collect and analyse data from computer systems, networks, wireless communications, and storage devices in a way that is admissible as evidence in a court of law” [3].
  • 50. To hinder the efforts of Computer Forensics, criminals work doggedly to instigate, develop and promote counter techniques and methodologies, or what is commonly referred to as Anti- Forensics. If we adopt the definition of Computer Forensics (CF) as scientifically obtaining, examining, and analysing digital information to be used as evidence in a court of law, then Anti- Forensics can be defined similarly but in the opposite direction. In Computer Anti-Forensics (CAF) scientific methods are used to simply frustrate Forensics efforts at all forensics stages. This includes preventing, impeding, and/or corrupting the acquiring of the needed evidence, its examination, its analysis, or its credibility. In other words, whatever necessary to ensure that computer evidence cannot get to, or will not be admissible in, a court of law. The use of Computer Anti-Forensics tools and techniques is evident and far away from being an illusion. So, criminals’ reliance on technology to cover their tracks is not a claim, as clearly reflected in recent researches conducted on reported and investigated incidents. Based on 2009-2010 Data Breach Investigations Reports [4][5], investigators found signs of anti- forensics usage in over one third of cases in 2009 and 2010 with the most common forms being the same for both years. The results show that the overall use of anti-forensics remained relatively flat with slight movement among the techniques themselves. Figure [2] below shows the types of anti-Forensic techniques used (data wiping, data hiding and data corruption) by percentage of breaches. As shown in Figure [2] below, data wiping is still the most common, because it is supported by many commercial off-the-shelf products that are available even as freeware that are easy to install, learn and use; while data hiding and data corruption remain a distant behind.
  • 51. Figure 2 Types of Anti-Forensics – 2010 (Reproduced from [5]) It is important to note that the lack of understanding on what CAF is and what it is capable of may lead to underestimating or probably overlooking CAF impact on the legitimate efforts of CF. Therefore, when dealing with computer forensics, it is important that we address the following questions, among others, that are related to CAF: Do we really have everything? Are the collected evidences really what were left behind or they are only just those intentionally left for us to find? How to know if the CF tool used was not misleading us due to certain weaknesses in the tool itself? Are these CF tools developed according to proper secure software engineering methodologies? Are these CF tools immune against attacks? What are the recent CAF methods and techniques? This paper attempts to provide some answers to such questions that can assist in developing the proper understanding for the issue. 3. RELATED WORK, CAF GOALS AND CLASSIFICATIONS Even though computer forensics and computer ant-forensics are tightly related, as if they are two faces of the same coin, the amount of research they received was not the same. CF received more focus over the past ten years or so because of its relation with other areas like data recovery, incident management and information systems risk assessment. CF is a little bit older, and therefore more mature than CAF. It has consistent definition, well defined systematic approach and complete set of leading best practices and technology. CAF on the other side, is still a new field, and is expected to get mature overtime and become closer to CF. In this effort, recent
  • 52. research papers attempted to introduce several definitions, various classifications and suggest some solutions and countermeasures. Some researchers have concentrated more on the technical aspects of CF and CAF software in terms of vulnerabilities and coding techniques, while others have focused primarily on understanding file systems, hardware capabilities, and operating systems. A few other researchers chose to address the issue from an ethical or social angle, such as privacy concerns. Despite the criticality of CAF, it is hard to find a comprehensive research that addresses the subject in a holistic manner by providing a consistent definition, structured taxonomies, and an inclusive view of CAF. 3.1. CAF Goals As stated in the previous section, CAF is a collection of tools and techniques that are intended to frustrate CF tools and CF’s investigators efforts. This field is growingly receiving more interest and attention as it continues to expose the limitations of currently available computer forensics techniques as well as challenge the presumed reliability of common CF tools. We believe, along with other researchers, that the advancements in the CAF field will eventually put the necessary pressure on CF developers and vendors to be more proactive in identifying possible vulnerabilities or weaknesses in their products, which consequently should lead to enhanced and more reliable tools. CAF can have a broad range of goals including: avoiding detection of event(s), disrupting the collection of information, increasing the time an examiner needs to spend on a case, casting doubt on a forensic report or testimony. In addition, these goals may also include: forcing the forensic tool to reveal its presence, using the forensic tool to attack the organization in which it is running, and leaving no evidence that an anti- forensic
  • 53. tool has been run [6]. 3.2. CAF Classifications Several classifications for CAF have been introduced in the literature. These various taxonomies differ in the criteria used to do the classification. The following are the most common approaches used: 1. Categories Based on the Attacked Target • Attacking Data: The acquisition of evidentiary data in the forensics process is a primary goal. In this category CAFs seek to complicate this step by wiping, hiding or corrupting evidentiary data. • Attacking CF Tools: The major focus of this category is the examination step of the forensics process. The objective of this category is to make the examination results questionable, not trustworthy, and/or misleading by manipulating essential information like hashes and timestamps. • Attacking the Investigator: This category is aimed at exhausting the investigator’s time and resources, leading eventually to the termination of the investigation. 2. CAF Techniques vs. Tactics This categorization makes a clear distinction between the terms anti-forensics and counter-forensics [7], even though the two terms have been used interchangeably by many others as the emphasis is usually on technology rather than on tactics.
  • 54. • Counter-Forensics: This category includes all techniques that target the forensics tools directly to cause them to crash, erase collected evidence, and/or break completely (thus disallowing the investigator from using it). Compression bombs are good example on this category. • Anti-Forensics: This category includes all technology related techniques including encryption, steganography, and alternate data streams (ADMs). 3. Traditional vs. Non-Traditional • Traditional Techniques: This category includes techniques involving overwriting data, Cryptography, Steganography, and other data hiding approaches beside generic data hiding techniques. • Non-Traditional Techniques: As opposed to traditional techniques, these techniques are more creative and impose more risk as they are harder to detect. These include: o Memory injections, where all malicious activities are done on the volatile memory area. o Anonymous storage, utilizes available web- based storage to hide data to avoid being found on local machines. o Exploitation of CF software bugs, including Denial of Service (DoS) and Crashers, amongst others.
  • 55. 4. Categories Based on Functionality This categorization includes data hiding, data wiping and obfuscation. Attacks against CF processes and tools is considered a separate category based on this scheme 4. CAF CHALLENGES Because Computer Anti-Forensics (CAF) is a relatively new discipline, the field faces many challenges that need considered and addressed. In this section, we have attempted to identify the most pressing challenges surrounding this area, highlight the research needed to address such challenges, and attempt to provide perceptive answers to some the concerns. 4.1. Ambiguity Aside from having no industry-accepted definition for CAF, studies in this area view anti-forensics differently; this leads to not having a clear set of standards or frameworks for this critical area. Consequently, misunderstanding may be an unavoidable end result that could lead to improperly addressing the associated concerns. The current classification schemes, stated above, which mostly reflect the author’s viewpoint and probably background, confirm as well as contribute to the ambiguity in this field. A classification can only be beneficial if it must has clear criteria that can assist not only in categorizing the current known techniques and methodologies but will also enable proper understanding and categorization of new ones. The attempt to distinguish between the two terms, anti-forensics and counter- forensics based on technology and tactics is a good initiative but yet requires more elaboration to avoid any unnecessary
  • 56. confusions. To address the definition issue, we suggest to adopt a definition for CAF that is built from our clear understanding of CF. The classification issue can be addressed by narrowing the gaps amongst the different viewpoints in the current classifications and excluding the odd ones. 4.2. Investigation Constraints A CF investigation has three main constraints/challenges, namely: time, cost and resources. Every CF investigation case should be approached as separate project that requires proper planning, scoping, budgeting and resources. If these elements are not properly accounted for, the investigation will eventually fail, with most efforts up to the point of failure being wasted. In this regard, CAF techniques and methodologies attempt to attack the time, cost and resources constraints of an investigation project. An investigator may not able to afford the additional costs or allocate the additional necessary resources. Most importantly, the time factor might play a critical role in the investigation as evidentiary data might lose value with time, and/or allow the suspect(s) the opportunity to cover their tracks or escape. Most, if not all, CAF techniques and methodologies (including data wiping, data hiding, and data corruption) attempt to exploit this weakness. Therefore, it proper project management is imperative before and during every CF investigation. 4.3. Integration of Anti-Forensics into Other Attacks Recent researches show an increased adoption of CAF techniques into other typical attacks. The primary purposes of
  • 57. integrating CAF into other attacks are undetectability and deletion of evidence. Two major areas for this threatening integration are Malware and Botnets [8][9]. Malwares and Botnets when armed with these techniques will make the investigative efforts labour and time intensive which can lead to overlooking critical evidence, if not abandoning the entire investigation. 4.4. Breaking the Forensics Software CF tools are, of course, created by humans, just like other software systems. Rushing to release their products to the market before their competition, companies tend to, unintentionally, introduce vulnerabilities into their products. In such cases, software development best practices, which are intended to ensure the quality of the product, might be overlooked leading to the end product being exposed to many known vulnerabilities, such as buffer overflow and code injection. Because CF software is ultimately used to present evidence in courts, the existence of such weaknesses is not tolerable. Hence, all CF software, before being used, must be subjected to thorough security testing that focuses on robustness against data hiding and accurate reproduction of evidence. The Common Vulnerabilities and Exposures (CVE) database is a great source for getting updates on vulnerabilities in existing products [10]. Some studies have reported several weaknesses that may result in crashes during runtime leaving no chance for interpreting the evidence [11]. Regardless of the fact that some of these weaknesses are still being disputed [12], it is important to be aware that these CF tools are not immune to vulnerabilities, and that CAF tools would most likely take advantage of such weaknesses. A good example of a common technique that can cause a CF to fail or crash is the “Compression Bomb”; where files are compressed hundreds of times such that when a CF tool tries to decompress, it will use up so many resources causing the computer or the tool to hang
  • 58. or crash. 4.5. Privacy Concerns Increasingly, users are becoming more aware of the fact that just deleting a file does not make it really disappear from the computer and that it can be retrieved by several means. This awareness is driving the market for software solutions that provide safe and secure means for files deletion. Such tools are marketed as “privacy protection” software and claim to have the ability to completely remove all traces of information concerning user’s activity on a system, websites, images and downloaded files. Some of these tools do not only provide protection through secure deletion; but also offer encryption and compression. Moreover, these tools are easy use, and some can even be downloaded for free. WinZip is a popular tool that offers encryption, password protection, and compression. Such tools will most definitely complicate the search for and acquiring of evidence in any CF investigation because they make the whole process more time and resources consuming. Privacy issues in relation to CF have been the subject of detailed research in an attempt to define appropriate policies and procedures that would maintain users’ privacy when excessive data is acquired for forensics purposes [13]. 4.6. Nature of Digital Evidence CF investigations rely on two main assumptions to be
  • 59. successful: (1) the data can be acquired and used as evidence, and (2) the results of the CF tools are authentic, reliable, and believable. The first assumption highlights the importance of digital evidence as the basis for any CF investigation; while the second assumption highlights the critical role of the trustworthiness of the CF tools in order for the results to stand solid in courts. Digital evidence is more challenging than physical evidence because of its more susceptible to being altered, hidden, removed, or simply made unreadable. Several techniques can be utilized to achieve such undesirable objectives that can complicate the acquisition process of evidentiary digital data, and thus compromise the first assumption. CF tools rely on many techniques that can attest to their trustworthiness, including but limited to: hashing; timestamps; and signatures during examination, analyses and inspection of source files. CAF tools can in turn utilize new advances in technology to break such authentication measures, and thus comprise the second assumption.. The following is a brief explanation of some of the techniques that are used to compromise these two assumptions: • Encryption is used to make the data unreadable. This is one of the most challenging techniques, as advances in encryption algorithms and tools empowered it to be applied on entire hard drive, selected partitions, or specific directories and files. In all cases, an encryption key is usually needed to reverse the process and decrypt the desired data, which is usually unknown to an investigator, in most cases. To complicate matters, decryption using brute-force techniques becomes infeasible when long keys are used. More success in this regard might be achieved with keyloggers or volatile memory content acquisition.
  • 60. • Steganography aims at hiding the data, by embedding it into another digital form, such as images or videos. Commercial Steganalysis tools, that can detect hidden data, exist and can be utilized to counter Steganography. Encryption and Steganography can be combined to obscure data and make it also unreadable, which can extremely complicate a CF investigation. • Secure-Deletion removes the target data completely from the source system, by overwriting it with random data, and thus rendering the target data unrecoverable. Fortunately, most of the available commercial secure-deletion tools tend to underperform and thus miss some data [14]. More research is needed in this area to understand the weaknesses and identify the signatures of such tools. Such information is needed to detect the operations and minimize the impact of these tools. • Hashing is used by CF tools to validate the integrity of data. A hashing algorithm accepts a variable-size input, such as a file, and generates a unique fixed-size value that corresponds to the given input. The generated output is unique and can be used as a fingerprint for the input file. Any change in the original file, no matter how minor, will result in considerable change in the hash value produced by the hashing algorithm. A key feature in hashing algorithms is “Irreversibility” where having the hash value in hand will not allow the recovery of the original input. Another key feature is “Uniqueness” which basically means that the hash values of two files will be equal if and only if the files are absolutely identical. Many hashing algorithms have developed, and some have been already infiltrated or cracked. Other algorithms like MD5, MD6, Secure Hashing Algorithms (SHA), SHA-1, SHA-2, amongst others, are harder to break. However, all are vulnerable to being
  • 61. infiltrated as technology and research advance [15]. Research is also necessary in the other direction to enhance the capabilities of CF tools in this regard and maintain their credibility. • Timestamps are associated with files and are critical for the task of establishing the chain of events during a CF investigation. The time line for the events is contingent on the accuracy of timestamps. CAF tools have provided the capability to modify timestamps of files or logs, which can mislead an investigation and consequently coerce the conclusion. Many tools currently exist on the market, some are even freely available, that make it easy to manipulate the timestamps, such as Timestamp Modifier and SKTimeStamp [16]. • File Signatures, also known as Magic Numbers, are constant known values that exist at the beginning of each file to identify the file type (e.g. image file, word document, etc.). Hexadecimal editors, such as WinHex, can be used to view and inspect these values. Forensics investigators rely on these values to search for evidence of certain type. When a file extension is changed, the actual type file is not changed, and thus the file signature remains unchanged. ACF tools intentionally change the file signatures in their attempt to mislead the investigations as some evidence files are overlooked or dismissed. Complete listing of file signatures or magic numbers can be found on the web in [17]. • CF Detection is simply the capability of ACF tools to detect the presences of CF software and their activities or functionalities. Self-Monitoring, Analysis and Reporting Technology (SMART) built into most hard drives reports the total number of power cycles (Power_Cycle_Count), the total time that a hard drive has been in use
  • 62. (Power_On_Hours or Power_On_Minutes), a log of high temperatures that the drive has reached, and other manufacturer-determined attributes. These counters can be reliably read by user programs and cannot be reset. Although the SMART specification implements a DISABLE command (SMART 96), experimentation indicates that the few drives that actually implement the DISABLE command continue to keep track of the time-in- use and power cycle count and make this information available after the next power cycle. CAF tools can read SMART counters to detect attempts at forensic analysis and alter their behavior accordingly. For example, a dramatic increase in Power_On_Minutes might indicate that the computer’s hard drive has been imaged [18]. • Business Needs: Cloud Computing (CC) is a business model typically suited for small and medium enterprises (SME) that do not have enough resources to invest in building their own IT infrastructure. Hence, they tend to outsource this to third parties who will in turn lease their infrastructure and probably applications as services. This new model introduces more challenges to CF investigations due to mainly the fact that the data is on the cloud (i.e. hosted somewhere in the Internet space), being transferred across countries with different regulations, and most importantly might reside on a machine that hosts other data instances of other enterprises. In some instances, the data for the same enterprise might even be stored across multiple data centres [19][20]. These issues complicate the CF’s primary functions (i.e. data acquisition, examination, and analyses) needed to build a good case extremely hard. 4.7 Recommendations
  • 63. Based on our findings, we see room for improvement in the field of ACF that can address some of the issues surrounding this field. We believe that such recommendations, when adopted and/or implemented properly, can add value and consolidate the efforts for advancing this field. Below is a list and brief explanation of the recommendations: a) Spend More Efforts to Understand ACF More efforts should be spent in order to reach an agreed upon comprehensive definition for ACF that would assist in getting better understanding of the concepts in the field. These efforts should also extend to develop acceptable best practices, procedures and processes that constitute the proper framework, or standard, that professionals can use and build onto. ACF classifications also need to be integrated, clarified, and formulated on well-defined criteria. Such fundamental foundational efforts would eventually assist researchers and experts in addressing the issues and mitigating the associated risks. Awareness of AFC techniques and their capabilities will prevent, or at least reduce, their success and consequently their impact on CF investigations. Knowledge in this area should encompass both techniques and tactics. Continued education and research are necessary to stay atop of latest developments in the field, and be ready with appropriate countermeasures when and as necessary. b) Define Laws that Prohibit Unjustified Use of ACF Existence of strict and clear laws that detail the obligations and consequences of violations can play a key deterrent role for the use of these tools in a destructive manner. When someone knows in advance that having certain ACF tools on one’s machine might be questioned and possibly pose some liabilities, one would probably have second
  • 64. thoughts about installing such tools. Commercial non-specialized ACF tools, which are more commonly used, always leave easily detectable fingerprints and signatures. They sometimes also fail to fulfil their developers’ promises of deleting all traces of data. This can later be used as evidence against a suspected criminal and can lead to an indictment. The proven unjustified use of ACF tools can be used as supporting incriminatory evidence in courts in some countries [21]. To address the privacy concerns, such as users needs to protect personal data like family pictures or videos, an approved list of authorized software can be compiled with known fingerprints, signatures and special recovery keys. Such information, especially recovery keys, would then be safe-guarded in possession of the proper authorities. It would strictly be used to reverse the process of AFC tools, through the appropriate judicial processes. c) Utilize Weaknesses of ACF Software In some cases, digital evidence can still be recovered if a data wiping tool is poorly used or is functioning improperly. Hence, each AFC software must be carefully examined and continuously analyzed in order to fully understand its exact behaviour and determine its weaknesses and vulnerabilities [14][22]. This can help to develop the appropriate course of actions given the different possible scenarios and circumstances. This could prove to be valuable in saving time and resources during an investigation. d) Harden CF Software CAF and CF thrive on the weaknesses of each other. To ensure justice CF must always strive to be more advanced
  • 65. than its counterpart. This can be achieved by conducting security and penetration tests to verify the software is immune to external attacks. Also, it is imperative not to submit to market pressure and demand for tools by rapidly releasing products without proper validation. The best practices of software development must not be overlooked at any rate. When vulnerabilities are identified, proper fixes and patches must be tested, verified and deployed promptly in order to avoid zero-day attacks. 5. CONCLUSION AND FUTURE WORK 5.1. Conclusion Computer Anti-Forensics (CAF) is an important developing area of technology. Because CAF success means that digital evidence will not be admissible in courts, Computer Forensics (CF) must evaluate its techniques and tactics very carefully. Also, CF efforts must be integrated and expedited to narrow the current exiting gap with CAF. It is important to agree on an acceptable definition and classification for CAF which will assist in implementing proper countermeasures. Current definitions and classifications all seem to concentrate on specific aspects of CAF without truly providing the needed holistic view. It is very important to realize that CAF is not only about tools that are used to delete, corrupt, or hide evidence. CAF is a blend of techniques and tactics that utilize technological advancements in areas like encryption and data overwriting amongst other techniques to obstruct investigators’ efforts. Many challenges exist and need to be carefully analyzed and
  • 66. addressed. In this paper we attempted to identify some of these challenges and suggested some recommendations that might, if applied properly, mitigate the risks. 5.2. Future Work This paper provides solid foundation for future work that can further elaborate on the various highlighted areas. It suggests a definition for CAF that is closely aligned with CF and presents several classifications that we deem acceptable. It also discusses several challenges that can be further addressed in future research. CAF technologies, techniques, and tactics need to receive more attention in research, especially in the areas that present debates on hashes, timestamps, and file signatures. Research opportunities in Computer Forensics, Network Forensics, and Anti-Forensics can use the work presented in this paper as a base. Privacy concerns and other issues related to the forensics field introduce a raw domain that requires serious consideration and analysis. Cloud computing, virtualization, and related laws and regulations concerns are topics that can be considered in future research. 6. REFERENCES [ 1 ] Corey Thuen, University of Idaho: “Understanding Counter-Forensics to Ensure a Successful Investigation”. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.138.2196 [ 2 ] Internet Usage Statistics, “The Internet Big Picture, World Internet Users and Population Stats”. DOI= http://www.internetworldstats.com/stats.htm
  • 67. [ 3 ] Bill Nelson, Amelia Phillips, and Steuart, “Guide to Computer Forensics and Investigations”, pp 2-3, 4 th Edition. [ 4 ] US-Computer Emergency Readiness Team, CERT, a government organization, “Computer Forensics”, 2008. [ 5 ] Verizon Business, “2009 Data Breach Investigations Report”. A study conducted by the Verizon RISK Team in cooperation with the United States Secret Service. DOI=http://www.verizonbusiness.com/about/news/podca sts/1008a1a3-111=129947-- Verizon+Business+2009+Data+Breach+Investigations+ Report.xml [ 6 ] Verizon Business, “2010 Data Breach Investigations Report”. A study conducted by the Verizon RISK Team in cooperation with the United States Secret Service. DOI=http://www.verizonbusiness.com/resources/reports/ rp_2010-data-breach- report_en_xg.pdf?&src=/worldwide/resources/index.xml &id= [ 7 ] Simson Garfinkel, “Anti-Forensics: Techniques, Detection and Countermeasures”, 2 nd International Conference in i-Warefare and Security, pp 77, 2007 [ 8 ] W.Matthew Hartley, “Current and Future Threats to
  • 68. Digital Forensics”, ISSA Journal, August 2007 [ 9 ] Murray Brand, (2007), “Forensics Analysis Avoidance Techniques of Malware”, Edith Cowan University, Australia. [ 10 ] “Security 101: Botnets”. DOI= http://www.secureworks.com/research/newsletter/2008/0 5/ [ 11 ] Common Vulnerabilities and Exposures (CVE) database, http://cve.mitre.org/ [ 12 ] Tim Newsham, Chris Palmer, Alex Stamos, “Breaking Forensics Software: Weaknesses in Critical Evidence Collection”, iSEC Partners http://www.isecpartners.com, 2007 [ 13 ] Guidance Software: Computer Forensics Solution s and Digital Investigations (http://www.guidancesoftware.com/) [ 14 ] S. Srinivasan, “Security and Privacy vs. Computer Forensics Capabilities”, ISACA Online Journal, 2007
  • 69. [ 15 ] Matthew Geiger, Carnegie Mellon University, “Evaluating Commercial Counter-Forensic Tools”, Digital Forensic Research Workshop (DFRWS), 2005 [ 16 ] Xiaoyun Wang and Hongbo Yu, Shandong University, China, “How to Break MD5 and Other Hash Functions”, EUROCRYPT 2005, pp.19-35, May, 2005 [ 17 ] How to Change TimeStamp of a File in Windows. DOI= http://www.trickyways.com/2009/08/how-to-change- timestamp-of-a-file-in-windows-file-created-modified- and-accessed/. [ 18 ] File Signature Table. DOI= http://www.garykessler.net/library/file_sigs.html, [ 19 ] McLeod S, “SMART Anti-Forensics”, DOI= http://www.forensicfocus.com/smart-anti-forensics, . [ 20 ] Stephen Biggs and Stilianos, “Cloud Computing Storms”, International Journal of Intelligent Computing Research (IJICR), Volume 1, Issue 1, MAR, 2010 [ 21 ] U Gurav, R Shaikh, “Virtualization – A key feature of
  • 70. cloud computing”, International Conference and Workshop on Emerging Trends in technology (ICWET 2010), Mumbai, India [ 22 ] U.S .v .Robert Johnson - Child Pornography Indictment. DOI=http://news.findlaw.com/hdocs/docs/chldprn/usjhns n62805ind.pdf [ 23 ] United States of America v. H. Marc Watzman. DOI= http://www.justice.gov/usao/iln/.../2003/watzman.pdf [ 24 ] Mark Whitteker, “Anti-Forensics: Breaking the Forensics Process”, ISSA Journal, November, 2008 [ 25 ] Gary C. Kessler,“Anti-Forensics and the Digital Investigator”, Champlain College, USA [ 26 ] Ryan Harris, “Arriving at an anti-forensics consensus: examining how to define and control the anti-forensics problem”, DOI= www.elsevier.com/locate/dinn. Appendix A: Anti-Forensics Tools
  • 71. The following is a list of some commercial CAF software packages available on the market. The tools listed below are intended as examples; none of these tools were purchased or tested as part of this paper work. Category Tool Name Privacy and Secure Deletion Privacy Expert; SecureClean; PrivacyProtection; Evidence Eliminator; Internet Cleaner File and Disk Encryption TruCrypt, PointSec; Winzip 14 Time stamp Modifiers SKTimeStamp; Timestamp Modifier; Timestomp Others The Defiler’s Toolkit – Necrofile and Klimafile; Metasploit Anti- Forensic Investigation Arsenal (known affectionately as MAFIA) Download and read the following articles available in the ACM
  • 72. Digital Library: Arduini, F., & Morabito, V. (2010, March). Business continuity and the banking industry. Communications of the ACM, 53(3), 121-125 Dahbur, K., & Mohammad, B. (2011). The anti-forensics challenge. Proceedings from ISWSA '11: International Conference on Intelligent Semantic Web-Services and Applications. Amman, Jordan. Write a five to seven (5-7) page paper in which you: 1. Consider that Data Security and Policy Assurance methods are important to the overall success of IT and Corporate data security. a. Determine how defined roles of technology, people, and processes are necessary to ensure resource allocation for business continuity. b. Explain how computer security policies and data retention policies help maintain user expectations of levels of business continuity that could be achieved. c. Determine how acceptable use policies, remote access policies, and email policies could help minimize any anti- forensics efforts. Give an example with your response. 2. Suggest at least two (2) models that could be used to ensure business continuity and ensure the integrity of corporate
  • 73. forensic efforts. Describe how these could be implemented. 3. Explain the essentials of defining a digital forensics process and provide two (2) examples on how a forensic recovery and analysis plan could assist in improving the Recovery Time Objective (RTO) as described in the first article. 4. Provide a step-by-step process that could be used to develop and sustain an enterprise continuity process. 5. Describe the role of incident response teams and how these accommodate business continuity. 6. There are several awareness and training efforts that could be adopted in order to prevent anti-forensic efforts. a. Suggest two (2) awareness and training efforts that could assist in preventing anti-forensic efforts. b. Determine how having a knowledgeable workforce could provide a greater level of secure behavior. Provide a rationale with your response. c. Outline the steps that could be performed to ensure continuous effectiveness. 7. Use at least three (3) quality resources in this assignment. Note: Wikipedia and similar Websites do not qualify as quality resources. Your assignment must follow these formatting requirements:
  • 74. · Be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides; citations and references must follow APA or school-specific format. Check with your professor for any additional instructions. · Include a cover page containing the title of the assignment, the student’s name, the professor’s name, the course title, and the date. The cover page and the reference page are not included in the required assignment page length. The specific course learning outcomes associated with this assignment are: · Describe and apply the 14 areas of common practice in the Department of Homeland Security (DHS) Essential Body of Knowledge. · Describe best practices in cybersecurity. · Explain data security competencies to include turning policy into practice. · Describe digital forensics and process management. · Evaluate the ethical concerns inherent in cybersecurity and how these concerns affect organizational policies. · Create an enterprise continuity plan. · Describe and create an incident management and response plan. · Describe system, application, network, and telecommunications security policies and response. · Use technology and information resources to research issues in
  • 75. cybersecurity. · Write clearly and concisely about topics associated with cybersecurity using proper writing mechanics and technical style conventions.