Logs are one of the most important pieces of analytical data in a cloud-based service infrastructure. At any point in time, service owners and operators need to understand the sta- tus of each infrastructure component for fault monitoring, to assess feature usage, and to monitor business processes. Application developers, as well as security personnel, need access to historic information for debugging and forensic in- vestigations.
This paper discusses a logging framework and guidelines that provide a proactive approach to logging to ensure that the data needed for forensic investigations has been gener- ated and collected. The standardized framework eliminates the need for logging stakeholders to reinvent their own stan- dards. These guidelines make sure that critical information associated with cloud infrastructure and software as a ser- vice (SaaS) use-cases are collected as part of a defense in depth strategy. In addition, they ensure that log consumers can effectively and easily analyze, process, and correlate the emitted log records. The theoretical foundations are em- phasized in the second part of the paper that covers the im- plementation of the framework in an example SaaS offering running on a public cloud service.
While the framework is targeted towards and requires the buy-in from application developers, the data collected is crit- ical to enable comprehensive forensic investigations. In ad- dition, it helps IT architects and technical evaluators of log- ging architectures build a business oriented logging frame- work.
1. Cloud Application Logging for Forensics
Raffael Marty
Loggly, Inc.
78 First Street
San Francisco, CA 94105
rmarty@loggly.com
ABSTRACT Keywords
Logs are one of the most important pieces of analytical data cloud, logging, computer forensics, software as a service, log-
in a cloud-based service infrastructure. At any point in time, ging guidelines
service owners and operators need to understand the sta-
tus of each infrastructure component for fault monitoring, 1. INTRODUCTION
to assess feature usage, and to monitor business processes.
The ”cloud”[5, 11] is increasingly used to deploy and run
Application developers, as well as security personnel, need
end-user services, also known as software as a service (SaaS)[28].
access to historic information for debugging and forensic in-
Running any application requires insight into each infras-
vestigations.
tructure layer for various technical, security, and business
This paper discusses a logging framework and guidelines
reasons. This section outlines some of these problems and
that provide a proactive approach to logging to ensure that
use-cases that can benefit from log analysis and manage-
the data needed for forensic investigations has been gener-
ment. If we look at the software development life cycle, the
ated and collected. The standardized framework eliminates
use-cases surface in the following order:
the need for logging stakeholders to reinvent their own stan-
dards. These guidelines make sure that critical information
• Debugging and Forensics
associated with cloud infrastructure and software as a ser-
vice (SaaS) use-cases are collected as part of a defense in • Fault monitoring
depth strategy. In addition, they ensure that log consumers • Troubleshooting
can effectively and easily analyze, process, and correlate the
emitted log records. The theoretical foundations are em- • Feature usage
phasized in the second part of the paper that covers the im- • Performance monitoring
plementation of the framework in an example SaaS offering • Security / incident detection
running on a public cloud service.
While the framework is targeted towards and requires the • Regulatory and standards compliance
buy-in from application developers, the data collected is crit-
ical to enable comprehensive forensic investigations. In ad- Each of these use-cases can leverage log analysis to either
dition, it helps IT architects and technical evaluators of log- completely solve or at least help drastically speed up and
ging architectures build a business oriented logging frame- simplify the solution to the use-case.
work. The rest of this paper is organized as follows: In Section
2 we discuss the challenges associated with logging in cloud-
based application infrastructures. Section 3 shows how these
challenges can be addressed by a logging architecture. In
Categories and Subject Descriptors section 4 we will see that a logging architecture alone is
C.2.4 [Distributed Systems]: Distributed applications not enough. It needs to be accompanied by use-case driven
logging guidelines. The second part of this paper (Section
5) covers a references setup of a cloud-based application
General Terms that shows how logging was architected and implemented
throughout the infrastructure layers.
Log Forensics
2. LOG ANALYSIS CHALLENGES
If log analysis is the solution to many of our needs in
cloud application development and delivery, we need to have
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are a closer look at the challenges that are associated with it.
not made or distributed for profit or commercial advantage and that copies The following is a list of challenges associated with cloud-
bear this notice and the full citation on the first page. To copy otherwise, to based log analysis and forensics:
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. • Decentralization of logs
SAC’11 March 21-25, 2011, TaiChung, Taiwan.
Copyright 2011 ACM 978-1-4503-0113-8/11/03 ...$10.00. • Volatility of logs
2. • Multiple tiers and layers 2.2 Log Records
• Archival and retention What happens if there are no common guidelines or stan-
dards defined for logging? In a lot of cases, application de-
• Accessibility of logs
velopers do not log much. Sometimes, when they do log,
• Non existence of logs the log records are incomplete, as the following example il-
• Absence of critical information in logs lustrates:
• Non compatible / random log formats Mar 16 08:09:58 kernel: [ 0.000000]
Normal 1048576 -> 1048576
A cloud-based application stores logs on multiple servers
and in multiple log files. The volatile nature of these re- There is not much information in this log to determine
sources1 causes log files to be available only for a certain what actually happened and what is Normal?
period of time. Each layer in the cloud application stack A general rule of thumb states that a log record should be
generates logs, the network, the operating system, the ap- both understandable by a human and easily machine pro-
plications, databases, network services, etc. Once logs are cessable. This also means that every log entry should, if
collected, they need to be kept around for a specific time possible, log what happened, when it happened, who trig-
either for regulatory reasons or to support forensic investi- gered the event, and why it happened. We will later discuss
gations. We need to make the logs available to a number of in more detail what these rules mean and how good log-
constituencies; application developers, system administra- ging guidelines cover these requirements (see Sections 4.2
tors, security analysts, etc. They all need access, but only and 4.3).
to a subset and not always all of the logs. Platform as a In the next section we will discuss how we need to instru-
service (PaaS) providers often do not make logs available to ment our infrastructure to collect all the logs. After that we
their platform users at all. This can be a significant problem will see how logging guidelines help address issues related to
when trying to analyze application problems. For example, missing and incomplete log records.
Amazon[5] does not make the load balancer logs available to
their users. And finally, critical components cannot or are
not instrumented correctly to generate the logs necessary to
3. LOGGING ARCHITECTURE
answer specific questions. Even if logs are available, they A log management system is the basis for enabling log
come in all kinds of different formats that are often hard to analysis and solving the goals introduced in the previous
process and analyze. sections. Setting up a logging framework involves the fol-
The first five challenges can be solved through log man- lowing steps:
agement. The remaining three are more intrinsic problems
and have to be addressed through defining logging guidelines • Enable logging in each infrastructure and application
and standards2 . component
• Setup and configure log transport
2.1 Log Management
• Tune logging configurations
Solving the cloud logging problems outlined in the last
section requires a log management solution or architecture 3.1 Enable Logging
to support the following list of features:
As a first step, we need to enable logging on all infrastruc-
ture components that we need to collect logs from. Note that
• Centralization of all logs
this might sound straight forward, but it is not always easy
• Scalable log storage to do so. Operating systems are mostly simple to configure.
• Fast data access and retrieval In the case of UNIX, syslog[31] is generally already setup
and logs can be found in /var/log. The hard part with
• Support for any log format OS logs is tuning. For example, how do you configure the
• Running data analysis jobs (e.g., map reduce[16]) logging of password changes on a UNIX system[22]? Log-
• Retention of log records ging in databases is a level harder than logging in operating
systems. Configuration can be very tricky and complicated.
• Archival of old logs and restoring on demand For example, Oracle[24] has at least three different logging
• Segregated data access through access control mechanisms. Each of them has their own sets of features, ad-
vantages, and disadvantages. It gets worse though; logging
• Preservation of log integrity
from within your applications is most likely non-existent. Or
• Audit trail for access to logs if it exists, it is likely not configured the way your use-cases
demand; log records are likely missing and the existing log
These requirements match up with the challenges defined records are missing critical information.
in the last section. However, they do not address the last
three challenges of missing and non standardized log records. 3.2 Log Transport
1 Setting up log transport covers issues related to how logs
For example, if machines are pegging at a very high load,
new machines can be booted up or machines are terminated are transfered from the sources to a central log collector.
if they are not needed anymore without prior warning. Here are issues to consider when setting up the infrastruc-
2
Note that in some cases, it is not possible to change any- ture:
thing about the logging behavior as we cannot control the
code of third-party applications. • Synchronized clocks across components
3. • Reliable transport protocol • Errors are problems that impact a single application
• Compressed communication to preserve bandwidth user and not the entire platform.
• Encrypted transport to preserve confidentiality and in- • Critical conditions are situations that impacts all users
tegrity of the application. They demand immediate atten-
tion3 .
3.3 Log Tuning • System and application start, stop, and restart. Each
Log data is now centralized and we have to tune log sources of these events could indicate a possible problem. There
to make sure we get the right type of logs and the right details is always a reason why a machine stopped or was restarted.
collected. Each logging component needs to be visited and
• Changes to objects track problems and attribute changes
tuned based on the use-cases. Some things to think about
to an activity. Objects are entities in the application,
are where to collect individual logs, what logs to store in
such as users, invoices, or goods. Other examples of
the same place, and whether to collect certain log records at
changes that should be logged are:
all. For example, if you are running an Apache Web server,
do you collect all the log records in the same file; all the – Installation of a new application (generally logged
media file access, the errors, and regular accesses? Or are on the operating system level).
you going to disregard some log records? – Configuration change4 .
Depending on the use-cases you might need to log addi-
tional details in the log records. For example, in Apache it – Logging program code updates enable attribution of
is possible to log the processing time for each request. That changes to developers.
way, it is possible to identify performance degradations by – Backup runs need to be logged to audit successful or
monitoring how long Apache takes to process a request. failed backups.
– Audit of log access (especially change attempts).
4. LOGGING GUIDELINES
To address the challenges associated with the information 4.1.3 Security
in log records, we need to establish a set of guidelines and we Security logging in cloud application is concerned with
need to have our applications instrumented to follow these authentication and authorization, as well as forensics sup-
guidelines. These guidelines were developed based on exist- port5 . In addition to these three cases, security tools (e.g.,
ing logging standards and research conducted at a number intrusion detection or prevention system or anti virus tools)
of log management companies[4, 15, 20, 30, 33]. will log all kinds of other security-related issues, such as at-
tempted attacks or the detection of a virus on a system.
4.1 When to Log Cloud-applications should focus on the following use-cases:
When do applications generate log records? Making the
decision when to write log records needs to be driven by • Login / logout (local and remote)
use-cases. These use-cases in cloud applications surface in • Password changes / authorization changes
four areas:
• Failed resource access (denied authorization)
• Business relevant logging • All activity executed by a privileged account
• Operations based logging
Privileged accounts, admins, or root users are the ones
• Security (forensics) related logging that have control of a system or application. They have
• Regulatory and standards mandates privileges to change most of the parameters in the applica-
tion. It therefore is crucial for security purposes to monitor
As a rule of thumb, at every return call in an application, very closely what these accounts are doing6 .
the status should be logged, whether success or failure. That
way errors are logged and activity throughout the applica- 4.1.4 Compliance
tion can be tracked. Compliance and regulatory demands are one more group
of use-cases that demand logging. The difference the other
4.1.1 Business use-cases is that it is often required by law or by business
Business relevant logging covers features used and busi- partners to comply with these regulations. For example,
ness metrics being tracked. Tracking features in a cloud the payment card industry’s data security standard (PCI
application is extremely crucial for product management. It DSS[25]) demands a set of actions with regards to logging
helps not only determine what features are currently used, (see Section 10 of PCI DSS). The interesting part about
it can also be used to make informed decisions about the the PCI DSS is that it demands that someone reviews the
future direction of the product. Other business metrics that
3
you want to log in a cloud application are outlined in [8]. Exceptions should be logged automatically through the ap-
Monitoring service level agreements (SLAs) fall under the plication framework.
4
topic of business relevant logging as well. Although some of For example a reconfiguration of the logging setup is im-
portant to determine why specific log entries are not logged
the metrics are more of operational origin, such as applica- anymore.
tion latencies. 5
Note that any type of log can be important for forensics,
not just security logs.
4.1.2 Operational 6
Note also that this has an interesting effect on what user
Operational logging should be implemented for the follow- should be used on a daily basis. Normal activity should not
ing instances: be executed with a privileged account!
4. logs and not just that they are generated. Note that most (CEE)[10] is a new standard that is going to be based on
of the regulations and standards will cover use-cases that the following syntax:
we discussed earlier in this section. For example logging
time=2010-05-13 13:03:47.123231PDT,
privileged activity is a central piece of any regulatory logging
session_id=08BaswoAAQgAADVDG3IAAAAD,
effort.
severity=ERROR,user=pixlcloud_zrlram,
object=customer,action=delete,status=failure,
4.2 What to Log reason=does not exist
We are now leaving the high-level use-cases and infras-
tructure setup to dive into the individual log records. How There are a couple of important properties to note about
does an individual record have to look? the example log record:
At a minimum, the following fields need to be present in First, each field is represented as a key-value pair. This is
every log record: Timestamp, Application, User, Session ID, the most important property that every records follows. It
Severity, Reason, Categorization. These fields help answer makes the log files easy to parse by a consumer and helps
the questions: when, what, who, and why. Furthermore, with interpreting the log records. Also note that all of the
they are responsible for providing all the information de- field names are lower case and do not contain any punctua-
manded by our use-cases. tions. You could use underscores for legibility.
A timestamp is necessary to identify when a log record Second, the log record uses three fields to establish a cat-
or the recorded event happened. Timestamps are logged in egorization schema or taxonomy: object, action, and status.
a standard format[18]. The application field identifies the Each log record is assigned exactly one value for each of these
producer of the log entry. A user field is necessary to iden- fields. The category entries should be established upfront
tify which user has triggered an activity. Use unique user for a given environment. Standards like CEE are trying to
names or IDs to distinguish users from each other. A session standardize an overarching taxonomy that can be adopted
ID helps track a single request across different applications by all log producers. Establishing a taxonomy needs to be
and tiers. The challenge is to share the same ID across based on the use-cases and allow for classifying log records
all components. A severity is logged to filter logs based on in an easy way. For example, by using the object value of
their importance. A severity system needs to be established. ’customer’, one can filter by customer related events. Or by
For example: debug, info, warn, error, and crit. The same querying for status=failure, it is possible to search for all
schema should be used across all applications and tiers. A failure indicating log records across all of the infrastructure
reason is often necessary to identify why something has hap- logs.
pened. For example, access was denied due to insufficient These are recommendations for how log entries should be
privileges or a wrong password. The reason identifies why. structured. To define a complete syntax, issues like encoding
As a last set of mandatory field, category or taxonomy fields have to be addressed. For the scope of this paper, those
should be logged. issues are left to a standard like CEE[10].
Categorization is a method commonly used to augment
information in log records to allow addressing similar events
4.4 Don’t forget the infrastructure
in a common way. This is highly useful in, for example, We talked a lot about application logging needs. However,
reporting. Think about a report that shows all failed logins. do not forget the infrastructure. Infrastructure logs can pro-
One could try to build a really complicated search pattern vide very useful context for application logs. Some examples
that finds failed login events across all kinds of different are firewalls, intrusion detection and prevention systems, or
applications7 or one could use a common category field to web content filter logs which can help identify why an ap-
address all those records. plication request was not completed. Either of these devices
might have blocked or altered the request. Other exam-
4.3 How to log ples are load balancers that rewrite Web requests or modify
them. Additionally, infrastructure issues, such as high laten-
What fields to log in a log record is the first piece in the
cies might be related to overloaded machines, materializing
logging puzzle. The next piece we need is a syntax. The
in logs kept on the operating systems of the Web servers.
basis for a syntax is rooted in normalization, which is the
There are many more infrastructure components that can
process of taking a log entry and identifying what each of
be used to correlate against application behavior.
the pieces represent. For example, to report on the top users
accessing a system we need to know which part of each log
record represents the user name. The problems that can be 5. REFERENCE SETUP
solved with normalized data are also called structured data In the first part of this paper, we have covered the theoret-
analysis. Sometimes additional processes are classified under ical and architectural basis for cloud application log manage-
normalization. For example, normalizing numerical values ment. The second part of this paper discusses how we have
to fall in a predefined range can be seen as normalization. implemented a application logging infrastructure at a soft-
There are a number of problems with normalization[21]. Es- ware as a service (SaaS) company. This section presents a
pecially if the log entries are not self-descriptive. More on number of tips and practical considerations that help overall
that in Section 4.3. logging use-cases but most importantly support and enable
The following is a syntax recommendation that is based forensic analysis in case of an incident.
on standards work and the study of many existing logging Part of the SaaS architecture that we are using here is
standards (e.g., [9, 17, 29, 35]). Common event expression a traditional three-tiered setup that is deployed on top of
the Amazon AWS cloud. The following is an overview of
7 the application components and a list of activities that are
Each application logs failed logins in completely different
formats[34]. logged in each component:
5. • Django[13] : Object changes, authentication and au- the server along with the other features. We therefore built
thorization, exceptions, errors, feature usage a little logging library that can be included in the HTML
• JavaScript: AJAX requests, feature usage code. Calling the library results in an Ajax call to an end
point on the server that will log the message which is passed
• Apache: User requests (successes and attempts) as a payload to the call. In order to not spawn too many
• MySQL: Database activity HTTP requests, the library can be used in batch mode to
bundle multiple log records into a single call.
• Operating system: System status, operational failures
• Java Backend : Exceptions, errors, performance 5.3 Apache
Our Apache logging setup is based on Apache’s defaults.
In the following, we are going to briefly touch upon each However, we tuned a number of parameters in order to en-
of the components and talk in more detail about what it able some of our use-cases. To satisfy our logging guidelines,
means to log in these components and what some of the we need to collect a timestamp, the originating server, the
pitfalls were that we ran into. URL accessed, the session ID from the application, and the
HTTP return code to identify what the server has done with
5.1 Django the request.
Django assumes that developers are using the standard The motivation for some of these fields you find in the
python logging libraries[26]. There is no additional support first part of this paper, for example the session ID. Here is
for logging built into Django. Due to this lack of intrinsic a sample log entry that shows how our Apache logs look.
logging support we had to implement our own logging solu- Note specifically the last two fields:
tion for Django and instrument Django, or more precisely,
the Django authentication methods to write log entries[14]. 76.191.189.15 - - [29/Jan/2010:11:15:54 -0800]
In addition, we wrote a small logging library that can be in- "GET / HTTP/1.1" 200 3874 "http://pixlcloud.service.ch/"
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us)
cluded in any code. Once included, it exports logging calls AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4
for each severity level, such as debug(), error(), info(), Safari/531.21.10" duvpqQqgOxAAABruAPYAAAAE
warn(), crit(), and feature() The first five calls all work
similar; they require a set of key-value pairs. For example: 5.3.1 Apache Configuration
This section is going to outline how we configured our
error({’object’:’customer’,’action’:’delete’, Apache instances. The first step is to configure the LogFor-
’reason’:’does not exists’,’id’:’22’}) mat to contain the extra information that we are interested
in. The item we added is {%UNIQUE_ID}e. The former adds
The corresponding log entry looks as follows:
a unique ID into every request and the latter logs the latency
2010 Jan 28 13:03:47 127.0.0.1 severity=ERROR, for every request:
user=pixlcloud_zrlram,object=customer,action=delete,
status=failure,reason=does not exist,id=22, LogFormat "%h %l %u %t "%r" %>s %b
request_id=dxlsEwqgOxAAABrrhZgAAAAB "%{Referer}i" "%{User-Agent}i"
%{UNIQUE_ID}e" service
The extra key-values in the log record are automatically
added to the log entries by our logging library without bur- Make sure you enable the mod_unique_id[3] module such
dening the developer to explicitly include them. The unique id that Apache includes a session ID in every log entry.
is extracted from the HTTP request object. This ID is In a next instance, we turned off an annoying log record
unique for each user request and is used on each applica- that you will see if you have Apache running with sub pro-
tion tier to allow correction of messages. The user is also cesses. When an Apache server manages its child processes,
extracted through the request object. The severity is in- it needs a way to wake up those processes to handle them
cluded automatically based on the logging command. These new connections. Each of these actions creates a log entry.
different levels of logging calls (i.e., severities) are used to We are not at all interested in those. Here is the Apache
filter messages, and also to log debug messages only in de- logging configuration we ended up with:
velopment. In production, there is a configuration setting
SetEnvIf Remote_Addr "127.0.0.1" loopback=1
that turns debug logging off. SetEnvIf loopback 1 accesslog
Note that we are trying to always log the category fields:
an object, an action, a status, and if possible a reason. This CustomLog /var/log/apache2/access.log service env=!accesslog
enables us to do very powerful queries on the logs, like look-
ing for specific objects or finding all failed calls. 5.3.2 Load Balancing
In addition to the regular logging calls, we implemented In our infrastructure, we are using a load balancer, which
a separate call to log feature usage. This goes back to the turned out to significantly impact the Apache logs. Load
use-cases where we are interested in how much each of our balancers make requests look like they came from the load
features in the product is used. balancer’s IP; the client IP is logged wrong, making it impos-
sible to track requests back to the actual source and correlate
5.2 JavaScript other log recrods with it. An easy fix would be to change
Logging in JavaScript is not very developer friendly. The the LogFormat statement to use the X-Forwarded_For field
problem is that the logs end up on the client side. They instead of %h. However, this won’t always work. There are
cannot be collected in a server log to correlate them with two Apache modules that are meant to address this issue:
other log records. Some of the service’s features are triggered mod remoteip and mod rpaf. Both of these modules allow
solely through Ajax[2] calls and we would like to log them on us to keep the original LogFormat statement in the Apache
6. configuration. The modules replace the remote ip (%h) field logs contain the same session ID as the original request that
with the value of the X-Forwarded-For header in the HTTP triggered the backend process.
request if the request came from a load balancer. We ended Another measure we are tracking very closely is the num-
up using mod rpaf with the following configuration: ber of exceptions in the logs. In a first instance, we are
monitoring them to fix them. However, not all of them can
LoadModule rpaf_module mod_rpaf.so be fixed or make sense to be fixed. What we are doing how-
RPAFenable On ever, is monitoring the number of exceptions over time. An
RPAFsethostname On increase shows that our code quality is degrading and we
RPAFproxy_ips 10.162.65.208 10.160.137.226 can take action to prioritize work on fixing them. This has
shown to be a good measure to balance feature vs. bug-
This worked really well, until we realized that on Amazon
related work and often proofs invaluable when investigating
AWS, the load balancer constantly changes. After defining
security issues.
a very long chain of RPAFproxy ips, we started looking into
another solution, which was to patch mod rpaf to work as
desired[23]. 6. FUTURE TOPICS
There are a number of issues and topics around cloud-
5.4 MySQL based application logging that we haven’t been able to dis-
Our MySQL setup is a such that we are using Amazon’s cuss in this paper. However, we are interested in addressing
Relational Database Service[7]. The problem with this ap- them at a future point in time are: security visualization[1],
proach being that we do not get MySQL logs. We are looking forensic timeline analysis, log review, log correlation, and
into configuring MySQL to send logs to a separate database policy monitoring.
table and then exporting the information from there. We
haven’t done so yet. 7. CONTRIBUTIONS
One of the challenges with setting up MySQL logging will
To date, there exists no simple and easy to implement
be to include the common session ID in the log messages to
framework for application logging. This paper presents a
enable correlation of the database logs with application logs
basis for application developers to guide the implementa-
and Web requests.
tion of efficient and use-case oriented logging. The paper
5.5 Operating system contributes guidelines for when to log, where to log, and
exactly what to log in order to enable the three main log
Our infrastructure heavily relies on servers handling re-
analysis uses: forensic investigation, reporting, and correla-
quests and processing customer data in the back end. We
tion. Without following these guidelines it is impossible to
are using collectd[12] on each machine to monitor individual
forensically recreate the precise actions that an actor under-
metrics. The data from all machines is centrally collected
took. Log collections and logging guidelines are an essential
and a number of alerts and scripted actions are triggered
building block of any forensic process.
based on predefined thresholds.
The logging guidelines in this paper are tailored to to-
We are also utilizing a number of other log files on the
day’s infrastructures that are often running in cloud envi-
operating systems for monitoring. For example, some of our
ronments, where asynchronous operations and a variety of
servers have very complicated firewall rules deployed. By
different components are involved in single user interactions.
logging failed requests we can identify both mis-configurations
They are designed for investigators, application developers,
and potential attacks. We are able to identify users that are
and operations teams to be more efficient and to better sup-
trying to access their services on the platform, but are using
port their business processes.
the wrong ports to do so. We mine the logs and then alert
the users of their mis-configurations. To assess the attacks,
we are not just logging blocked connections, but also se- 8. BIOGRAPHY
lected passed ones. We can for example monitor our servers Raffael Marty is an expert and author in the areas of
for strange outbound requests that haven’t been seen be- data analysis and visualization. His interests span anything
fore. Correlating those with observed blocked connections related to information security, big data analysis, and infor-
gives us clues as to the origin of these new outbound con- mation visualization. Previously, he has held various posi-
nections. We can then use the combined information to tions in the SIEM and log management space at companies
determine whether the connections are benign or should be such as Splunk, ArcSight, IBM research, and PriceWater-
of concern. house Coopers. Nowadays, he is frequently consulted as an
industry expert in all aspects of log analysis and data visu-
5.6 Backend alization. As the founder of Loggly, a logging as a service
We are operating a number of components in our SaaS’s company, Raffy spends a lot of time re-inventing the log-
backend. The most important piece is a Java-based server ging space and - when not surfing the California waves - he
component we instrumented with logging code to monitor can be found teaching classes and giving lectures at security
a number of metrics and events. First and foremost we conferences around the world.
are looking for errors. We instrumented Log4J[19] to log
through syslog. The severity levels defined earlier in this
paper are used to classify the log entries.
In order to correlate the backend logs with any of the
other logs, we are using the session ID from the original
Web requests and pass them down in any query made to the
back end as part of the request. That way, all the backend
7. References [21] Event Processing ˆ
A£ Normalization,
[1] Raffael Marty, Applied Security Visualization, Addi- http://raffy.ch/blog/2007/08/25/
son Wesley, August 2008. event-processing-normalization/, accessed June
4, 2010.
[2] Asynchronous JavaScript Technology, https:
//developer.mozilla.org/en/AJAX. [22] Linux/UNIX Audit Logs, http://raffy.ch/blog/
2006/07/24/linux-unix-audit-logs.
[3] Apache Module mod unique id, http://httpd.
[23] Fixing Client IPs in Apache Logs with Ama-
apache.org/docs/2.1/mod/mod_unique_id.html.
zon Load Balancers, www.loggly.com/2010/03/
[4] ArcSight - Common Event Format, www.arcsight. fixing-client-ips-in-apache-logs-with-amazon-load-balance
com/solutions/solutions-cef. Accessed June 11, 2010.
[5] Amazon Web Services, http://aws.amazon.com. [24] Pete Finnigan, Introduction to Simple Oracle
Auditing, www.symantec.com/connect/articles/
[6] AWS Elastic Load Balancing http://aws.amazon. introduction-simple-oracle-auditing.
com/elasticloadbalancing.
[25] PCI security standards council, Payment Card Indus-
[7] Amazon Relational Database Service http://aws. try (PCI), Data Security Standard, Version 1.2.1, July
amazon.com/rds. 2009.
[8] Bessemer cloud computing law Number 2: Get [26] Python Logging, www.python.org/dev/peps/
Instrument rated, and trust the 6C’s of Cloud pep-0282.
Finance, www.bvp.com/About/Investment_Practice/
Default.aspx?id=3988. Accessed June 6, 2010. [27] rsyslgo, www.rsyslog.com.
[9] Bridgewater, David, Standardize messages with the [28] Software as a Service in Wikipedia. Retrieved
Common Base Event model, IBM DeveloperWorks, 21 June 2, 2010 from http://en.wikipedia.org/wiki/
Oct 2004. Software_as_a_service.
[10] Common Event Expression http://cee.mitre.org. [29] , ICSA Labs. The Security Device Event Exchange
(SDEE), www.icsalabs.com/html/communities/ids/
[11] Cloud Computing. in Wikipedia. Retrieved June 2, sdee/index.shtml
2010 from http://en.wikipedia.org/wiki/Cloud_
[30] Splunk Wiki - Common Information Model, www.
computing.
splunk.com/wiki/Apps:Common_Information_Model.
ˆ
[12] collectd A£ The system statistics collection daemon, [31] Syslog(3) - Man page, http://linux.die.net/man/3/
www.collectd.org. syslog.
[13] Django, Web framework www.djangoproject.com. [32] Syslog-ng logging system, www.balabit.com/
[14] Django 1.2 logging patch www.loggly.com/ network-security/syslog-ng.
wp-content/uploads/2010/04/django_logging_ [33] Thor: A tool to Test Intrusion Detection System by
1.2.patch Variations of Attacks, http://thor.cryptojail.net/
[15] M. Dacier et al. Design of an intrusion-tolerant intru- thor.
sion detection system, Maftia Project, deliverable 10, [34] Common Dictionary and Event Taxonomy, http://
2005. cee.mitre.org/ceelanguage.html#event.
[16] Jeffrey Dean and Sanjay Ghemawat, MapRe- [35] OpenXDAS, http://openxdas.sourceforge.net/.
duce: Simplified Data Processing on Large Clusters,
OSDI’04: Sixth Symposium on Operating System De-
sign and Implementation, San Francisco, CA, Decem-
ber, 2004.
[17] IDWG (Intrusion Detection Working Group), In-
trusion Detection Exchange Format, www.ietf.org/
html.charters/idwg-charter.html.
ˆ
[18] ISO 8601 - Data elements and interchange formats A£
ˆ Representation of dates
Information interchange A£
and times, International Organization for Standard-
ization.
[19] Apache log4j, Java Logging, http://logging.apache.
org/log4j.
[20] Loggly Inc., www.loggly.com.