1. Distributed Sniffing and Scanning
Project Report
Rishu Seth
Soham Kulkarni
Philipp Orekhov
Natalia Katyuzhanskaya
Mohammad Tarique Abdullah
15th February 2011
3. 0.1 Our groups initial thoughts about the distributed
sniffing and scanning project
• Initially we have decided to cover a number of various network in-
terconnection scenarios and draw some observations to make our ap-
proach more systematic.
• We thought it would be important to consider the different aspects of
information:
– What information can be gained?
– How can it be gained?
– How can it be analyzed?
– How can it be used?
• We also wanted to figure out in advance what kind of problems we
might encounter with information gathering, information content, cen-
tralised and distributed approaches.
• We tried to discuss the problems related to choosing good locations to
place sniffers and scanners in order to improve the quality/quantity of
the information gathering process based on the network interconnec-
tion scenarios we considered.
• We have tried to come up with possible subspecializations within the
project’s scope:
– Firewalls
– Routers
– Intrusion Detection Systems
– Virtual Networks
– Network protocol based (eg detailed ICMP, UDP+TFTP...)
– Task based (Producing a map of the network, Learning about
different network layers through detailed analysis of example net-
works, Gaining the network workload statistics)
• Finally, we wanted to estimate the timing/level of detail/area of spe-
cialisation: all to make the project fruitful and feasible at the same
time.
0.2 General network model and observations
We have drawn a number of network topologies that differ in multiple as-
pects, and from those have been able to get the following observations:
2
4. 1. There exist 1-to-1, 1-to-many and many-to-many connections accross
all kinds of networks.
2. From point 1: there can be single or multiple points of access to one
resource: different network interfaces.
3. There can be a number of different equipment types on common net-
works: routers, switches, bridges, repeaters, gateways, firewalls, clients
and servers, printers, dialup-modems, PDAs, ...
4. Throughout the different network layers: the similarity commonly lies
in layered encasulation, and the difference in specific protocol details.
5. There are a number of different types of wired and wireless connection
option that are common: serial & parallel ports, ethernet ports, di-
alup ports, 802.11 a/b/g/n wireless, infrared, bluetooth: not only the
TCP/IP based networking can be considered.
6. The networks are often segmented into different logical subblocks for
various purposes: splitting very large networks to simplify monitoring,
physical devices located in different rooms, intended use (eg Subnet-
work for HR Department)...
0.3 What information can be collected?
Firstly, some common network common data that can be acquired: IP ad-
dresses, MAC addresses, hostnames, OS versions, services running: protocol,
port, type, version.
Now, slightly more specific data types and points:
• number of hops: can/cannot be monitored (eg a NAT that changes
the hop count): can identify NATs.
• chains of useage: can/cannot observe common path for common traffic
type: Internal webserver traffic within the local area network.
• manual or automatic network reorganisations: can those be easily de-
tected?
• unordinary sequences/types of network activity: likelihood/signature
of attacks. For example: a combination of a shell connection + IRC
+ ftp at 3 AM might be suspicious.
• covert channels: detection/guessing.
• backdoor detection. For example: a listening UDP port, within unre-
served range, open for a lengthy period of time?
3
5. • hidden traffic detection. For example: ftp packets over Ethernet layer.
• distributed traffic from a single application detection. For example: ftp
client-server communication that takes place partially over a wireless
interface, partially over a wired interface.
• common: header analysis + payload analysis.
0.4 How can information be collected?
• place a hardware based sniffer at (in between) some (well chosen)
network points.
• place software based sniffer(s) on different machines.
• setup software bots to be travelling around the network gaining info.
• place and run scanners on 1+ machine:
– perform thorough scans from each machine and cross-reference
the data.
– perform distributed partial scans and combine the results.
• consider multistaged scanning options. For example:
– scan TCP port 80: open
– connect to TCP port 80: webserver status + version + modules
installed + OS details?
– cross-reference to scans from other locations
– search/scan for vulnerabilities...
0.5 How and where can information be analyzed?
Three location posibilities:
• on central server.
• on distributed communicating machines.
• hybrid of the above two.
Types of analysis:
• Statistical analysis
– quantity of traffic: filtered by protocol or unfiltered
4
6. – pattern identification: for example, roughly 50 percent of traf-
fic/day is HTTP.
• Semantics/logical/rule based analysis and pattern recognition
– Example 1: Encrypted Ethernet frames over a TCP connection:
VPN traffic?
– Example 2: Deep packet inspection: a source address in the TCP
packet containing an FTP packet differs from the source address
contained in the FTP packet: NAT/Router/Firewall?
• How to identify own (scanner/sniffer/analyzer) traffic to NOT get
recognised as an intruder?
0.6 How can information be used?
• In case an irregularity on the network is detected network administra-
tor can be informed by one or more means: by email, by phone.
• If a level of risk posed by an irregularity is judged as high, it may be
appropriate to fix or block an issue automatically.
• A visual representation of collected data can be produced:
– detailed network diagrams.
– statistical probabilites: of attacks, of ’bad’ configurations, of pos-
sible damage.
– identified ’potentially insecure’ network zones and associated ex-
pected paths/vectors of attacks.
• Gathered information can be provided to a hired team of penetration
testers:
– to evaluate the quality/usefulness of that information.
– to generate detailed attack/security plans/policies.
• Gathered information can be compared to that collected by penetra-
tion testing team to decide on what can be improved in the automatic
data collection.
0.7 Advantages and disadvantages of centralised
and distributed data gathering
Main disadvantages of centralized data gathering:
5
7. • Only one perspective on the network.
• Monitoring the traffic changes in real-time is difficult and sometimes
not possible.
• All traffic analysis has to happen on only one machine: slower.
Advantages of distributed data gathering:
• Multiple ’sides of view’ of the network.
• With well placed sniffers it becomes possible to monitor traffic changes
throughout the network in real-time.
• Scan results from different points in the network can be different:
more/better information.
• Distributed work delegation and division becomes possible.
• Distributed analysis is faster.
• Data from different perspectives is more complete.
• There are possibilites to efficiently distribute the network traffic to
avoid/reduce congestions.
Disadvantages of distributed data gathering:
• A load of additional network traffic from scans may be unwanted and
can cause congestions in normal operation.
• Impact from network reorganisations can be significant.
• Choosing where best to perform the analysis is an open problem.
• What and how information is to be shared in a distributed environment
is uncertain and can pose additional risks.
• Identifying own traffic is problematic.
• How to make sure that no one tempers with the traffic becomes an
important question.
• Should the distributed machines be centrally managed or partially/totally
autonomous?
6
9. Our group approached exploring the distributed sniffing and scanning
by investigating the existing open source software packages that allow for
various simple and advanced types of sniffing and scanning to be performed.
We have chosen to use the Linux operating system as the platform due to
availability of many networking utilities and the possibility of automation
using the command line interfaces common to most Linux based packages
and the flexible environment of the OS.
We have decided to split our project into three main components:
• Java based client and server software written by our group to distribute
the tools and the commands to multiple clients.
• Tools to perform different types of sniffing and scanning.
• Analysis tools and approaches to be able to make security decisions
based on collected raw data.
8
11. As the project is composed of multiple components a stepwise description
of the operation follows:
1. Java based client and server software is installed onto the client and
server machines.
2. First, the clients are run, then the server gets started on the server
machine.
3. An archive of the binary tools and a file with the commands to be run
on each cllient are distributed to all the available clients.
4. Clients run the commands from the file and produce log files. The
status of the command execution is sent to the server.
5. The log files are considered as the raw data and are to be further
decomposed and analysed by the analysis tools.
6. Final data sets can be cross-referenced and analysed manually to be
able to make some security decisions.
10
13. The detailed stages of operation of the client and server based programs
are described here:
1. When the clients and servers start up they read the initial settings
from a file. Settings include predefined TCP and UDP (multicast)
port numbers, multicast address and various predefined messages for
client-server communication.
2. When the clients are run, they first expect to receive a multicast mes-
sage from the server to find out the IP address of the server.
3. When the server is run, it first starts a TCP message server for com-
municating with the clients and a TCP file server to transfer the tools
arhive and the commands file to the clients. Then a multicast message
with the IP address of the server is transmitted to the clients.
4. Once the clients receive the IP of the server, they request for the tools
archive and the commands file to be transferred to them. Client-
server communication, including the requests for file transmissions, is
handled by the TCP message server. The file transfer - by the TCP
file server.
5. On successful receipt of the tools and the commands file, clients ex-
ecute each command on their respective machines and report the
progress back to the server.
12
15. For the project there are 2 types of tools that are used: data gathering
tools and analysis tools. Those tools are described in the following sections.
0.8 General classification of tools used
Our group tried to produce a general classification:
• Passive tools
– Sniffers of different types and scope.
• Active tools
– Scanners of different types and scope.
– Packet generators.
– Vulnerability scanners.
– Visualizers.
• Analysis and parsing tools
– Parsers.
– Packet analyzers and filters.
– Network forensic analysis tools.
0.9 ngrep (multitype sniffing)
At the moment ngrep is the only sniffing tool used in the project. It allows
for listening on one or more network interfaces, different network layers
and protocols, and supports searches for specific information, similar to the
standard utility grep. Additional details about the tool can be found in the
supporting tool description file.
0.10 hping (packet generation)
This tool is commonly used for crafting specific packets to test the various
features of the network. There are multiple possibilities for the use of this
tool in our project.
0.11 netcat (general purpose clients and servers)
Very common multipurpose networking tool. Creating clients and servers for
message interchange or file transfer and testing services running on various
machines are common examples of use.
14
16. 0.12 nmap (scanning)
The main scanner used in the project. With a very large choice of scan
types, it might be the tool gathering the most information for analysis.
0.13 bzip2 (compression/decompression)
As a common archiving tool, it may be needed for reducing the used space
and efficient file transfers.
0.14 p0f (passive OS fingerprinting)
Used as an alternative to the active OS identification, might improve the
reliability of gathered data.
0.15 traceroute (identifying the network structure)
On it’s own or in combination with other tools it might be helpful in mapping
the network.
0.16 Decomposition and analysis tools
Various decomposition, restructuring and searching tools are used for pro-
ducing the final data for analysis. To date some of the tools that were used
include: grep, sed, awk, perl, cut, head, tail, wc, uniq, sort, comm, diff.
0.17 Scenarios
• OS identification.
• Network mapping.
• Gahtering information about the running services.
• Multilayered protocols analysis.
• Uncommon network patters identification.
• Traffic size evaluation.
• Identification of own traffic.
• Reduction of network traffic congestion.
15
18. The following steps can be performed:
1. Setup the Java server and client on the machines.
2. Run the server and clients to transfer the tools and the commands to
client machines.
3. Gather initial (locally available) information about each client.
4. For each available network interface (eg eth0 wlan0...) and each alive
host on the network: perform traceroute.
5. On each client machine: use the information gathered (into files) from
’traceroute to different hosts’ to construct a network map from that
client machine’s perspective (agreed upon plain text file format can be
used).
6. Naming conventions for files are the same across clients, so the files
with the network map are going to have the same filenames. By using
(the same binary accross clients) a checksum calculation program, the
checksums for the network map files will be calculated. If the check-
sums from some machines are going to be the same, it will mean the
maps from the different perspectives of those machines will also be the
same.
7. All of the checksums from clients can then be sent to the server and
compared (this will help avoid network congestion and large amounts
of duplicate data from being sent).
8. On the server, the files with only the differing checksums can be re-
quested. Those files can be compressed on the client machines before
being sent over. Those files, once received by the server, can then be
used to construct either a single ’full-at-the-moment’ network map or
multiple different network maps (which can get converted to an easily
viewable graphical representations) that are to be manually compared
to identify/guess reasons for the differences.
9. Repeat the steps from 2nd to last once or twice a day (for example,
as a cron daemon tasks) to account for switched off, new or differently
configured equipment and to avoid unnecessary traffic interfering with
normal network operation.
17
20. The intended use steps are as follows:
1. Produce a compressed archive containing the binaries of the tools, var-
ious scripts and other required components (those should be organised
into a convenient directory structure/topology). Archive can be called
’tools.tar.gz’, for example.
2. Produce a list of commands to be executed on each client machine
(assume the tools archive has already been received by the clients).
Save it as ’commands.txt’, for example.
3. Produce one or more (same format) configuration files to be used by
the client and server java components. Name it ’dss settings.txt’, for
example.
4. Place all of the above mentioned components, as well as the executable
versions of the java server and clients (according to the configuration
file or files, and the requirements of the user) onto the machines in the
network.
5. Then, first, run the client side application on each of the client ma-
chines:
user@machine $ java -jar dsscli.jar dss_settings[same_or_other].txt
6. Now, second, run the server side application on one server machine:
user@machine $ java -jar dsssrv.jar dss_settings.txt
7. Wait for and monitor the results
19