TOR Packet Analysis - Locating Identifying Markers

TOR PACKET ANALYSIS:
LOCATING IDENTIFYING MARKERS

BRENT MUIR

1
TOR Packet Analysis

2010

ABSTRACT
This paper examines the traffic analysis of “The Onion Router” (TOR) network in
order to identify any markers of TOR usage on the network packets. A historical
overview of anonymity systems on the internet is provided. A detailed examination
of the TOR system is also conducted discussing its development, its features, its
limitations and its weaknesses. The methodology utilised to locate any TOR
identifying markers is via a packet comparison of TOR and non-TOR identical network
packets. A high-level and a low-level traffic analysis are conducted resulting in some
TOR markers being identified. These results are put into a law enforcement context
in order for a forensic analysis of TOR network packets to take place.
Recommendations are given regarding the usage of TOR to mitigate the behavioural
actions of users that have inadvertently violated their anonymity.

2
TOR Packet Analysis

MUIR

2010

TABLE OF CONTENTS
ABSTRACT ............................................................................................ 2
INTRODUCTION.................................................................................... 4
ANONYMITY ON THE INTERNET ........................................................... 5
TOR (THE ONION ROUTER) ................................................................. 10
METHODOLOGY ................................................................................. 19
TOR CLIENT ................................................................................................................ 20
PACKET COMPARISON................................................................................................ 22
PACKET CAPTURE & ANALYSIS .................................................................................... 24

RESULTS ............................................................................................. 25
CONCLUSION/RECOMMENDATIONS .................................................. 33
REFERENCE LIST ................................................................................. 36

3
TOR Packet Analysis

MUIR

2010

INTRODUCTION
Various technologies exist that assist internet users in maintaining their anonymity
while online. One of the most common technologies that allows for anonymity is
“TOR” (The Onion Router). This paper will examine the extent of anonymity that TOR
provides when the network traffic is subject to traffic analysis techniques.
Specifically, through the analysis of network packets, it is the hypothesis that TOR
traffic will be distinguishable from regular internet traffic. Through traffic analysis
and social engineering it is theorised that the originating IP address can still be learnt
from the remnants of the TOR network traffic.

Before discussing the analysis of TOR traffic, firstly anonymity on the internet will be
explained providing a brief background into the different techniques that have been
used since the internet was invented. Secondly, TOR itself will be discussed
explaining the technology behind the onion router and how it provides anonymity.
Next, the methodology utilised to test TOR’s ability to provide anonymity will be
explained, including traffic capture and analysis techniques. The results of the traffic
analysis will be detailed next providing an insight into TOR’s use as an anonymity tool
on the internet. Lastly, some recommendations will be given regarding the use of
TOR as an anonymity tool, and the analysis of TOR traffic in a law enforcement
context.

4
TOR Packet Analysis

MUIR

2010

ANONYMITY ON THE INTERNET
The development of inter-networking that led to the “internet” was not designed for
the mass-usage that it currently facilitates1. Many features that users take for
granted were not specified, for example encryption, and were subsequently “tackedon” to a system that couldn’t initially support it, rather than building it from the
ground-up with these features included2. One feature that was never envisaged was
anonymity for its users3. With the exclusion of anonymity by default on the internet
many systems have been designed to fill this gap, for example proxy servers and MIX
networks, yet the network traffic these systems attempt to anonymise still relies on
the same network infrastructure to send and receive packets4, thus leading to a
misconception that these systems provide full-anonymity on the internet.
Before discussing these anonymity systems any further it is important to define the
term “anonymity”. Danezis and Diaz define anonymity as “the state of being not
identifiable within a set of subjects, the anonymity set”5. This definition implies that
a user on the internet should not be identifiable through their network traffic any
more than any other user of the internet, that is, the network traffic should have no
identifying characteristics. This is not the case with network traffic as it is necessary
for network traffic to contain identification characteristics such as IP headers and
port numbers so that the computers receiving this traffic are able to correctly
interpret the data contained within. A more suitable term for these systems might
be “unlinkability”, which is defined in the ISO15408 standard as follows:
[Unlinkability] ensures that a user may make multiple uses of resources or services
without others being able to link these uses together. [...] Unlinkability requires that
users and/or subjects are unable to determine whether the same user caused certain
specific operations in the system6.

1

(Hafner & Lyon, 2000)
3
4
(Danezis & Diaz, 2008)
5
6
In (Danezis & Diaz, 2008)
2

TOR Packet Analysis

5
MUIR

2010

The term “unlinkability” describes a scenario where it is impossible to pin certain
network traffic to a particular user (or computer) which is more accurate for
describing the level of service that the anonymity systems provide on the internet.
There are numerous types of anonymity systems (anonymisers) designed to function
over the internet. These can be broken-down into three main categories: Proxy
servers; Anonymous email clients, and MIX or Crowd systems.
The idea of proxy servers is to route the network traffic through a proxy to hide the
original IP address of the internet connection. An example of these simple proxy
servers are “Anonymizer” and “SafeWeb”:
The Anonymizer product acts as a web proxy through which all web requests and
replies are relayed. The web servers accessed, should therefore not be able to extract
any information about the address of the requesting user7.

The purpose of these systems is to provide basic “unlinkability” and allows users the
ability to access IP location-specific content, such as videos hosted on
http://www.hulu.com/. These types of systems are known as “one-hop” proxy
servers as the network traffic is only routed through one proxy server at a time. This
makes them distinct from MIX networks which route their traffic through multiple
nodes (or “hops”) before reaching their destination. Due to the nature of the “onehop” system, back-tracing can be conducted on the internet traffic to determine the
originating IP address of the packets.
Anonymous email clients were originally devised in the early 1980’s by Chaum who
designed an email communication system that used Public Key Cryptography (PKG)
to not only hide the contents of the email message, but also the sender and receiver
of the email message8. The purpose of anonymous email clients is to provide a
communication channel where the author of an email message can communicate
with another without revealing their identity. An example of an anonymous email

7
8

(Chaum, 1981)

TOR Packet Analysis

6
MUIR

2010

client is the “Anon.penet.fi” relay system. This system utilised pseudonyms, or
fictitious names, to facilitate an anonymous communication system:
The technical principle behind the service was a table of correspondences between
real email addresses and pseudonymous addresses, kept by the server. Email to a
pseudonym would be forwarded to the real user. Email from a pseudonym was
stripped of all identifying information and forwarded to the recipient9.

These types of systems were designed with email communication in mind; however,
email is only one form of communication over the internet, and so these systems are
not suitable for providing anonymity (or unlinkability for that matter) for the
majority of internet traffic.
MIX or Crowd systems are similar in design to proxy servers; however they relay the
network traffic through multiple MIXes (or nodes) rather than through only one
proxy server.
Each user contacts a central server and receives the list of participants, the crowd. A
user then relays her web requests by passing it to another randomly selected node in
the crowd. Upon receiving a request each node tosses a biased coin and decides if it
should relay it further through the crowd or send it to the final recipient 10.

The principal idea is that messages to be anonymized are relayed through a node,
called a mix. The mix has a well-known RSA public key, and messages are divided into
blocks and encrypted using this key. The first few blocks are conceptually the
“header” of the message, and contain the address of the next mix. Upon receiving a
message, a mix decrypts all the blocks, strips out the first block that contains the
address of the recipient, and appends a block of random bits (the junk) at the end of
the message. 11

9

11
10

TOR Packet Analysis

7
MUIR

2010

Another purpose of MIX systems is to “actually mix together many messages, to
make it difficult for an adversary to follow messages through it, on a first-in, first-out
basis”12.
Like the previous systems, Crowds and MIXes are also susceptible to various attacks
that undermine the level of anonymity that they provide. One such attack is outlined
by Reiter and Rubin who explain that these systems “can be undermined by
executable web content that, if downloaded into the user's browser, can open
network connections directly from the browser to web servers, thus bypassing
Crowds altogether and exposing the user to the end server”13. By utilising end-to-end
traffic analysis techniques other inadequacies of these systems can also be
highlighted:
Another attack tries to correlate events at the endpoints of the system: if a user
makes an HTTP request, it is reasonable to assume that this request leaves the last
MIX towards a web server shortly later. Similarly, the response sent from the web
server to the last MIX will appear on the link between first MIX and user within some
14

seconds .

Onion routers, such as TOR, work similarly to Crowds in providing anonymity; in fact
their goal has been described as “to protect communication so that the recipients
and the sender cannot be linked by an adversary analyzing the network traffic”15.
Onion Routing is similar to Crowds in that an initial message forms a path of proxies
through which the initiator sends its future messages. The protocol gets its name
from its method of encrypting the initial packet and the address of the proxies at
each hop on the path with the public key of the previous step. This scheme results in
layers of encryption that are peeled off at each step in order to determine the next

12

(Reiter & Rubin, 1998)
14
(Rennhard & Plattner, 2002)
15
(Gomu kiewicz, Klonowski, & Kutylowski, 2004)
13

TOR Packet Analysis

8
MUIR

2010

address to send to on the path. This requires the initiator to predetermine the entire
path16.

One issue with onion routing is highlighted by Danezis and Diaz who state that
“onion routing aims at providing anonymous web browsing, and therefore would
become too slow if proper mixing was to be implemented”17. This means that
network traffic utilising onion routing does not mix various messages together unlike
MIX networks. Another weakness with onion routing is described by Wright et al.:
Onion Routing has generally been implemented with the onion routers being placed
in the network outside of the control of the individual users. While it can be argued
that this reduces the possibility of corruption of any particular onion router, it
requires that the users trust the operators of the onion router to maintain their
anonymity 18.

The level of trust required in the people or organisations which host these nodes is
possibly the biggest weakness in onion routing. Any corrupt node along the route
can compromise the entire anonymity of the network packet.

…strong anonymity against traffic analysis requires cooperation by and implicit trust
in many different parties. Any single entity, no matter how trustworthy it appears,
can be subverted, whether by technical means, corrupt personnel, or so-called
“subpoena attacks”19.

16

(Wright, Adler, Levine, & Shields, 2002)
18
(Wright et al., 2002)
19
(Androulaki, Raykova, Srivatsan, Stavrou, & Bellovin, 2008)
17

TOR Packet Analysis

MUIR

9
2010

TOR (THE ONION ROUTER)
One of the most widely deployed onion-routing anonymising systems is TOR (The
Onion Router). TOR is known as a "second-generation onion router"20 and was
originally funded by the United States Naval Research Centre and the United States
Defense Advanced Research Projects Agency (DARPA)21. The history of the TOR
project will be briefly outlined before a detailed examination of the TOR design is
discussed. After this the design limitations and weaknesses of TOR will be scrutinised
providing an overview of the attacks that have been proposed to target the TOR
system.

HISTORY
The beginnings of the TOR project date back to the mid-1990s when the US Office of
Naval Research (ONR) began developing onion routing techniques 22. The
development of the second generation of onion routers did not begin until 2002
when funding was provided by DARPA and ONR23. In 2003 the TOR network was
publicly deployed with nodes spanning across 2 continents and the following year
“hidden services” went online24. The funding from DARPA and ONR ceased in 2004
and the Electronic Frontier Foundation stepped up to continue funding the TOR
project25. One of the main purposes of TOR has been stated as being to “defend
against a form of network surveillance that threatens personal freedom and
privacy”26. It is widely acknowledged that TOR is often used by journalists and people
who wish to remain anonymous while browsing online, as well as people who have

20

(The Tor Project Inc, 2009b)
22
(Naval Research Laboratory)
23
24
25
26
(The Tor Project Inc, 2009a)
21

TOR Packet Analysis

10
MUIR

2010

restricted internet access, for example people living in China27. The US military also
utilise TOR to host hidden services for intelligence gathering purposes28.

DESIGN
The TOR system follows on from traditional onion routing services, that is it utilises
proxy servers in order to spoof an IP address so that the originating IP address
remains unknown. TOR can be seen as a mix between onion routing and crowd
systems. The TOR system “tunnels everything over TCP Port 80” 29 “over a network of
relays, and is particularly well tuned to work for web traffic, with the help of the
‘Privoxy’ content sanitizer”30. Privoxy is a web proxy service which modifies “web
page data and HTTP headers”31 and is commonly used for “removing ads and other
obnoxious Internet junk”32. In the case of TOR this web cache assists in removing
web traffic that could reveal the true IP address of the user, such as Javascript or
Flash content. Unlike traditional onion routing services TOR does not send the traffic
through in its original packet format, instead TOR uses fixed-length “Cells” to
transfer data. Each Cell consists of a header and a payload (see Diagram 1). As stated
by Fraser et. al, “TOR operates using fixed 512 byte cells (or packets) for stronger
anonymity and the Transport Layer Security (TLS) protocol for authentication and
privacy”33. Coupled with this Cell-based design, TOR utilises “Circuits” to choose the
path that the data will take as well as which protocol layer to anonymise: “they may
intercept IP packets directly, and relay them whole (stripping the source address)
along the circuit”34.

27

(The Tor Project Inc, 2009c)
(The Tor Project Inc, 2009c)
29
30
31
(Privoxy Developers 2010)
32
33
(Fraser, Raines, & Baldwin, 2005)
34
(Dingledine, Mathewson, & Syverson, 2005)
28

TOR Packet Analysis

11
MUIR

2010

Diagram 1 – TOR Cells (Packets)35

TOR uses a traditional network architecture: a list of volunteer servers is downloaded
from a directory service. Then, clients can create paths by choosing three random
nodes, over which their communication is relayed. Instead of an `onion' being sent to
distribute the cryptographic material, Tor uses an iterative mechanism. The client
connects to the first node, then it request this node to connect to the next one. The
bi-directional channel is used at each stage to perform an authenticated DiffieHellman key exchange. This guarantees forward secrecy and compulsion resistance:
only short term encryption keys are ever needed36.

Diagram 2 – The TOR Network37

35

(Dingledine, Mathewson, & Syverson, 2004)
37
(Bauer, McCoy, Grunwald, Kohno, & Sicker, 2007)
36

TOR Packet Analysis

MUIR

12
2010

HIDDEN SERVICES
Another benefit of the TOR system over traditional onion routers is that it allows
users to host content on the internet that can only be accessed via the use of the
TOR system, these are known as “Hidden Services”. These hidden services are
denoted by the use of the virtual Top Level Domain (TLD) “.onion” which is the
address entered by the user to connect to this type of service. When connecting to a
hidden service a user creates a new circuit to the hidden service’s rendezvous point
which adds an extra layer of protection38 (see Diagram 3). As claimed by Dingledine
et al. “this type of anonymity protects against Distributed-Denial-of-Service attacks:
attackers are forced to attack the onion routing network because they do not know
the host’s IP address”39.

Diagram 3 - Normal use of hidden services and rendezvous servers 40

38

(Dingledine et al., 2004)
40
(Øverlier & Syverson, 2006)
39

TOR Packet Analysis

13
MUIR

2010

LIMITATIONS and WEAKNESSES
The TOR system is not without its share of limitations; Danezis and Diaz raise the
point that “one notable difference between TOR and previous attempts at
anonymizing streams of traffic, is that it does not claim to offer security against even
passive global observers”41. In fact Lemos states that “the problem is known to both
the Tor Project, which advises everyone to use end-to-end encryption, and to
security researchers”42. This limitation accumulates to the following point, “an
adversary, who can observe a stream at two different points, can trivially realize it is
the same traffic”43.
This limitation leads to weaknesses that can be exploited to undermine the
anonymity of the TOR system. As outlined in Table 1 there are two types of attacks
against the TOR network: active attacks and passive attacks.
PASSIVE ATTACKS

ACTIVE ATTACKS

– Packet and connection timing

– Lying about bandwidth to get more

correlation

traffic

– Fingerprinting of traffic/usage patterns

– Failing circuits to bias node selection

– “Intersection Attacks” of multiple

– Modifying application layer traffic at

attributes of users

exit

Table 1 – Attacks Against TOR44

Passive attacks involve collecting of the network packets for later analysis and are
often hard to detect45. Fu et. al state that “passive traffic analysis attacks may, at first
sight, appear innocuous since those attacks do not actively alter the traffic (e.g.,

41

(Lemos, 2007)
43
44
(Perry, 2007)
45
(Fu, Graham, Bettati, & Zhao, 2003)
42

TOR Packet Analysis

14
MUIR

2010

drop, insert, and modify packets during a communication session)”46. Whereas active
attacks use probing methods to collect packet information which may alter the
traffic on the network. The various types of attacks against TOR, and there position
in the TOR network, are detailed in Diagram 4. As stated by Sun et. al:

Even when multiple proxies are used, however, the first link between the user and
the first proxy is the most vulnerable to attack, since the attacker (whether the first
proxy itself, the user's ISP, or perhaps an eavesdropper (say, on a wireless link) can
immediately determine the user's network address47.

Diagram 4 - TOR Attack Points48

One common attack against the TOR system is known as a “Timing Correlation
Attack”. This type of attack uses timing analysis methods to determine the network
latency of the TOR system. As observed by Murdoch:

…the load on the Tor node affects the latency of all connection streams that are
routed through this node. A similar increase in latency is introduced at all layers. As
expected, the higher the load on the node, the higher the latency49.

46

(Fu et al., 2003)
(Sun et al., 2002)
48
(Perry, 2007)
47

TOR Packet Analysis

15
MUIR

2010

An attacker relays traffic over all routers, and measures their latency: this latency is
affected by the other streams transported over the router. Long term correlations
between known signals injected by a malicious server and the measurements are
possible. This allows an adversary to trace a connection up to the first router used to
anonymize it50.

Diagram 5 - How Much Anonymity Does Network Latency Leak?51
(Measuring TOR circuit time without application-layer ACKs: the estimate for TAX
is t3 - t1. We abuse notation and write TXY for the one-way delay from X to Y 52)

Using website fingerprinting is another passive attack against the TOR system. In this
type of attack an adversary “fingerprints” commonly visited websites to determine
their file size, this file size is then compared to the network packets to determine if
there are any matches. As stated by Hintz:

49

(Murdoch & Danezis, 2005)
51
(Hopper, Vasserman, & Chan-Tin, 2007)
52
(Hopper et al., 2007)
50

TOR Packet Analysis

16
MUIR

2010

When a user visits a typical webpage, they download several files. A user downloads
the HTML file for the webpage, images included in the page, and the referenced
stylesheets. Each of these... files has a specific file size which is for the most part
constant.53

Attacks against TOR Hidden Services have also been devised. Øverlier and Syverson
discuss an attack which is used to locate the address of the Hidden Service. To carry
out this attack a compromised TOR node and a malicious client machine are used to
make repeated connections to the Hidden Service (see Diagram 6).

The main idea is to make many connections to the hidden server, so that it eventually
builds a circuit to the rendezvous point using the malicious Tor node as an entry
point. The malicious Tor node uses a simple timing analysis (packet counting) to
54

discover when this has happened .

Diagram 6 - Vulnerable location of Attacker in communication channel to the
Hidden Server 55

Although the design specifications of the TOR system negates traditional DDoS
attacks, Fraser et. al have proposed a mutated DDoS attack on TOR based on TOR’s
use of TLS.

53

(Hintz, 2003)
In (Hopper et al., 2007)
55
(Øverlier & Syverson, 2006)
54

TOR Packet Analysis

17
MUIR

2010

DDoS attacks targeting an Onion Router’s CPU are possible due to TOR’s dependence
on TLS. Such attacks force an Onion Router to execute so many public key
decryptions that it can no longer route messages56.

Another weakness in the TOR system is due to the fact that any user may host a TOR
server (node) which means that any person wishing to host a compromised node is
able to do so without any major hurdles. TOR designers have developed a formula
for determining the probability of using a compromised node:

…the probability of choosing a compromised entrance node is m/N and the
probability of choosing a compromised exit node is the same, thus, the combinatorial
model is expressed as (m/N )2, where m > 1 is the number of malicious nodes and N is
the network size…57

Another passive attack can be achieved by hosting a compromised TOR node and
collecting the unencrypted packets exiting this node. In this type of attack high-level
information about the network traffic can be learnt. Egerstad conducted an attack
against TOR using this method described and was able to intercept email messages
“discussing military and national-security issues between embassies and sensitive
corporate e-mail messages”58. This highlights another limitation of the TOR system,
or any anonymity system, if users enter their real logins and email addresses into
TOR then their perceived anonymity is compromised. TOR is not designed to be used
by “real” users due to the lack of end-to-end encryption, instead it is recommended
that people utilise anonymous email clients and logins59.

56

(Fraser et al., 2005)
In (Bauer et al., 2007)
58
In (Lemos, 2007)
59
(The Tor Project Inc, 2009a)
57

TOR Packet Analysis

18
MUIR

2010

METHODOLOGY
A gap exists in the research regarding how the weaknesses in TOR can be utilised
from a law enforcement perspective. In order to establish what information can be
gathered from the analysis of TOR packets, a packet comparison is necessary. This
comparison will examine the TOR packets as well as identical non-TOR (or standard)
internet packets. There are three stages to the methodology: Setting up the TOR
system; Packet selection, and Analysis. The setup of the TOR system will be discussed
to detail how the packets will be intercepted. This will be followed by an explanation
of the types of internet traffic examined. Finally the analysis stage will be outlined
discussing the various tools utilised to examine the TOR packets.

19
TOR Packet Analysis

MUIR

2010

TOR CLIENT
To ensure that the network traffic was generated from identical machines, virtual
machines (VMs) were utilised: one with TOR installed and the other without TOR.
Originally it had been planned to run a TOR exit node on a local server in order to
capture the unencrypted network traffic as it left the exit node, however it was
determined that to propagate realistic network traffic locally would produce
undesired results. Instead a standard TOR client was installed on the TOR-VM.
The full specifications for the two VMs was as follows:

TOR VM

Non-TOR VM

CPU

Intel Dual core E6550
@ 2.33GHz

Intel Dual core E6550
@ 2.33GHz

RAM

1 GB Ram

1 GB Ram

Operating System

Microsoft Windows XP
SP3

Microsoft Windows XP
SP3

Web Browser

Mozilla Firefox version
3.5.6

Mozilla Firefox version
3.5.6

TOR/Vidalia

TOR version 0.2.1.21
N/A
Vidalia version 0.2.6
WireShark version 1.2.5 WireShark version 1.2.5
(SVN Rev 31296)
(SVN Rev 31296)

WireShark
(traffic capture)
Eeye IRIS

Eeye IRIS version 5

Eeye IRIS version 5

(traffic analysis)
Table 2 – VM Comparison
By running a fully functioning TOR client for end-users allows for the packets to be
generated on-the-fly over the internet rather than propagating traffic to simulate
the internet. Rather than capture the TOR-packets on the local machine, which
would be unencrypted, the TOR traffic was captured by observing the traffic
entering the LAN (as depicted in Diagram 7). When installing TOR it provides a
Mozilla Firefox plug-in that can be switched on and off. It is for this reason that
Mozilla Firefox was utilised for the web browsing aspect of this research. Windows
20
TOR Packet Analysis

MUIR

2010

XP was chosen as the operating system for the VMs, this is due to the full
compatibility of the TOR system with Windows XP.

Diagram 7 – TOR and Non-TOR Network Setup

21
TOR Packet Analysis

MUIR

2010

PACKET COMPARISON
The types of internet traffic chosen to utilise for this analysis was based on the
highest internet hits of December 2009, as compiled by Nielsen60 (see Table 3). By
examining these statistics the following information about web usage can be
gathered: the internet is used as a source of information (for example Google or
News Corp); the internet is used as a communication medium (for example Facebook
or Yahoo); the internet is used as a source for shopping (for example eBay or
Amazon).
Using this knowledge the following web browsing usage was established:
1. Yahoo was selected as the user’s homepage. The user would log on to their
Yahoo webmail account.
2. The user would read their emails as well as write an email.
3. Following-on from reading their email, the user would click a link inside a
www.news.com.au email and read a few news articles, including one
regarding the 2010 Winter Olympics.
4. The user would then visit www.google.com and search for “winter Olympics”.
5. This search would result in a www.wikipedia.com link which the user would
click on.
6. From the original “winter Olympics” Wikipedia entry the user would click on
the 2010 winter Olympics link.
7. The user would then enter www.ebay.com into the web browser and search
for “winter Olympics tickets”.
8. Following this search the user would then browse a few of the resulting links.
9. The user would then enter www.amazon.com into the web browser and
search for “winter Olympics tickets”.
10. The user would then search for “ice hockey” under the “movies” category and
click on the first link.
11. The user would then enter www.facebook.com into the web browser and
login to their account.
60

(Nielsen, 2010)

TOR Packet Analysis

22
MUIR

2010

12. On the Facebook site the user would search for “winter Olympics” under the
“groups” category.
13. The user would then join a “winter Olympics” group and add a message to the
group’s Facebook “wall”.
14. Then the user would enter www.bing.com into the web browser and search
for “what is my ip”.
15. Following

the

above

search

the

user

would

click

on

the

link

www.whatismyip.com and recover their IP address (it is to be noted that
when TOR is installed the original homepage is always an IP address providing
link).

RANK PARENT

UNIQUE
ACTIVE TIME PER
AUDIENCE REACH PERSON
(000)
%
(HH:MM:SS)

1

GOOGLE

353,851

83.91

2:38:50

2

MICROSOFT

315,490

74.81

3:01:38

3

YAHOO!

228,711

54.23

2:12:36

4

FACEBOOK

206,878

49.06

5:57:17

5

EBAY

163,844

38.85

1:41:31

6

WIKIMEDIA
FOUNDATION

141,239

33.49

0:16:01

7

AMAZON

137,364

32.57

0:32:11

8

AOL LLC

129,360

30.67

2:21:03

9

NEWS CORP.
ONLINE

120,316

28.53

0:59:17

10

INTERACTIVECORP 115,131

27.30

0:11:36

Table 3- Top 10 Global Web Parent Companies, Home & Work December 200961

61

(Nielsen, 2010)

TOR Packet Analysis

23
MUIR

2010

PACKET CAPTURE & ANALYSIS
The analysis phase of the methodology has two stages: the packet capture stage;
and the packet analysis stage. To capture the network packets WireShark was
selected as it is a robust network capture tool based on the “pcap” library. The first
stage of the analysis involves capturing the identical network traffic from the two
VMs via WireShark. WireShark can then be utilised to conduct the first form of traffic
analysis to examine the low-level protocol information of the network traffic. This
initial analysis will focus on IP header information as well as connection types and
port information.
For the next stage of analysis Eeye’s IRIS will be utilised to rebuild the html traffic.
IRIS is a commercial network traffic monitoring and analysis tool that works on all
IPv4 internet traffic. It is able to rebuild html traffic as well as provide statistical
information about the network traffic.
By utilising WireShark and IRIS it will be possible to drill-down into the network
packets in order to exploit social engineering strategies to locate personally
identifiable information from the network packets. The social engineering analysis
will attempt to discover personally identifiable information from various sources,
including email and social networking sites. As stated by Cohen, “the forensic
examiner is more interested in high level information obtained from the traffic
rather than low level protocol information”62.
As the TOR traffic will be captured on the LAN, the most important question to
answer will be if there is anyway to tell if network packets are utilising TOR from
packet information. This is, can traffic analysis be used to “fingerprint” the network
packets in order to identify the usage of TOR.

62

(Cohen, 2008)

TOR Packet Analysis

24
MUIR

2010

RESULTS
As the network traffic is already known, the purpose of the analysis was not to
distinguish the websites visited by the user, instead the analysis is to determine
what, if any, TOR-specific traffic fragments can be identified in order to violate the
anonymity-properties of the TOR system.

LOW-LEVEL ANALYSIS
By observing the network packets through WireShark the low-level packet properties
were examined and compared. It is evident through the analysis of these packets
that TOR packets do not contain any property that can be utilised to “fingerprint” the
header of the packet, that is, there is no recurring hex header of the network traffic
that can be associated with TOR traffic. This is due to the first part of the TOR “cell”
being the CircID (Circuit ID) of 2 bytes, which is unlikely to be the same as numerous
circuits can be multiplexed over the single TLS connection.
This is not to say that TOR traffic cannot be recognised on-the-fly, just that a hex
header for packet fingerprinting is not possible. One way that TOR traffic can be
identified compared to standard internet traffic is through the default port number
that TOR utilises, port number 9001. By applying a TCP port filter in WireShark the
TOR traffic can be easily monitored (see Screenshot 1). Officially port 9001 is
reserved for traffic related to the “Microsoft Sharepoint Authoring Environment”;
however, TOR is setup by default to take advantage of this port number for both a
source port and a destination port. This is not to say that TOR can’t be re-configured
to use other TCP port numbers, only that a default installation TOR will utilise port
9001.

Screenshot 1 – WireShark TOR Port Filter
25
TOR Packet Analysis

MUIR

2010

Screenshot 2 – TOR Traffic on Port 9001
By filtering for port 9001 on the LAN the TOR traffic was able to be observed. Once
the TOR traffic has been identified it was important to note that the IP source and
destination address information could be learnt through analysis in WireShark. As is
seen in Screenshot 3 the destination address for this packet is 192.168.1.5. Knowing
this IP address will allow for future analysis and capturing of the unencrypted
packets from the local machine hosting the TOR client. In this way the identification
of the TOR packets could be used to determine which user has TOR installed on their
machine.

26
TOR Packet Analysis

MUIR

2010

Screenshot 3 - IP Address Identification

HIGH-LEVEL ANALYSIS
The TOR packets, when encrypted, do not allow for the HTML data to be rebuilt.
Although this may obstruct a high-level traffic analysis from taking place on these
TOR packets, there are workarounds which allow for the SSL-encrypted TOR packets
to be rebuilt. There is a WireShark plug-in called “TOR Dissector” which, when run on
a local machine running a TOR client , captures the user’s TOR SSL keys and decrypts
the TOR packets on the fly (see screenshot 4). This leads to an issue about whether
someone would be able to access a user’s local machine and run WireShark in
conjunction with TOR Dissector without the user’s knowledge. This does, however,
lead to an alternative method to decrypt the TOR traffic without the user suspecting
anything, by conducting a Man-In-The-Middle (MITM) attack. A MITM attack
positioned between the user’s computer and the TOR server will allow an attacker to
decrypt the user’s TOR packets in real-time, and either rebuild the HTML or filter for
plain text.
27
TOR Packet Analysis

MUIR

2010

Screenshot 4 – TOR Dissector

Private LAN/Corporations
By using a network capture tool such as WireShark and filtering the internet
connection of the LAN it is easy to recognise when TOR is used. By observing the TOR
packets on the LAN a corporation would be able to pinpoint the local host computer
utilising the TOR network. The use of TOR in many organisations is in itself likely to
breach their internet usage policies, and once the local host is determined any future
TOR packets could be captured and rebuilt using the WireShark plug-in “TOR
Dissector”, or by conducting a TOR MITM attack.

Government/ISP
If a corporation, or a Government, does not have access to the local machine running
TOR then the TOR MITM attack can still be performed to decrypt the TOR traffic.
Similarly it is possible to establish a compromised TOR exit node to capture
unencrypted TOR traffic. It must be stated that in order to conduct a targeted TOR
28
TOR Packet Analysis

MUIR

2010

MITM attack the adversary must have prior knowledge that the user has utilised TOR
and be aware of their IP address.
In 2007 Egerstad hosted a TOR exit node in an effort to capture unencrypted TOR
packets to investigate the types of internet traffic people were accessing through the
TOR service63. Among the captured packets were highly confidential emails regarding
foreign military issues sent by embassy staff members64. This highlights one of the
biggest fallacies with TOR, or any anonymity service, in that many users assume that
these services will provide complete anonymity even when sending emails from their
own accounts. Similarly in 2009 Vea conducted research into the anonymitybreaching properties of hosting a TOR service and stated:
...no matter how many anonymizing tools a user employs, or how well they are put
into play, that same user lets the cat out of the bag when their web posts, emails or
chats leave traces back to themselves...65

This research broke-down the TOR traffic into categories of usage as depicted in the
following graph:

Graph 1 – TOR Packet Distribution66

63

In (Lemos, 2007)
In (Lemos, 2007)
65
(Vea, 2009)
66
(Vea, 2009)
64

TOR Packet Analysis

29
MUIR

2010

The fact that anyone may host a TOR server is another concern and major security
risk, which may be mitigated via educating TOR users about what aspect of their
internet usage is really anonymous. This leads to an important question: how many
compromised TOR nodes are there? It only takes one compromised node along the
TOR relay to violate the entire relay’s traffic. This issue has propelled some into
investigating whether certain TOR nodes are in fact compromised and acting
maliciously67.
Since the TOR exit-nodes can decide what traffic (or rather, what ports) it wants to
relay it’s easy to set up a rogue exit-node that relays only cleartext traffic (and of
68

course sniffs it on the fly)...

This research resulted in the identification of numerous TOR exit nodes restricting
traffic based upon the port numbers. For example, a node was identified as
accepting only unencrypted IMAP, AOL Instant Messenger, MSN Messenger and
Yahoo Messenger traffic and rejecting all other forms of internet traffic69. It is
possible that the person hosting this server is doing so to assist people communicate
over TOR, yet it is equally possible that the node is compromised and capturing
unencrypted packets. Even if this node is not compromised it could become
compromised as easily as turning on WireShark. As well as being selective with
internet traffic, TOR nodes can be compromised using the MITM attack
methodology. By running a SSL enabled server the same researcher connected to
their website through TOR to check if any exit nodes were modifying his website’s
SSL certificate70. One TOR exit node was found to have modified his website’s SSL
certificate indicating that a MITM attack was being carried out through this particular
exit node71. It was unclear what the MITM attack was being used for, but it is
important to be aware of the potential dangers when using the TOR service.

67

(Team Furry, 2007a)
(Team Furry, 2007a)
69
(Team Furry, 2007a)
70
(Team Furry, 2007b)
71
(Team Furry, 2007b)
68

TOR Packet Analysis

30
MUIR

2010

Man-In-The-Middle attacks against TOR are not new. In fact there is a tool designed
to facilitate these types of attacks against SSL traffic called “SSLStrip”72. This tool has
been designed to work on a proxy server, such as TOR, between the user and the
internet. Whenever a user attempts to access an SSL website, “a program on the
proxy server sends the request to the website, handles any redirect to an SSLencrypted page and returns an exact duplicate to the user, without the
encryption”73. To the end user the website looks legitimate, even the ubiquitous SSL
“padlock” symbol is able to be spoofed with the use of this tool74. When run on a
TOR node the tool’s creator was able to capture and decrypt packets relating to
account logins, including 114 Yahoo credentials, and 50 Gmail credentials, as well as
packets containing credit-card numbers75. This research indicates that TOR users are
sending traffic relating to login details and credit card numbers and assuming that
the TOR system will ensure that these packets are secure and anonymous.
By analysing TOR traffic captured from the TOR exit nodes it is evident that users are
misguided in their understanding of the abilities of TOR, specifically users who utilise
TOR to “anonymously” log into their own email clients or other websites using their
own personally identifiable information (for example social networking/blogging
sites). In fact the TOR developers clearly state that for security users should
incorporate end-to-end encryption76.
The TOR developers also state that TOR does not guarantee against global
adversaries, for example corrupt or compromised nodes77. Currently there is the
ability for TOR users to manually select which exit nodes they wish to utilise.
Although good in theory, this leads to another issue regarding the choosing of the
nodes. It is therefore possible that a system similar to a Certificate Authority (CA)
system could be put into place for users to ensure the integrity of the exit nodes
which they are using. This would, however, result in a violation of the anonymity of
the people or organisations hosting these exit nodes. This violation would most likely
72

(Marlinspike, 2009)
(Security Focus, 2009)
74
(Marlinspike, 2009)
75
(Security Focus, 2009)
76
(Lemos, 2007)
77
73

TOR Packet Analysis

31
MUIR

2010

lead to a reduction in the number of privately operated exit nodes which in turn
would result in fewer onion layers and slower connections.

32
TOR Packet Analysis

MUIR

2010

CONCLUSION/RECOMMENDATIONS
Through traffic analysis it has been disproven that the originating IP addresses can
be recovered from TOR packets, that is, except if they are TOR packets captured over
a local area network. On the other hand social engineering has had success in
identifying users of the TOR network through insecure and non-anonymous logins.
Although this method does not always result in recovering the originating IP address,
recovering the real identity of the user is much more important from a law
enforcement perspective.
This paper has shown that through traffic analysis techniques TOR traffic can be
distinguished from regular internet traffic. Specifically, the port numbers that TOR
utilises, along with the frequent usage of SSL traffic, assist in locating packets
belonging to the TOR network. Having this knowledge greatly assists network
observers, either law enforcement or corporations, in recognising TOR and then
subsequently implementing suitable measures to further conduct traffic analysis on
these types of packets.
Although at first glance it may appear that the TOR system provides adequate
protection of the users’ anonymity, and to a certain degree their security, the
weaknesses exhibited by the TOR system can be easily exploited. From a law
enforcement perspective these weaknesses can be exploited in order to capture
these packets and conduct a forensic analysis of their content.
There are a few processes that are recommended in order to minimise the loss of
anonymity while using TOR. Firstly it is fundamentally flawed to use TOR with a
user’s real email address or account logins. This undermines any anonymity provided
by the TOR service. Instead it is highly recommended that only anonymous, or
temporary, email addresses and logins are used within the TOR network. Secondly
TOR should not be utilised to make any purchases over the internet. Using a credit
card number or a user’s physical shipping address will also undermine any anonymity
provided by TOR. Any reference to a user’s physical location or any personally
identifiable information should not be mentioned whilst utilising the TOR service to
ensure the anonymity of TOR users.
TOR Packet Analysis

33
MUIR

2010

If, for example, a user or computer had been identified as utilising TOR and a law
enforcement agency wanted to know what TOR was being used for then the law
enforcement agency could instigate a MITM attack using a tool such as “SSLStrip”.
The point of attack could either be running SSLStrip while acting as a compromised
TOR node, or running SSLStrip in between the user’s internet connection and the
TOR system itself. Using either of these attack points a law enforcement agency
would be able to “tap” the user’s network packets and view the content in clear text
(see Diagram 8).

Diagram 8 – TOR MITM Attack

By utilising open-source tools, such as WireShark and SSLStrip, a law enforcement
agency would be able to effectively capture and analyse a user’s TOR packets. In
order for this type of capture and analysis to be successful the law enforcement
agency would need to have prior knowledge of the person who is utilising the TOR
34
TOR Packet Analysis

MUIR

2010

service. Without knowledge of the person’s IP address the MITM attack would not
be feasible due to the requirement of positioning the attack in between the user’s
computer and the TOR system. If a law enforcement agency were to run a MITM
attack on a compromised TOR node they would not be able to determine which TOR
users were connected to their compromised TOR node, therefore in a law
enforcement context knowing the target is a necessity.
This paper has demonstrated that the TOR system is not infallible to traffic analysis
techniques. Indeed traffic analysis plays an important part in locating TOR packets
and subsequently implementing attacks that compromise the anonymity of the TOR
network. The attacks presented in this paper allow law enforcement agencies to
implement systems that will decrypt TOR packets to gain high-level access to the
original HTML of the packets. When used in a network forensic context these attacks
change TOR from an anonymity system into nothing more than a slight
inconvenience.

35
TOR Packet Analysis

MUIR

2010

REFERENCE LIST
Androulaki, E., Raykova, M., Srivatsan, S., Stavrou, A., & Bellovin, S. M. (2008). Par: Payment
for anonymous routing. Lecture notes in computer science, 5134, 219-236.
Bauer, K., McCoy, D., Grunwald, D., Kohno, T., & Sicker, D. (2007). Low-resource routing
attacks against anonymous systems. Paper presented at the Proceedings of the 2007
ACM workshop on Privacy in electronic society.
Chaum, D. L. (1981). Untraceable electronic mail, return addresses, and digital pseudonyms.
Communications of the ACM.
Cohen, M. I. (2008). PyFlag–An advanced network forensic framework. Digital Investigation,
5, 112-120.
Danezis, G., & Diaz, C. (2008). A survey of anonymous communication channels. Journal of
Privacy Technology.
Dingledine, R., Mathewson, N., & Syverson, P. (2004). Tor: The second-generation onion
router. Paper presented at the Proceedings of the 13 th Usenix Security Symposium.
Dingledine, R., Mathewson, N., & Syverson, P. (2005). Challenges in deploying low-latency
anonymity. NRL CHACS Report, 5540-5265.
Fraser, N. A., Raines, R. A., & Baldwin, R. O. (2005). Tor: An Anonymous Routing Network for
Covert On-line Operations. IOSphere: the Professional Journal of Joint Information
Operations, 44–47.
Fu, X., Graham, B., Bettati, R., & Zhao, W. (2003). Active traffic analysis attacks and
countermeasures.
Gomu kiewicz, M., Klonowski, M., & Kutylowski, M. (2004). Onions Based on Universal Re–
Encryption-Anonymous Communication Immune Against Repetitive Attack.
Hafner, K., & Lyon, M. (2000). Where wizards stay up late: The origins of the Internet:
Touchstone Books.
Hintz, A. (2003). Fingerprinting websites using traffic analysis. Lecture notes in computer
science, 171-178.
Hopper, N., Vasserman, E. Y., & Chan-Tin, E. (2007). How much anonymity does network
latency leak?
Lemos, R. (2007). Embassy leaks highlight pitfalls of Tor [Electronic Version]. SecurityFocus.
Retrieved 09/10/2009, from http://www.securityfocus.com/news/11486?ref=rss
Marlinspike, M. (2009). SSLSTRIP [Electronic Version]. Retrieved 05/02/2010, from
http://www.thoughtcrime.org/software/sslstrip/
Murdoch, S. J., & Danezis, G. (2005). Low-cost traffic analysis of tor. Paper presented at the
IEEE Symposium on Security and Privacy.
Naval Research Laboratory. Onion Routing - Brief Selected History. Retrieved 09/10/2009,
from http://www.onion-router.net/History.html
Nielsen. (2010). Top 10 Global Web Parent Companies. Retrieved 22/01/2010, from
http://en-us.nielsen.com/rankings/insights/rankings/internet
Øverlier, L., & Syverson, P. (2006). Locating hidden servers. Paper presented at the IEEE
Symposium on Security and Privacy.
Perry, M. (2007). Securing the Tor Network: Defcon.
Privoxy Developers (2010). Privoxy 3.0.16 User Manual. Retrieved 04/01/2010, from
http://www.privoxy.org/user-manual/index.html
Reiter, M. K., & Rubin, A. D. (1998). Crowds: Anonymity for web transactions. ACM
Transactions on Information and System Security (TISSEC), 1(1), 66-92.
Rennhard, M., & Plattner, B. (2002). Introducing morphmix: Peer-to-peer based anonymous
internet usage with collusion detection.
Security Focus. (2009). Man-in-the-middle attack sidesteps SSL [Electronic Version].
Retrieved 05/02/2010, from http://www.securityfocus.com/brief/910
36
TOR Packet Analysis

MUIR

2010

Sun, Q., Simon, D. R., Wang, Y. M., Russell, W., Padmanabhan, V. N., & Qiu, L. (2002).
Statistical Identification of Encrypted Web Browsing Traffic. Paper presented at the
Proceedings of IEEE Symposium on Security and Privacy,.
Team Furry. (2007a). On TOR. MW-Blog Retrieved 05/02/2010, from
http://www.teamfurry.com/wordpress/2007/11/19/on-tor/#more-177
Team Furry. (2007b). TOR Exit Nodes Doing MITM Attacks. MW-Blog Retrieved 05/02/2010,
from http://www.teamfurry.com/wordpress/2007/11/20/tor-exit-node-doing-mitmattacks
The Tor Project Inc. (2009a). Tor: anonymity online. Retrieved 09/10/2009, from
https://www.torproject.org/index.html.en
The Tor Project Inc. (2009b). Tor: Sponsors. Retrieved 09/10/2009, from
https://www.torproject.org/sponsors
The Tor Project Inc. (2009c). Tor: Users. Retrieved 09/10/2009, from
https://www.torproject.org/torusers.html.en
Vea, M. (2009). What Traffic is on a TOR Relay? Retrieved 04/01/2010, from
http://www.omninerd.com/articles/What_Traffic_is_on_a_TOR_Relay
Wright, M., Adler, M., Levine, B. N., & Shields, C. (2002). An analysis of the degradation of
anonymous protocols.

37
TOR Packet Analysis

MUIR

2010

TOR Packet Analysis - Locating Identifying Markers

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie TOR Packet Analysis - Locating Identifying Markers

Ähnlich wie TOR Packet Analysis - Locating Identifying Markers (20)

Mehr von Brent Muir

Mehr von Brent Muir (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

TOR Packet Analysis - Locating Identifying Markers