SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
A Survey of Peer-to-Peer Content Distribution Technologies
STEPHANOS ANDROUTSELLIS-THEOTOKIS AND DIOMIDIS SPINELLIS
Athens University of Economics and Business




             Distributed computer architectures labeled “peer-to-peer” are designed for the sharing
             of computer resources (content, storage, CPU cycles) by direct exchange, rather than
             requiring the intermediation or support of a centralized server or authority.
             Peer-to-peer architectures are characterized by their ability to adapt to failures and
             accommodate transient populations of nodes while maintaining acceptable connectivity
             and performance.
                Content distribution is an important peer-to-peer application on the Internet that
             has received considerable research attention. Content distribution applications
             typically allow personal computers to function in a coordinated manner as a distributed
             storage medium by contributing, searching, and obtaining digital content.
                In this survey, we propose a framework for analyzing peer-to-peer content
             distribution technologies. Our approach focuses on nonfunctional characteristics such
             as security, scalability, performance, fairness, and resource management potential, and
             examines the way in which these characteristics are reflected in—and affected by—the
             architectural design decisions adopted by current peer-to-peer systems.
                We study current peer-to-peer systems and infrastructure technologies in terms of
             their distributed object location and routing mechanisms, their approach to content
             replication, caching and migration, their support for encryption, access control,
             authentication and identity, anonymity, deniability, accountability and reputation, and
             their use of resource trading and management schemes.

             Categories and Subject Descriptors: C.2.1 [Computer-Communication Networks]:
             Network Architecture and Design—Network topology; C.2.2 [Computer-
             Communication Networks]: Network Protocols—Routing protocols; C.2.4
             [Computer-Communication Networks]: Distributed Systems—Distributed
             databases; H.2.4 [Database Management]: Systems—Distributed databases; H.3.4
             [Information Storage and Retrieval]: Systems and Software—Distributed systems
             General Terms: Algorithms, Design, Performance, Reliability, Security
             Additional Key Words and Phrases: Content distribution, DOLR, DHT, grid computing,
             p2p, peer-to-peer


1. INTRODUCTION                                           systems such as [Gnutella 2003], Seti@
                                                          Home [SetiAtHome 2003], OceanStore
A new wave of network architectures                       [Kubiatowicz et al. 2000], and many
labeled peer-to-peer is the basis of                      others. Such architectures are gener-
operation of distributed computing                        ally characterized by the direct sharing

Authors’ address: Athens University of Economics and Business, 76 Patission St., GR-104 34, Athens, Greece;
email: stheotok@aueb.gr.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or direct commercial advantage and
that copies show this notice on the first page or initial screen of a display along with the full citation.
Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212)
869-0481, or permissions@acm.org.
 c 2004 ACM 0360-0300/04/1200-0335 $5.00


                                                ACM Computing Surveys, Vol. 36, No. 4, December 2004, pp. 335–371.
336                                            S. Androutsellis-Theotokis and D. Spinellis

of computer resources (CPU cycles, stor-          We then present the main attributes of
age, content) rather than requiring the in-     peer-to-peer content distribution systems,
termediation of a centralized server.           and the aspects of their architectural de-
   The motivation behind basing appli-          sign which affect these attributes are an-
cations on peer-to-peer architectures           alyzed with reference to specific existing
derives to a large extent from their ability    peer-to-peer content distribution systems
to function, scale, and self-organize in        and technologies.
the presence of a highly transient popu-          Throughout this report the terms
lation of nodes, network, and computer          “node”, “peer” and “user” are used inter-
failures, without the need of a central         changeably, according to the context, to re-
server and the overhead of its adminis-         fer to the entities that are connected in a
tration. Such architectures typically have      peer-to-peer network.
as inherent characteristics scalability,
resistance to censorship and centralized
                                                1.1. Defining Peer-to-Peer Computing
control, and increased access to resources.
Administration, maintenance, respon-            A quick look at the literature reveals a con-
sibility for the operation, and even the        siderable number of different definitions
notion of “ownership” of peer-to-peer sys-      of “peer-to-peer”, mainly distinguished
tems are also distributed among the users,      by the “broadness” they attach to the
instead of being handled by a single com-       term.
pany, institution or person (see also Agre         The strictest definitions of “pure” peer-
[2003] for an interesting discussion of in-     to-peer refer to totally distributed sys-
stitutional change through decentralized        tems, in which all nodes are completely
architectures). Finally, peer-to-peer archi-    equivalent in terms of functionality and
tectures have the potential to accelerate       tasks they perform. These definitions fail
communication processes and reduce              to encompass, for example, systems that
collaboration costs through the ad hoc ad-      employ the notion of “supernodes” (nodes
ministration of working groups [Schoder         that function as dynamically assigned
and Fischbach 2003].                            localized mini-servers) such as Kazaa
   This report surveys peer-to-peer con-        [2003], which are, however, widely ac-
tent distribution technologies, aiming to       cepted as peer-to-peer, or systems that
provide a comprehensive account of ap-          rely on some centralized server infras-
plications, features, and implementation        tructure for a subset of noncore tasks
techniques. As this is a new and (thank-        (e.g. bootstrapping, maintaining reputa-
fully) rapidly evolving field, and advances      tion ratings, etc).
and new capabilities are constantly being          According to a broader and widely ac-
introduced, this article will be present-       cepted definition in Shirky [2000], “peer-
ing what essentially constitutes a “snap-       to-peer is a class of applications that take
shot” of the state of the art around the        advantage of resources—storage, cycles,
time of its writing—as is unavoidably the       content, human presence—available at
case for any survey of a thriving research      the edges of the internet”. This defini-
field. We do, however, believe that the          tion, however, encompasses systems that
core information and principles presented       completely rely upon centralized servers
will remain relevant and useful for the         for their operation (such as seti@home,
reader.                                         various instant messaging systems, or
   In the next section, we define the basic      even the notorious Napster), as well as
concepts of peer-to-peer computing. We          various applications from the field of Grid
classify peer-to-peer systems into three        computing.
categories (communication and collab-              Overall, it is fair to say that there is
oration, distributed computation, and           no general agreement about what “is” and
content distribution). Content distribu-        what “is not” peer-to-peer.
tion systems are further discussed and             We feel that this lack of agreement on
categorized.                                    a definition—or rather the acceptance of

                                                 ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                           337

various different definitions—is, to a large                nections fail or recover, in order to main-
extent, due to the fact that systems or                    tain its connectivity and performance.
applications are labeled “peer-to-peer” not
because of their internal operation or ar-                We therefore propose the following defi-
chitecture, but rather as a result of how               nition:
they are perceived “externally”, that is,                    Peer-to-peer systems are distributed systems
whether they give the impression of pro-                  consisting of interconnected nodes able to self-
viding direct interaction between comput-                 organize into network topologies with the purpose
ers. As a result, different definitions of                 of sharing resources such as content, CPU cycles,
“peer-to-peer” are applied to accommodate                 storage and bandwidth, capable of adapting to
the various different cases of such systems               failures and accommodating transient popula-
or applications.                                          tions of nodes while maintaining acceptable con-
  From our perspective, we believe that                   nectivity and performance, without requiring the
the two defining characteristics of peer-to-               intermediation or support of a global centralized
                                                          server or authority.
peer architectures are the following:
—The sharing of computer resources by                     This definition is meant to encompass
 direct exchange, rather than requir-                   “degrees of centralization” ranging from
 ing the intermediation of a centralized                the pure, completely decentralized sys-
 server. Centralized servers can some-                  tems such as Gnutella, to “partially cen-
 times be used for specific tasks (sys-                  tralized” systems1 such as Kazaa. How-
 tem bootstrapping, adding new nodes                    ever, for the purposes of this survey, we
 to the network, obtain global keys for                 shall not restrict our presentation and dis-
 data encryption), however, systems that                cussion of architectures and systems to
 rely on one or more global centralized                 our own proposed definition, and we will
 servers for their basic operation (e.g. for            take into account systems that are consid-
 maintaining a global index and search-                 ered peer-to-peer by other definitions as
 ing through it—Napster, Publius) are                   well, including systems that employ a cen-
 clearly stretching the definition of peer-              tralized server (such as Napster, instant
 to-peer.                                               messaging applications, and others).
 As the nodes of a peer-to-peer network                   The focus of our study is content distri-
 cannot rely on a central server coordi-                bution, a significant area of peer-to-peer
 nating the exchange of content and the                 systems that has received considerable re-
 operation of the entire network, they are              search attention.
 required to actively participate by inde-
 pendently and unilaterally performing                  1.2. Peer-to-Peer and Grid Computing
 tasks such as searching for other nodes,               Peer-to-peer and Grid computing are two
 locating or caching content, routing in-               approaches to distributed computing, both
 formation and messages, connecting to                  concerned with the organization of re-
 or disconnecting from other neighboring                source sharing in large-scale computa-
 nodes, encrypting, introducing, retriev-               tional societies.
 ing, decrypting and verifying content, as                 Grids are distributed systems that en-
 well as others.                                        able the large-scale coordinated use and
—Their ability to treat instability and                 sharing of geographically distributed re-
 variable connectivity as the norm, au-                 sources, based on persistent, standards-
 tomatically adapting to failures in both               based service infrastructures, often with
 network connections and computers, as                  a high-performance orientation [Foster
 well as to a transient population of                   et al. 2001].
 nodes.
 This fault-tolerant, self-organizing ca-               1 In our definition, we refer to “global” servers, to
 pacity suggests the need for an adap-                  make a distinction from the dynamically assigned
 tive network topology that will change                 “supernodes” of partially centralized systems (see
 as nodes enter or leave and network con-               also Section 3.3.3)


ACM Computing Surveys, Vol. 36, No. 4, December 2004.
338                                             S. Androutsellis-Theotokis and D. Spinellis

   As Grid systems increase in scale, they       Chat/Irc, Instant Messaging (Aol, Icq,
begin to require solutions to issues of self-    Yahoo, Msn), and Jabber [Jabber 2003].
configuration, fault tolerance, and scala-           Distributed Computation. This category
bility, for which peer-to-peer research has      includes systems whose aim is to take ad-
much to offer.                                   vantage of the available peer computer
   Peer-to-peer systems, on the other hand,      processing power (CPU cycles). This is
focus on dealing with instability, tran-         achieved by breaking down a computer-
sient populations, fault tolerance, and self-    intensive task into small work units and
adaptation. To date, however, peer-to-peer       distributing them to different peer com-
developers have worked mainly on verti-          puters, that execute their corresponding
cally integrated applications, rather than       work unit and return the results. Cen-
seeking to define common protocols and            tral coordination is invariably required,
standardized infrastructures for interop-        mainly for breaking up and distributing
erability.                                       the tasks and collecting the results. Exam-
   In summary, one can say that “Grid com-       ples of such systems include projects such
puting addresses infrastructure, but not         as Seti@home [Sullivan III et al. 1997;
yet failure, while peer-to-peer addresses        SetiAtHome 2003], genome@home [Lar-
failure, but not yet infrastructure” [Foster     son et al. 2003; GenomeAtHome 2003],
and Iamnitchi 2003].                             and others.
   In addition to this, the form of sharing         Internet Service Support. A number of
initially targeted by peer-to-peer has been      different applications based on peer-to-
of limited functionality, providing a global     peer infrastructures have emerged for
content distribution and filesharing space        supporting a variety of Internet ser-
lacking any form of access control.              vices. Examples of such applications
   As peer-to-peer technologies move into        include peer-to-peer multicast systems
more sophisticated and complex applica-          [VanRenesse et al. 2003; Castro et al.
tions, such as structured content distri-        2002], Internet indirection infrastruc-
bution, desktop collaboration, and net-          tures [Stoica et al. 2002], and secu-
work computation, it is expected that            rity applications, providing protection
there will be a strong convergence be-           against denial of service or virus attacks
tween peer-to-peer and Grid computing            [Keromytis et al. 2002; Janakiraman et al.
[Foster 2000]. The result will be a new          2003; Vlachos et al. 2004].
class of technologies combining elements            Database Systems. Considerable work
of both peer-to-peer and Grid comput-            has been done on designing distributed
ing, which will address scalability, self-       database systems based on peer-to-peer
adaptation, and failure recovery, while,         infrastructures. Bernstein et al. [2002]
at the same time, providing a persis-            propose the Local Relational Model
tent and standardized infrastructure for         (LRM), in which the set of all data stored
interoperability.                                in a peer-to-peer network is assumed to
                                                 be comprised of inconsistent local rela-
                                                 tional databases interconnected by sets
1.3. Classification of Peer-to-Peer
                                                 of “acquaintances” that define translation
     Applications
                                                 rules and semantic dependencies between
Peer-to-peer architectures have been em-         them. PIER [Huebsch et al. 2003] is a
ployed for a variety of different application    scalable distributed query engine built
categories, which include the following.         on top of a peer-to-peer overlay network
   Communication and Collaboration.              topology that allows relational queries
This category includes systems that              to run across thousands of computers.
provide the infrastructure for facilitating      The Piazza system [Halevy et al. 2003]
direct, usually real-time, communica-            provides an infrastructure for building
tion and collaboration between peer              semantic Web [Berners-Lee et al. 2001]
computers. Examples include chat and             applications, consisting of nodes that can
instant messaging applications, such as          supply either source data (e.g. from a

                                                  ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                   339

relational database), schemas (or ontolo-                 An examination of current peer-to-peer
gies) or both. Piazza nodes are transitively            technologies in this context suggests that
connected by chains of mappings between                 they can be grouped as follows (with ex-
pairs of nodes, allowing queries to be                  amples in Tables I and II).
distributed across the Piazza network.
Finally, Edutella [Nejdl et al. 2003] is an             —Peer-to-Peer Applications. This category
open-source project that builds on the                   includes content distribution systems
W3C metadata standard RDF, to provide                    that are based on peer-to-peer tech-
a metadata infrastructure and querying                   nology. We attempt to further subdi-
capability for peer-to-peer applications.                vide them into the following two groups,
  Content Distribution. Most of the cur-                 based on their application goals and per-
rent peer-to-peer systems fall within the                ceived complexity:
category of content distribution, which in-              Peer-to-peer “file exchange” systems.
cludes systems and infrastructures de-                   These systems are targeted towards
signed for the sharing of digital media                  simple, one-off file exchanges between
and other data between users. Peer-to-                   peers. They are used for setting up
peer content distribution systems range                  a network of peers and providing fa-
from relatively simple direct fileshar-                   cilities for searching and transferring
ing applications, to more sophisticated                  files between them. These are typically
systems that create a distributed stor-                  light-weight applications that adopt a
age medium for securely and efficiently                   best-effort approach without addressing
publishing, organizing, indexing, search-                security, availability, and persistence. It
ing, updating, and retrieving data. There                is mainly systems in this category that
are numerous such systems and in-                        are responsible for spawning the repu-
frastructures. Some examples are: the                    tation (and in some cases notoriety) of
late Napster, Publius [Waldman et al.                    peer-to-peer technologies.
2000], Gnutella [Gnutella 2003], Kazaa                   Peer-to-peer content publishing and stor-
[Kazaa 2003], Freenet [Clarke et al.                     age systems. These systems are targeted
2000], MojoNation [MojoNation 2003],                     towards creating a distributed storage
Oceanstore [Kubiatowicz et al. 2000],                    medium in—and through—which users
PAST [Druschel and Rowstron 2001],                       will be able to publish, store, and dis-
Chord [Stoica et al. 2001], Scan [Chen                   tribute content in a secure and persis-
et al. 2000], FreeHaven [Dingledine                      tent manner. Such content is meant to
et al. 2000], Groove [Groove 2003], and                  be accessible in a controlled manner by
Mnemosyne [Hand and Roscoe 2002].                        peers with appropriate privileges. The
  This survey will focus on content distri-              main focus of such systems is security
bution, one of the most prominent appli-                 and persistence, and often the aim is
cation areas of peer-to-peer systems.                    to incorporate provisions for account-
                                                         ability, anonymity and censorship resis-
                                                         tance, as well as persistent content man-
1.4. Peer-to-Peer Content Distribution                   agement (updating, removing, version
In its most basic form, a peer-to-peer con-              control) facilities.
tent distribution system creates a dis-                 —Peer-to-Peer Infrastructures. This cate-
tributed storage medium that allows for                  gory includes peer-to-peer based infras-
the publishing, searching, and retrieval                 tructures that do not constitute working
of files by members of its network. As                    applications, but provide peer-to-peer
systems become more sophisticated, non-                  based services and application frame-
functional features may be provided, in-                 works. The following infrastructure ser-
cluding provisions for security, anonymity,              vices are identified:
fairness, increased scalability and perfor-              Routing and location. Any peer-to-peer
mance, as well as resource management                    content distribution system relies on
and organization capabilities. All of these              a network of peers within which re-
will be discussed in the following sections.             quests and messages must be routed

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
340                                                 S. Androutsellis-Theotokis and D. Spinellis

                        Table I. A Classification of Current Peer-to-Peer Systems
  (RM: Resource Management; CR: Censorship Resistance; PS: Performance and Scalability; SPE: Security,
      Privacy and Encryption; A: Anonymity; RA: Reputation and Accountability; RT: Resource Trading.)

Peer-to-Peer File Exchange Systems

System                                  Brief Description
Napster                                 Distributed file sharing—hybrid decentralized.
Kazaa [2003]                            Distributed file sharing—partially centralized.
Gnutella [2003]                         Distributed file sharing—purely decentralized.

Peer-to-Peer Content Publishing and Storage Systems

System                                 Brief Description                                    Main Focus
Scan [Chen et al. 2000]                A dynamic, scalable, efficient content distribution       PS
                                       network. Provides dynamic content replication.
Publius [Waldman et al. 2000]          A censorship-resistant system for publishing            RM
                                       content. Static list of servers. Enhanced content
                                       management (update and delete).
Groove [Groove 2003]                   Internet communications software for direct          RM,PS,SPE
                                       real-time peer-to-peer interaction.
FreeHaven [Dingledine et al. 2000]     A flexible system for anonymous storage.                A,RA
Freenet [Clarke et al. 2000]           Distributed anonymous information storage and          A,RA
                                       retrieval system.
MojoNation [MojoNation 2003]           Distributed file storage. Fairness through the use     SPE,RT
                                       of currency mojo.
Oceanstore [Kubiatowicz et al. 2000]   An architecture for global scale persistent storage. RM,PS,SPE
                                       Scalable, provides security and access control.
Intermemory [Chen et al. 1999]         System of networked computers. Donate storage            RT
                                       in exchange for the right to publish data.
Mnemosyne [Hand and Roscoe 2002]       Peer-to-peer steganographic storage system.             SPE
                                       Provides privacy and plausible deniability.
PAST [Druschel and Rowstron 2001]      Large scale persistent peer-to-peer storage utility.  PS,SPE
Dagster [Stubblefield and Wallach 2001] A censorship-resistant document publishing            CR,SPE
                                       system.
Tangler [Waldman and Mazi 2001]        A content publishing system based on document         CR,SPE
                                       entanglements.



  with efficiency and fault tolerance, and             2. ANALYSIS FRAMEWORK
  through which peers and content can
                                                      In this survey, we present a description
  be efficiently located. Different infras-
                                                      and analysis of applications, systems, and
  tructures and algorithms have been
                                                      infrastructures that are based on peer-
  developed to provide such services.
                                                      to-peer architectures and aim at either
  Anonymity. Peer-to-peer based infras-
                                                      offering content distribution solutions, or
  tructure systems have been designed
                                                      supporting content distribution related
  with the explicit aim of providing user
                                                      activities.
  anonymity.
                                                         Our approach is based on:
  Reputation Management. In a peer-
  to-peer network, there is no central                —identifying the feature space of nonfunc-
  organization to maintain reputation                   tional properties and characteristics of
  information for users and their behavior.             content distribution systems,
  Reputation information is, therefore,               —determining the way in which the non-
  hosted in the various network nodes. In               functional properties depend on, and
  order for such reputation information to              can be affected by, various design fea-
  be kept secure, up-to-date, and avail-                tures, and
  able throughout the network, complex                —providing an account, analysis, and
  reputation management infrastructures                 evaluation of the design features and
  need to be employed.                                  solutions that have been adopted by

                                                       ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                              341

                       Table II. A Classification of Current Peer-to-Peer Infrastructures

Infrastructures for routing and location

Chord [Stoica et al. 2001]                              A scalable peer-to-peer lookup service. Given a key, it
                                                        maps the key to a node.
CAN [Ratnasamy et al. 2001]                             Scalable content addressable network. A distributed
                                                        infrastructure that provides hash-table functionality
                                                        for mapping file names to their locations.
Pastry [Rowstron and Druschel 2001]                     Infrastructure for fault-tolerant wide-area location and
                                                        routing.
Tapestry [Zhao et al. 2001]                             Infrastructure for fault-tolerant wide-area location and
                                                        routing.
Kademlia [Mayamounkov and Mazieres 2002]                A scalable peer-to-peer lookup service based on the
                                                        XOR metric.

Infrastructures for anonymity

Anonymous remailer mixnet [Berthold et al. 1998] Infrastructure for anonymous connection.
Onion Routing [Goldschlag et al. 1999]           Infrastructure for anonymous connection.
ZeroKnowledge Freedom [Freedom 2003]             Infrastructure for anonymous connection.
Tarzan [Freedman et al. 2002]                    A peer-to-peer decentralized anonymous network layer.

Infrastructures for reputation management

Eigentrust [Kamvar et al. 2003]                         A Distributed reputation management algorithm.
A Partially distributed reputation management           A partially distributed approach based on a
  system [Gupta et al. 2003]                            debit/credit or debit-only scheme.
PeerTrust [Xiong and Liu 2002]                          A decentralized, feedback-based reputation
                                                        management system using satisfaction and number of
                                                        interaction metrics.



  current peer-to-peer systems, as well as                   data and processing methods. Unautho-
  their shortcomings, potential improve-                     rized entities cannot change data; ad-
  ments, and proposed alternatives.                          versaries cannot substitute a forged doc-
                                                             ument for a requested one.
   For the identification of the nonfunc-                     Privacy and confidentiality. Ensuring
tional characteristics on which our study                    that data is accessible only to those
is based, we used the work of Shaw and                       authorized to have access, and that
Garlan [1995], which we adapted to the                       there is control over what data is col-
field of peer-to-peer content distribution.                   lected, how it is used, and how it is
   The various design features and solu-                     maintained.
tions that we examine, as well as their
                                                             Availability and persistence. Ensuring
relationship to the relevant nonfunctional
                                                             that authorized users have access to
characteristics, were assembled through
                                                             data and associated assets when re-
a detailed analysis and study of current
                                                             quired. For a peer-to-peer content dis-
peer-to-peer content distribution systems
                                                             tribution system this often means al-
that are either deployed, researched, or
                                                             ways. This property entails stability in
proposed.
                                                             the presence of failure, or changing node
   The resulting analysis framework is il-
                                                             populations.
lustrated in Figure 1. The boxes in the
periphery show the most important at-                      Scalability. Maintaining the system’s per-
tributes of peer-to-peer content distribu-                   formance attributes independent of the
tion systems, described as follows.                          number of nodes or documents in its
                                                             network. A dramatic increase in the
Security. Further analyzed in terms of:                      number of nodes or documents will
  Integrity and authenticity. Safeguard-                     have minimal effect on performance and
  ing the accuracy and completeness of                       availability.

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
342                                                   S. Androutsellis-Theotokis and D. Spinellis




  Fig. 1. Illustration of the way in which various design features affect the main characteristics of peer-
  to-peer content distribution systems.

Performance. The time required for per-                   grouping, based on the content itself;
  forming the operations allowed by the                   grouping, based on locality or network
  system, typically publication, searching,               distance; grouping, based on organiza-
  and retrieval of documents.                             tion ties, as well as others.
Fairness. Ensuring that users offer and                    The relationships between nonfunc-
  consume resources in a fair and bal-                  tional characteristics are depicted as a
  anced manner. May rely on account-                    UML diagram in Figure 1. The Figure
  ability, reputation, and resource trading             schematically illustrates the relationship
  mechanisms.                                           between the various design features and
Resource Management Capabilities. In                    the main characteristics of peer-to-peer
  their most basic form, peer-to-peer con-              content distribution systems.
  tent distribution systems allow the pub-                 The boxes in the center show the design
  lishing, searching, and retrieval of docu-            decisions that affect these attributes. We
  ments. More sophisticated systems may                 note that these design decisions are mostly
  afford more advanced resource manage-                 independent and orthogonal.
  ment capabilities, such as editing or                    We see, for example, that the perfor-
  removal of documents, management of                   mance of a peer-to-peer content distribu-
  storage space, and operations on meta-                tion system is affected by the distributed
  data.                                                 object location and routing mechanisms,
Semantic Grouping of Information. An                    as well as by the data replication,
  area of research that has attracted con-              caching, and migration algorithms. Fair-
  siderable attention recently is the se-               ness, on the other hand, depends on
  mantic grouping and organization of                   the system’s provisions for accountabil-
  content and information in peer-to-peer               ity and reputation, as well as on any re-
  networks. Various grouping schemes                    source trading mechanisms that may be
  are encountered, such as semantic                     implemented.

                                                          ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                    343

  In the following sections of this survey,             such networks are often termed “servents”
the different design decisions and features             (SERVers+clieENTS).
will be presented and discussed, based on                  Partially Centralized Architectures. The
an examination of existing peer-to-peer                 basis is the same as with purely decentral-
content distribution systems, and with                  ized systems. Some of the nodes, however,
reference to the way in which they affect               assume a more important role, acting as
the main attributes of these systems,                   local central indexes for files shared by lo-
as presented. A relevant presentation                   cal peers. The way in which these supern-
and discussion of various desired prop-                 odes are assigned their role by the network
erties of peer-to-peer systems can also                 varies between different systems. It is im-
be found in Kubiatowics [2003], while                   portant, however, to note that these su-
an analysis of peer-to-peer systems from                pernodes do not constitute single points
the end-user perspective can be found in                of failure for a peer-to-peer network, since
Lee [2003].                                             they are dynamically assigned and, if they
                                                        fail, the network will automatically take
3. PEER-TO-PEER DISTRIBUTED OBJECT                      action to replace them with others.
   LOCATION AND ROUTING                                    Hybrid Decentralized Architectures. In
                                                        these systems, there is a central server fa-
The operation of any peer-to-peer content               cilitating the interaction between peers by
distribution system relies on a network                 maintaining directories of metadata, de-
of peer computers (nodes), and connec-                  scribing the shared files stored by the peer
tions (edges) between them. This network                nodes. Although the end-to-end interac-
is formed on top of—and independently                   tion and file exchanges may take place di-
from—the underlying physical computer                   rectly between two peer nodes, the central
(typically IP) network, and is thus referred            servers facilitate this interaction by per-
to as an “overlay” network. The topology,               forming the lookups and identifying the
structure, and degree of centralization of              nodes storing the files. The terms “peer-
the overlay network, and the routing and                through-peer” or “broker mediated” are
location mechanisms it employs for mes-                 sometimes used for such systems [Kim
sages and content are crucial to the oper-              2001].
ation of the system, as they affect its fault              Obviously, in these architectures, there
tolerance, self-maintainability, adaptabil-             is a single point of failure (the central
ity to failures, performance, scalability,              server). This typically renders them inher-
and security; in short almost all of the sys-           ently unscalable and vulnerable to censor-
tem’s attributes as laid out in Section 2               ship, technical failure, or malicious attack.
(see also Figure 1).
   Overlay networks can be distinguished
in terms of their centralization and                    3.2. Network Structure
structure.                                              By structure, we refer to whether the over-
                                                        lay network is created nondeterministi-
3.1. Overlay Network Centralization
                                                        cally (ad hoc) as nodes and content are
Although in their purest form peer-to-peer              added, or whether its creation is based
overlay networks are supposed to be to-                 on specific rules. We categorize peer-to-
tally decentralized, in practice this is not            peer networks as follows, in terms of their
always true, and systems with various de-               structure:
grees of centralization are encountered.                  Unstructured. The placement of content
Specifically, the following three categories             (files) is completely unrelated to the over-
are identified.                                          lay topology.
   Purely Decentralized Architectures. All                In an unstructured network, content
nodes in the network perform exactly the                typically needs to be located. Searching
same tasks, acting both as servers and                  mechanisms range from brute force meth-
clients, and there is no central coordi-                ods, such as flooding the network with
nation of their activities. The nodes of                propagating queries in a breadth-first

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
344                                              S. Androutsellis-Theotokis and D. Spinellis

or depth-first manner until the desired              Table III. A classification of Peer-to-Peer Content
content is located, to more sophisticated            Distribution Systems and Location and Routing
                                                   Infrastructures in Terms of Their Network Structure,
and resource-preserving strategies that                        With Some Typical Examples
include the use of random walks and rout-
ing indices (discussed in more detail in                                      Centralization
Section 3.3.4). The searching mechanisms
employed in unstructured networks have                               Hybrid    Partial    None
obvious implications, particularly in re-          Unstructured      Napster, Kazaa,    Gnutella,
gards to matters of availability, scalability,                       Publius Mor-       FreeHaven
and persistence.                                                              pheus,
  Unstructured systems are generally                                          Gnutella,
                                                                              Edutella
more appropriate for accommodating
highly-transient node populations. Some            Structured                             Chord,
representative examples of unstructured            Infrastructures                        CAN,
systems are Napster, Publius [Waldman                                                     Tapestry,
                                                                                          Pastry
et al. 2000], Gnutella [Gnutella 2003],
Kazaa [Kazaa 2003], Edutella [Nejdl et al.         Structured                             OceanStore,
2003], FreeHaven [Dingledine et al. 2000],         Systems                                Mnemosyne,
                                                                                          Scan, PAST,
as well as others.                                                                        Kademlia,
  Structured. These have emerged mainly                                                   Tarzan
in an attempt to address the scalability
issues that unstructured systems were               A category of networks that are in be-
originally faced with. In structured net-         tween structured and unstructured are re-
works, the overlay topology is tightly            ferred to as loosely structured networks.
controlled and files (or pointers to them)         Although the location of content is not
are placed at precisely specified locations.       completely specified, it is affected by rout-
These systems essentially provide a map-          ing hints. A typical example is Freenet
ping between content (e.g. file identifier)         [Clarke et al. 2000; Clake et al. 2002].
and location (e.g. node address), in the            Table III summarizes the categories we
form of a distributed routing table, so that      outlined, with examples of peer-to-peer
queries can be efficiently routed to the           content distribution systems and architec-
node with the desired content [Lv et al.          tures. Note that all structured and loosely
2002].                                            structured systems are inherently purely
  Structured systems offer a scalable so-         decentralized; form follows function.
lution for exact-match queries, that is,            In the following sections, the overlay
queries where the exact identifier of the          network topology and operation of differ-
requested data object is known (as com-           ent peer-to-peer systems is discussed, fol-
pared to keyword queries). Using exact-           lowing the above classification, according
match queries as a substrate for keyword          to degree of centralization and network
queries remains an open research problem          structure.
for distributed environments [Witten et al.
1999].                                            3.3. Unstructured Architectures
  A disadvantage of structured systems is           3.3.1. Hybrid Decentralized. Figure 2 il-
that it is hard to maintain the structure         lustrates the architecture of a typical hy-
required for efficiently routing messages          brid decentralized peer-to-peer system.
in the face of a very transient node popu-          Each client computer stores content
lation, in which nodes are joining and leav-      (files) shared with the rest of the network.
ing at a high rate [Lv et al. 2002].              All clients connect to a central directory
  Typical examples of structured systems          server that maintains:
include Chord [Stoica et al. 2001], CAN
[Ratnasamy et al. 2001], PAST [Druschel           —A table of registered user connection in-
and Rowstron 2001], Tapestry [Zhao et al.          formation (IP address, connection band-
2001] among others.                                width etc.)

                                                   ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                   345

                                                        torious Napster and Publius systems
                                                        [Waldman et al. 2000] that rely on a static,
                                                        system-wide list of servers. Their architec-
                                                        ture does not provide any smooth, decen-
                                                        tralized support for adding a new server,
                                                        or removing dead or malicious servers. A
                                                        comprehensive study of the behavior of
                                                        hybrid peer-to-peer systems and a com-
                                                        parison of their query characteristics and
                                                        their performance in terms of bandwidth
                                                        and CPU cycle consumption is presented
                                                        in Yang and Garcia-Molina [2001].
                                                           It should be noted that systems that
                                                        do not fall under the hybrid decentral-
                                                        ized category may still use some central
                                                        administration server to a limited extent,
Fig. 2. Typical hybrid decentralized peer-to-peer       for example, for initial system boot-
architecture. A central directory server maintains      strapping (e.g. Mojonation [MojoNation
an index of the metadata for all files in the network.
                                                        2003]), or for allowing new users to
—A table listing the files that each user                join the network by providing them
 holds and shares in the network, along                 with access to a list of current users
 with metadata descriptions of the files                 (e.g. gnutellahosts.com for the gnutella
 (e.g. filename, time of creation, etc.)                 network).

   A computer that wishes to join the                      3.3.2. Purely Decentralized. In this sec-
network contacts the central server and                 tion, we examine the Gnutella network
reports the files it maintains. Client com-              [Gnutella 2003], an interesting and rep-
puters send requests for files to the server.            resentative member of purely decentral-
The server searches for matches in its in-              ized peer-to-peer architectures, due to its
dex, returning a list of users that hold the            open architecture, achieved scale, and self-
matching file. The user then opens direct                organizing structure. FreeHaven [Dingle-
connections with one or more of the peers               dine et al. 2000] is another system using
that hold the requested file, and down-                  routing and search mechanisms similar to
loads it (see Figure 2).                                those of Gnutella.
   The advantage of hybrid decentralized                   Like most peer-to-peer systems,
systems is that they are simple to imple-               Gnutella builds a virtual overlay net-
ment, and they locate files quickly and ef-              work with its own routing mechanisms
ficiently. Their main disadvantage is that               [Ripeanu and Foster 2002], allowing its
they are vulnerable to censorship, legal ac-            users to share files with other peers.
tion, surveillance, malicious attack, and                  There is no central coordination of the
technical failure, since the content shared,            activities in the network and users connect
or at least descriptions of it, and the abil-           to each other directly through a software
ity to access it are controlled by the single           application that functions both as a client
institution, company, or user maintain-                 and a server (users are referred to as a
ing the central server. Furthermore, these              servents).
systems are considered inherently unscal-                  Gnutella uses IP as its underlying net-
able, as there are bound to be limitations              work service, while the communication be-
to the size of the server database and its              tween servents is specified in a form of
capacity to respond to queries. Large Web               application level protocol supporting four
search engines have, however, repeatedly                types of messages [Jovanovich et al. 2001]:
provided counterexamples to this notion.
   Examples of hybrid decentralized con-                  Ping. A request for a certain host to an-
tent distribution systems include the no-               nounce itself.

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
346                                           S. Androutsellis-Theotokis and D. Spinellis

   Pong. Reply to a Ping message. It con-         Once a node receives a QueryHit mes-
tains the IP and port of the responding        sage, indicating that the target file has
host and number and size of the files being     been identified at a certain node, it ini-
shared.                                        tiates a download by establishing a di-
   Query. A search request. It contains a      rect connection between the two nodes.
search string and the minimum speed re-        Figure 3 illustrates an example of the
quirements of the responding host.             Gnutella search mechanism.
   Query Hits. Reply to a Query message.          Scalability issues in the original purely
It contains the IP, port, and speed of the     decentralized systems arose from the fact
responding host, the number of match-          that the use of the TTL effectively seg-
ing files found, and their indexed result       mented the network into “sub-networks”,
set.                                           imposing on each user a virtual “hori-
                                               zon” beyond which their messages could
   After joining the Gnutella network (by      not reach [Jovanovic 2000]. Removing the
connecting to nodes found in databases         TTL limit, on the other hand, would re-
such as gnutellahosts.com), a node sends       sult in the network being swamped with
out a Ping message to any node it is           messages.
connected to. The nodes send back a               Significant research has been carried
Pong message identifying themselves, and       out to address the above issues, and
also propagate the Ping message to their       various solutions have been proposed.
neighbors.                                     These will be discussed in more detail in
   In order to locate a file in unstructured    Section 3.3.4.
systems such as gnutella, nondeterminis-
tic searches are the only option since the       3.3.3. Partially Centralized. Partially cen-
nodes have no way of guessing where (at        tralized systems are similar to purely
which nodes) the files may lie.                 decentralized, but they use the con-
   The original Gnutella architecture uses     cept of supernodes: nodes that are dy-
a flooding (or broadcast) mechanism to dis-     namically assigned the task of servic-
tribute Ping and Query messages: each          ing a small subpart of the peer network
Gnutella node forwards the received mes-       by indexing and caching files contained
sages to all of its neighbors. The response    therein.
messages received are routed back along          Peers are automatically elected to be-
the opposite path through which the origi-     come supernodes if they have sufficient
nal request arrived. To limit the spread of    bandwidth and processing power (al-
messages through the network, each mes-        though a configuration parameter may
sage header contains a time-to-live (TTL)      allow users to disable this feature)
field. At each hop, the value of this field      [FastTrack 2003].
is decremented, and when it reaches zero,        Supernodes index the files shared by
the message is dropped.                        peers connected to them, and proxy search
   The above mechanism is implemented          requests on behalf of these peers. All
by assigning each message a unique iden-       queries are therefore initially directed to
tifier and equipping each host with a dy-       supernodes.
namic routing table of message identifiers        Two major advantages of partially cen-
and node addresses. Since the response         tralized systems are that:
messages contain the same ID as the orig-
inal messages, the host checks its routing     —Discovery time is reduced in compari-
table to determine along which link the re-     son with purely decentralized systems,
sponse message should be forwarded. In          while there still is no unique point of
order to avoid loops, the nodes use the         failure. If one or more supernodes go
unique message identifiers to detect and         down, the nodes connected to them can
drop duplicate messages, to improve effi-        open new connections with other su-
ciency, and preserve network bandwidth          pernodes, and the network will continue
[Jovanovic 2000].                               to operate.

                                                ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                     347




                 Fig. 3. An example of the Gnutella search mechanism. Solid lines between
                 the nodes represent connections of the Gnutella network. The search orig-
                 inates at the “requestor” node, for a file maintained by another node. Re-
                 quest messages are dispatched to all neighboring nodes, and propagated
                 from node-to-node as shown in the four consecutive steps (a) to (d). When
                 a node receives the same request message (based on its unique identifier)
                 multiple times, it replies with a failure message to avoid loops and minimize
                 traffic. When the file is identified, a success message is returned.


—The inherent heterogeneity of peer-to-                  mentation (as it is a proprietary system,
 peer networks is taken advantage of,                    there is no detailed documentation on its
 and exploited. In a purely decentralized                structure and operation). Edutella [Nejdl
 network, all of the nodes will be equally               et al. 2003] is another partially centralized
 (and usually heavily) loaded, regard-                   architecture.
 less of their CPU power, bandwidth, or                     Yang and Garcia-Molina [2002a, 2002b]
 storage capabilities. In partially central-             present research addressing the design
 ized systems, however, the supernodes                   of, and searching techniques for, partially
 will undertake a large portion of the                   centralized peer-to-peer networks.
 entire network load, while most of the                     The concept of supernodes has also been
 other (so called “normal”) nodes will be                proposed in a more recent version of the
 very lightly loaded, in comparison (see                 Gnutella protocol. A mechanism for dy-
 also [Lv et al. 2002; Zhichen et al. 2002]).            namically selecting supernodes organizes
                                                         the Gnutella network into an interconnec-
  Kazaa is a typical, widely used instance               tion of superpeers (as they are referred to)
of a partially centralized system imple-                 and client nodes.

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
348                                                S. Androutsellis-Theotokis and D. Spinellis

   When a node with enough CPU power                sues by means of:
joins the network, it immediately be-
comes a superpeer and establishes con-              —A dynamic topology adaptation protocol
nections with other superpeers, forming a            that ensures that most nodes will be at a
flat unstructured network of superpeers.              short distance from high capacity nodes.
If it establishes a minimum required num-            This protocol, coupled with replicating
ber of connections to client nodes within a          pointers to the content of neighbors, en-
specified time, it remains a superpeer. Oth-          sures that high capacity nodes will be
erwise, it turns into a regular client node.         able to provide answers to a very large
                                                     number of queries.
   3.3.4. Shortcomings and Evolution of Unstruc-    —An active flow control scheme that ex-
tured Architectures. A number of methods             ploits network heterogeneity to avoid
have been proposed to overcome the origi-            hotspots, and
nal unscalability of unstructured peer-to-          —A search protocol that is based on ran-
peer systems. These have been shown to               dom walks directed towards high capac-
drastically improve their performance.               ity nodes.
   Lv et al. [2002] proposed to replace
the original flooding approach by multiple              Simulations of Gia were found to in-
parallel random walks, where each node              crease overall system capacity by three
chooses a neighbor at random, and propa-            to five orders of magnitude. Similar ap-
gates the request only to it. The use of ran-       proaches, based on taking advantage of
dom walks, combined with proactive object           the underlying network heterogeneity, are
replication (discussed in Section 4), was           described in Lv et al. [2002] and Zhichen
found to significantly improve the perfor-           et al. [2002].
mance of the system as measured by query               In another approach, Crespo and
resolution time (in terms of numbers of             Garcia-Molina [2002] use routing indices
hops), per-node query load, and message             to address the searching and scalability
traffic generated. Proactive data replica-           issues. Routing indices are tables of infor-
tion schemes are further examined in Co-            mation about other nodes, stored within
hen and Shenker [2001].                             each node. They provide a list of neighbors
   Yang and Garcia-Molina [2002b] sug-              that are most likely to be “in the direction”
gested the use of more sophisticated                of the content corresponding to the query.
broadcast policies, selecting which neigh-          These tables contain information about
bors to forward search queries to based on          the total number of files maintained by
their past history, as well as the use of local     nearby nodes, as well as the number
indices: data structures where each node            of files corresponding to various topics.
maintains an index of the data stored at            Three different variations of the approach
nodes located within a radius from itself.          are presented, and simulations are shown
   A similar solution to the information re-        to improve performance by 1-2 orders
trieval problem is proposed by Kalogeraki           of magnitude with respect to flooding
et al. [2002] in the form of an Intelli-            techniques.
gent Search Mechanism built on top of                  Finally, the connectivity properties and
a Modified Random BFS Search Mecha-                  reliability of unstructured peer-to-peer
nism. Each peer forwards queries to a sub-          networks such as Gnutella have been
set of its neighbors, selecting them based          studied in Ripeanu and Foster [2002]. In
on a profile mechanism that maintains in-            particular, emphasis was placed on the
formation about their performance in re-            fact that peer-to-peer networks exhibit
cent queries. Neighbours are ranked ac-             the properties of power-law networks,
cording to their profiles and queries are            in which the number of nodes with L
forwarded selectively to the most appro-            links is proportional to L−k , where k is
priate ones only.                                   a network dependent constant. In other
   Chawathe et al. [2003] presented the             words, most nodes have few links (thus, a
Gia System, which addresses the above is-           large fraction of them can be taken away

                                                     ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                    349

without seriously damaging the network                  —CAN is a system using an n-dimensional
connectivity), while there are a few highly-             Cartesian coordinate space to imple-
connected nodes which, if taken away, are                ment the distributed location and rout-
likely to cause the whole network to be                  ing table, whereby each node is respon-
broken down in disconnected pieces. One                  sible for a zone in the coordinate space.
implication of this is that such networks               —Tapestry (and the similar Pastry and
are robust when facing random node                       Kademlia) are based on the plaxton
attacks, yet vulnerable to well-planned                  mesh data structure, which maintains
attacks. The topology mismatch between                   pointers to nodes in the network whose
the Gnutella network and the underlying                  IDs match the elements of a tree-like
physical network infrastructure was also                 structure of ID prefixes up to a digit
documented in Ripeanu and Foster [2002].                 position.
Tsoumakos and Roussopoulos [2003]
present a more comprehensive analysis                     3.4.1. Freenet—A Loosely Structured Sys-
and comparison of the above methods.                    tem. The defining characteristic of loosely
   Overall, unstructured peer-to-peer con-              structured systems is that the nodes of the
tent distribution systems might be the                  peer-to-peer network can produce an esti-
preferred choice for applications where the             mate (not with certainty) of which node is
following assumptions hold:                             most likely to store certain content. This
                                                        affords them the possibility of avoiding
—keyword searching is the common oper-                  blindly broadcasting request messages to
 ation,                                                 all (or a random subset) of their neighbors.
—most content is typically replicated at a              Instead, they use a chain mode propaga-
 fair fraction of participating sites,                  tion approach, where each node makes a
—the node population is highly transient,               local decision about which node to send the
—users will accept a best-effort content re-            request message to next.
 trieval approach, and                                     Freenet [Clarke et al. 2000] is a typical,
                                                        purely decentralized loosely-structured
—the network size is not so large as to in-
                                                        content distribution system. It operates
 cur scalability problems [Lv et al. 2002]
                                                        as a self-organizing peer-to-peer network,
 (The last issue is alleviated through
                                                        pooling unused disk space in peer com-
 the various methods described in this
                                                        puters to create a collaborative virtual
 Section).
                                                        file system. Important features of Freenet
                                                        include its focus on security, publisher
3.4. Structured Architectures                           anonymity, deniability, and data replica-
The various structured content distribu-                tion for availability and performance.
tion systems and infrastructures employ                    Files in Freenet are identified by unique
different mechanisms for routing mes-                   binary keys. Three types of keys are sup-
sages and locating data. Four of the most               ported, the simplest is based on applying
interesting and representative mecha-                   a hash function on a short descriptive text
nisms and their corresponding systems                   string that accompanies each file as it is
are examined in the following sections.                 stored in the network by its original owner.
                                                           Each Freenet node maintains its own lo-
—Freenet is a loosely structured system                 cal data store, that it makes available to
 that uses file and node identifier similar-              the network for reading and writing, as
 ity to produce an estimate of where a file              well as a dynamic routing table contain-
 may be located, and a chain mode propa-                ing the addresses of other nodes and the
 gation approach to forward queries from                files they are thought to hold. To search
 node-to-node.                                          for a file, the user sends a request message
—Chord is a system whose nodes maintain                 specifying the key and a timeout (hops-to-
 a distributed routing table in the form of             live) value.
 an identifier circle on which all nodes are                Freenet uses the following types of mes-
 mapped, and an associated finger table.                 sages, which all include the node identifier

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
350                                             S. Androutsellis-Theotokis and D. Spinellis

(for loop detection), a hops-to-live value          If a node receives a request for a locally-
(similar to the Gnutella TTL, see Sec-           stored file, the search stops and the data
tion 3.3.2), and the source and destination      is forwarded back to the requestor.
node identifiers:                                    If the node does not store the file that the
                                                 requestor is looking for, it forwards the re-
  Data insert. A node inserts new data
                                                 quest to its neighbor that is most likely to
in the network. A key and the actual data
                                                 have the file, by searching for the file key
(file) are included.
                                                 in its local routing table that is closest to
  Data request. A request for a certain
                                                 the one requested. The messages, there-
file. The key of the file requested is also
                                                 fore, form a chain, as they propagate from
included.
                                                 node-to-node. To avoid huge chains, mes-
  Data reply. A reply initiated when the
                                                 sages are deleted after passing through
requested file is located. The actual file is
                                                 a certain number of nodes, based on the
also included in the reply message.
                                                 hops-to-live value they carry. Nodes also
  Data failed. A failure to locate a file.
                                                 store the ID and other information of the
The location (node) of the failure and the
                                                 requests they have seen, in order to handle
reason are also included.
                                                 “data reply” and “data failed” messages.
   New nodes join the Freenet network               If a node receives a backtracking “data
by first discovering the address of one           failed” message from a downstream node,
or more existing nodes, and then start-          it selects the next best node from its rout-
ing to send Data Insert messages. To in-         ing stack and forwards the request to it.
sert a new file in the network, the node          If all nodes in the routing table have been
first calculates a binary key for the file,        explored in this way and failed, it sends
and then sends a data insert message to          back a “data failed” message to the node
itself. Any node that receives the insert        from which it originally received the data
message, first checks to see if the key is        request message.
already taken. If the key is not found, the         If the requested file is eventually found
node looks up the closest key (in terms          at a certain node, a reply is passed back
of lexicographic distance) in its routing        through each node that forwarded the re-
table, and forwards the insert message           quest to the original node that started
to the corresponding node. By this mech-         the chain. This data reply message will
anism, newly inserted files are placed            include the actual data, which is cached
at nodes possessing files with similar            in all intermediate nodes for future re-
keys.                                            quests. A subsequent request with the
   This continues as long as the hops-to-        same key will be served immediately with
live limit is not reached. In this way, more     the cached data. A request for a similar
than one node will store the new file. At         key will be forwarded to the node that pre-
the same time, all the participating nodes       viously provided the data.
will update their routing tables with the           To address the problem of obtaining
new information (this is the mechanism           the key that corresponds to a specific file,
through which the new nodes announce             Freenet recommends the use of a special
their presence to the rest of the network).      class of lightweight files called “indirect
If the hops-to-live limit is reached without     files”. When a real file is inserted, the au-
any key collision, an “all clear” result         thor also inserts a number of indirect files
will be propagated back to the original          that are named according to search key-
inserter, informing that the insert was          words and contain pointers to the real file.
successful.                                      These indirect files differ from normal files
   If the key is found to be taken, the node     in that multiple files with the same key
returns the preexisting file as if a request      (i.e. search keyword) are permitted to ex-
were made for it. In this way, malicious         ist, and requests for such keys keep going
attempts to supplant existing files by in-        until a specified number of results is ac-
serting junk will result in the existing files    cumulated, instead of stopping at the first
being spread further.                            file found. The problem of managing the

                                                  ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                            351




Fig. 4. The use of indirect files in Freenet.
The diagram illustrates a regular file with key
“A652D17D88”, which is supposed to contain a tu-
torial about photography. The author chose to in-
sert the file itself, and then a set of indirect files
(marked with an i), named according to search key-      Fig. 5. A Chord identifier circle consisting of the
words she considered relevant. The indirect files        three nodes 0,1 and 3. In this example, key 1 is lo-
are distributed among the nodes of the network;         cated at node 1, key 2 at node 3, and key 6 at node 0.
they do not contain the actual document, simply a
“pointer” to the location of the regular file contain-
ing the document.                                       [Karger et al. 1997]. All node identifiers
                                                        are ordered in an “identifier circle” modulo
large volume of such indirect files remains              2m (Figure 5 shows an identifier circle with
open. Figure 4 illustrates the use of indi-             m = 3). Key k is assigned to the first node
rect files.                                              whose identifier is equal to, or follows k, in
   The following properties of Freenet are              the identifier space. This node is called the
a result of its routing and location algo-              successor node of key k. The use of consis-
rithms:                                                 tent hashing tends to balance load, as each
                                                        node receives roughly the same number of
—Nodes tend to specialize in searching                  keys.
 for similar keys over time, as they get                   The only routing information required
 queries from other nodes for similar                   is for each node to be aware of its succes-
 keys.                                                  sor node on the circle. Queries for a given
—Nodes store similar keys over time, due                key are passed around the circle via these
 to the caching of files as a result of suc-             successor pointers until a node that con-
 cessful queries.                                       tains the key is encountered. This is the
—Similarity of keys does not reflect simi-               node the query maps to.
 larity of files.                                           When a new node n joins the network,
—Routing does not reflect the underlying                 certain keys previously assigned to n’s suc-
 network topology.                                      cessor will become assigned to n. When
                                                        node n leaves the network, all keys as-
   3.4.2. Chord. Chord [Stoica et al. 2001]             signed to it will be reassigned to its suc-
is a peer-to-peer routing and location in-              cessor. These are the only changes in key
frastructure that performs a mapping of                 assignments that need to take place in or-
file identifiers onto node identifiers. Data               der to maintain load balance.
location can be implemented on top of                      As discussed, only one data element
Chord by identifying data items (files)                  per node needs to be correct for Chord to
with keys and storing the (key, data item)              guarantee correct (though slow) routing of
pairs at the node that the keys map to.                 queries. Performance degrades gracefully
   In Chord, nodes are also identified by                when routing information becomes out of
keys. The keys are assigned both to files                date due to nodes joining and leaving the
and nodes by means of a deterministic                   system, and availability remains high only
function, a variant of consistent hashing               as long as nodes fail independently. Since

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
352                                               S. Androutsellis-Theotokis and D. Spinellis




              Fig. 6. CAN: (a) Example 2-d [0, 1] × [0, 1] coordinate space partitioned
              between 5 CAN nodes; (b) Example 2-d space after node F joins.

the overlay topology is not based on the            by supporting the insertion, lookup, and
underlying physical IP network topology, a          deletion of (key,value) pairs in the table.
single failure in the IP network may man-              Each individual node of the CAN net-
ifest itself as multiple, scattered link fail-      work stores a part (referred to as a “zone”)
ures in the overlay [Saroiu et al. 2002].           of the hash table, as well as information
   To increase the efficiency of the loca-           about a small number of adjacent zones
tion mechanism described previously, that           in the table. Requests to insert, lookup, or
may, in the worst case, require traversing          delete a particular key are routed via in-
all N nodes to find a certain key, Chord             termediate zones to the node that main-
maintains additional routing information,           tains the zone containing the key.
in the form of a “finger table”. In this table,         CAN uses a virtual d -dimensional
each entry i points to the successor of node        Cartesian coordinate space (see Figure 6)
n + 2i. For a node n to perform a lookup            to store (key K ,value V ) pairs. The zone
for key k, the finger table is consulted to          of the hash table that a node is respon-
identify the highest node n whose ID is             sible for corresponds to a segment of this
between n and k. If such a node exists, the         coordinate space. Any key K is, therefore,
lookup is repeated starting from n . Oth-           deterministically mapped onto a point P
erwise, the successor of n is returned. Us-         in the coordinate space. The (K , V ) pair is
ing the finger table, both the amount of             then stored at the node that is responsible
routing information maintained by each              for the zone within which point P lies. For
node and the time required for resolving            example, in the case of Figure 6(a), a key
lookups are O(logN ) for an N -node sys-            that maps to coordinate (0.1,0.2) would be
tem in the steady state.                            stored at the node responsible for zone B.
   Achord [Hazel and Wiley 2002] is pro-               To retrieve the entry corresponding to
posed as a variant of Chord that pro-               K , any node can apply the same determin-
vides censorship resistance by limiting             istic function to map K to P , and then re-
each node’s knowledge of the network in             trieve the corresponding value V from the
ways similar to Freenet (see Section 3.4.1).        node covering P . Unless P happens to lie
                                                    in the requesting node’s zone, the request
  3.4.3. CAN. The      CAN      (“Content           must be routed from node-to-node until it
Addressable     Network”)     [Ratnasamy            reaches the node covering P .
et al. 2001] is essentially a distributed,             CAN nodes maintain a routing table
Internet-scale hash table that maps file             containing the IP addresses of nodes that
names to their location in the network,             hold zones adjoining their own, to enable

                                                     ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                     353

routing between arbitrary points in space.              advanced routing metrics, such as connec-
Intuitively, routing in CAN works by fol-               tion latency and underlying IP topology,
lowing the straight line path through the               alongside the Cartesian distance between
Cartesian space from source to destination              source and destination; allowing multiple
coordinates. For example, in Figure 6(a), a             nodes to share the same zone, mapping the
request from node A for a key mapping to                same key onto different points; and the ap-
point p would be routed though nodes A,                 plication of caching and replication tech-
B, E, along the straight line represented               niques [Ratnasamy et al. 2001].
by the arrow.
  A new node that joins the CAN system is
allocated its own portion of the coordinate                3.4.4. Tapestry. Tapestry [Zhao et al.
space by splitting the allocated zone of an             2001] supports the location of objects and
existing node in half, as follows:                      the routing of messages to objects (or the
                                                        closest copy of them, if more than one copy
(1) The new node identifies a node al-                   exist in the network) in a distributed, self-
    ready in CAN network, using a boot-                 administering, and fault-tolerant manner,
    strap mechanism as described in Fran-               offering system-wide stability by bypass-
    cis [2000].                                         ing failed routes and nodes, and rapidly
(2) Using the CAN routing mechanism,                    adapting communication topologies to cir-
    the node randomly chooses a point P                 cumstances.
    in the coordinate space and sends a                    The topology of the network is self-
    JOIN request to the node covering P .               organizing as nodes come and go, and net-
    The zone is then split, and half of it is           work latencies vary. The routing and loca-
    assigned to the new node.                           tion information is distributed among the
(3) The new node builds its routing table               network nodes; the topology’s consistency
    with the IP addresses of its new neigh-             is checked on-the-fly, and if it is lost due to
    bors, and the neighbors of the split                failures or destroyed, it is easily rebuilt or
    zone are also notified to update their               refreshed.
    routing tables to include the new node.                Tapestry is based on the location and
                                                        routing mechanisms introduced by Plax-
   When nodes gracefully leave CAN, the                 ton, Rajamaran and Richa [1997], in which
zones they occupy and the associated                    they present the Plaxton mesh, a dis-
hash table entries are explicitly handed                tributed data structure that allows nodes
over to one of their neighbors. Under                   to locate objects and route messages to
normal conditions, a node sends periodic                them across an arbitrarily-sized overlay
update messages to each of its neighbors                network, while using routing maps of
reporting its zone coordinates, its list of             small and constant size. In the original
neighbors, and their zone coordinates. If               Plaxton mesh, the nodes can take on the
there is prolonged absence of such an up-               role of servers (where objects are stored),
date message, the neighbor nodes realize                routers (which forward messages), and
there has been a failure, and initiate a con-           clients (origins of requests).
trolled takeover mechanism. If many of                     Each node maintains a neighbor map,
the neighbors of a failed node also fail, an            as shown in the example in Table IV. The
expanding ring search mechanism is ini-                 neighbor map has multiple levels, each
tiated by one of the neighboring nodes, to              level l containing pointers to nodes whose
identify any functioning nodes outside the              ID must be matched with l digits (the x’s
failure region.                                         represent wildcards). Each entry in the
   A list of design improvements is pro-                neighbor map corresponds to a pointer to
posed over the basic CAN design described               the closest node in the network whose ID
including the use of multi-dimensional                  matches the number in the neighbor map,
coordinate space, or multiple coordinate                up to a digit position.
spaces, for improving network latency and                  For example, the 5th entry for the 3rd
fault tolerance; the employment of more                 level for node 67493 points to the node

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
354                                                      S. Androutsellis-Theotokis and D. Spinellis

Table IV. The Neighbor Map Held by Tapestry Node
                 With ID 67493
          Level 5 Level 4 Level 3 Level 2 Level 1
 Entry 0 07493       x0493 xx093 xxx03 xxxx0
 Entry 1 17493       x1493 xx193 xxx13 xxxx1
 Entry 2 27493       x2493 xx293 xxx23 xxxx2
 Entry 3 37493       x3493 xx393 xxx33 xxxx3
 Entry 4 47493       x4493 xx493 xxx43 xxxx4
 Entry 5 57493       x5493 xx593 xxx53 xxxx5
 Entry 6 67493       x6493 xx693 xxx63 xxxx6
 Entry 7 77493       x7493 xx793 xxx73 xxxx7
 Entry 8 87493       x8493 xx893 xxx83 xxxx8
 Entry 9 97493       x9493 xx993 xxx93 xxxx9
Each entry in this   table corresponds to a pointer to
another node.


closest to 67493 in network distance whose
ID ends in ..593. Table IV shows an exam-
                                                             Fig. 7. Tapestry: Plaxton mesh routing ex-
ple neighbor map maintained by a node                        ample, showing the path taken by a message
with ID 67493.                                               originating from node 67493, and destined for
  Messages are, therefore, incrementally                     node 34567 in a Plaxton mesh, using decimal
routed to the destination node digit-by-                     digits of length 5.
digit, from the right to the left. Figure 7
shows an example path taken by a mes-                     required for assigning and identifying root
sage from node with I D = 67493, to node                  nodes, and (2) the vulnerability of the root
I D = 34567. The digits are resolved right                nodes.
to left as follows:                                         The Plaxton mesh assumes a static node
                                                          population. Tapestry extends its design to
  xxxx7 → xxx67 → xx567 → x4567                           adapt it to the transient populations of
                                                          peer-to-peer networks and provide adapt-
                        → 34567                           ability, fault tolerance, as well as vari-
                                                          ous optimizations described in Zhao et al.
   The Plaxton mesh uses a root node for                  [2001] and outlined below:
each object, that serves to provide a guar-
anteed node from which the object can                     —Each node additionally maintains a list
be located. When an object o is inserted                   of back-pointers, which point to nodes
in the network and stored at node ns ,                     where it is referred to as a neighbor.
a root node nr is assigned to it by us-                    These are used in dynamic node inser-
ing a globally consistent deterministic al-                tion algorithms to generate the appro-
gorithm. A message is then routed from                     priate neighbor maps for new nodes. Dy-
ns to nr , storing data in the form of a                   namic algorithms are employed for node
mapping (object id o, storer id ns ) at all                insertion, populating neighbor maps,
nodes along the way. During a location                     and notifying neighbors of new node in-
query, messages destined for o are ini-                    sertions.
tially routed towards nr , until a node is                —The concept of distance between nodes
encountered containing the (o, ns ) location               becomes semantically more flexible, and
mapping.                                                   locations of more than one replica of
   The Plaxton mesh offers: (1) simple                     an object are stored, allowing the ap-
fault-handling by its potential to route                   plication architecture to define how the
around a single link or node by choos-                     “closest” node will be interpreted. For
ing a node with a similar suffix, and (2)                   example, in the Oceanstore architecture
scalability (with the only bottleneck exist-               [Kubiatowicz et al. 2000], a “freshness”
ing at the root nodes). Its limitations in-                metric is incorporated in the concept
clude: (1) the need for global knowledge                   of distance, that is taken into account

                                                           ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A Survey of Content Distribution Technologies                                                  355

  when finding the closest replica of a doc-             availability to adapt to regional outages
  ument.                                                and denial of service attacks.
—The use of soft-state to maintain cached                 Pastry [Rowstron and Druschel 2001]
 content, based on the announce/listen                  is a scheme very similar to Tapestry,
 approach [Deering 1998], is adopted by                 differing mainly in its approach to
 Tapestry to detect, circumvent, and re-                achieving network locality and object
 cover from failures in routing or ob-                  replication. It is employed by the PAST
 ject location. Caches are periodically                 [Druschel and Rowstron 2001] large-
 updated by refreshment messages, or                    scale persistent peer-to-peer storage
 purged if no such messages are re-                     utility.
 ceived. Additionally, the neighbor map                   Finally Kademlia [Mayamounkov and
 is extended to maintain two backup                     Mazieres 2002] is proposed as an improved
 neighbors, in addition to the closest (pri-            XOR-topology-based routing algorithm
 mary) neighbor. Furthermore, to avoid                  similar to Tapestry and Pastry, focusing
 costly reinsertions of nodes after fail-               on consistency and performance. It intro-
 ures, when a node realizes that a neigh-               duces the use of a concurrence parameter,
 bor is unreachable, instead of removing                that lets users trade bandwidth for better
 its pointer, it temporarily marks it as in-            latency and fault recovery.
 valid in the hope that the failure will be
 repaired, and, in the meantime, routes                    3.4.5. Comparison, Shortcomings and Evo-
 messages through alternative paths.                    lution of Structured Architectures. In the
—To avoid the single point of failure that              previous sections, we have presented
 root nodes constitute, Tapestry assigns                characteristic examples of structured
 multiple roots to each object. This en-                peer-to-peer object location and routing
 ables a trade-off between reliability and              systems, and their main characteristics.
 redundancy. A distributed algorithm                    A comparison of these (and similar) sys-
 called surrogate routing is employed to                tems can be based on the size of the
 compute a unique root node for an object               routing information maintained by each
 in a globally consistent fashion, given                node (the routing table), their search
 the nonstatic set of nodes in the network              and retrieval performance (measured in
—A set of optimizations improve per-                    number of hops), and the flexibility with
 formance by adapting to environment                    which they can adapt to changing network
 changes. Tapestry nodes tune their                     topologies.
 neighbor pointers by running refresher                    It turns out that in their basic de-
 threads that update network latency                    sign, most of these systems are equiva-
 values. Algorithms are implemented to                  lent in terms of routing table space cost,
 detect query hotspots and offer sugges-                which is O(log N ), where N is the size of
 tions as to where additional copies of                 network. Similarly, their performance is
 objects can be placed to significantly im-              again mostly O(log N ), with the exception
 prove query response time. A “hotspot                  of CAN, where the performance is given by
                                                                1
 cache” is also maintained at each node.                O( d N d ), d being the number of employed
                                                            4
                                                        dimensions.
   Tapestry is used by several systems                     Maintaining the routing table in the
such as Oceanstore [Kubiatowicz et al.                  face of transient node populations is rela-
2000; Rhea et al. 2001], Mnemosyne                      tively costly for all systems, perhaps more
[Hand and Roscoe 2002], and Scan [Chen                  for Chord, as nodes joining or leaving in-
et al. 2000].                                           duce changes to all other nodes, and less
   Oceanstore further enhances the per-                 so in Kademlia which maintains a more
formance and fault tolerance of Tapestry                flexible routing table. Liben-Nowell et al.
by applying an additional “introspec-                   [2002a] introduce the notion of the “half-
tion layer”, where nodes monitor usage                  life” of a system as an approach to this
patterns, network activity, and resource                issue.

ACM Computing Surveys, Vol. 36, No. 4, December 2004.
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies
A survey of peer-to-peer content distribution technologies

Weitere ähnliche Inhalte

Was ist angesagt?

Implementation of Agent Based Dynamic Distributed Service
Implementation of Agent Based Dynamic Distributed ServiceImplementation of Agent Based Dynamic Distributed Service
Implementation of Agent Based Dynamic Distributed ServiceCSCJournals
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883paserv
 
Effects of power conservation, wireless coverage and cooperation on data diss...
Effects of power conservation, wireless coverage and cooperation on data diss...Effects of power conservation, wireless coverage and cooperation on data diss...
Effects of power conservation, wireless coverage and cooperation on data diss...marwaeng
 
Addressing the Challenges of Tactical Information Management in Net-Centric S...
Addressing the Challenges of Tactical Information Management in Net-Centric S...Addressing the Challenges of Tactical Information Management in Net-Centric S...
Addressing the Challenges of Tactical Information Management in Net-Centric S...Angelo Corsaro
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ijp2p
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceAIRCC Publishing Corporation
 
An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...
An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...
An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...ijcsit
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...ijcsit
 
A Relational Model of Data for Large Shared Data Banks
A Relational Model of Data for Large Shared Data BanksA Relational Model of Data for Large Shared Data Banks
A Relational Model of Data for Large Shared Data Banksrenguzi
 
Birthof Relation Database
Birthof Relation DatabaseBirthof Relation Database
Birthof Relation DatabaseRaj Bhat
 
A MALICIOUS USERS DETECTING MODEL BASED ON FEEDBACK CORRELATIONS
A MALICIOUS USERS DETECTING MODEL BASED  ON FEEDBACK CORRELATIONSA MALICIOUS USERS DETECTING MODEL BASED  ON FEEDBACK CORRELATIONS
A MALICIOUS USERS DETECTING MODEL BASED ON FEEDBACK CORRELATIONSIJCNC
 
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESBIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESijwmn
 
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESBIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESijwmn
 
Bt9002 grid computing 1
Bt9002 grid computing 1Bt9002 grid computing 1
Bt9002 grid computing 1Techglyphs
 

Was ist angesagt? (18)

Implementation of Agent Based Dynamic Distributed Service
Implementation of Agent Based Dynamic Distributed ServiceImplementation of Agent Based Dynamic Distributed Service
Implementation of Agent Based Dynamic Distributed Service
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883
 
Effects of power conservation, wireless coverage and cooperation on data diss...
Effects of power conservation, wireless coverage and cooperation on data diss...Effects of power conservation, wireless coverage and cooperation on data diss...
Effects of power conservation, wireless coverage and cooperation on data diss...
 
Addressing the Challenges of Tactical Information Management in Net-Centric S...
Addressing the Challenges of Tactical Information Management in Net-Centric S...Addressing the Challenges of Tactical Information Management in Net-Centric S...
Addressing the Challenges of Tactical Information Management in Net-Centric S...
 
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
ANALYSIS STUDY ON CACHING AND REPLICA PLACEMENT ALGORITHM FOR CONTENT DISTRIB...
 
Www2008 Restws Pautasso Zimmermann Leymann
Www2008 Restws Pautasso Zimmermann LeymannWww2008 Restws Pautasso Zimmermann Leymann
Www2008 Restws Pautasso Zimmermann Leymann
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for Mapreduce
 
An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...
An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...
An Sna-Bi Based System for Evaluating Virtual Teams: A Software Development P...
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
 
Intro ds 1
Intro ds 1Intro ds 1
Intro ds 1
 
A Relational Model of Data for Large Shared Data Banks
A Relational Model of Data for Large Shared Data BanksA Relational Model of Data for Large Shared Data Banks
A Relational Model of Data for Large Shared Data Banks
 
Birthof Relation Database
Birthof Relation DatabaseBirthof Relation Database
Birthof Relation Database
 
A MALICIOUS USERS DETECTING MODEL BASED ON FEEDBACK CORRELATIONS
A MALICIOUS USERS DETECTING MODEL BASED  ON FEEDBACK CORRELATIONSA MALICIOUS USERS DETECTING MODEL BASED  ON FEEDBACK CORRELATIONS
A MALICIOUS USERS DETECTING MODEL BASED ON FEEDBACK CORRELATIONS
 
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESBIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
 
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESBIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUES
 
Towards a New Architectural Framework – The Nth Stratum Concept Mobimedia 08
Towards a New Architectural Framework – The Nth Stratum Concept Mobimedia 08Towards a New Architectural Framework – The Nth Stratum Concept Mobimedia 08
Towards a New Architectural Framework – The Nth Stratum Concept Mobimedia 08
 
Bt9002 grid computing 1
Bt9002 grid computing 1Bt9002 grid computing 1
Bt9002 grid computing 1
 
Grid computing
Grid computingGrid computing
Grid computing
 

Andere mochten auch

CV Lucas Madrid Barbotto
CV Lucas Madrid BarbottoCV Lucas Madrid Barbotto
CV Lucas Madrid BarbottoLucas Madrid
 
XI Grande Prémio Atletismo Vila Mozelos
XI Grande Prémio Atletismo Vila MozelosXI Grande Prémio Atletismo Vila Mozelos
XI Grande Prémio Atletismo Vila Mozelosjamozelense
 
How to do business in Pharma Industry in Brazil
How to do business in Pharma Industry in BrazilHow to do business in Pharma Industry in Brazil
How to do business in Pharma Industry in BrazilLadina Consulting
 
Sports Geek Sports Social Media Rankings April 2013
Sports Geek Sports Social Media Rankings April 2013Sports Geek Sports Social Media Rankings April 2013
Sports Geek Sports Social Media Rankings April 2013Sports Geek
 
4900 Piazza Center Vraag 4 Visualisatie (lowres)
4900 Piazza Center Vraag 4 Visualisatie (lowres)4900 Piazza Center Vraag 4 Visualisatie (lowres)
4900 Piazza Center Vraag 4 Visualisatie (lowres)twansibon
 
IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...
IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...
IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...Cedric Manara
 
Instala y Configura Aplicaciones y Servicios
Instala y Configura Aplicaciones y ServiciosInstala y Configura Aplicaciones y Servicios
Instala y Configura Aplicaciones y ServiciosJennifer Amador Martinez
 
Moje Bambino: Kreator dobrej przestrzeni edukacyjnej
Moje Bambino: Kreator dobrej przestrzeni edukacyjnejMoje Bambino: Kreator dobrej przestrzeni edukacyjnej
Moje Bambino: Kreator dobrej przestrzeni edukacyjnejmojebambino
 

Andere mochten auch (20)

CV Lucas Madrid Barbotto
CV Lucas Madrid BarbottoCV Lucas Madrid Barbotto
CV Lucas Madrid Barbotto
 
El Universitario del Chocó No 15
El Universitario del Chocó No 15 El Universitario del Chocó No 15
El Universitario del Chocó No 15
 
Winning with agile Networks
Winning with agile NetworksWinning with agile Networks
Winning with agile Networks
 
Webprosjekt Bodø
Webprosjekt BodøWebprosjekt Bodø
Webprosjekt Bodø
 
XI Grande Prémio Atletismo Vila Mozelos
XI Grande Prémio Atletismo Vila MozelosXI Grande Prémio Atletismo Vila Mozelos
XI Grande Prémio Atletismo Vila Mozelos
 
NOMINA VINI V
NOMINA VINI  VNOMINA VINI  V
NOMINA VINI V
 
Manuales taller-cecsa-volkswagen (1) (1)
Manuales taller-cecsa-volkswagen (1) (1)Manuales taller-cecsa-volkswagen (1) (1)
Manuales taller-cecsa-volkswagen (1) (1)
 
Flowing society
Flowing societyFlowing society
Flowing society
 
Resume-Murugan
Resume-MuruganResume-Murugan
Resume-Murugan
 
How to do business in Pharma Industry in Brazil
How to do business in Pharma Industry in BrazilHow to do business in Pharma Industry in Brazil
How to do business in Pharma Industry in Brazil
 
Sports Geek Sports Social Media Rankings April 2013
Sports Geek Sports Social Media Rankings April 2013Sports Geek Sports Social Media Rankings April 2013
Sports Geek Sports Social Media Rankings April 2013
 
4900 Piazza Center Vraag 4 Visualisatie (lowres)
4900 Piazza Center Vraag 4 Visualisatie (lowres)4900 Piazza Center Vraag 4 Visualisatie (lowres)
4900 Piazza Center Vraag 4 Visualisatie (lowres)
 
Agenda Semanal Cultural de la Municipalidad de Asunción 26 mayo al 01 junio
Agenda Semanal Cultural de la Municipalidad de Asunción 26 mayo al 01 junioAgenda Semanal Cultural de la Municipalidad de Asunción 26 mayo al 01 junio
Agenda Semanal Cultural de la Municipalidad de Asunción 26 mayo al 01 junio
 
IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...
IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...
IDN & UDRP : tour d'horizon des litiges relatifs aux noms de domaine internat...
 
Instala y Configura Aplicaciones y Servicios
Instala y Configura Aplicaciones y ServiciosInstala y Configura Aplicaciones y Servicios
Instala y Configura Aplicaciones y Servicios
 
Medio ambiente oficina alba
Medio ambiente oficina albaMedio ambiente oficina alba
Medio ambiente oficina alba
 
Taller ecuador
Taller ecuadorTaller ecuador
Taller ecuador
 
Topicos
TopicosTopicos
Topicos
 
Moje Bambino: Kreator dobrej przestrzeni edukacyjnej
Moje Bambino: Kreator dobrej przestrzeni edukacyjnejMoje Bambino: Kreator dobrej przestrzeni edukacyjnej
Moje Bambino: Kreator dobrej przestrzeni edukacyjnej
 
Erasmus Kotka (Finlandia)
Erasmus Kotka (Finlandia)Erasmus Kotka (Finlandia)
Erasmus Kotka (Finlandia)
 

Ähnlich wie A survey of peer-to-peer content distribution technologies

A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)Dinesh Modak
 
Distributed computing
Distributed computingDistributed computing
Distributed computingshivli0769
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSNDhaya kanthavel
 
Distributed Computing Report
Distributed Computing ReportDistributed Computing Report
Distributed Computing ReportIIT Kharagpur
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Spacencct
 
Java Abs Peer To Peer Design & Implementation Of A Tuple S
Java Abs   Peer To Peer Design & Implementation Of A Tuple SJava Abs   Peer To Peer Design & Implementation Of A Tuple S
Java Abs Peer To Peer Design & Implementation Of A Tuple Sncct
 
Katasonov icinco08
Katasonov icinco08Katasonov icinco08
Katasonov icinco08cg19920128
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system Sarvesh Meena
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.pptNileshkuGiri
 
A Review Grid Computing
A Review  Grid ComputingA Review  Grid Computing
A Review Grid ComputingBecky Gilbert
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)irjes
 
F233842
F233842F233842
F233842irjes
 
communication in distributed systems
communication in distributed systemscommunication in distributed systems
communication in distributed systemsmohammed alrekabe
 

Ähnlich wie A survey of peer-to-peer content distribution technologies (20)

Grid computing
Grid computingGrid computing
Grid computing
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
Ijcatr04071003
Ijcatr04071003Ijcatr04071003
Ijcatr04071003
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN
 
Distributed Computing Report
Distributed Computing ReportDistributed Computing Report
Distributed Computing Report
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
 
Java Abs Peer To Peer Design & Implementation Of A Tuple S
Java Abs   Peer To Peer Design & Implementation Of A Tuple SJava Abs   Peer To Peer Design & Implementation Of A Tuple S
Java Abs Peer To Peer Design & Implementation Of A Tuple S
 
7- Grid Computing.Pdf
7- Grid Computing.Pdf7- Grid Computing.Pdf
7- Grid Computing.Pdf
 
Katasonov icinco08
Katasonov icinco08Katasonov icinco08
Katasonov icinco08
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system
 
B1802030511
B1802030511B1802030511
B1802030511
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
 
A Review Grid Computing
A Review  Grid ComputingA Review  Grid Computing
A Review Grid Computing
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
F233842
F233842F233842
F233842
 
communication in distributed systems
communication in distributed systemscommunication in distributed systems
communication in distributed systems
 
Grid computing
Grid computingGrid computing
Grid computing
 

Kürzlich hochgeladen

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

A survey of peer-to-peer content distribution technologies

  • 1. A Survey of Peer-to-Peer Content Distribution Technologies STEPHANOS ANDROUTSELLIS-THEOTOKIS AND DIOMIDIS SPINELLIS Athens University of Economics and Business Distributed computer architectures labeled “peer-to-peer” are designed for the sharing of computer resources (content, storage, CPU cycles) by direct exchange, rather than requiring the intermediation or support of a centralized server or authority. Peer-to-peer architectures are characterized by their ability to adapt to failures and accommodate transient populations of nodes while maintaining acceptable connectivity and performance. Content distribution is an important peer-to-peer application on the Internet that has received considerable research attention. Content distribution applications typically allow personal computers to function in a coordinated manner as a distributed storage medium by contributing, searching, and obtaining digital content. In this survey, we propose a framework for analyzing peer-to-peer content distribution technologies. Our approach focuses on nonfunctional characteristics such as security, scalability, performance, fairness, and resource management potential, and examines the way in which these characteristics are reflected in—and affected by—the architectural design decisions adopted by current peer-to-peer systems. We study current peer-to-peer systems and infrastructure technologies in terms of their distributed object location and routing mechanisms, their approach to content replication, caching and migration, their support for encryption, access control, authentication and identity, anonymity, deniability, accountability and reputation, and their use of resource trading and management schemes. Categories and Subject Descriptors: C.2.1 [Computer-Communication Networks]: Network Architecture and Design—Network topology; C.2.2 [Computer- Communication Networks]: Network Protocols—Routing protocols; C.2.4 [Computer-Communication Networks]: Distributed Systems—Distributed databases; H.2.4 [Database Management]: Systems—Distributed databases; H.3.4 [Information Storage and Retrieval]: Systems and Software—Distributed systems General Terms: Algorithms, Design, Performance, Reliability, Security Additional Key Words and Phrases: Content distribution, DOLR, DHT, grid computing, p2p, peer-to-peer 1. INTRODUCTION systems such as [Gnutella 2003], Seti@ Home [SetiAtHome 2003], OceanStore A new wave of network architectures [Kubiatowicz et al. 2000], and many labeled peer-to-peer is the basis of others. Such architectures are gener- operation of distributed computing ally characterized by the direct sharing Authors’ address: Athens University of Economics and Business, 76 Patission St., GR-104 34, Athens, Greece; email: stheotok@aueb.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or permissions@acm.org. c 2004 ACM 0360-0300/04/1200-0335 $5.00 ACM Computing Surveys, Vol. 36, No. 4, December 2004, pp. 335–371.
  • 2. 336 S. Androutsellis-Theotokis and D. Spinellis of computer resources (CPU cycles, stor- We then present the main attributes of age, content) rather than requiring the in- peer-to-peer content distribution systems, termediation of a centralized server. and the aspects of their architectural de- The motivation behind basing appli- sign which affect these attributes are an- cations on peer-to-peer architectures alyzed with reference to specific existing derives to a large extent from their ability peer-to-peer content distribution systems to function, scale, and self-organize in and technologies. the presence of a highly transient popu- Throughout this report the terms lation of nodes, network, and computer “node”, “peer” and “user” are used inter- failures, without the need of a central changeably, according to the context, to re- server and the overhead of its adminis- fer to the entities that are connected in a tration. Such architectures typically have peer-to-peer network. as inherent characteristics scalability, resistance to censorship and centralized 1.1. Defining Peer-to-Peer Computing control, and increased access to resources. Administration, maintenance, respon- A quick look at the literature reveals a con- sibility for the operation, and even the siderable number of different definitions notion of “ownership” of peer-to-peer sys- of “peer-to-peer”, mainly distinguished tems are also distributed among the users, by the “broadness” they attach to the instead of being handled by a single com- term. pany, institution or person (see also Agre The strictest definitions of “pure” peer- [2003] for an interesting discussion of in- to-peer refer to totally distributed sys- stitutional change through decentralized tems, in which all nodes are completely architectures). Finally, peer-to-peer archi- equivalent in terms of functionality and tectures have the potential to accelerate tasks they perform. These definitions fail communication processes and reduce to encompass, for example, systems that collaboration costs through the ad hoc ad- employ the notion of “supernodes” (nodes ministration of working groups [Schoder that function as dynamically assigned and Fischbach 2003]. localized mini-servers) such as Kazaa This report surveys peer-to-peer con- [2003], which are, however, widely ac- tent distribution technologies, aiming to cepted as peer-to-peer, or systems that provide a comprehensive account of ap- rely on some centralized server infras- plications, features, and implementation tructure for a subset of noncore tasks techniques. As this is a new and (thank- (e.g. bootstrapping, maintaining reputa- fully) rapidly evolving field, and advances tion ratings, etc). and new capabilities are constantly being According to a broader and widely ac- introduced, this article will be present- cepted definition in Shirky [2000], “peer- ing what essentially constitutes a “snap- to-peer is a class of applications that take shot” of the state of the art around the advantage of resources—storage, cycles, time of its writing—as is unavoidably the content, human presence—available at case for any survey of a thriving research the edges of the internet”. This defini- field. We do, however, believe that the tion, however, encompasses systems that core information and principles presented completely rely upon centralized servers will remain relevant and useful for the for their operation (such as seti@home, reader. various instant messaging systems, or In the next section, we define the basic even the notorious Napster), as well as concepts of peer-to-peer computing. We various applications from the field of Grid classify peer-to-peer systems into three computing. categories (communication and collab- Overall, it is fair to say that there is oration, distributed computation, and no general agreement about what “is” and content distribution). Content distribu- what “is not” peer-to-peer. tion systems are further discussed and We feel that this lack of agreement on categorized. a definition—or rather the acceptance of ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 3. A Survey of Content Distribution Technologies 337 various different definitions—is, to a large nections fail or recover, in order to main- extent, due to the fact that systems or tain its connectivity and performance. applications are labeled “peer-to-peer” not because of their internal operation or ar- We therefore propose the following defi- chitecture, but rather as a result of how nition: they are perceived “externally”, that is, Peer-to-peer systems are distributed systems whether they give the impression of pro- consisting of interconnected nodes able to self- viding direct interaction between comput- organize into network topologies with the purpose ers. As a result, different definitions of of sharing resources such as content, CPU cycles, “peer-to-peer” are applied to accommodate storage and bandwidth, capable of adapting to the various different cases of such systems failures and accommodating transient popula- or applications. tions of nodes while maintaining acceptable con- From our perspective, we believe that nectivity and performance, without requiring the the two defining characteristics of peer-to- intermediation or support of a global centralized server or authority. peer architectures are the following: —The sharing of computer resources by This definition is meant to encompass direct exchange, rather than requir- “degrees of centralization” ranging from ing the intermediation of a centralized the pure, completely decentralized sys- server. Centralized servers can some- tems such as Gnutella, to “partially cen- times be used for specific tasks (sys- tralized” systems1 such as Kazaa. How- tem bootstrapping, adding new nodes ever, for the purposes of this survey, we to the network, obtain global keys for shall not restrict our presentation and dis- data encryption), however, systems that cussion of architectures and systems to rely on one or more global centralized our own proposed definition, and we will servers for their basic operation (e.g. for take into account systems that are consid- maintaining a global index and search- ered peer-to-peer by other definitions as ing through it—Napster, Publius) are well, including systems that employ a cen- clearly stretching the definition of peer- tralized server (such as Napster, instant to-peer. messaging applications, and others). As the nodes of a peer-to-peer network The focus of our study is content distri- cannot rely on a central server coordi- bution, a significant area of peer-to-peer nating the exchange of content and the systems that has received considerable re- operation of the entire network, they are search attention. required to actively participate by inde- pendently and unilaterally performing 1.2. Peer-to-Peer and Grid Computing tasks such as searching for other nodes, Peer-to-peer and Grid computing are two locating or caching content, routing in- approaches to distributed computing, both formation and messages, connecting to concerned with the organization of re- or disconnecting from other neighboring source sharing in large-scale computa- nodes, encrypting, introducing, retriev- tional societies. ing, decrypting and verifying content, as Grids are distributed systems that en- well as others. able the large-scale coordinated use and —Their ability to treat instability and sharing of geographically distributed re- variable connectivity as the norm, au- sources, based on persistent, standards- tomatically adapting to failures in both based service infrastructures, often with network connections and computers, as a high-performance orientation [Foster well as to a transient population of et al. 2001]. nodes. This fault-tolerant, self-organizing ca- 1 In our definition, we refer to “global” servers, to pacity suggests the need for an adap- make a distinction from the dynamically assigned tive network topology that will change “supernodes” of partially centralized systems (see as nodes enter or leave and network con- also Section 3.3.3) ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 4. 338 S. Androutsellis-Theotokis and D. Spinellis As Grid systems increase in scale, they Chat/Irc, Instant Messaging (Aol, Icq, begin to require solutions to issues of self- Yahoo, Msn), and Jabber [Jabber 2003]. configuration, fault tolerance, and scala- Distributed Computation. This category bility, for which peer-to-peer research has includes systems whose aim is to take ad- much to offer. vantage of the available peer computer Peer-to-peer systems, on the other hand, processing power (CPU cycles). This is focus on dealing with instability, tran- achieved by breaking down a computer- sient populations, fault tolerance, and self- intensive task into small work units and adaptation. To date, however, peer-to-peer distributing them to different peer com- developers have worked mainly on verti- puters, that execute their corresponding cally integrated applications, rather than work unit and return the results. Cen- seeking to define common protocols and tral coordination is invariably required, standardized infrastructures for interop- mainly for breaking up and distributing erability. the tasks and collecting the results. Exam- In summary, one can say that “Grid com- ples of such systems include projects such puting addresses infrastructure, but not as Seti@home [Sullivan III et al. 1997; yet failure, while peer-to-peer addresses SetiAtHome 2003], genome@home [Lar- failure, but not yet infrastructure” [Foster son et al. 2003; GenomeAtHome 2003], and Iamnitchi 2003]. and others. In addition to this, the form of sharing Internet Service Support. A number of initially targeted by peer-to-peer has been different applications based on peer-to- of limited functionality, providing a global peer infrastructures have emerged for content distribution and filesharing space supporting a variety of Internet ser- lacking any form of access control. vices. Examples of such applications As peer-to-peer technologies move into include peer-to-peer multicast systems more sophisticated and complex applica- [VanRenesse et al. 2003; Castro et al. tions, such as structured content distri- 2002], Internet indirection infrastruc- bution, desktop collaboration, and net- tures [Stoica et al. 2002], and secu- work computation, it is expected that rity applications, providing protection there will be a strong convergence be- against denial of service or virus attacks tween peer-to-peer and Grid computing [Keromytis et al. 2002; Janakiraman et al. [Foster 2000]. The result will be a new 2003; Vlachos et al. 2004]. class of technologies combining elements Database Systems. Considerable work of both peer-to-peer and Grid comput- has been done on designing distributed ing, which will address scalability, self- database systems based on peer-to-peer adaptation, and failure recovery, while, infrastructures. Bernstein et al. [2002] at the same time, providing a persis- propose the Local Relational Model tent and standardized infrastructure for (LRM), in which the set of all data stored interoperability. in a peer-to-peer network is assumed to be comprised of inconsistent local rela- tional databases interconnected by sets 1.3. Classification of Peer-to-Peer of “acquaintances” that define translation Applications rules and semantic dependencies between Peer-to-peer architectures have been em- them. PIER [Huebsch et al. 2003] is a ployed for a variety of different application scalable distributed query engine built categories, which include the following. on top of a peer-to-peer overlay network Communication and Collaboration. topology that allows relational queries This category includes systems that to run across thousands of computers. provide the infrastructure for facilitating The Piazza system [Halevy et al. 2003] direct, usually real-time, communica- provides an infrastructure for building tion and collaboration between peer semantic Web [Berners-Lee et al. 2001] computers. Examples include chat and applications, consisting of nodes that can instant messaging applications, such as supply either source data (e.g. from a ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 5. A Survey of Content Distribution Technologies 339 relational database), schemas (or ontolo- An examination of current peer-to-peer gies) or both. Piazza nodes are transitively technologies in this context suggests that connected by chains of mappings between they can be grouped as follows (with ex- pairs of nodes, allowing queries to be amples in Tables I and II). distributed across the Piazza network. Finally, Edutella [Nejdl et al. 2003] is an —Peer-to-Peer Applications. This category open-source project that builds on the includes content distribution systems W3C metadata standard RDF, to provide that are based on peer-to-peer tech- a metadata infrastructure and querying nology. We attempt to further subdi- capability for peer-to-peer applications. vide them into the following two groups, Content Distribution. Most of the cur- based on their application goals and per- rent peer-to-peer systems fall within the ceived complexity: category of content distribution, which in- Peer-to-peer “file exchange” systems. cludes systems and infrastructures de- These systems are targeted towards signed for the sharing of digital media simple, one-off file exchanges between and other data between users. Peer-to- peers. They are used for setting up peer content distribution systems range a network of peers and providing fa- from relatively simple direct fileshar- cilities for searching and transferring ing applications, to more sophisticated files between them. These are typically systems that create a distributed stor- light-weight applications that adopt a age medium for securely and efficiently best-effort approach without addressing publishing, organizing, indexing, search- security, availability, and persistence. It ing, updating, and retrieving data. There is mainly systems in this category that are numerous such systems and in- are responsible for spawning the repu- frastructures. Some examples are: the tation (and in some cases notoriety) of late Napster, Publius [Waldman et al. peer-to-peer technologies. 2000], Gnutella [Gnutella 2003], Kazaa Peer-to-peer content publishing and stor- [Kazaa 2003], Freenet [Clarke et al. age systems. These systems are targeted 2000], MojoNation [MojoNation 2003], towards creating a distributed storage Oceanstore [Kubiatowicz et al. 2000], medium in—and through—which users PAST [Druschel and Rowstron 2001], will be able to publish, store, and dis- Chord [Stoica et al. 2001], Scan [Chen tribute content in a secure and persis- et al. 2000], FreeHaven [Dingledine tent manner. Such content is meant to et al. 2000], Groove [Groove 2003], and be accessible in a controlled manner by Mnemosyne [Hand and Roscoe 2002]. peers with appropriate privileges. The This survey will focus on content distri- main focus of such systems is security bution, one of the most prominent appli- and persistence, and often the aim is cation areas of peer-to-peer systems. to incorporate provisions for account- ability, anonymity and censorship resis- tance, as well as persistent content man- 1.4. Peer-to-Peer Content Distribution agement (updating, removing, version In its most basic form, a peer-to-peer con- control) facilities. tent distribution system creates a dis- —Peer-to-Peer Infrastructures. This cate- tributed storage medium that allows for gory includes peer-to-peer based infras- the publishing, searching, and retrieval tructures that do not constitute working of files by members of its network. As applications, but provide peer-to-peer systems become more sophisticated, non- based services and application frame- functional features may be provided, in- works. The following infrastructure ser- cluding provisions for security, anonymity, vices are identified: fairness, increased scalability and perfor- Routing and location. Any peer-to-peer mance, as well as resource management content distribution system relies on and organization capabilities. All of these a network of peers within which re- will be discussed in the following sections. quests and messages must be routed ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 6. 340 S. Androutsellis-Theotokis and D. Spinellis Table I. A Classification of Current Peer-to-Peer Systems (RM: Resource Management; CR: Censorship Resistance; PS: Performance and Scalability; SPE: Security, Privacy and Encryption; A: Anonymity; RA: Reputation and Accountability; RT: Resource Trading.) Peer-to-Peer File Exchange Systems System Brief Description Napster Distributed file sharing—hybrid decentralized. Kazaa [2003] Distributed file sharing—partially centralized. Gnutella [2003] Distributed file sharing—purely decentralized. Peer-to-Peer Content Publishing and Storage Systems System Brief Description Main Focus Scan [Chen et al. 2000] A dynamic, scalable, efficient content distribution PS network. Provides dynamic content replication. Publius [Waldman et al. 2000] A censorship-resistant system for publishing RM content. Static list of servers. Enhanced content management (update and delete). Groove [Groove 2003] Internet communications software for direct RM,PS,SPE real-time peer-to-peer interaction. FreeHaven [Dingledine et al. 2000] A flexible system for anonymous storage. A,RA Freenet [Clarke et al. 2000] Distributed anonymous information storage and A,RA retrieval system. MojoNation [MojoNation 2003] Distributed file storage. Fairness through the use SPE,RT of currency mojo. Oceanstore [Kubiatowicz et al. 2000] An architecture for global scale persistent storage. RM,PS,SPE Scalable, provides security and access control. Intermemory [Chen et al. 1999] System of networked computers. Donate storage RT in exchange for the right to publish data. Mnemosyne [Hand and Roscoe 2002] Peer-to-peer steganographic storage system. SPE Provides privacy and plausible deniability. PAST [Druschel and Rowstron 2001] Large scale persistent peer-to-peer storage utility. PS,SPE Dagster [Stubblefield and Wallach 2001] A censorship-resistant document publishing CR,SPE system. Tangler [Waldman and Mazi 2001] A content publishing system based on document CR,SPE entanglements. with efficiency and fault tolerance, and 2. ANALYSIS FRAMEWORK through which peers and content can In this survey, we present a description be efficiently located. Different infras- and analysis of applications, systems, and tructures and algorithms have been infrastructures that are based on peer- developed to provide such services. to-peer architectures and aim at either Anonymity. Peer-to-peer based infras- offering content distribution solutions, or tructure systems have been designed supporting content distribution related with the explicit aim of providing user activities. anonymity. Our approach is based on: Reputation Management. In a peer- to-peer network, there is no central —identifying the feature space of nonfunc- organization to maintain reputation tional properties and characteristics of information for users and their behavior. content distribution systems, Reputation information is, therefore, —determining the way in which the non- hosted in the various network nodes. In functional properties depend on, and order for such reputation information to can be affected by, various design fea- be kept secure, up-to-date, and avail- tures, and able throughout the network, complex —providing an account, analysis, and reputation management infrastructures evaluation of the design features and need to be employed. solutions that have been adopted by ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 7. A Survey of Content Distribution Technologies 341 Table II. A Classification of Current Peer-to-Peer Infrastructures Infrastructures for routing and location Chord [Stoica et al. 2001] A scalable peer-to-peer lookup service. Given a key, it maps the key to a node. CAN [Ratnasamy et al. 2001] Scalable content addressable network. A distributed infrastructure that provides hash-table functionality for mapping file names to their locations. Pastry [Rowstron and Druschel 2001] Infrastructure for fault-tolerant wide-area location and routing. Tapestry [Zhao et al. 2001] Infrastructure for fault-tolerant wide-area location and routing. Kademlia [Mayamounkov and Mazieres 2002] A scalable peer-to-peer lookup service based on the XOR metric. Infrastructures for anonymity Anonymous remailer mixnet [Berthold et al. 1998] Infrastructure for anonymous connection. Onion Routing [Goldschlag et al. 1999] Infrastructure for anonymous connection. ZeroKnowledge Freedom [Freedom 2003] Infrastructure for anonymous connection. Tarzan [Freedman et al. 2002] A peer-to-peer decentralized anonymous network layer. Infrastructures for reputation management Eigentrust [Kamvar et al. 2003] A Distributed reputation management algorithm. A Partially distributed reputation management A partially distributed approach based on a system [Gupta et al. 2003] debit/credit or debit-only scheme. PeerTrust [Xiong and Liu 2002] A decentralized, feedback-based reputation management system using satisfaction and number of interaction metrics. current peer-to-peer systems, as well as data and processing methods. Unautho- their shortcomings, potential improve- rized entities cannot change data; ad- ments, and proposed alternatives. versaries cannot substitute a forged doc- ument for a requested one. For the identification of the nonfunc- Privacy and confidentiality. Ensuring tional characteristics on which our study that data is accessible only to those is based, we used the work of Shaw and authorized to have access, and that Garlan [1995], which we adapted to the there is control over what data is col- field of peer-to-peer content distribution. lected, how it is used, and how it is The various design features and solu- maintained. tions that we examine, as well as their Availability and persistence. Ensuring relationship to the relevant nonfunctional that authorized users have access to characteristics, were assembled through data and associated assets when re- a detailed analysis and study of current quired. For a peer-to-peer content dis- peer-to-peer content distribution systems tribution system this often means al- that are either deployed, researched, or ways. This property entails stability in proposed. the presence of failure, or changing node The resulting analysis framework is il- populations. lustrated in Figure 1. The boxes in the periphery show the most important at- Scalability. Maintaining the system’s per- tributes of peer-to-peer content distribu- formance attributes independent of the tion systems, described as follows. number of nodes or documents in its network. A dramatic increase in the Security. Further analyzed in terms of: number of nodes or documents will Integrity and authenticity. Safeguard- have minimal effect on performance and ing the accuracy and completeness of availability. ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 8. 342 S. Androutsellis-Theotokis and D. Spinellis Fig. 1. Illustration of the way in which various design features affect the main characteristics of peer- to-peer content distribution systems. Performance. The time required for per- grouping, based on the content itself; forming the operations allowed by the grouping, based on locality or network system, typically publication, searching, distance; grouping, based on organiza- and retrieval of documents. tion ties, as well as others. Fairness. Ensuring that users offer and The relationships between nonfunc- consume resources in a fair and bal- tional characteristics are depicted as a anced manner. May rely on account- UML diagram in Figure 1. The Figure ability, reputation, and resource trading schematically illustrates the relationship mechanisms. between the various design features and Resource Management Capabilities. In the main characteristics of peer-to-peer their most basic form, peer-to-peer con- content distribution systems. tent distribution systems allow the pub- The boxes in the center show the design lishing, searching, and retrieval of docu- decisions that affect these attributes. We ments. More sophisticated systems may note that these design decisions are mostly afford more advanced resource manage- independent and orthogonal. ment capabilities, such as editing or We see, for example, that the perfor- removal of documents, management of mance of a peer-to-peer content distribu- storage space, and operations on meta- tion system is affected by the distributed data. object location and routing mechanisms, Semantic Grouping of Information. An as well as by the data replication, area of research that has attracted con- caching, and migration algorithms. Fair- siderable attention recently is the se- ness, on the other hand, depends on mantic grouping and organization of the system’s provisions for accountabil- content and information in peer-to-peer ity and reputation, as well as on any re- networks. Various grouping schemes source trading mechanisms that may be are encountered, such as semantic implemented. ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 9. A Survey of Content Distribution Technologies 343 In the following sections of this survey, such networks are often termed “servents” the different design decisions and features (SERVers+clieENTS). will be presented and discussed, based on Partially Centralized Architectures. The an examination of existing peer-to-peer basis is the same as with purely decentral- content distribution systems, and with ized systems. Some of the nodes, however, reference to the way in which they affect assume a more important role, acting as the main attributes of these systems, local central indexes for files shared by lo- as presented. A relevant presentation cal peers. The way in which these supern- and discussion of various desired prop- odes are assigned their role by the network erties of peer-to-peer systems can also varies between different systems. It is im- be found in Kubiatowics [2003], while portant, however, to note that these su- an analysis of peer-to-peer systems from pernodes do not constitute single points the end-user perspective can be found in of failure for a peer-to-peer network, since Lee [2003]. they are dynamically assigned and, if they fail, the network will automatically take 3. PEER-TO-PEER DISTRIBUTED OBJECT action to replace them with others. LOCATION AND ROUTING Hybrid Decentralized Architectures. In these systems, there is a central server fa- The operation of any peer-to-peer content cilitating the interaction between peers by distribution system relies on a network maintaining directories of metadata, de- of peer computers (nodes), and connec- scribing the shared files stored by the peer tions (edges) between them. This network nodes. Although the end-to-end interac- is formed on top of—and independently tion and file exchanges may take place di- from—the underlying physical computer rectly between two peer nodes, the central (typically IP) network, and is thus referred servers facilitate this interaction by per- to as an “overlay” network. The topology, forming the lookups and identifying the structure, and degree of centralization of nodes storing the files. The terms “peer- the overlay network, and the routing and through-peer” or “broker mediated” are location mechanisms it employs for mes- sometimes used for such systems [Kim sages and content are crucial to the oper- 2001]. ation of the system, as they affect its fault Obviously, in these architectures, there tolerance, self-maintainability, adaptabil- is a single point of failure (the central ity to failures, performance, scalability, server). This typically renders them inher- and security; in short almost all of the sys- ently unscalable and vulnerable to censor- tem’s attributes as laid out in Section 2 ship, technical failure, or malicious attack. (see also Figure 1). Overlay networks can be distinguished in terms of their centralization and 3.2. Network Structure structure. By structure, we refer to whether the over- lay network is created nondeterministi- 3.1. Overlay Network Centralization cally (ad hoc) as nodes and content are Although in their purest form peer-to-peer added, or whether its creation is based overlay networks are supposed to be to- on specific rules. We categorize peer-to- tally decentralized, in practice this is not peer networks as follows, in terms of their always true, and systems with various de- structure: grees of centralization are encountered. Unstructured. The placement of content Specifically, the following three categories (files) is completely unrelated to the over- are identified. lay topology. Purely Decentralized Architectures. All In an unstructured network, content nodes in the network perform exactly the typically needs to be located. Searching same tasks, acting both as servers and mechanisms range from brute force meth- clients, and there is no central coordi- ods, such as flooding the network with nation of their activities. The nodes of propagating queries in a breadth-first ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 10. 344 S. Androutsellis-Theotokis and D. Spinellis or depth-first manner until the desired Table III. A classification of Peer-to-Peer Content content is located, to more sophisticated Distribution Systems and Location and Routing Infrastructures in Terms of Their Network Structure, and resource-preserving strategies that With Some Typical Examples include the use of random walks and rout- ing indices (discussed in more detail in Centralization Section 3.3.4). The searching mechanisms employed in unstructured networks have Hybrid Partial None obvious implications, particularly in re- Unstructured Napster, Kazaa, Gnutella, gards to matters of availability, scalability, Publius Mor- FreeHaven and persistence. pheus, Unstructured systems are generally Gnutella, Edutella more appropriate for accommodating highly-transient node populations. Some Structured Chord, representative examples of unstructured Infrastructures CAN, systems are Napster, Publius [Waldman Tapestry, Pastry et al. 2000], Gnutella [Gnutella 2003], Kazaa [Kazaa 2003], Edutella [Nejdl et al. Structured OceanStore, 2003], FreeHaven [Dingledine et al. 2000], Systems Mnemosyne, Scan, PAST, as well as others. Kademlia, Structured. These have emerged mainly Tarzan in an attempt to address the scalability issues that unstructured systems were A category of networks that are in be- originally faced with. In structured net- tween structured and unstructured are re- works, the overlay topology is tightly ferred to as loosely structured networks. controlled and files (or pointers to them) Although the location of content is not are placed at precisely specified locations. completely specified, it is affected by rout- These systems essentially provide a map- ing hints. A typical example is Freenet ping between content (e.g. file identifier) [Clarke et al. 2000; Clake et al. 2002]. and location (e.g. node address), in the Table III summarizes the categories we form of a distributed routing table, so that outlined, with examples of peer-to-peer queries can be efficiently routed to the content distribution systems and architec- node with the desired content [Lv et al. tures. Note that all structured and loosely 2002]. structured systems are inherently purely Structured systems offer a scalable so- decentralized; form follows function. lution for exact-match queries, that is, In the following sections, the overlay queries where the exact identifier of the network topology and operation of differ- requested data object is known (as com- ent peer-to-peer systems is discussed, fol- pared to keyword queries). Using exact- lowing the above classification, according match queries as a substrate for keyword to degree of centralization and network queries remains an open research problem structure. for distributed environments [Witten et al. 1999]. 3.3. Unstructured Architectures A disadvantage of structured systems is 3.3.1. Hybrid Decentralized. Figure 2 il- that it is hard to maintain the structure lustrates the architecture of a typical hy- required for efficiently routing messages brid decentralized peer-to-peer system. in the face of a very transient node popu- Each client computer stores content lation, in which nodes are joining and leav- (files) shared with the rest of the network. ing at a high rate [Lv et al. 2002]. All clients connect to a central directory Typical examples of structured systems server that maintains: include Chord [Stoica et al. 2001], CAN [Ratnasamy et al. 2001], PAST [Druschel —A table of registered user connection in- and Rowstron 2001], Tapestry [Zhao et al. formation (IP address, connection band- 2001] among others. width etc.) ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 11. A Survey of Content Distribution Technologies 345 torious Napster and Publius systems [Waldman et al. 2000] that rely on a static, system-wide list of servers. Their architec- ture does not provide any smooth, decen- tralized support for adding a new server, or removing dead or malicious servers. A comprehensive study of the behavior of hybrid peer-to-peer systems and a com- parison of their query characteristics and their performance in terms of bandwidth and CPU cycle consumption is presented in Yang and Garcia-Molina [2001]. It should be noted that systems that do not fall under the hybrid decentral- ized category may still use some central administration server to a limited extent, Fig. 2. Typical hybrid decentralized peer-to-peer for example, for initial system boot- architecture. A central directory server maintains strapping (e.g. Mojonation [MojoNation an index of the metadata for all files in the network. 2003]), or for allowing new users to —A table listing the files that each user join the network by providing them holds and shares in the network, along with access to a list of current users with metadata descriptions of the files (e.g. gnutellahosts.com for the gnutella (e.g. filename, time of creation, etc.) network). A computer that wishes to join the 3.3.2. Purely Decentralized. In this sec- network contacts the central server and tion, we examine the Gnutella network reports the files it maintains. Client com- [Gnutella 2003], an interesting and rep- puters send requests for files to the server. resentative member of purely decentral- The server searches for matches in its in- ized peer-to-peer architectures, due to its dex, returning a list of users that hold the open architecture, achieved scale, and self- matching file. The user then opens direct organizing structure. FreeHaven [Dingle- connections with one or more of the peers dine et al. 2000] is another system using that hold the requested file, and down- routing and search mechanisms similar to loads it (see Figure 2). those of Gnutella. The advantage of hybrid decentralized Like most peer-to-peer systems, systems is that they are simple to imple- Gnutella builds a virtual overlay net- ment, and they locate files quickly and ef- work with its own routing mechanisms ficiently. Their main disadvantage is that [Ripeanu and Foster 2002], allowing its they are vulnerable to censorship, legal ac- users to share files with other peers. tion, surveillance, malicious attack, and There is no central coordination of the technical failure, since the content shared, activities in the network and users connect or at least descriptions of it, and the abil- to each other directly through a software ity to access it are controlled by the single application that functions both as a client institution, company, or user maintain- and a server (users are referred to as a ing the central server. Furthermore, these servents). systems are considered inherently unscal- Gnutella uses IP as its underlying net- able, as there are bound to be limitations work service, while the communication be- to the size of the server database and its tween servents is specified in a form of capacity to respond to queries. Large Web application level protocol supporting four search engines have, however, repeatedly types of messages [Jovanovich et al. 2001]: provided counterexamples to this notion. Examples of hybrid decentralized con- Ping. A request for a certain host to an- tent distribution systems include the no- nounce itself. ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 12. 346 S. Androutsellis-Theotokis and D. Spinellis Pong. Reply to a Ping message. It con- Once a node receives a QueryHit mes- tains the IP and port of the responding sage, indicating that the target file has host and number and size of the files being been identified at a certain node, it ini- shared. tiates a download by establishing a di- Query. A search request. It contains a rect connection between the two nodes. search string and the minimum speed re- Figure 3 illustrates an example of the quirements of the responding host. Gnutella search mechanism. Query Hits. Reply to a Query message. Scalability issues in the original purely It contains the IP, port, and speed of the decentralized systems arose from the fact responding host, the number of match- that the use of the TTL effectively seg- ing files found, and their indexed result mented the network into “sub-networks”, set. imposing on each user a virtual “hori- zon” beyond which their messages could After joining the Gnutella network (by not reach [Jovanovic 2000]. Removing the connecting to nodes found in databases TTL limit, on the other hand, would re- such as gnutellahosts.com), a node sends sult in the network being swamped with out a Ping message to any node it is messages. connected to. The nodes send back a Significant research has been carried Pong message identifying themselves, and out to address the above issues, and also propagate the Ping message to their various solutions have been proposed. neighbors. These will be discussed in more detail in In order to locate a file in unstructured Section 3.3.4. systems such as gnutella, nondeterminis- tic searches are the only option since the 3.3.3. Partially Centralized. Partially cen- nodes have no way of guessing where (at tralized systems are similar to purely which nodes) the files may lie. decentralized, but they use the con- The original Gnutella architecture uses cept of supernodes: nodes that are dy- a flooding (or broadcast) mechanism to dis- namically assigned the task of servic- tribute Ping and Query messages: each ing a small subpart of the peer network Gnutella node forwards the received mes- by indexing and caching files contained sages to all of its neighbors. The response therein. messages received are routed back along Peers are automatically elected to be- the opposite path through which the origi- come supernodes if they have sufficient nal request arrived. To limit the spread of bandwidth and processing power (al- messages through the network, each mes- though a configuration parameter may sage header contains a time-to-live (TTL) allow users to disable this feature) field. At each hop, the value of this field [FastTrack 2003]. is decremented, and when it reaches zero, Supernodes index the files shared by the message is dropped. peers connected to them, and proxy search The above mechanism is implemented requests on behalf of these peers. All by assigning each message a unique iden- queries are therefore initially directed to tifier and equipping each host with a dy- supernodes. namic routing table of message identifiers Two major advantages of partially cen- and node addresses. Since the response tralized systems are that: messages contain the same ID as the orig- inal messages, the host checks its routing —Discovery time is reduced in compari- table to determine along which link the re- son with purely decentralized systems, sponse message should be forwarded. In while there still is no unique point of order to avoid loops, the nodes use the failure. If one or more supernodes go unique message identifiers to detect and down, the nodes connected to them can drop duplicate messages, to improve effi- open new connections with other su- ciency, and preserve network bandwidth pernodes, and the network will continue [Jovanovic 2000]. to operate. ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 13. A Survey of Content Distribution Technologies 347 Fig. 3. An example of the Gnutella search mechanism. Solid lines between the nodes represent connections of the Gnutella network. The search orig- inates at the “requestor” node, for a file maintained by another node. Re- quest messages are dispatched to all neighboring nodes, and propagated from node-to-node as shown in the four consecutive steps (a) to (d). When a node receives the same request message (based on its unique identifier) multiple times, it replies with a failure message to avoid loops and minimize traffic. When the file is identified, a success message is returned. —The inherent heterogeneity of peer-to- mentation (as it is a proprietary system, peer networks is taken advantage of, there is no detailed documentation on its and exploited. In a purely decentralized structure and operation). Edutella [Nejdl network, all of the nodes will be equally et al. 2003] is another partially centralized (and usually heavily) loaded, regard- architecture. less of their CPU power, bandwidth, or Yang and Garcia-Molina [2002a, 2002b] storage capabilities. In partially central- present research addressing the design ized systems, however, the supernodes of, and searching techniques for, partially will undertake a large portion of the centralized peer-to-peer networks. entire network load, while most of the The concept of supernodes has also been other (so called “normal”) nodes will be proposed in a more recent version of the very lightly loaded, in comparison (see Gnutella protocol. A mechanism for dy- also [Lv et al. 2002; Zhichen et al. 2002]). namically selecting supernodes organizes the Gnutella network into an interconnec- Kazaa is a typical, widely used instance tion of superpeers (as they are referred to) of a partially centralized system imple- and client nodes. ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 14. 348 S. Androutsellis-Theotokis and D. Spinellis When a node with enough CPU power sues by means of: joins the network, it immediately be- comes a superpeer and establishes con- —A dynamic topology adaptation protocol nections with other superpeers, forming a that ensures that most nodes will be at a flat unstructured network of superpeers. short distance from high capacity nodes. If it establishes a minimum required num- This protocol, coupled with replicating ber of connections to client nodes within a pointers to the content of neighbors, en- specified time, it remains a superpeer. Oth- sures that high capacity nodes will be erwise, it turns into a regular client node. able to provide answers to a very large number of queries. 3.3.4. Shortcomings and Evolution of Unstruc- —An active flow control scheme that ex- tured Architectures. A number of methods ploits network heterogeneity to avoid have been proposed to overcome the origi- hotspots, and nal unscalability of unstructured peer-to- —A search protocol that is based on ran- peer systems. These have been shown to dom walks directed towards high capac- drastically improve their performance. ity nodes. Lv et al. [2002] proposed to replace the original flooding approach by multiple Simulations of Gia were found to in- parallel random walks, where each node crease overall system capacity by three chooses a neighbor at random, and propa- to five orders of magnitude. Similar ap- gates the request only to it. The use of ran- proaches, based on taking advantage of dom walks, combined with proactive object the underlying network heterogeneity, are replication (discussed in Section 4), was described in Lv et al. [2002] and Zhichen found to significantly improve the perfor- et al. [2002]. mance of the system as measured by query In another approach, Crespo and resolution time (in terms of numbers of Garcia-Molina [2002] use routing indices hops), per-node query load, and message to address the searching and scalability traffic generated. Proactive data replica- issues. Routing indices are tables of infor- tion schemes are further examined in Co- mation about other nodes, stored within hen and Shenker [2001]. each node. They provide a list of neighbors Yang and Garcia-Molina [2002b] sug- that are most likely to be “in the direction” gested the use of more sophisticated of the content corresponding to the query. broadcast policies, selecting which neigh- These tables contain information about bors to forward search queries to based on the total number of files maintained by their past history, as well as the use of local nearby nodes, as well as the number indices: data structures where each node of files corresponding to various topics. maintains an index of the data stored at Three different variations of the approach nodes located within a radius from itself. are presented, and simulations are shown A similar solution to the information re- to improve performance by 1-2 orders trieval problem is proposed by Kalogeraki of magnitude with respect to flooding et al. [2002] in the form of an Intelli- techniques. gent Search Mechanism built on top of Finally, the connectivity properties and a Modified Random BFS Search Mecha- reliability of unstructured peer-to-peer nism. Each peer forwards queries to a sub- networks such as Gnutella have been set of its neighbors, selecting them based studied in Ripeanu and Foster [2002]. In on a profile mechanism that maintains in- particular, emphasis was placed on the formation about their performance in re- fact that peer-to-peer networks exhibit cent queries. Neighbours are ranked ac- the properties of power-law networks, cording to their profiles and queries are in which the number of nodes with L forwarded selectively to the most appro- links is proportional to L−k , where k is priate ones only. a network dependent constant. In other Chawathe et al. [2003] presented the words, most nodes have few links (thus, a Gia System, which addresses the above is- large fraction of them can be taken away ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 15. A Survey of Content Distribution Technologies 349 without seriously damaging the network —CAN is a system using an n-dimensional connectivity), while there are a few highly- Cartesian coordinate space to imple- connected nodes which, if taken away, are ment the distributed location and rout- likely to cause the whole network to be ing table, whereby each node is respon- broken down in disconnected pieces. One sible for a zone in the coordinate space. implication of this is that such networks —Tapestry (and the similar Pastry and are robust when facing random node Kademlia) are based on the plaxton attacks, yet vulnerable to well-planned mesh data structure, which maintains attacks. The topology mismatch between pointers to nodes in the network whose the Gnutella network and the underlying IDs match the elements of a tree-like physical network infrastructure was also structure of ID prefixes up to a digit documented in Ripeanu and Foster [2002]. position. Tsoumakos and Roussopoulos [2003] present a more comprehensive analysis 3.4.1. Freenet—A Loosely Structured Sys- and comparison of the above methods. tem. The defining characteristic of loosely Overall, unstructured peer-to-peer con- structured systems is that the nodes of the tent distribution systems might be the peer-to-peer network can produce an esti- preferred choice for applications where the mate (not with certainty) of which node is following assumptions hold: most likely to store certain content. This affords them the possibility of avoiding —keyword searching is the common oper- blindly broadcasting request messages to ation, all (or a random subset) of their neighbors. —most content is typically replicated at a Instead, they use a chain mode propaga- fair fraction of participating sites, tion approach, where each node makes a —the node population is highly transient, local decision about which node to send the —users will accept a best-effort content re- request message to next. trieval approach, and Freenet [Clarke et al. 2000] is a typical, purely decentralized loosely-structured —the network size is not so large as to in- content distribution system. It operates cur scalability problems [Lv et al. 2002] as a self-organizing peer-to-peer network, (The last issue is alleviated through pooling unused disk space in peer com- the various methods described in this puters to create a collaborative virtual Section). file system. Important features of Freenet include its focus on security, publisher 3.4. Structured Architectures anonymity, deniability, and data replica- The various structured content distribu- tion for availability and performance. tion systems and infrastructures employ Files in Freenet are identified by unique different mechanisms for routing mes- binary keys. Three types of keys are sup- sages and locating data. Four of the most ported, the simplest is based on applying interesting and representative mecha- a hash function on a short descriptive text nisms and their corresponding systems string that accompanies each file as it is are examined in the following sections. stored in the network by its original owner. Each Freenet node maintains its own lo- —Freenet is a loosely structured system cal data store, that it makes available to that uses file and node identifier similar- the network for reading and writing, as ity to produce an estimate of where a file well as a dynamic routing table contain- may be located, and a chain mode propa- ing the addresses of other nodes and the gation approach to forward queries from files they are thought to hold. To search node-to-node. for a file, the user sends a request message —Chord is a system whose nodes maintain specifying the key and a timeout (hops-to- a distributed routing table in the form of live) value. an identifier circle on which all nodes are Freenet uses the following types of mes- mapped, and an associated finger table. sages, which all include the node identifier ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 16. 350 S. Androutsellis-Theotokis and D. Spinellis (for loop detection), a hops-to-live value If a node receives a request for a locally- (similar to the Gnutella TTL, see Sec- stored file, the search stops and the data tion 3.3.2), and the source and destination is forwarded back to the requestor. node identifiers: If the node does not store the file that the requestor is looking for, it forwards the re- Data insert. A node inserts new data quest to its neighbor that is most likely to in the network. A key and the actual data have the file, by searching for the file key (file) are included. in its local routing table that is closest to Data request. A request for a certain the one requested. The messages, there- file. The key of the file requested is also fore, form a chain, as they propagate from included. node-to-node. To avoid huge chains, mes- Data reply. A reply initiated when the sages are deleted after passing through requested file is located. The actual file is a certain number of nodes, based on the also included in the reply message. hops-to-live value they carry. Nodes also Data failed. A failure to locate a file. store the ID and other information of the The location (node) of the failure and the requests they have seen, in order to handle reason are also included. “data reply” and “data failed” messages. New nodes join the Freenet network If a node receives a backtracking “data by first discovering the address of one failed” message from a downstream node, or more existing nodes, and then start- it selects the next best node from its rout- ing to send Data Insert messages. To in- ing stack and forwards the request to it. sert a new file in the network, the node If all nodes in the routing table have been first calculates a binary key for the file, explored in this way and failed, it sends and then sends a data insert message to back a “data failed” message to the node itself. Any node that receives the insert from which it originally received the data message, first checks to see if the key is request message. already taken. If the key is not found, the If the requested file is eventually found node looks up the closest key (in terms at a certain node, a reply is passed back of lexicographic distance) in its routing through each node that forwarded the re- table, and forwards the insert message quest to the original node that started to the corresponding node. By this mech- the chain. This data reply message will anism, newly inserted files are placed include the actual data, which is cached at nodes possessing files with similar in all intermediate nodes for future re- keys. quests. A subsequent request with the This continues as long as the hops-to- same key will be served immediately with live limit is not reached. In this way, more the cached data. A request for a similar than one node will store the new file. At key will be forwarded to the node that pre- the same time, all the participating nodes viously provided the data. will update their routing tables with the To address the problem of obtaining new information (this is the mechanism the key that corresponds to a specific file, through which the new nodes announce Freenet recommends the use of a special their presence to the rest of the network). class of lightweight files called “indirect If the hops-to-live limit is reached without files”. When a real file is inserted, the au- any key collision, an “all clear” result thor also inserts a number of indirect files will be propagated back to the original that are named according to search key- inserter, informing that the insert was words and contain pointers to the real file. successful. These indirect files differ from normal files If the key is found to be taken, the node in that multiple files with the same key returns the preexisting file as if a request (i.e. search keyword) are permitted to ex- were made for it. In this way, malicious ist, and requests for such keys keep going attempts to supplant existing files by in- until a specified number of results is ac- serting junk will result in the existing files cumulated, instead of stopping at the first being spread further. file found. The problem of managing the ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 17. A Survey of Content Distribution Technologies 351 Fig. 4. The use of indirect files in Freenet. The diagram illustrates a regular file with key “A652D17D88”, which is supposed to contain a tu- torial about photography. The author chose to in- sert the file itself, and then a set of indirect files (marked with an i), named according to search key- Fig. 5. A Chord identifier circle consisting of the words she considered relevant. The indirect files three nodes 0,1 and 3. In this example, key 1 is lo- are distributed among the nodes of the network; cated at node 1, key 2 at node 3, and key 6 at node 0. they do not contain the actual document, simply a “pointer” to the location of the regular file contain- ing the document. [Karger et al. 1997]. All node identifiers are ordered in an “identifier circle” modulo large volume of such indirect files remains 2m (Figure 5 shows an identifier circle with open. Figure 4 illustrates the use of indi- m = 3). Key k is assigned to the first node rect files. whose identifier is equal to, or follows k, in The following properties of Freenet are the identifier space. This node is called the a result of its routing and location algo- successor node of key k. The use of consis- rithms: tent hashing tends to balance load, as each node receives roughly the same number of —Nodes tend to specialize in searching keys. for similar keys over time, as they get The only routing information required queries from other nodes for similar is for each node to be aware of its succes- keys. sor node on the circle. Queries for a given —Nodes store similar keys over time, due key are passed around the circle via these to the caching of files as a result of suc- successor pointers until a node that con- cessful queries. tains the key is encountered. This is the —Similarity of keys does not reflect simi- node the query maps to. larity of files. When a new node n joins the network, —Routing does not reflect the underlying certain keys previously assigned to n’s suc- network topology. cessor will become assigned to n. When node n leaves the network, all keys as- 3.4.2. Chord. Chord [Stoica et al. 2001] signed to it will be reassigned to its suc- is a peer-to-peer routing and location in- cessor. These are the only changes in key frastructure that performs a mapping of assignments that need to take place in or- file identifiers onto node identifiers. Data der to maintain load balance. location can be implemented on top of As discussed, only one data element Chord by identifying data items (files) per node needs to be correct for Chord to with keys and storing the (key, data item) guarantee correct (though slow) routing of pairs at the node that the keys map to. queries. Performance degrades gracefully In Chord, nodes are also identified by when routing information becomes out of keys. The keys are assigned both to files date due to nodes joining and leaving the and nodes by means of a deterministic system, and availability remains high only function, a variant of consistent hashing as long as nodes fail independently. Since ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 18. 352 S. Androutsellis-Theotokis and D. Spinellis Fig. 6. CAN: (a) Example 2-d [0, 1] × [0, 1] coordinate space partitioned between 5 CAN nodes; (b) Example 2-d space after node F joins. the overlay topology is not based on the by supporting the insertion, lookup, and underlying physical IP network topology, a deletion of (key,value) pairs in the table. single failure in the IP network may man- Each individual node of the CAN net- ifest itself as multiple, scattered link fail- work stores a part (referred to as a “zone”) ures in the overlay [Saroiu et al. 2002]. of the hash table, as well as information To increase the efficiency of the loca- about a small number of adjacent zones tion mechanism described previously, that in the table. Requests to insert, lookup, or may, in the worst case, require traversing delete a particular key are routed via in- all N nodes to find a certain key, Chord termediate zones to the node that main- maintains additional routing information, tains the zone containing the key. in the form of a “finger table”. In this table, CAN uses a virtual d -dimensional each entry i points to the successor of node Cartesian coordinate space (see Figure 6) n + 2i. For a node n to perform a lookup to store (key K ,value V ) pairs. The zone for key k, the finger table is consulted to of the hash table that a node is respon- identify the highest node n whose ID is sible for corresponds to a segment of this between n and k. If such a node exists, the coordinate space. Any key K is, therefore, lookup is repeated starting from n . Oth- deterministically mapped onto a point P erwise, the successor of n is returned. Us- in the coordinate space. The (K , V ) pair is ing the finger table, both the amount of then stored at the node that is responsible routing information maintained by each for the zone within which point P lies. For node and the time required for resolving example, in the case of Figure 6(a), a key lookups are O(logN ) for an N -node sys- that maps to coordinate (0.1,0.2) would be tem in the steady state. stored at the node responsible for zone B. Achord [Hazel and Wiley 2002] is pro- To retrieve the entry corresponding to posed as a variant of Chord that pro- K , any node can apply the same determin- vides censorship resistance by limiting istic function to map K to P , and then re- each node’s knowledge of the network in trieve the corresponding value V from the ways similar to Freenet (see Section 3.4.1). node covering P . Unless P happens to lie in the requesting node’s zone, the request 3.4.3. CAN. The CAN (“Content must be routed from node-to-node until it Addressable Network”) [Ratnasamy reaches the node covering P . et al. 2001] is essentially a distributed, CAN nodes maintain a routing table Internet-scale hash table that maps file containing the IP addresses of nodes that names to their location in the network, hold zones adjoining their own, to enable ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 19. A Survey of Content Distribution Technologies 353 routing between arbitrary points in space. advanced routing metrics, such as connec- Intuitively, routing in CAN works by fol- tion latency and underlying IP topology, lowing the straight line path through the alongside the Cartesian distance between Cartesian space from source to destination source and destination; allowing multiple coordinates. For example, in Figure 6(a), a nodes to share the same zone, mapping the request from node A for a key mapping to same key onto different points; and the ap- point p would be routed though nodes A, plication of caching and replication tech- B, E, along the straight line represented niques [Ratnasamy et al. 2001]. by the arrow. A new node that joins the CAN system is allocated its own portion of the coordinate 3.4.4. Tapestry. Tapestry [Zhao et al. space by splitting the allocated zone of an 2001] supports the location of objects and existing node in half, as follows: the routing of messages to objects (or the closest copy of them, if more than one copy (1) The new node identifies a node al- exist in the network) in a distributed, self- ready in CAN network, using a boot- administering, and fault-tolerant manner, strap mechanism as described in Fran- offering system-wide stability by bypass- cis [2000]. ing failed routes and nodes, and rapidly (2) Using the CAN routing mechanism, adapting communication topologies to cir- the node randomly chooses a point P cumstances. in the coordinate space and sends a The topology of the network is self- JOIN request to the node covering P . organizing as nodes come and go, and net- The zone is then split, and half of it is work latencies vary. The routing and loca- assigned to the new node. tion information is distributed among the (3) The new node builds its routing table network nodes; the topology’s consistency with the IP addresses of its new neigh- is checked on-the-fly, and if it is lost due to bors, and the neighbors of the split failures or destroyed, it is easily rebuilt or zone are also notified to update their refreshed. routing tables to include the new node. Tapestry is based on the location and routing mechanisms introduced by Plax- When nodes gracefully leave CAN, the ton, Rajamaran and Richa [1997], in which zones they occupy and the associated they present the Plaxton mesh, a dis- hash table entries are explicitly handed tributed data structure that allows nodes over to one of their neighbors. Under to locate objects and route messages to normal conditions, a node sends periodic them across an arbitrarily-sized overlay update messages to each of its neighbors network, while using routing maps of reporting its zone coordinates, its list of small and constant size. In the original neighbors, and their zone coordinates. If Plaxton mesh, the nodes can take on the there is prolonged absence of such an up- role of servers (where objects are stored), date message, the neighbor nodes realize routers (which forward messages), and there has been a failure, and initiate a con- clients (origins of requests). trolled takeover mechanism. If many of Each node maintains a neighbor map, the neighbors of a failed node also fail, an as shown in the example in Table IV. The expanding ring search mechanism is ini- neighbor map has multiple levels, each tiated by one of the neighboring nodes, to level l containing pointers to nodes whose identify any functioning nodes outside the ID must be matched with l digits (the x’s failure region. represent wildcards). Each entry in the A list of design improvements is pro- neighbor map corresponds to a pointer to posed over the basic CAN design described the closest node in the network whose ID including the use of multi-dimensional matches the number in the neighbor map, coordinate space, or multiple coordinate up to a digit position. spaces, for improving network latency and For example, the 5th entry for the 3rd fault tolerance; the employment of more level for node 67493 points to the node ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 20. 354 S. Androutsellis-Theotokis and D. Spinellis Table IV. The Neighbor Map Held by Tapestry Node With ID 67493 Level 5 Level 4 Level 3 Level 2 Level 1 Entry 0 07493 x0493 xx093 xxx03 xxxx0 Entry 1 17493 x1493 xx193 xxx13 xxxx1 Entry 2 27493 x2493 xx293 xxx23 xxxx2 Entry 3 37493 x3493 xx393 xxx33 xxxx3 Entry 4 47493 x4493 xx493 xxx43 xxxx4 Entry 5 57493 x5493 xx593 xxx53 xxxx5 Entry 6 67493 x6493 xx693 xxx63 xxxx6 Entry 7 77493 x7493 xx793 xxx73 xxxx7 Entry 8 87493 x8493 xx893 xxx83 xxxx8 Entry 9 97493 x9493 xx993 xxx93 xxxx9 Each entry in this table corresponds to a pointer to another node. closest to 67493 in network distance whose ID ends in ..593. Table IV shows an exam- Fig. 7. Tapestry: Plaxton mesh routing ex- ple neighbor map maintained by a node ample, showing the path taken by a message with ID 67493. originating from node 67493, and destined for Messages are, therefore, incrementally node 34567 in a Plaxton mesh, using decimal routed to the destination node digit-by- digits of length 5. digit, from the right to the left. Figure 7 shows an example path taken by a mes- required for assigning and identifying root sage from node with I D = 67493, to node nodes, and (2) the vulnerability of the root I D = 34567. The digits are resolved right nodes. to left as follows: The Plaxton mesh assumes a static node population. Tapestry extends its design to xxxx7 → xxx67 → xx567 → x4567 adapt it to the transient populations of peer-to-peer networks and provide adapt- → 34567 ability, fault tolerance, as well as vari- ous optimizations described in Zhao et al. The Plaxton mesh uses a root node for [2001] and outlined below: each object, that serves to provide a guar- anteed node from which the object can —Each node additionally maintains a list be located. When an object o is inserted of back-pointers, which point to nodes in the network and stored at node ns , where it is referred to as a neighbor. a root node nr is assigned to it by us- These are used in dynamic node inser- ing a globally consistent deterministic al- tion algorithms to generate the appro- gorithm. A message is then routed from priate neighbor maps for new nodes. Dy- ns to nr , storing data in the form of a namic algorithms are employed for node mapping (object id o, storer id ns ) at all insertion, populating neighbor maps, nodes along the way. During a location and notifying neighbors of new node in- query, messages destined for o are ini- sertions. tially routed towards nr , until a node is —The concept of distance between nodes encountered containing the (o, ns ) location becomes semantically more flexible, and mapping. locations of more than one replica of The Plaxton mesh offers: (1) simple an object are stored, allowing the ap- fault-handling by its potential to route plication architecture to define how the around a single link or node by choos- “closest” node will be interpreted. For ing a node with a similar suffix, and (2) example, in the Oceanstore architecture scalability (with the only bottleneck exist- [Kubiatowicz et al. 2000], a “freshness” ing at the root nodes). Its limitations in- metric is incorporated in the concept clude: (1) the need for global knowledge of distance, that is taken into account ACM Computing Surveys, Vol. 36, No. 4, December 2004.
  • 21. A Survey of Content Distribution Technologies 355 when finding the closest replica of a doc- availability to adapt to regional outages ument. and denial of service attacks. —The use of soft-state to maintain cached Pastry [Rowstron and Druschel 2001] content, based on the announce/listen is a scheme very similar to Tapestry, approach [Deering 1998], is adopted by differing mainly in its approach to Tapestry to detect, circumvent, and re- achieving network locality and object cover from failures in routing or ob- replication. It is employed by the PAST ject location. Caches are periodically [Druschel and Rowstron 2001] large- updated by refreshment messages, or scale persistent peer-to-peer storage purged if no such messages are re- utility. ceived. Additionally, the neighbor map Finally Kademlia [Mayamounkov and is extended to maintain two backup Mazieres 2002] is proposed as an improved neighbors, in addition to the closest (pri- XOR-topology-based routing algorithm mary) neighbor. Furthermore, to avoid similar to Tapestry and Pastry, focusing costly reinsertions of nodes after fail- on consistency and performance. It intro- ures, when a node realizes that a neigh- duces the use of a concurrence parameter, bor is unreachable, instead of removing that lets users trade bandwidth for better its pointer, it temporarily marks it as in- latency and fault recovery. valid in the hope that the failure will be repaired, and, in the meantime, routes 3.4.5. Comparison, Shortcomings and Evo- messages through alternative paths. lution of Structured Architectures. In the —To avoid the single point of failure that previous sections, we have presented root nodes constitute, Tapestry assigns characteristic examples of structured multiple roots to each object. This en- peer-to-peer object location and routing ables a trade-off between reliability and systems, and their main characteristics. redundancy. A distributed algorithm A comparison of these (and similar) sys- called surrogate routing is employed to tems can be based on the size of the compute a unique root node for an object routing information maintained by each in a globally consistent fashion, given node (the routing table), their search the nonstatic set of nodes in the network and retrieval performance (measured in —A set of optimizations improve per- number of hops), and the flexibility with formance by adapting to environment which they can adapt to changing network changes. Tapestry nodes tune their topologies. neighbor pointers by running refresher It turns out that in their basic de- threads that update network latency sign, most of these systems are equiva- values. Algorithms are implemented to lent in terms of routing table space cost, detect query hotspots and offer sugges- which is O(log N ), where N is the size of tions as to where additional copies of network. Similarly, their performance is objects can be placed to significantly im- again mostly O(log N ), with the exception prove query response time. A “hotspot of CAN, where the performance is given by 1 cache” is also maintained at each node. O( d N d ), d being the number of employed 4 dimensions. Tapestry is used by several systems Maintaining the routing table in the such as Oceanstore [Kubiatowicz et al. face of transient node populations is rela- 2000; Rhea et al. 2001], Mnemosyne tively costly for all systems, perhaps more [Hand and Roscoe 2002], and Scan [Chen for Chord, as nodes joining or leaving in- et al. 2000]. duce changes to all other nodes, and less Oceanstore further enhances the per- so in Kademlia which maintains a more formance and fault tolerance of Tapestry flexible routing table. Liben-Nowell et al. by applying an additional “introspec- [2002a] introduce the notion of the “half- tion layer”, where nodes monitor usage life” of a system as an approach to this patterns, network activity, and resource issue. ACM Computing Surveys, Vol. 36, No. 4, December 2004.