SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Author: Xiuqi Li and Jie Wu
Presenter: Zia Ush Shamszaman
              ANLAB, ICE, HUFS
   Survey of major searching techniques in peer-to-peer (P2P)
    networks.
   Concept of P2P networks and the methods for classifying
    different P2P networks .
   Various searching techniques in unstructured P2P systems, strictly
    structured P2P systems, and loosely structured P2P systems has
    been discussed.
   Searching in unstructured P2Ps covers both blind search schemes
    and informed search scheme.
   Searching in Strictly structured P2Ps focuses on hierarchical
    Distributed Hash Table (DHT) P2Ps and
   Non-DHT P2Ps and        non-hierarchical DHT P2Ps is briefly
    overviewed.




                                                             2
   P2P networks are overlay networks on top of Internet, where
    nodes are end systems in the Internet and maintain information
    about a set of other nodes (called neighbors) in the P2P.


   P2P networks offer the following benefits
       They do not     require   any   special   administration   or    financial
        arrangements.
       They are self-organized and adaptive. Peers may come and go freely.
        P2P systems handle these events automatically.
       They can gather and harness the tremendous computation and storage
        resources on computers across the Internet.
       They are distributed and decentralized.         Therefore,      they   are
        potentially fault-tolerant and load-balanced.




                                                                        3
   P2P networks can be classified based on the control over
    data location and network topology.
   There are three categories:
       Unstructured:     In an unstructured P2P network such as
        Gnutella, no rule exists which defines where data is stored and
        the network topology is arbitrary.

       Loosely structured: In a loosely structured network such as
        Freenet and Symphony, the overlay structure and the data
        location are not precisely determined.

       Highly structured: In a highly structured P2P network such as
        Chord, both the network architecture and the data placement
        are precisely specified.



                                                              4
   P2P networks can also be classified as centralized and decentralized
       In a centralized P2P such as Napster, a central directory of object location, ID
        assignment, etc. is maintained in a single location.
       Decentralized P2Ps adopt a distributed directory structure.
        These systems can be further divided into
           Purely decentralized systems, such as Gnutella and Chord, peers are totally equal.
           Hybrid systems, some peers called dominating nodes or super-peers serve the search
            request of other regular peers.


   Another classification of P2P systems is hierarchical & non-hierarchical
    based on whether the overlay structure is a hierarchy or not.
       All hybrid systems and few purely decentralized systems such as Kelips, are
        hierarchical systems. Hierarchical systems provide good scalability,
        opportunity to take advantage of node heterogeneity, and high routing
        efficiency
       Most purely decentralized systems have flat overlays and are non-hierarchical
        systems. Non-hierarchical systems offer load-balance and highresilience


                                                                               5
   Searching means locating desired data.
   Most existing P2P systems support thesimple object lookup by key
    or identifier.
   Some existing P2P systems can handle more complex keyword
    queries, which find documents containing keywords in queries.
   More than one copy of an object may exist in a P2P system.
    There may be more than one document that contains desired
    keywords.
   Some P2P systems are interested in a single data item; others are
    interested in all data items or as many data items as possible
    that satisfy a given condition.
   Most searching techniques are forwarding-based. Starting with
    the requesting node, a query is forwarded (or routed) to the
    desired node/s.



                                                            6
 High-quality query results
 Minimal routing state maintained per node
 High routing efficiency
 Load balance
 Resilience to node failures
 Support of complex queries




                                     7
   The quality of query results is application dependent.
   Generally, it is measured by the number of results and
    relevance.

   The routing state refers to the number of neighbors each
    node maintains.
   The routing efficiency is generally measured by the
    number of overlay hops per query.
   In some systems, it is also evaluated using the number of
    messages per query.
   Different searching techniques make different trade-offs
    between these desired characteristics.




                                                     8
Yang and Garcia-Molina borrowed the idea of
iterative deepening from artificial intelligence.

   The querying node periodically issues a sequence of BFS
    searches with increasing depth limits D1 < D2 < … < Di.

   The query is terminated when the query result is satisfied
    or when the maximum depth limit D has been reached.

   All nodes use the same sequence of depth limits called
    Policy: set of depths {0,2,4,5} and,



                                                      11
 Iterative       Deepening – {0, 2, 4, 5}, 3




 Holding Frozen Query    Already Processed Query   Unaware of Query
 Iterative       Deepening – {0, 2, 4, 5}, 3




 Holding Frozen Query    Already Processed Query   Unaware of Query
 Iterative       Deepening – {0, 2, 4, 5}, 3




 Holding Frozen Query    Already Processed Query   Unaware of Query
 Iterative       Deepening – {0, 2, 4, 5}, 3




 Holding Frozen Query    Already Processed Query   Unaware of Query
Standard Random Walker
 Forward the query to a randomly chosen neighbor at each
  step Each message a walker.
     Cut message overhead
     Increase query searching delay(#hops)
k-walkers
 The requesting node sends k query messages and each
  query message takes its own random walk
 Periodically, when a node receives a query, it checks with
  source node to see if query has been satisfied
 k walkers after T steps should reach roughly the same
  number of nodes as 1 walker after kT steps
     So cut delay by a factor of k.
      To decrease delay, increase walkers


                                                    16
Why
 shouldn’t I
find a song? A sends a walker
              to find song.mp3
              that is stored on B
   TTL-based or Hop Count
   Checking: the walker periodically checks with the original
    requestor before walking to the next node (again use a
    large TTL, just to prevent loops)


Experiments show that
    checking once at every 4th step strikes a good balance
    between the overhead of the checking message and the
    benefits of checking




                                                      18
 Directed         Breadth First Search
    Source Only sends queries to good neighbors
    Good neighbors might have
        Produced results in the past
        Low latency
        Lowest hop count for results
            They have good neighbors
        Highest traffic neighbors
            They’re stable
        Shortest message queue
    Routed as normal BFS after first hop
 Directed   Breadth First Search
 Directed   Breadth First Search
Efficient Search - Methods

 Directed   Breadth First Search
 Directed   Breadth First Search
 Directed   Breadth First Search
 Idea is that a node maintains information about
  what files neighbors, and possibly neighbor’s
  neighbors store. “radius of knowledge”
 All nodes know about radius, and know about
  policy
     Policy lists which levels will respond and which will
      ignore messages
     Servents look at TTL/Hops to determine if they
      process or ignore.
   Memory issue of maintaining lists
       Size is far below a megabyte for radius < 5
   Network issue of building list
       Hit from extra packets
Nodes at level 1
                                                         Each node has a radius of
respond with
                                                         2 – knows about the files
information about
                                                         on its neighbors and
levels 1, 2 and 3,                        …              neighbor’s neighbors
and forward to next
level                                                    Policy is that levels 1, 4
                                                   …     and 7 respond


                                                                 …
                               …
                          Searches move to
                      …   levels 2 and 3, which
                          ignore and forward reaches
                                   When search
                                   level 4, it responds with
                                   information about Layers 5 level67
                                                        Finally,
                                                                 and         …
                                   levels 4, 5, and 6, simply ignore and own
                                                       then
                                                        responds with its
                                                       forward.
                                   forwards the messages. and terminates
                                                        data,
                                                        the query.
   Joining a new node: sends a join message with TTL=r and
    all the nodes within r hops update their indices.
   Join message contains the metadata about the joining
    node.
   When a node receives this join message it, in turn, send
    join message containing its meta data directly to the new
    node. New node updates its indices.
   Node dies: Other nodes update their indices based on the
    timeouts.
   Updating the node: When a node updates its collection, his
    node will send out a small update message with TTL= r,
    containing the metadata of the affected item. All nodes
    receiving this message subsequently update their index.
   The objective of a Routing Index (RI) is to allow
    a node to select the “best” neighbors to send a
    query.

   A RI is a data structure that, given a query, retur
    ns a list of neighbors, ranked according to their
    goodness for the query.

   Each node has a local index for quickly finding lo
    cal documents when a query is received. Nodes
    also have a CRI containing
       the number of documents along each path
       the number of documents on each topic



                                                  28
   For A, there are
     100 documents available from B (and its descendents)
       20 belong to Database category

       10 belong to Theory category

       30 belong to Languages category




   “Goodness” of a neighbor
                                     CRI ( s i )
Number   Of Documents
                        i   Number    Of Documents


CRI(si) is the value for the cell at the column for topic si and at the
row for a neighbor
   For documents of “databases” and “languages”
                            20        30
    Goodness ( B )   100                         6
                            100       100
                                  0         50
    Goodness ( C )   1000                                 0
                             1000       1000
                            100       150
    Goodness ( D )   200                             75
                            200        200
New connection




RI propagation                D+A+J




                  D+A+I
   Attenuated Bloom Filters are extensions to bloom filters.
   Bloom filters are often used to approximately and
    efficiently summarize elements in a set.
   Assumes that each stored document has many replicas
    spread over the P2P network.
   Documents are queried by names.
   It intends to quickly find replicas close to the query source
    with high probability.
   This is achieved by approximately
   Summarizing the documents that likely exist in nearby
    nodes.
   However, the approach alone fails to find replicas far away
    from the query source.

                                                        32
   In a strictly structured system, the neighbor relationship
    between peers and data locations is strictly defined.
   Searching in such systems is therefore determined by the
    particular network architecture.
   Among the strictly structured systems, some implement a
    distributed hash table (DHT) using different data
    structures. Others do not provide a DHT interface.
   Some DHT P2Psystems have flat overlay structures; others
    have hierarchical overlay structures.
       A DHT is a hash table whose table entries are distributed among
        different p eers located in arbitrary locations.
       Each data item is hashed to a unique numeric key. Each node is
        also hashed to a unique ID in the same key space



                                                                 33
   Different non-hierarchical DHT P2Ps use different flat data
    structures to implement the DHT.
   These flat data structures include ring, mesh, hypercube, and
    other special graphs such as de Bruijn graph.
   Chord uses a ring data structure
       Node IDs form a ring.
       Each node keeps a finger table that contains the IP addresses of nodes


   Pastry uses a tree-based data structure which can be
    considered as a generalization of a hypercube.
       To shorten the routing latency, each pastry node also keeps a routing
        table of pointers to other nodes in the ID space
   Some other examples are Koorde, Viceroy, and Cycloid etc.


                                                                 34
   All hierarchical DHT P2Ps organize peers into different
    groups or clusters.
   Each group forms its own overlay.
   All groups together form the entire hierarchical overlay.
   Typically the overlay hierarchies are two-tier or three-tier.
   They differ mainly in the number of groups in each tier,
    the overlay structure formed by each group.
   Superpeers/dominating nodes generally contribute more
   computing resources, are more stable, and take more
    responsibility in routing than regular peers.

   Example of this category is Kelips and Coral.


                                                         35
   Kelips is composed of k virtual affinity groups with group
    IDs.
   IP address and port number of a node n is hashed to a
    group ID of the group to which the node n belongs.
   The consistent hashing function SHA-1 provides a good
    balance of group members with high probability.
   Each file name is mapped to a group using the same SHA-1
    function.
   Inside a group, a file is stored in a randomly chosen group
    member, called the file’s homenode.
   Kelips offers load balance in the same group and among
    different groups



                                                       36
Each node n in an affinity group “g” keeps in the memory
the following routing state:
 View of the belonging affinity group g:
       This is the information about the set of nodes in the same group.
        The data includes the roundtrip time estimate, the heartbeat
        count, etc.
   Contacts of all other affinity groups :
       This is the information about a small constant number of nodes in
        all other groups.
   Filetuples:
       This is the intra-group index about the set of files whose
        homenodes are in the same
       affinity group.
       A file tuple consists of a file name and the IP address of the file’s
        homenode. A heartbeat count is also associated with a file tuple.



                                                                   37
   The total number of routing table entries per
    node is-
           N/k+c*(k- 1 )+F/k

Where,
         N: refers to the total number of nodes,
         c: the number of contacts per group,
         F: total number of files in the system, and
         k: the number of affinity groups

F is proportional to N and c is fixed


                                             38
   To look up a file f, the querying node A in the group G
    hashes the file to the file’s belonging group G'.
   If G' is the same as G, the query is resolved by checking the
    node A’s local data store and local intra-group data index.
   Otherwise, A forwards the query to the topologically closest
    contact in group G'.
   On receiving a query request, the contact in the group G'
    searches its local data store and local intra-group data index.
    The IP address of f’s homenode is then returned to the
    querying node directly.
   In case of a file lookup failure, the querying node retries
    using different contacts in the group G’
   Using a random walk in the group G' , or a random walk in
    the group G.


                                                        39
   Coral in is an indexing scheme. It does not dictate how to
    store or replicate data items.
   Objectives of Coral are to avoid hot spots and to find
    nearby data without querying distant nodes.
   A distributed sloppy hash table (DSHT) has been proposed
    to eliminate hot spots.
   In DHT, a key is associated with a single value which is a
    data item or a pointer to a data item.
   In a DSHT, a key is associated with a number of values
    which are pointers to replicas of data items.
   DSHT provides the interface: put(key, value) and get(key).
       put(key,value) stores a value under a key.
       get(key) returns a subset of values under a key.



                                                           40
   When a file replica is stored locally on a node A,
   The node A hashes the file name to a key k and inserts a
    pointer nodeaddr (A’s address) to that file into the DSHT
    by calling put(k,nodeaddr).
   To query for a list of values for a key k, get(k) is forwarded
    in the identifier space.
   Coral organizes nodes into a hierarchy of clusters and puts
    nearby nodes in the same cluster.
       Coral consists of three levels of clusters. In the lowest-level, Level
        2, cover peers located in the same region and have the cluster
        diameter (round-trip time) 30msecs.
       Level 1, cover peers located in the same continent and have the
        cluster diameter 100msecs.
       Level 0, is a single cluster for the entire planet and the cluster
        diameter is infinite.

                                                                    41
   DHT balance load among different nodes, but hashing
    destroys data locality.
   The non-DHT P2Ps try to solve the problems of DHT P2Ps
    by avoiding hashing.
   Hashing does not keep data locality and is not amenable to
    range queries.
   Some non-DHT P2Ps are SkipNet, SkipGraph, and TerraDir.
       SkipNet is designed for storing data close to users.
       SkipGraph is intended for supporting range queries.
       TerraDir is targeted for hierarchical name searches.
   Searching in such systems follows the specified neighboring
   relationships between nodes.



                                                               42
   The overlay SkipNet in supports Content Locality & Path
    Locality. by using a hierarchical naming structure.
       Content locality refers to the fact that a data item is stored close
        to its users.
       Path locality means that routing between the querying node and
        the node responsible for the queried data are within their
        organization.




              Figure: The peer name order and sample routing tables

                                                                      43
   This chapter discusses various searching techniques in peer-
    to-peer networks (P2Ps).
   Clearly, significant progress has been made in the P2P
    research field. But still long way to go..
       First, good benchmarks need to be developed to evaluate the actual
        performance of various techniques.
       Secondly, schemes amenable to complex queries supporting relevance
        ranking, aggregates, or SQL are needed to satisfy the practical
        requirements of P2P users
       Thirdly, security issues have not been addressed by most current
        searching techniques.
       Fourthly, P2P systems are dynamic in nature. Unfortunately existing
        searching techniques can not handle concurrent node join-leave
        gracefully.
       Fifthly, good strategies are needed to form overlays that consider the
        underlying network proximity.
       Sixthly,almost all existing techniques are forwarding-based techniques.
        Recently, a study on non More effort is required to develop good non-
        forwarding techniques and to compare non-forwarding techniques to
        various forwarding techniques

                                                                     45
Thank You
For Your Attention


                46

Weitere ähnliche Inhalte

Was ist angesagt? (12)

10.1.1.150.595
10.1.1.150.59510.1.1.150.595
10.1.1.150.595
 
Iaetsd a secured based information sharing scheme via
Iaetsd a secured based information sharing scheme viaIaetsd a secured based information sharing scheme via
Iaetsd a secured based information sharing scheme via
 
E42062126
E42062126E42062126
E42062126
 
Replica
ReplicaReplica
Replica
 
Ac24204209
Ac24204209Ac24204209
Ac24204209
 
Bcs 052 solved assignment
Bcs 052 solved assignmentBcs 052 solved assignment
Bcs 052 solved assignment
 
Peer to peer Paradigms
Peer to peer ParadigmsPeer to peer Paradigms
Peer to peer Paradigms
 
Nd
NdNd
Nd
 
Kademlia introduction
Kademlia introductionKademlia introduction
Kademlia introduction
 
Bc32356359
Bc32356359Bc32356359
Bc32356359
 
Z02417321735
Z02417321735Z02417321735
Z02417321735
 
Bx32903907
Bx32903907Bx32903907
Bx32903907
 

Andere mochten auch

Day1 search-1
Day1 search-1Day1 search-1
Day1 search-1jmurali
 
Presentazione _agricoltura_intensiva/monocoltura
Presentazione _agricoltura_intensiva/monocolturaPresentazione _agricoltura_intensiva/monocoltura
Presentazione _agricoltura_intensiva/monocolturamartina.scalia
 
Detailed Study of CCTV Cameras
Detailed Study of CCTV CamerasDetailed Study of CCTV Cameras
Detailed Study of CCTV CamerasHans Khanna
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniquesabbas mohd
 
Library Automation A - Z Guide: A Hands on Module
Library Automation A - Z Guide: A Hands on ModuleLibrary Automation A - Z Guide: A Hands on Module
Library Automation A - Z Guide: A Hands on ModuleAshok Kumar Satapathy
 
Data Structures - Searching & sorting
Data Structures - Searching & sortingData Structures - Searching & sorting
Data Structures - Searching & sortingKaushal Shah
 
applications of computer graphics
applications of computer graphicsapplications of computer graphics
applications of computer graphicsAaina Katyal
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Andere mochten auch (10)

Real world internet zia
Real world internet ziaReal world internet zia
Real world internet zia
 
Day1 search-1
Day1 search-1Day1 search-1
Day1 search-1
 
Presentazione _agricoltura_intensiva/monocoltura
Presentazione _agricoltura_intensiva/monocolturaPresentazione _agricoltura_intensiva/monocoltura
Presentazione _agricoltura_intensiva/monocoltura
 
Detailed Study of CCTV Cameras
Detailed Study of CCTV CamerasDetailed Study of CCTV Cameras
Detailed Study of CCTV Cameras
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniques
 
Library Automation A - Z Guide: A Hands on Module
Library Automation A - Z Guide: A Hands on ModuleLibrary Automation A - Z Guide: A Hands on Module
Library Automation A - Z Guide: A Hands on Module
 
Data Structures - Searching & sorting
Data Structures - Searching & sortingData Structures - Searching & sorting
Data Structures - Searching & sorting
 
applications of computer graphics
applications of computer graphicsapplications of computer graphics
applications of computer graphics
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Ähnlich wie 20120412 searching techniques in peer to peer networks

A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search systemAleesha Noushad
 
A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search systemAleesha Noushad
 
02 - Topologies of Distributed Systems
02 - Topologies of Distributed Systems02 - Topologies of Distributed Systems
02 - Topologies of Distributed SystemsDilum Bandara
 
Leveraging social networks for p2 p content based file sharing in disconnecte...
Leveraging social networks for p2 p content based file sharing in disconnecte...Leveraging social networks for p2 p content based file sharing in disconnecte...
Leveraging social networks for p2 p content based file sharing in disconnecte...Papitha Velumani
 
Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...Papitha Velumani
 
QuaP2P P2P Tutorial 2006
QuaP2P P2P Tutorial 2006QuaP2P P2P Tutorial 2006
QuaP2P P2P Tutorial 2006Kalman Graffi
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newchizhangufl
 
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksTextual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksUvaraj Shan
 
Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...Papitha Velumani
 
P2P DOMAIN CLASSIFICATION USING DECISION TREE
P2P DOMAIN CLASSIFICATION USING DECISION TREE P2P DOMAIN CLASSIFICATION USING DECISION TREE
P2P DOMAIN CLASSIFICATION USING DECISION TREE ijp2p
 
Anon p2p slides
Anon p2p slidesAnon p2p slides
Anon p2p slideschintaan
 
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATIONSECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATIONIJNSA Journal
 
Troytech 640 407 ccna edt.2
Troytech 640 407 ccna edt.2Troytech 640 407 ccna edt.2
Troytech 640 407 ccna edt.2rickybcool
 
A QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERS
A QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERSA QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERS
A QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERSijait
 
Peer To Peer File Sharing
Peer To Peer File SharingPeer To Peer File Sharing
Peer To Peer File Sharingsyifa nurjanah
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...Tristan Penman
 
An efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharingAn efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharingambitlick
 

Ähnlich wie 20120412 searching techniques in peer to peer networks (20)

A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search system
 
A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search system
 
02 - Topologies of Distributed Systems
02 - Topologies of Distributed Systems02 - Topologies of Distributed Systems
02 - Topologies of Distributed Systems
 
Leveraging social networks for p2 p content based file sharing in disconnecte...
Leveraging social networks for p2 p content based file sharing in disconnecte...Leveraging social networks for p2 p content based file sharing in disconnecte...
Leveraging social networks for p2 p content based file sharing in disconnecte...
 
Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...
 
QuaP2P P2P Tutorial 2006
QuaP2P P2P Tutorial 2006QuaP2P P2P Tutorial 2006
QuaP2P P2P Tutorial 2006
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
 
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksTextual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
 
Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...Leveraging social networks for p2p content based file sharing in disconnected...
Leveraging social networks for p2p content based file sharing in disconnected...
 
P2P DOMAIN CLASSIFICATION USING DECISION TREE
P2P DOMAIN CLASSIFICATION USING DECISION TREE P2P DOMAIN CLASSIFICATION USING DECISION TREE
P2P DOMAIN CLASSIFICATION USING DECISION TREE
 
new-Modified
new-Modifiednew-Modified
new-Modified
 
Anon p2p slides
Anon p2p slidesAnon p2p slides
Anon p2p slides
 
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATIONSECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
SECURITY CONSIDERATION IN PEER-TO-PEER NETWORKS WITH A CASE STUDY APPLICATION
 
Troytech 640 407 ccna edt.2
Troytech 640 407 ccna edt.2Troytech 640 407 ccna edt.2
Troytech 640 407 ccna edt.2
 
A QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERS
A QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERSA QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERS
A QUERY LEARNING ROUTING APPROACH BASED ON SEMANTIC CLUSTERS
 
Peer To Peer File Sharing
Peer To Peer File SharingPeer To Peer File Sharing
Peer To Peer File Sharing
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
P2p Peer To Peer Introduction
P2p Peer To Peer IntroductionP2p Peer To Peer Introduction
P2p Peer To Peer Introduction
 
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
 
An efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharingAn efficient hybrid peer to-peersystemfordistributeddatasharing
An efficient hybrid peer to-peersystemfordistributeddatasharing
 

20120412 searching techniques in peer to peer networks

  • 1. Author: Xiuqi Li and Jie Wu Presenter: Zia Ush Shamszaman ANLAB, ICE, HUFS
  • 2. Survey of major searching techniques in peer-to-peer (P2P) networks.  Concept of P2P networks and the methods for classifying different P2P networks .  Various searching techniques in unstructured P2P systems, strictly structured P2P systems, and loosely structured P2P systems has been discussed.  Searching in unstructured P2Ps covers both blind search schemes and informed search scheme.  Searching in Strictly structured P2Ps focuses on hierarchical Distributed Hash Table (DHT) P2Ps and  Non-DHT P2Ps and non-hierarchical DHT P2Ps is briefly overviewed. 2
  • 3. P2P networks are overlay networks on top of Internet, where nodes are end systems in the Internet and maintain information about a set of other nodes (called neighbors) in the P2P.  P2P networks offer the following benefits  They do not require any special administration or financial arrangements.  They are self-organized and adaptive. Peers may come and go freely. P2P systems handle these events automatically.  They can gather and harness the tremendous computation and storage resources on computers across the Internet.  They are distributed and decentralized. Therefore, they are potentially fault-tolerant and load-balanced. 3
  • 4. P2P networks can be classified based on the control over data location and network topology.  There are three categories:  Unstructured: In an unstructured P2P network such as Gnutella, no rule exists which defines where data is stored and the network topology is arbitrary.  Loosely structured: In a loosely structured network such as Freenet and Symphony, the overlay structure and the data location are not precisely determined.  Highly structured: In a highly structured P2P network such as Chord, both the network architecture and the data placement are precisely specified. 4
  • 5. P2P networks can also be classified as centralized and decentralized  In a centralized P2P such as Napster, a central directory of object location, ID assignment, etc. is maintained in a single location.  Decentralized P2Ps adopt a distributed directory structure. These systems can be further divided into  Purely decentralized systems, such as Gnutella and Chord, peers are totally equal.  Hybrid systems, some peers called dominating nodes or super-peers serve the search request of other regular peers.  Another classification of P2P systems is hierarchical & non-hierarchical based on whether the overlay structure is a hierarchy or not.  All hybrid systems and few purely decentralized systems such as Kelips, are hierarchical systems. Hierarchical systems provide good scalability, opportunity to take advantage of node heterogeneity, and high routing efficiency  Most purely decentralized systems have flat overlays and are non-hierarchical systems. Non-hierarchical systems offer load-balance and highresilience 5
  • 6. Searching means locating desired data.  Most existing P2P systems support thesimple object lookup by key or identifier.  Some existing P2P systems can handle more complex keyword queries, which find documents containing keywords in queries.  More than one copy of an object may exist in a P2P system. There may be more than one document that contains desired keywords.  Some P2P systems are interested in a single data item; others are interested in all data items or as many data items as possible that satisfy a given condition.  Most searching techniques are forwarding-based. Starting with the requesting node, a query is forwarded (or routed) to the desired node/s. 6
  • 7.  High-quality query results  Minimal routing state maintained per node  High routing efficiency  Load balance  Resilience to node failures  Support of complex queries 7
  • 8. The quality of query results is application dependent.  Generally, it is measured by the number of results and relevance.  The routing state refers to the number of neighbors each node maintains.  The routing efficiency is generally measured by the number of overlay hops per query.  In some systems, it is also evaluated using the number of messages per query.  Different searching techniques make different trade-offs between these desired characteristics. 8
  • 9. Yang and Garcia-Molina borrowed the idea of iterative deepening from artificial intelligence.  The querying node periodically issues a sequence of BFS searches with increasing depth limits D1 < D2 < … < Di.  The query is terminated when the query result is satisfied or when the maximum depth limit D has been reached.  All nodes use the same sequence of depth limits called Policy: set of depths {0,2,4,5} and, 11
  • 10.  Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen Query Already Processed Query Unaware of Query
  • 11.  Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen Query Already Processed Query Unaware of Query
  • 12.  Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen Query Already Processed Query Unaware of Query
  • 13.  Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen Query Already Processed Query Unaware of Query
  • 14. Standard Random Walker  Forward the query to a randomly chosen neighbor at each step Each message a walker.  Cut message overhead  Increase query searching delay(#hops) k-walkers  The requesting node sends k query messages and each query message takes its own random walk  Periodically, when a node receives a query, it checks with source node to see if query has been satisfied  k walkers after T steps should reach roughly the same number of nodes as 1 walker after kT steps  So cut delay by a factor of k.  To decrease delay, increase walkers 16
  • 15. Why shouldn’t I find a song? A sends a walker to find song.mp3 that is stored on B
  • 16. TTL-based or Hop Count  Checking: the walker periodically checks with the original requestor before walking to the next node (again use a large TTL, just to prevent loops) Experiments show that checking once at every 4th step strikes a good balance between the overhead of the checking message and the benefits of checking 18
  • 17.  Directed Breadth First Search  Source Only sends queries to good neighbors  Good neighbors might have  Produced results in the past  Low latency  Lowest hop count for results  They have good neighbors  Highest traffic neighbors  They’re stable  Shortest message queue  Routed as normal BFS after first hop
  • 18.  Directed Breadth First Search
  • 19.  Directed Breadth First Search
  • 20. Efficient Search - Methods  Directed Breadth First Search
  • 21.  Directed Breadth First Search
  • 22.  Directed Breadth First Search
  • 23.  Idea is that a node maintains information about what files neighbors, and possibly neighbor’s neighbors store. “radius of knowledge”  All nodes know about radius, and know about policy  Policy lists which levels will respond and which will ignore messages  Servents look at TTL/Hops to determine if they process or ignore.  Memory issue of maintaining lists  Size is far below a megabyte for radius < 5  Network issue of building list  Hit from extra packets
  • 24. Nodes at level 1 Each node has a radius of respond with 2 – knows about the files information about on its neighbors and levels 1, 2 and 3, … neighbor’s neighbors and forward to next level Policy is that levels 1, 4 … and 7 respond … … Searches move to … levels 2 and 3, which ignore and forward reaches When search level 4, it responds with information about Layers 5 level67 Finally, and … levels 4, 5, and 6, simply ignore and own then responds with its forward. forwards the messages. and terminates data, the query.
  • 25. Joining a new node: sends a join message with TTL=r and all the nodes within r hops update their indices.  Join message contains the metadata about the joining node.  When a node receives this join message it, in turn, send join message containing its meta data directly to the new node. New node updates its indices.  Node dies: Other nodes update their indices based on the timeouts.  Updating the node: When a node updates its collection, his node will send out a small update message with TTL= r, containing the metadata of the affected item. All nodes receiving this message subsequently update their index.
  • 26. The objective of a Routing Index (RI) is to allow a node to select the “best” neighbors to send a query.  A RI is a data structure that, given a query, retur ns a list of neighbors, ranked according to their goodness for the query.  Each node has a local index for quickly finding lo cal documents when a query is received. Nodes also have a CRI containing  the number of documents along each path  the number of documents on each topic 28
  • 27. For A, there are  100 documents available from B (and its descendents)  20 belong to Database category  10 belong to Theory category  30 belong to Languages category  “Goodness” of a neighbor CRI ( s i ) Number Of Documents i Number Of Documents CRI(si) is the value for the cell at the column for topic si and at the row for a neighbor
  • 28. For documents of “databases” and “languages” 20 30 Goodness ( B ) 100 6 100 100 0 50 Goodness ( C ) 1000 0 1000 1000 100 150 Goodness ( D ) 200 75 200 200
  • 30. Attenuated Bloom Filters are extensions to bloom filters.  Bloom filters are often used to approximately and efficiently summarize elements in a set.  Assumes that each stored document has many replicas spread over the P2P network.  Documents are queried by names.  It intends to quickly find replicas close to the query source with high probability.  This is achieved by approximately  Summarizing the documents that likely exist in nearby nodes.  However, the approach alone fails to find replicas far away from the query source. 32
  • 31. In a strictly structured system, the neighbor relationship between peers and data locations is strictly defined.  Searching in such systems is therefore determined by the particular network architecture.  Among the strictly structured systems, some implement a distributed hash table (DHT) using different data structures. Others do not provide a DHT interface.  Some DHT P2Psystems have flat overlay structures; others have hierarchical overlay structures.  A DHT is a hash table whose table entries are distributed among different p eers located in arbitrary locations.  Each data item is hashed to a unique numeric key. Each node is also hashed to a unique ID in the same key space 33
  • 32. Different non-hierarchical DHT P2Ps use different flat data structures to implement the DHT.  These flat data structures include ring, mesh, hypercube, and other special graphs such as de Bruijn graph.  Chord uses a ring data structure  Node IDs form a ring.  Each node keeps a finger table that contains the IP addresses of nodes  Pastry uses a tree-based data structure which can be considered as a generalization of a hypercube.  To shorten the routing latency, each pastry node also keeps a routing table of pointers to other nodes in the ID space  Some other examples are Koorde, Viceroy, and Cycloid etc. 34
  • 33. All hierarchical DHT P2Ps organize peers into different groups or clusters.  Each group forms its own overlay.  All groups together form the entire hierarchical overlay.  Typically the overlay hierarchies are two-tier or three-tier.  They differ mainly in the number of groups in each tier, the overlay structure formed by each group.  Superpeers/dominating nodes generally contribute more  computing resources, are more stable, and take more responsibility in routing than regular peers.  Example of this category is Kelips and Coral. 35
  • 34. Kelips is composed of k virtual affinity groups with group IDs.  IP address and port number of a node n is hashed to a group ID of the group to which the node n belongs.  The consistent hashing function SHA-1 provides a good balance of group members with high probability.  Each file name is mapped to a group using the same SHA-1 function.  Inside a group, a file is stored in a randomly chosen group member, called the file’s homenode.  Kelips offers load balance in the same group and among different groups 36
  • 35. Each node n in an affinity group “g” keeps in the memory the following routing state:  View of the belonging affinity group g:  This is the information about the set of nodes in the same group. The data includes the roundtrip time estimate, the heartbeat count, etc.  Contacts of all other affinity groups :  This is the information about a small constant number of nodes in all other groups.  Filetuples:  This is the intra-group index about the set of files whose homenodes are in the same  affinity group.  A file tuple consists of a file name and the IP address of the file’s homenode. A heartbeat count is also associated with a file tuple. 37
  • 36. The total number of routing table entries per node is- N/k+c*(k- 1 )+F/k Where, N: refers to the total number of nodes, c: the number of contacts per group, F: total number of files in the system, and k: the number of affinity groups F is proportional to N and c is fixed 38
  • 37. To look up a file f, the querying node A in the group G hashes the file to the file’s belonging group G'.  If G' is the same as G, the query is resolved by checking the node A’s local data store and local intra-group data index.  Otherwise, A forwards the query to the topologically closest contact in group G'.  On receiving a query request, the contact in the group G' searches its local data store and local intra-group data index. The IP address of f’s homenode is then returned to the querying node directly.  In case of a file lookup failure, the querying node retries using different contacts in the group G’  Using a random walk in the group G' , or a random walk in the group G. 39
  • 38. Coral in is an indexing scheme. It does not dictate how to store or replicate data items.  Objectives of Coral are to avoid hot spots and to find nearby data without querying distant nodes.  A distributed sloppy hash table (DSHT) has been proposed to eliminate hot spots.  In DHT, a key is associated with a single value which is a data item or a pointer to a data item.  In a DSHT, a key is associated with a number of values which are pointers to replicas of data items.  DSHT provides the interface: put(key, value) and get(key).  put(key,value) stores a value under a key.  get(key) returns a subset of values under a key. 40
  • 39. When a file replica is stored locally on a node A,  The node A hashes the file name to a key k and inserts a pointer nodeaddr (A’s address) to that file into the DSHT by calling put(k,nodeaddr).  To query for a list of values for a key k, get(k) is forwarded in the identifier space.  Coral organizes nodes into a hierarchy of clusters and puts nearby nodes in the same cluster.  Coral consists of three levels of clusters. In the lowest-level, Level 2, cover peers located in the same region and have the cluster diameter (round-trip time) 30msecs.  Level 1, cover peers located in the same continent and have the cluster diameter 100msecs.  Level 0, is a single cluster for the entire planet and the cluster diameter is infinite. 41
  • 40. DHT balance load among different nodes, but hashing destroys data locality.  The non-DHT P2Ps try to solve the problems of DHT P2Ps by avoiding hashing.  Hashing does not keep data locality and is not amenable to range queries.  Some non-DHT P2Ps are SkipNet, SkipGraph, and TerraDir.  SkipNet is designed for storing data close to users.  SkipGraph is intended for supporting range queries.  TerraDir is targeted for hierarchical name searches.  Searching in such systems follows the specified neighboring  relationships between nodes. 42
  • 41. The overlay SkipNet in supports Content Locality & Path Locality. by using a hierarchical naming structure.  Content locality refers to the fact that a data item is stored close to its users.  Path locality means that routing between the querying node and the node responsible for the queried data are within their organization. Figure: The peer name order and sample routing tables 43
  • 42. This chapter discusses various searching techniques in peer- to-peer networks (P2Ps).  Clearly, significant progress has been made in the P2P research field. But still long way to go..  First, good benchmarks need to be developed to evaluate the actual performance of various techniques.  Secondly, schemes amenable to complex queries supporting relevance ranking, aggregates, or SQL are needed to satisfy the practical requirements of P2P users  Thirdly, security issues have not been addressed by most current searching techniques.  Fourthly, P2P systems are dynamic in nature. Unfortunately existing searching techniques can not handle concurrent node join-leave gracefully.  Fifthly, good strategies are needed to form overlays that consider the underlying network proximity.  Sixthly,almost all existing techniques are forwarding-based techniques. Recently, a study on non More effort is required to develop good non- forwarding techniques and to compare non-forwarding techniques to various forwarding techniques 45
  • 43. Thank You For Your Attention 46

Hinweis der Redaktion

  1. * In Freenet, both the overlay topology and the data location are determined based on hints. The network topology eventually evolves into some intended structure.* In Symphony, the overlay topology is determined probabilistically but the data location is defined precisely* Highly structured: The neighbors of a node are well-defined. The data is stored in a welldefined location
  2. Centralized P2Ps do not scale well and the central directory server causes single point of failure.-In Hybrid dominating nodes and superpeers have to be carefully selected to avoid single points of failure and service bottlenecks.
  3. highly structured P2P systems provide guarantees on finding existing data and bounded data lookup efficiency in terms of the number of overlay hops; however, the strict network structure imposes high overhead for handling frequent node join-leave.
  4. In an unstructured P2P system, no rule exists that strictly defines where data is stored and which nodes are neighbors of each other.Gnutella [15] used flooding, which is the Breadth First Search (BFS).
  5. 16 to 64 walkers give good results