P2PIR is one of the an application of peer to peer network. P2PIR combines key elements of File Sharing and Federal Information Retrieval. No single technique is used for all P2PIR problem. Recall and Precision are used for Evaluation of P2PIR.
A field dealing with the structure, analysis, organization, storage, searching and retrieval of information is called information retrieval. And Searching in peer-to-peer networks
is called Peer to Peer Information Retrieval.
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
Peer to Peer Information Retrieval
1. Peer to Peer
Information Retrieval
By, Chetan K. Sundarde
@CHETANSUNDARDE
https://www.linkedin.com/in/chetansundarde
29-Oct-15 1P2PIR
2. Outlines :-
Peer to Peer Network
Information Retrieval
Peer to Peer Information Retrieval (P2PIR)
Peer to peer IR system architectures
Techniques used in IR in P2P networks
Basic algorithms used in P2PIR
Evaluation techniques used P2PIR
Challenges
Conclusion
References 29-Oct-15 2P2PIR
3. Peer To Peer Network
Collection of distributed system
Computers leave and join the network frequently
Each computer acts as a server and a client simultaneously
three tasks that every peer-to-peer network performs
Searching: Querying and getting list of document references.
Locating: Resolve a document reference to concrete
location - full document
Transferring: download the document.
29-Oct-15 3P2PIR
4. Applications of P2P
Information Retrieval
File Sharing
Gnutella, Napster, Bit-torrent, etc.
29-Oct-15 4P2PIR
5. Information Retrieval :-
A field dealing with the structure, analysis, organization,
storage, searching and retrieval of information is called
information retrieval
Search relevant documents, on the basis of user input
Document
collection
Info. need
IR
Retrieval
29-Oct-15 5P2PIR
6. Comparison between File Sharing and
Information Retrieval
File Sharing Information Retrieval
Application Locating Searching
Index
-Content File Identifiers Document Content
-Size Small Large
Data Exchange
-Unit File Search Result
-Size Megabyte+ Kilobyte(small)
29-Oct-15 8
P2PIR- file sharing networks and federated information retrieval
P2PIR
7. Peer to peer Information Retrieval (P2PIR)
Searching in peer-to-peer networks
Each peer shares its information with other peer
Peer searches information by sending queries to its peer
Routed to one or many other peers.
Query result is provide in the form of index
29-Oct-15 9P2PIR
8. Peer to peer IR system architectures
Based on relationship between peers:
o Cooperative system
o Uncooperative system
Based on the network structure
o Centralized network
o Structured architecture
o Unstructured architecture
Based on task perform in P2P network
o Centralized Global Index
o Distributed Global Index
o Strict Local Indices
o Aggregated Local Indices
29-Oct-15 11P2PIR
9. Peer-to-Peer architectures used in IR
29-Oct-15 15
G
G
G
G
G
G
G
G
G
G
L L
L
L
L
L
L
L
L
L
L
L
Central Global Index
Distributed Global Index
Aggregated Local Index Strict Local Index
P2PIR
10. Algorithm used in P2PIR
Statistical IR algorithms
Vector Space Model (VSM)
Document A: “books on computer networks”
Document B: “network routing in P2P networks”
Query Q: “computer network”
Each elements of the vector corresponds to the importance of the
term in the document
Ranking of retrieved documents based Similarity between document
vector and query vector
book
computer
network
routing
vocabulary
0.5
0.5
0.8
0
VA
0
0
0.9
0.6
VB
0
0.5
0.8
0
VQ
0.89 0.72
29-Oct-15P2PIR 16
11. Algorithm used in P2PIR
Statistical IR algorithms
Latent Semantic Indexing (LSI)
documents
terms …..
V’a V’b
semantic vectors
SVD …..
SVD: singular value decomposition
– Reduce dimensionality
– Discover word semantics
Cat <-> Pet
Bus <-> Travel
Va Vb
29-Oct-15 17
P2PIR
12. Algorithm used in P2PIR…
Distributed Hash Table (DHT)
method of hash table lookup over a decentralized distributed network
Key–value pairs are stored in
Kd=hash (“books on computer networks”)
Kq=hash (“computer network”)
the DHT at a parent node. (Structured Architecture)
Any node in the DHT can then efficiently retrieve the value by providing its key.
Napster and BitTorrent
modern DHTs are CAN, Chord, etc.
Extend with Content-Based Search
Full-Text Retrieval
Content-Based Image Retrieval
Content-Based Music Retrieval ,etc.
29-Oct-15 18P2PIR
13. P2P Information Retrieval Techniques
Unstructured
BFS, RBFS,
Eg.
Gnutella
Blind Search
Random
Walk
Blind Search
Routing
Indices
Indexing
Semantic
Searching
Eg. (SON)
Clustering
Structured
pSearch
Clustering
29-Oct-15 19P2PIR
14. Evaluation in P2P IR
Recall (Are all the relevant documents retrieved?)
fraction of the documents that are relevant to the query that are successfully
retrieved
Recall = number of retrieved relevant in answer/ total number of relevant in the
collection.
Precision (Are the retrieved documents relevant?)
fraction of documents retrieved that are relevant to a search query
Precision = number of retrieved relevant in answer/ number of retrieved Measure
retrieved relevant
Relevant Retrieved
29-Oct-15 20P2PIR
15. Evaluation Techniques in P2P IR…
F-Score / F-measure
Harmonic mean of precision and recall.
Hits per Query
average number of distinct relevant documents discovered per search query.
29-Oct-15 21P2PIR
16. Applications Of P2P Information Retrieval
In Real World
YaCy (www.yacy.net)
local index entries are injected into a distributed global index
YaCy uses no centralized servers, but
The resulting decentralized web search currently has about 1.4 billion documents
in its index and more than 600 peer operators contribute each month. About
130,000 search queries are performed with this network each day (Feb 2015)
Faroo (www.faroo.com)
This is a proprietary peer-to-peer search engine that uses a distributed global
index.
They perform distributed crawling and ranking.
Faroo encrypts queries and results for privacy protection.
2 million peers.
Some other P2PIR system: Sixearch, ODISSEA, MINERVA, Seeks, etc.
29-Oct-15 22P2PIR
17. Challenges:-
Cross-Language Information Retrieval
Maintaining index freshness
Security features
Quality of service
Efficient use of resources
Increase range of peer-to-peer network
29-Oct-15 24P2PIR
18. Conclusion :-
P2PIR is one of the application of peer to peer network
P2PIR combines key elements of File Sharing and Federal Information
Retrieval
No single technique is used for all P2PIR problem
Recall and Precision are used for Evaluation of P2PIR
29-Oct-15 25P2PIR
19. References
ALMER S. TIGELAAR, DJOERD HIEMSTRA and DOLF
TRIESCHNIGG “Peer-to-Peer Information Retrieval ”
University of Twente, IEEE PAPER SEPT 2012.
Rasanjalee Dissanayaka Mudiyanselage. “Ontology-based
Search Algorithms over Large- Scale Unstructured Peer-to-
Peer Networks.”Georgia State University, IEEE , OCT 2014
Demetrios Zeinalipour-Yazti . “Information Retrieval in Peer-
to-Peer Systems .” UNIVERSITY OF CALIFORNIA RIVERSIDE,
JUNE, IEEE 2003.
Chengye lu. “Peer to Peer English/Chinese Cross-Language
Information Retrieval.”Queensland University of Technology,
SEPT 2008.
29-Oct-15 26P2PIR
20. References
Xiuqi Li and Jie Wu “Searching Techniques in Peer-to-Peer Networks.”
Florida Atlantic University Boca Raton, FL 33431, 2007
Christos Gkantsidis, Milena Mihail, and Amin Saberi. “Random Walks in
Peer-to-Peer Networks.” Georgia Institute of Technology, Atlanta, GA,
2002.
Taoufik Yeferny, Amel Bouzeghoub and Khedija Arour. “A QUERY
LEARNING ROUTING APPROACH BASED ON SEMANTIC
CLUSTERS.”International Journal of Advanced Information Technology
(IJAIT) Vol. 1, No.6, December 2011
Yulian YANG . “Semantic Information Retrieval over P2P
Networks.”Universit de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F-
69621, France, 2009.
29-Oct-15 27P2PIR
Based on relationship between peers P2PIR architectures are divided into cooporative system and uncooperative system. In this type information regarding document such as resource description, collection statistics and collection index is stored at the central place. Peer can use this information to help there search. In uncooperative system, each peer is independent of each other, do not share any information as cooperative system does.
Again based on network structure p2p system architecture is classified into centralized network and decentralized networks architectures such as structured and de
Based on relationship between peers P2PIR architectures are divided into cooporative system and uncooperative system. In this type information regarding document such as resource description, collection statistics and collection index is stored at the central place. Peer can use this information to help there search. In uncooperative system, each peer is independent of each other, do not share any information as cooperative system does.
Again based on network structure p2p system architecture is classified into centralized network and decentralized networks architectures such as structured and de
Centralized network is mix of traditional client-server architecture and pure p2p architecture.
Single point of failure and scalability are the main issues in CN
Napster and Bit-Torrent is come under the this category.
Unstructured:
In UA, all peer are equal. They all can issuer request, response to other request and route requests to other nodes to locate information.
Gnutella comes under this architecture
[It is clear that flooding-based approaches are effective for finding popular items but the performance
is quite poor for rare items]
Structured:
In this peers are grouped or clustered.
make use of distributed hash table [DHT] abstraction to find queries efficiently
based on task perform in p2p network and index:
Centralized global index,
Distributed global index, strict local index
Querying Processing generating results for a specific query based on an index
techniques used in peer-to-peer information retrieval are adapted from file sharing
networks
Peers with double borders are involved in storing index information
and processing queries. A G symbol indicates a peer stores a part of a global index,
whereas an L symbol indicates a local index
Global Index Local Indices
Centralised Distributed Aggregated Strict
Index
- Construction Central Peer All Peers All Peers All Peers
- Storage Central Peer All Peers (Shared) Super Peers All Peers (Indiv.)
- Mutation Cost? Low High Low None
Query Routing
- Method Direct Forwarding Forwarding Forwarding
- Parties Central Peer Intermediate Peers Super Peers Neighbour Peers
- Complexity O (1) O (logN)y O (Ns 1)z O (N 1)
Query Processing
- Peer Subset Central Only Small Medium Large
- Latency Low Medium Medium High
- Result Set Unit Query Term Query Query
- Result Fusion – Intersect Merge Merge
- Exhaustive Yes Yes No No
Here are some basic algorithm used in query routing
Most of new query routing algorithm used uses statistical Information retrieval algorithms
P2pir borrow some elements from file sharing networks and federal information retrieval
Here are some basic algorithm used in query routing
Most of new query routing algorithm used uses statistical Information retrieval algorithms
P2pir borrow some elements from file sharing networks and federal information retrieval
Gnutella (flooding query such as BFS), .
CAN Content-Addressable Network – network is divided into zone – each zone assign to computer – object key its area.
chord
needed to allow nodes to communicate in ways that support systematically locating a node responsible for a
particular key.
Different types of P2P system employ different retrieval methods.
New techniques are improvement over old techniques
Again this techniques can be divided into blind or inform
-Flooding queries to
whole neighborhood
with fixed TTL
-K nods selected randomly
to forward
query with xed TTL
-The query is rst
forwarded to the most
suited SON and then
flooded within the
SON(Semantic Overlay
Networks (SON)- on semantic content a peer shares)
The best set of neighbors
are selected
based on a goodness
score computed based
on routing indices
A query is routed
through CAN based
on its semantics to
relevant peer. Upon
reaching destination,
query is flooded
within radius r
Evaluation techniquee or measuring the effectiveness of system
precision-recall for 1, 2 ..N answers
a. High precision means few false alarms
b. High recall mean few false dismissals