P2p search engine1. •↓ ↓
• http://code.google.com/p/fujene/
1
2011 5 30
3. •[ ]
(→ )
•
3
2011 5 30
4. •
•
•
•
• etc...
4
2011 5 30
6. • Namazu
• Senna
• Lucene
• Solr
• Hyper Estraier
• ...
6
2011 5 30
8. Fujene( )
• :
•
• P2P
•
•
•
8
2011 5 30
9. • →FARE system
• Fast → ( )
• Autonomous →
• Retrieval →
• Engine →
• system
9
2011 5 30
10. •
•
•
•
•
•
10
2011 5 30
12. •
Content = Title
Content = Body
Appendix = ID
Appendix = URL
Fujene --primary SettingFile
2011 5 30
13. • IP
•
10.0.1.5
Fujene --secondary 10.0.1.5
2011 5 30
15. F
A
B
E
C
D
2011 5 30
16. F
A
B
E
C
D
2011 5 30
17. F
A
B
E
C
D
2011 5 30
20. Chord chain
F
Hash: 0xEF459AB...
A
B
E
C
D
2011 5 30
22. :
1
Node 1: 56%
Node 2:
20%
2
Node 3: 24%
3
2011 5 30
23. :
1
Node 1: 56%
Node 2:
20%
2
Node 3: 24%
3
2011 5 30
24. :
2
3
Node 1 56% 42%
Node 2 20% 32%
1 1
Node 3 24% 26%
3 2
1 3
2 1
1 2 2
1
2011 5 30
25. :
2
3
Node 1 56% 42%
Node 2 20% 32%
1 1
Node 3 24% 26%
3 2
1 3
2 1
1 2 2
1
2011 5 30
26. Indexing
•
•
• ( )
• : Sen(=MeCab)
• : Bi-gram, Uni-gram
•
2011 5 30
28. Indexing
ID: 12345 F
Title: ... A Content
Body: ... RPC/API( ) Term
URL: ... Term
B
E
C
D
2011 5 30
29. Indexing
F
A Content
B Hash
Hash
E
C
D
2011 5 30
30. Indexing
F
A
B
E (replication=3)
C
D
2011 5 30
31. Skip pointer …
Dictionary …
Invert index …
ID
Skip pointer …
Content … Appendix …
2011 5 30
32. (8) Lookup
Skip pointer …
(12) (9) Lookup
Dictionary (10) …
(16) (11)
(13) Lookup
Invert index (14) …
(15) (2) Lookup
Skip pointer …
(7) (3) Lookup
Content (4) Appendix (1) …
(5)
(6)
2011 5 30
33. • :
• Contents ... (1) (7)
• Dictionary ... (8) (12)
• Invert index ... (13) (16)
•
28
2011 5 30
35. Searching
F Term
Analyze
Term A Query
Term
B
E
C
D
2011 5 30
36. Searching
F
A Intersection
ID: 12, 24, 35, 49, ...
ID: 12, 30, 49, 55, ... B
E ID: 7, 12, 30, 49, ...
C
D
2011 5 30
37. Searching
F Output
A
ID: 12
ID: 49 B
E
C
D
2011 5 30
38. Query
Skip pointer …
Dictionary …
Invert index …
ID
Skip pointer …
Content … Appendix …
Output Output
2011 5 30
40. F
beacon A
B
E
C
D
2011 5 30
41. “live”
F
A
B
E
C
D
2011 5 30
42. F ×
A
× ×
B
E
C
D
2011 5 30
43. 6
F ×
A
5 × × 1
B
E
2
C
4
D 3
2011 5 30
44. A B C D E F
4 5 6 1 2 3
5 6 1 2 3 4
6 1 2 3 4 5
3 4 5
2011 5 30
47. Topic:
Index Server Search Server
Node Manager /
Search Gather
Store/Lookup, Query Parser
Memory/Disk Blocks
42
2011 5 30
48. Topic: Intersection
• : r1, r2, ..., rn O(∑ r)
•
r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47
r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50
r3 4 5 6 10 11 12 23 27 30 37 39 41 43 47
2011 5 30
49. Topic: Intersection
1.
2.
2.1.
2.2.
r1 1 4 6 10 12 16 22 29 30 31 37 40 43 47
r2 2 4 6 9 12 14 26 30 32 37 43 44 47 50
r3 4 5 6 10 11 12 23 27 30 37 39 41 43 47
2011 5 30
50. Topic:
MemoryBlockPool
withdraw deposit
…
Skip Pointer Invert Index Content
45
2011 5 30
51. Bibliography(1)
(1) I. Stoica, et al.; Chord: A Scalable Peer-
to-peer Lookup Service for Internet
Applications; SIGCOMM 2001; October
2001
(2) D. Karger, et al.; Consistent Hashing
and Random Trees: Distributed Caching
Protocols for Relieving Hot Spots on the
World Wide Web; STOC ’97; 1997
46
2011 5 30
52. Bibliography(2)
(3) C. D. Manning, et al.; An Introduction to
Information Retrieval; Cambridge UP;
2009
(4) T. Luu, et al.; ALVIS Peers: A Scalable
Full-text Peer-to-Peer Retrieval Engine;
P2PIR ’06; Nov. 2006
47
2011 5 30