9. Introduction
2008
Twitter acquires Summize (MySQL-based RT search engine)
2009
2010
Modified Lucene (Earlybird) ships and replaces MySQL indexes
2011
New Earlybird features: image/video search; index compression;
efficient relevance search in time-sorted index
2012
2013
2014
Tweet archive search on SSD with vanilla Lucene
New RT posting list format that supports arbitrary document
lengths, but keeps performance optimizations for tweets
9
10. Introduction
2008
Twitter acquires Summize (MySQL-based RT search engine)
2009
2010
Modified Lucene (Earlybird) ships and replaces MySQL indexes
2011
New Earlybird features: image/video search; index compression;
efficient relevance search in time-sorted index
2012
2013
2014
Tweet archive search on SSD with vanilla Lucene
New RT posting list format that supports arbitrary document
lengths, but keeps performance optimizations for tweets
10
11. Introduction
2008
Twitter acquires Summize (MySQL-based RT search engine)
2009
2010
Modified Lucene (Earlybird) ships and replaces MySQL indexes
2011
New Earlybird features: image/video search; index compression;
efficient relevance search in time-sorted index
2012
2013
2014
Tweet archive search on SSD with vanilla Lucene
New RT posting list format that supports arbitrary document
lengths, but keeps performance optimizations for tweets
11
12. Introduction
2008
Twitter acquires Summize (MySQL-based RT search engine)
2009
2010
Modified Lucene (Earlybird) ships and replaces MySQL indexes
2011
New Earlybird features: image/video search; index compression;
efficient relevance search in time-sorted index
2012
2013
2014
Tweet archive search on SSD with vanilla Lucene
New RT posting list format that supports arbitrary document
lengths, but keeps performance optimizations for tweets
12
13. Introduction
2008
Twitter acquires Summize (MySQL-based RT search engine)
2009
2010
Modified Lucene (Earlybird) ships and replaces MySQL indexes
2011
New Earlybird features: image/video search; index compression;
efficient relevance search in time-sorted index
2012
2013
2014
Tweet archive search on SSD with vanilla Lucene
New RT posting list format that supports arbitrary document
lengths, but keeps performance optimizations for tweets
13
19. Search Architecture
RT index
RT index
(Earlybird)
• Modified Lucene index implementation optimized for realtime search
• IndexWriter buffer is searchable (no need to flush to allow searching)
• In-memory
• Hash-partitioned, static layout
19
24. Search Architecture
RT index
RT index
(Earlybird)
• Modified Lucene index implementation optimized for realtime search
• IndexWriter buffer is searchable (no need to flush to allow searching)
• In-memory
• Hash-partitioned, static layout
24
31. Search Architecture
Archive
RT index
index
• Two tiers: In-memory and on SSD
In-memory index
Much bigger index with more
tweets, less max. QPS, limited by
SSD IOPS.
Only needs to be queried if inmemory index did not yield
enough results
SSD index
31
38. Inverted Index 101
1
The old night keeper keeps the keep in the town
2
In the big old house in the big old gown.
3
The house in the town had the big old keep
4
Where the old night keeper never did sleep.
5
The night keeper keeps the keep in the night
6
And keeps in the dark and sleeps in the light.
Table with 6 documents
Example from:
Justin Zobel , Alistair Moffat,
Inverted files for text search engines,
ACM Computing Surveys (CSUR)
v.38 n.2, p.6-es, 2006
38
39. Inverted Index 101
1
The old night keeper keeps the keep in the town
2
In the big old house in the big old gown.
3
The house in the town had the big old keep
4
Where the old night keeper never did sleep.
5
The night keeper keeps the keep in the night
6
And keeps in the dark and sleeps in the light.
Table with 6 documents
term
and
big
dark
did
gown
had
house
in
keep
keeper
keeps
light
never
night
old
sleep
sleeps
the
town
where
freq
1
2
1
1
1
1
2
5
3
3
3
1
1
3
4
1
1
6
2
1
<6>
<2> <3>
<6>
<4>
<2>
<3>
<2> <3>
<1> <2> <3> <5> <6>
<1> <3> <5>
<1> <4> <5>
<1> <5> <6>
<6>
<4>
<1> <4> <5>
<1> <2> <3> <4>
<4>
<6>
<1> <2> <3> <4> <5> <6>
<1> <3>
<4>
Dictionary and posting lists
39
40. Inverted Index 101
1
The old night keeper keeps the keep in the town
2
In the big old house in the big old gown.
3
The house in the town had the big old keep
4
Where the old night keeper never did sleep.
5
The night keeper keeps the keep in the night
6
And keeps in the dark and sleeps in the light.
Table with 6 documents
Query: keeper
term
and
big
dark
did
gown
had
house
in
keep
keeper
keeps
light
never
night
old
sleep
sleeps
the
town
where
freq
1
2
1
1
1
1
2
5
3
3
3
1
1
3
4
1
1
6
2
1
<6>
<2> <3>
<6>
<4>
<2>
<3>
<2> <3>
<1> <2> <3> <5> <6>
<1> <3> <5>
<1> <4> <5>
<1> <5> <6>
<6>
<4>
<1> <4> <5>
<1> <2> <3> <4>
<4>
<6>
<1> <2> <3> <4> <5> <6>
<1> <3>
<4>
Dictionary and posting lists
40
41. Inverted Index 101
1
The old night keeper keeps the keep in the town
2
In the big old house in the big old gown.
3
The house in the town had the big old keep
4
Where the old night keeper never did sleep.
5
The night keeper keeps the keep in the night
6
And keeps in the dark and sleeps in the light.
Table with 6 documents
Query: keeper
term
and
big
dark
did
gown
had
house
in
keep
keeper
keeps
light
never
night
old
sleep
sleeps
the
town
where
freq
1
2
1
1
1
1
2
5
3
3
3
1
1
3
4
1
1
6
2
1
<6>
<2> <3>
<6>
<4>
<2>
<3>
<2> <3>
<1> <2> <3> <5> <6>
<1> <3> <5>
<1> <4> <5>
<1> <5> <6>
<6>
<4>
<1> <4> <5>
<1> <2> <3> <4>
<4>
<6>
<1> <2> <3> <4> <5> <6>
<1> <3>
<4>
Dictionary and posting lists
41
44. Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Delta encoding:
5 10 8985
VInt compression:
00000101
2
90998
90
Values 0 <= delta <= 127 need
one byte
44
45. Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Delta encoding:
VInt compression:
5 10 8985
2
90998
90
11000110 00011001
Values 128 <= delta <= 16384
need two bytes
45
46. Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Delta encoding:
VInt compression:
5 10 8985
2
90998
90
11000110 00011001
First bit indicates whether next
byte belongs to the same value
46
47. Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Delta encoding:
VInt compression:
5 10 8985
2
90998
90
11000110 00011001
• Variable number of bytes - a VInt-encoded posting can not be written as a
primitive Java type; therefore it can not be written atomically
47
48. Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Delta encoding:
5 10 8985
2
90998
90
Read direction
• Each posting depends on previous one; decoding only possible in old-to-new
direction
• With recency ranking (new-to-old) no early termination is possible
48
49. Posting list encoding
• By default Lucene uses a combination of delta encoding and VInt
compression
• VInts are expensive to decode
• Problem 1: How to traverse posting lists backwards?
• Problem 2: How to write a posting atomically?
49
52. Posting list encoding in Earlybird v1
int (32 bits)
docID
24 bits
max. 16.7M
textPosition
8 bits
max. 255
• Tweet text can only have 140 chars
52
53. Posting list encoding in Earlybird v1
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Earlybird encoding:
5
15
9000
9002
100000
100090
Read direction
53
54. Early query termination
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Earlybird encoding:
5
15
9000
9002
100000
100090
Read direction
E.g. 3 result are requested: Here
we can terminate after reading 3
postings
54
55. Inverted index components
Posting list storage
?
Dictionary
Parallel arrays
pointer to the most recently
indexed posting for a term
55
56. Inverted index components
Posting list storage
?
Dictionary
Parallel arrays
pointer to the most recently
indexed posting for a term
56
57. Posting lists storage - Objectives
• Store many single-linked lists of different lengths space-efficiently
• The number of java objects should be independent of the number of lists or
number of items in the lists
• Every item should be a possible entry point into the lists for iterators, i.e.
items should not be dependent on other items (e.g. no delta encoding)
• Append and read possible by multiple threads in a lock-free fashion (single
append thread, multiple reader threads)
• Traversal in backwards order
57
60. Memory management
4 int[]
pools
• For simplicity we can forget about the blocks for now and think of the pools
as continuous, unbounded int[] arrays
• Small total number of Java objects (each 32K block is one object)
60
62. Adding and appending to a list
slice size
211
27
available
24
allocated
21
current list
62
63. Adding and appending to a list
slice size
211
27
available
24
allocated
21
current list
Store first two
postings in this slice
63
64. Adding and appending to a list
slice size
211
27
available
24
allocated
21
current list
When first slice is full, allocate another one in second pool
64
65. Adding and appending to a list
slice size
211
27
available
24
allocated
21
current list
Allocate a slice on each level as list grows
65
66. Adding and appending to a list
slice size
211
27
available
24
allocated
21
current list
On upper most level one list can own multiple slices
66
67. Posting list format v1
int (32 bits)
docID
24 bits
max. 16.7M
textPosition
8 bits
max. 255
• Tweet text can only have 140 chars
67
68. Addressing items
• Use 32 bit (int) pointers to address any item in any list unambiguously:
int (32 bits)
poolIndex
2 bits
0-3
sliceIndex
19-29 bits
depends on pool
offset in slice
1-11 bits
depends on pool
• Nice symmetry: Postings and address pointers both fit into a 32 bit int
68
70. Linking the slices
slice size
211
27
available
24
allocated
21
current list
Dictionary
Parallel arrays
pointer to the last posting indexed for a term
70
71. Posting list encoding - Summary
• ints can be written atomically in Java
• Backwards traversal easy on absolute docIDs (not deltas)
• Every posting is a possible entry point for a searcher
• Skipping can be done without additional data structures as binary search,
though there are better approaches (skip lists)
• Repeating docIDs if a term occurs multiple times in the same document only
works for small docs
• Max. segment size: 2^24 = 16.7M tweets
71
72. New posting list encoding
• Objectives:
• 32 bit positions and variable-length payloads
• Store term frequency (TF) instead of repeating docIDs
• Keep:
• Concurrency model
• Space-efficiency for short documents
• Performance
72
77. New posting list encoding
...
DocID, termFreq
DocID, termFreq
DocID, termFreq
Position, Payload
Position, Payload, Position
...
Position, Payload
77
78. New posting list encoding
...
DocID, termFreq
DocID, termFreq
DocID, termFreq
Position, Payload
Position, Payload, Position
...
Position, Payload
• Store TF instead of repeating the same DocID
• Store DocID/TF pairs separately from position/payloads
• Find a way to synchronously decode the two streams without storing a
pointer for each posting (expensive)
78
79. New posting list encoding
...
DocID, termFreq
DocID, termFreq
DocID, termFreq
Position, Payload
Position, Payload, Position
...
Position, Payload
Fixed length for each posting
(32 bits)
• Store TF instead of repeating the same DocID
• Store DocID/TF pairs separately from position/payloads
• Find a way to synchronously decode the two streams without storing a
pointer for each posting (expensive)
79
80. New posting list encoding
• Idea: Use an embedded skip list as periodical “synchronization points”
• Keeps memory overhead for pointers low and improves search performance
80
81. New posting list encoding
slice size
211
27
available
24
allocated
21
current list
81
82. New posting list encoding
Slice header
• Header contains:
• Back-pointer to previous slice (as before)
• Skip list
• Slice id
82
83. New posting list encoding
int (32 bits)
docID
24 bits
max. 16.7M
textPosition
8 bits
max. 255
• Observation: Most tweets don’t need all 8 bits for text position
• Idea: Use the position “inlining” approach for short documents, but support
Lucene’s 32-bit positions and variable length payloads
83
84. New posting list encoding
int (32 bits)
docID
24 bits
max. 16.7M
textPosition
or
termFreq
7 bits
max. 127
0=textPosition
1=termFreq
1 bit
As a storage optimization, the text position is stored with the docID if:
o termFreq == 1 (term occurs once only in the doc) AND
o textPosition <= 127 AND
o Posting has no payload AND
o Posting is not at a skip point of the docID posting list (see later).
84
85. New posting list encoding - Summary
• Support for 32 bit positions and arbitrary length payloads stored in separate
data structure
• Performance and space consumption very similar compared to previous
encoding for tweet search
• Skip lists used for speed and synchronization points
• For short documents positions can still be inlined
85