SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Downloaden Sie, um offline zu lesen
Search @twitter 
Michael Busch 
@michibusch 
michael@twitter.com 
buschmi@apache.org
Search @twitter 
Agenda 
‣ Introduction 
- Search Architecture 
- Lucene Extensions 
- Outlook
Introduction
Introduction 
Twitter has more than 284 million 
monthly active users.
Introduction 
500 million tweets are sent per day.
Introduction 
More than 300 billion tweets have been 
sent since company founding in 2006.
Introduction 
Tweets-per-second record: 
one-second peak of 143,199 TPS.
Introduction 
More than 2 billion search queries per 
day.
Search @twitter 
Agenda 
- Introduction 
‣ Search Architecture 
- Lucene Extensions 
- Outlook
Search Architecture
RT index 
Search Architecture 
RT stream 
Analyzer/ 
Partitioner 
RT index 
(Earlybird) 
Blender 
Archive 
index 
RT index 
Mapreduce 
Analyzer 
raw 
tweets 
Tweet archive 
HDFS 
Search 
requests 
writes 
searches 
analyzed 
tweets 
analyzed 
tweets 
raw 
tweets
RT index 
Search Architecture 
Tweets 
Analyzer/ 
Partitioner 
RT index 
(Earlybird) 
Blender 
Archive 
index 
RT index 
queue 
HDFS 
Search 
requests 
Updates Deletes/ 
Engagement (e.g. retweets/favs) 
writes 
searches 
Mapreduce 
Analyzer
RT index 
Search Architecture 
RT index 
(Earlybird) 
Social 
graph Social 
Blender 
Archive 
index 
RT index 
User 
search 
Search 
requests 
writes 
searches 
• Blender is our Thrift 
service aggregator 
• Queries multiple 
Earlybirds, merges results 
Social 
graph 
graph
Search Architecture 
RT index 
(Earlybird) 
Archive 
index 
User 
search
Search Architecture 
RT index 
(Earlybird) 
Archive 
index 
• For historic reasons, these used 
to be entirely different codebases, 
but had similar features/ 
technologies 
• Over time cross-dependencies 
were introduced to share code 
User 
search 
Lucene
Search Architecture 
RT index 
(Earlybird) 
Archive 
index 
User 
search 
Lucene 
Extensions 
Lucene 
• New Lucene extension package 
• This package is truly generic and 
has no dependency on an actual 
product/index 
• It contains Twitter’s extensions for 
real-time search, a thin segment 
management layer and other 
features
Search @twitter 
Agenda 
- Introduction 
- Search Architecture 
‣ Lucene Extensions 
- Outlook
Lucene Extensions
Lucene Extension Library 
• Abstraction layer for Lucene index segments 
• Real-time writer for in-memory index segments 
• Schema-based Lucene document factory 
• Real-time faceting
Lucene Extension Library 
• API layer for Lucene segments 
• *IndexSegmentWriter 
• *IndexSegmentAtomicReader 
• Two implementations 
• In-memory: RealtimeIndexSegmentWriter (and reader) 
• On-disk: LuceneIndexSegmentWriter (and reader)
Lucene Extension Library 
• IndexSegments can be built ... 
• in realtime 
• on Mesos or Hadoop (Mapreduce) 
• locally on serving machines 
• Cluster-management code that deals with IndexSegments 
• Share segments across serving machines using HDFS 
• Can rebuild segments (e.g. to upgrade Lucene version, change data 
schema, etc.)
Lucene Extension Library 
HDFS EEEaararlyrlylbybbirirdirdd 
Mesos 
Hadoop (MR) 
RT pipeline
RealtimeIndexSegmentWriter 
• Modified Lucene index implementation optimized for realtime search 
• IndexWriter buffer is searchable (no need to flush to allow searching) 
• In-memory 
• Lock-free concurrency model for best performance
Concurrency - Definitions 
• Pessimistic locking 
• A thread holds an exclusive lock on a resource, while an action is 
performed [mutual exclusion] 
• Usually used when conflicts are expected to be likely 
• Optimistic locking 
• Operations are tried to be performed atomically without holding a lock; 
conflicts can be detected; retry logic is often used in case of conflicts 
• Usually used when conflicts are expected to be the exception
Concurrency - Definitions 
• Non-blocking algorithm 
Ensures, that threads competing for shared resources do not have their 
execution indefinitely postponed by mutual exclusion. 
• Lock-free algorithm 
A non-blocking algorithm is lock-free if there is guaranteed system-wide 
progress. 
• Wait-free algorithm 
A non-blocking algorithm is wait-free, if there is guaranteed per-thread 
progress. 
* Source: Wikipedia
Concurrency 
• Having a single writer thread simplifies our problem: no locks have to be used 
to protect data structures from corruption (only one thread modifies data) 
• But: we have to make sure that all readers always see a consistent state of 
all data structures -> this is much harder than it sounds! 
• In Java, it is not guaranteed that one thread will see changes that another 
thread makes in program execution order, unless the same memory barrier is 
crossed by both threads -> safe publication 
• Safe publication can be achieved in different, subtle ways. Read the great 
book “Java concurrency in practice” by Brian Goetz for more information!
Java Memory Model 
• Program order rule 
Each action in a thread happens-before every action in that thread that comes 
later in the program order. 
• Volatile variable rule 
A write to a volatile field happens-before every subsequent read of that same 
field. 
• Transitivity 
If A happens-before B, and B happens-before C, then A happens-before C. 
* Source: Brian Goetz: Java Concurrency in Practice
Concurrency 
RAM 0 
int x; 
Cache 
Thread 1 Thread 2 
time
Concurrency 
Cache 5 
RAM 0 
int x; 
Thread 1 Thread 2 
x = 5; 
Thread A writes x=5 to cache 
time
Concurrency 
Cache 5 
RAM 0 
int x; 
Thread 1 Thread 2 
x = 5; 
time while(x != 5); 
This condition will likely 
never become false!
Concurrency 
RAM 0 
int x; 
Cache 
Thread 1 Thread 2 
time
Concurrency 
RAM 0 
int x; 
Thread A writes b=1 to RAM, 
because b is volatile 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1;
Concurrency 
RAM 0 
int x; 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1; 
Read volatile b 
int dummy = b; 
while(x != 5);
Concurrency 
RAM 0 
int x; 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
happens-before 
• Program order rule: Each action in a thread happens-before every action in 
that thread that comes later in the program order.
Concurrency 
RAM 0 
int x; 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
happens-before 
• Volatile variable rule: A write to a volatile field happens-before every 
subsequent read of that same field.
Concurrency 
RAM 0 
int x; 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
happens-before 
• Transitivity: If A happens-before B, and B happens-before C, then A 
happens-before C.
Concurrency 
RAM 0 
int x; 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
This condition will be 
false, i.e. x==5 
• Note: x itself doesn’t have to be volatile. There can be many variables like x, 
but we need only a single volatile field.
Concurrency 
RAM 0 
int x; 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
Memory barrier 
• Note: x itself doesn’t have to be volatile. There can be many variables like x, 
but we need only a single volatile field.
Demo
Concurrency 
RAM 0 
int x; 
5 x = 5; 
1 
Cache 
Thread 1 Thread 2 
time 
volatile int b; 
b = 1; 
int dummy = b; 
while(x != 5); 
Memory barrier 
• Note: x itself doesn’t have to be volatile. There can be many variables like x, 
but we need only a single volatile field.
Concurrency 
IndexWriter IndexReader 
time 
write 100 docs 
maxDoc = 100 
in IR.open(): read maxDoc 
search upto maxDoc 
write more docs 
maxDoc is volatile
Concurrency 
IndexWriter IndexReader 
time 
write 100 docs 
maxDoc = 100 
in IR.open(): read maxDoc 
search upto maxDoc 
write more docs 
maxDoc is volatile 
happens-before 
• Only maxDoc is volatile. All other fields that IW writes to and IR reads from 
don’t need to be!
Wait-free 
• Not a single exclusive lock 
• Writer thread can always make progress 
• Optimistic locking (retry-logic) in a few places for searcher thread 
• Retry logic very simple and guaranteed to always make progress
In-memory Real-time Index 
• Highly optimized for GC - all data is stored in blocked native arrays 
• v1: Optimized for tweets with a term position limit of 255 
• v2: Support for 32 bit positions without performance degradation 
• v2: Basic support for out-of-order posting list inserts
In-memory Real-time Index 
• Highly optimized for GC - all data is stored in blocked native arrays 
• v1: Optimized for tweets with a term position limit of 255 
• v2: Support for 32 bit positions without performance degradation 
• v2: Basic support for out-of-order posting list inserts
In-memory Real-time Index 
• RT term dictionary 
• Term lookups using a lock-free hashtable in O(1) 
• v2: Additional probabilistic, lock-free skip list maintains ordering on terms 
• Perfect skip list not an option: out-of-order inserts would require 
rebalancing, which is impractical with our lock-free index 
• In a probabilistic skip list the tower height of a new (out-of-order) item can 
be determined without knowing its insert position by simply rolling a dice
In-memory Real-time Index 
• Perfect skip list
In-memory Real-time Index 
• Perfect skip list 
Inserting a new element in the middle of this 
skip list requires re-balancing the towers.
In-memory Real-time Index 
• Probabilistic skip list
In-memory Real-time Index 
• Probabilistic skip list Tower height determined by rolling a dice 
BEFORE knowing the insert location; tower height 
never has to change for an element, simplifying 
memory allocation and concurrency.
Schema-based Document factory 
• Apps provide one ThriftSchema per index and create a ThriftDocument for 
each document 
• SchemaDocumentFactory translates ThriftDocument -> Lucene Document 
using the Schema 
• Default field values 
• Extended field settings 
• Type-system on top of DocValues 
• Validation
Schema-based Document factory 
Schema 
Lucene 
Document 
SchemaDocument 
Factory 
Thrift 
Document 
• Validation 
• Fill in default values 
• Apply correct Lucene 
field settings
Schema-based Document factory 
Schema 
Lucene 
Document 
SchemaDocument 
Factory 
Thrift 
Document 
• Validation 
• Fill in default values 
• Apply correct Lucene 
field settings 
Decouples core package from 
specific product/index. Similar 
to Solr/ElasticSearch.
Search @twitter 
Agenda 
- Introduction 
- Search Architecture 
- Lucene Extensions 
‣ Outlook
Outlook
Outlook 
• Support for parallel (sliced) segments to support partial segment rebuilds 
and other cool posting list update patterns 
• Add remaining missing Lucene features to RT index 
• Index term statistics for ranking 
• Term vectors 
• Stored fields
Questions? 
Michael Busch 
@michibusch 
michael@twitter.com 
buschmi@apache.org
Backup Slides
Searching for top entities within Tweets 
• Task: Find the best photos in a subset of tweets 
• We could use a Lucene index, where each photo is a document 
• Problem: How to update existing documents when the same photos are 
tweeted again? 
• In-place posting list updates are hard 
• Lucene’s updateDocument() is a delete/add operation - expensive and not 
order-preserving
Searching for top entities within Tweets 
• Task: Find the best photos in a subset of tweets 
• Could we use our existing time-ordered tweet index? 
• Facets!
Searching for top entities within Tweets 
Query Doc ids 
Inverted 
index 
Term id Term label 
Forward 
Doc id index Document 
Metadata 
Facet 
index 
Doc id Term ids
Storing tweet metadata 
Facet 
Doc id index Term ids
5 15 9000 9002 100000 100090 
Matching 
doc id 
Facet 
index 
Term ids 
Top-k heap 
Id Count 
48239 8 
31241 2 
Query 
Searching for top entities within Tweets
5 15 9000 9002 100000 100090 
Matching 
doc id 
Facet 
index 
Term ids 
Top-k heap 
Id Count 
48239 15 
31241 12 
85932 8 
6748 3 
Query 
Searching for top entities within Tweets
Searching for top entities within Tweets 
5 15 9000 9002 100000 100090 
Matching 
doc id 
Facet 
index 
Term ids 
Top-k heap 
Id Count 
48239 15 
31241 12 
85932 8 
6748 3 
Query 
Weighted counts (from 
engagement features) used 
for relevance scoring
Searching for top entities within Tweets 
5 15 9000 9002 100000 100090 
Matching 
doc id 
Facet 
index 
Term ids 
Top-k heap 
Id Count 
48239 15 
31241 12 
85932 8 
6748 3 
Query 
All query operators can be 
used. E.g. find best photos in 
San Francisco tweeted by 
people I follow
Searching for top entities within Tweets 
Inverted 
Term id index Term label
Searching for top entities within Tweets 
Id Count Label Count 
pic.twitter.com/jknui4w 45 
pic.twitter.com/dslkfj83 23 
pic.twitter.com/acm3ps 15 
pic.twitter.com/948jdsd 11 
pic.twitter.com/dsjkf15h 8 
pic.twitter.com/irnsoa32 5 
48239 45 
31241 23 
85932 15 
6748 11 
74294 8 
3728 5 
Inverted 
index
Summary 
• Indexing tweet entities (e.g. photos) as facets allows to search and rank top-entities 
using a tweets index 
• All query operators supported 
• Documents don’t need to be reindexed 
• Approach reusable for different use cases, e.g.: best vines, hashtags, 
@mentions, etc.

Weitere ähnliche Inhalte

Was ist angesagt?

Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalSpark Summit
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchSigmoid
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...Lucidworks
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muirlucenerevolution
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search PerformanceLucidworks (Archived)
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkagezouzias
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...Lucidworks
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPSujit Pal
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Spark Summit
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Spark Summit
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Databricks
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 

Was ist angesagt? (20)

Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit Pal
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Improved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert MuirImproved Search with Lucene 4.0 - Robert Muir
Improved Search with Lucene 4.0 - Robert Muir
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
LuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity LinkageLuceneRDD for (Geospatial) Search and Entity Linkage
LuceneRDD for (Geospatial) Search and Entity Linkage
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 

Andere mochten auch

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Realtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael BuschRealtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael Buschlucenerevolution
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)Knoldus Inc.
 
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Grid Dynamics
 
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, LucidworksThis Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, LucidworksLucidworks
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Lucidworks
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucidworks
 
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...Lucidworks
 
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteLucidworks
 
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will HayesLucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will HayesLucidworks
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageGreg Brown
 
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyLucidworks
 
MongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game DataMongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game DataValeri Karpov
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeleylucenerevolution
 
Webinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and RelevanceWebinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and RelevanceLucidworks
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoLucidworks
 
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Lucidworks
 

Andere mochten auch (20)

Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Realtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael BuschRealtime Search at Twitter - Michael Busch
Realtime Search at Twitter - Michael Busch
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)Type-Safe MongoDB query (Lift Rogue query)
Type-Safe MongoDB query (Lift Rogue query)
 
11 lucene
11 lucene11 lucene
11 lucene
 
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
Faceting with Lucene Block Join Query - Lucene/Solr Revolution 2014
 
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, LucidworksThis Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David SmileyLucene/Solr Spatial in 2015: Presented by David Smiley
Lucene/Solr Spatial in 2015: Presented by David Smiley
 
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
 
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, EvernoteSearch Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
 
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will HayesLucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
 
A Survey of Elasticsearch Usage
A Survey of Elasticsearch UsageA Survey of Elasticsearch Usage
A Survey of Elasticsearch Usage
 
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
 
MongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game DataMongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game Data
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
 
Webinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and RelevanceWebinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and Relevance
 
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoParallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
Parallel Computing with SolrCloud: Presented by Joel Bernstein, Alfresco
 
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
 

Ähnlich wie Search at Twitter: Presented by Michael Busch, Twitter

Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPrashant Rane
 
Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015Alex Blewitt
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrencyAlex Miller
 
Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"Fwdays
 
London devops logging
London devops loggingLondon devops logging
London devops loggingTomas Doran
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Verification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAVerification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAUniversität Rostock
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Charles Nutter
 
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Iffat Anjum
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 

Ähnlich wie Search at Twitter: Presented by Michael Busch, Twitter (20)

Pune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCDPune-Cocoa: Blocks and GCD
Pune-Cocoa: Blocks and GCD
 
Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015Swift 2 Under the Hood - Gotober 2015
Swift 2 Under the Hood - Gotober 2015
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Groovy concurrency
Groovy concurrencyGroovy concurrency
Groovy concurrency
 
Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"Игорь Фесенко "Direction of C# as a High-Performance Language"
Игорь Фесенко "Direction of C# as a High-Performance Language"
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Ehcache 3 @ BruJUG
Ehcache 3 @ BruJUGEhcache 3 @ BruJUG
Ehcache 3 @ BruJUG
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Freckle
FreckleFreckle
Freckle
 
Verification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLAVerification with LoLA: 4 Using LoLA
Verification with LoLA: 4 Using LoLA
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014
 
.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves.NET UY Meetup 7 - CLR Memory by Fabian Alves
.NET UY Meetup 7 - CLR Memory by Fabian Alves
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 

Mehr von Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Search at Twitter: Presented by Michael Busch, Twitter

  • 1. Search @twitter Michael Busch @michibusch michael@twitter.com buschmi@apache.org
  • 2. Search @twitter Agenda ‣ Introduction - Search Architecture - Lucene Extensions - Outlook
  • 3.
  • 5. Introduction Twitter has more than 284 million monthly active users.
  • 6. Introduction 500 million tweets are sent per day.
  • 7. Introduction More than 300 billion tweets have been sent since company founding in 2006.
  • 8. Introduction Tweets-per-second record: one-second peak of 143,199 TPS.
  • 9. Introduction More than 2 billion search queries per day.
  • 10. Search @twitter Agenda - Introduction ‣ Search Architecture - Lucene Extensions - Outlook
  • 11.
  • 13. RT index Search Architecture RT stream Analyzer/ Partitioner RT index (Earlybird) Blender Archive index RT index Mapreduce Analyzer raw tweets Tweet archive HDFS Search requests writes searches analyzed tweets analyzed tweets raw tweets
  • 14. RT index Search Architecture Tweets Analyzer/ Partitioner RT index (Earlybird) Blender Archive index RT index queue HDFS Search requests Updates Deletes/ Engagement (e.g. retweets/favs) writes searches Mapreduce Analyzer
  • 15. RT index Search Architecture RT index (Earlybird) Social graph Social Blender Archive index RT index User search Search requests writes searches • Blender is our Thrift service aggregator • Queries multiple Earlybirds, merges results Social graph graph
  • 16. Search Architecture RT index (Earlybird) Archive index User search
  • 17. Search Architecture RT index (Earlybird) Archive index • For historic reasons, these used to be entirely different codebases, but had similar features/ technologies • Over time cross-dependencies were introduced to share code User search Lucene
  • 18. Search Architecture RT index (Earlybird) Archive index User search Lucene Extensions Lucene • New Lucene extension package • This package is truly generic and has no dependency on an actual product/index • It contains Twitter’s extensions for real-time search, a thin segment management layer and other features
  • 19. Search @twitter Agenda - Introduction - Search Architecture ‣ Lucene Extensions - Outlook
  • 20.
  • 22. Lucene Extension Library • Abstraction layer for Lucene index segments • Real-time writer for in-memory index segments • Schema-based Lucene document factory • Real-time faceting
  • 23. Lucene Extension Library • API layer for Lucene segments • *IndexSegmentWriter • *IndexSegmentAtomicReader • Two implementations • In-memory: RealtimeIndexSegmentWriter (and reader) • On-disk: LuceneIndexSegmentWriter (and reader)
  • 24. Lucene Extension Library • IndexSegments can be built ... • in realtime • on Mesos or Hadoop (Mapreduce) • locally on serving machines • Cluster-management code that deals with IndexSegments • Share segments across serving machines using HDFS • Can rebuild segments (e.g. to upgrade Lucene version, change data schema, etc.)
  • 25. Lucene Extension Library HDFS EEEaararlyrlylbybbirirdirdd Mesos Hadoop (MR) RT pipeline
  • 26. RealtimeIndexSegmentWriter • Modified Lucene index implementation optimized for realtime search • IndexWriter buffer is searchable (no need to flush to allow searching) • In-memory • Lock-free concurrency model for best performance
  • 27. Concurrency - Definitions • Pessimistic locking • A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion] • Usually used when conflicts are expected to be likely • Optimistic locking • Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts • Usually used when conflicts are expected to be the exception
  • 28. Concurrency - Definitions • Non-blocking algorithm Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion. • Lock-free algorithm A non-blocking algorithm is lock-free if there is guaranteed system-wide progress. • Wait-free algorithm A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress. * Source: Wikipedia
  • 29. Concurrency • Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data) • But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds! • In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication • Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!
  • 30. Java Memory Model • Program order rule Each action in a thread happens-before every action in that thread that comes later in the program order. • Volatile variable rule A write to a volatile field happens-before every subsequent read of that same field. • Transitivity If A happens-before B, and B happens-before C, then A happens-before C. * Source: Brian Goetz: Java Concurrency in Practice
  • 31. Concurrency RAM 0 int x; Cache Thread 1 Thread 2 time
  • 32. Concurrency Cache 5 RAM 0 int x; Thread 1 Thread 2 x = 5; Thread A writes x=5 to cache time
  • 33. Concurrency Cache 5 RAM 0 int x; Thread 1 Thread 2 x = 5; time while(x != 5); This condition will likely never become false!
  • 34. Concurrency RAM 0 int x; Cache Thread 1 Thread 2 time
  • 35. Concurrency RAM 0 int x; Thread A writes b=1 to RAM, because b is volatile 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1;
  • 36. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; Read volatile b int dummy = b; while(x != 5);
  • 37. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); happens-before • Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
  • 38. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); happens-before • Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.
  • 39. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); happens-before • Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
  • 40. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); This condition will be false, i.e. x==5 • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 41. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); Memory barrier • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 42.
  • 43. Demo
  • 44. Concurrency RAM 0 int x; 5 x = 5; 1 Cache Thread 1 Thread 2 time volatile int b; b = 1; int dummy = b; while(x != 5); Memory barrier • Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
  • 45. Concurrency IndexWriter IndexReader time write 100 docs maxDoc = 100 in IR.open(): read maxDoc search upto maxDoc write more docs maxDoc is volatile
  • 46. Concurrency IndexWriter IndexReader time write 100 docs maxDoc = 100 in IR.open(): read maxDoc search upto maxDoc write more docs maxDoc is volatile happens-before • Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!
  • 47. Wait-free • Not a single exclusive lock • Writer thread can always make progress • Optimistic locking (retry-logic) in a few places for searcher thread • Retry logic very simple and guaranteed to always make progress
  • 48. In-memory Real-time Index • Highly optimized for GC - all data is stored in blocked native arrays • v1: Optimized for tweets with a term position limit of 255 • v2: Support for 32 bit positions without performance degradation • v2: Basic support for out-of-order posting list inserts
  • 49. In-memory Real-time Index • Highly optimized for GC - all data is stored in blocked native arrays • v1: Optimized for tweets with a term position limit of 255 • v2: Support for 32 bit positions without performance degradation • v2: Basic support for out-of-order posting list inserts
  • 50. In-memory Real-time Index • RT term dictionary • Term lookups using a lock-free hashtable in O(1) • v2: Additional probabilistic, lock-free skip list maintains ordering on terms • Perfect skip list not an option: out-of-order inserts would require rebalancing, which is impractical with our lock-free index • In a probabilistic skip list the tower height of a new (out-of-order) item can be determined without knowing its insert position by simply rolling a dice
  • 51. In-memory Real-time Index • Perfect skip list
  • 52. In-memory Real-time Index • Perfect skip list Inserting a new element in the middle of this skip list requires re-balancing the towers.
  • 53. In-memory Real-time Index • Probabilistic skip list
  • 54. In-memory Real-time Index • Probabilistic skip list Tower height determined by rolling a dice BEFORE knowing the insert location; tower height never has to change for an element, simplifying memory allocation and concurrency.
  • 55. Schema-based Document factory • Apps provide one ThriftSchema per index and create a ThriftDocument for each document • SchemaDocumentFactory translates ThriftDocument -> Lucene Document using the Schema • Default field values • Extended field settings • Type-system on top of DocValues • Validation
  • 56. Schema-based Document factory Schema Lucene Document SchemaDocument Factory Thrift Document • Validation • Fill in default values • Apply correct Lucene field settings
  • 57. Schema-based Document factory Schema Lucene Document SchemaDocument Factory Thrift Document • Validation • Fill in default values • Apply correct Lucene field settings Decouples core package from specific product/index. Similar to Solr/ElasticSearch.
  • 58. Search @twitter Agenda - Introduction - Search Architecture - Lucene Extensions ‣ Outlook
  • 59.
  • 61. Outlook • Support for parallel (sliced) segments to support partial segment rebuilds and other cool posting list update patterns • Add remaining missing Lucene features to RT index • Index term statistics for ranking • Term vectors • Stored fields
  • 62. Questions? Michael Busch @michibusch michael@twitter.com buschmi@apache.org
  • 63.
  • 65. Searching for top entities within Tweets • Task: Find the best photos in a subset of tweets • We could use a Lucene index, where each photo is a document • Problem: How to update existing documents when the same photos are tweeted again? • In-place posting list updates are hard • Lucene’s updateDocument() is a delete/add operation - expensive and not order-preserving
  • 66. Searching for top entities within Tweets • Task: Find the best photos in a subset of tweets • Could we use our existing time-ordered tweet index? • Facets!
  • 67. Searching for top entities within Tweets Query Doc ids Inverted index Term id Term label Forward Doc id index Document Metadata Facet index Doc id Term ids
  • 68. Storing tweet metadata Facet Doc id index Term ids
  • 69. 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 8 31241 2 Query Searching for top entities within Tweets
  • 70. 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 15 31241 12 85932 8 6748 3 Query Searching for top entities within Tweets
  • 71. Searching for top entities within Tweets 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 15 31241 12 85932 8 6748 3 Query Weighted counts (from engagement features) used for relevance scoring
  • 72. Searching for top entities within Tweets 5 15 9000 9002 100000 100090 Matching doc id Facet index Term ids Top-k heap Id Count 48239 15 31241 12 85932 8 6748 3 Query All query operators can be used. E.g. find best photos in San Francisco tweeted by people I follow
  • 73. Searching for top entities within Tweets Inverted Term id index Term label
  • 74. Searching for top entities within Tweets Id Count Label Count pic.twitter.com/jknui4w 45 pic.twitter.com/dslkfj83 23 pic.twitter.com/acm3ps 15 pic.twitter.com/948jdsd 11 pic.twitter.com/dsjkf15h 8 pic.twitter.com/irnsoa32 5 48239 45 31241 23 85932 15 6748 11 74294 8 3728 5 Inverted index
  • 75. Summary • Indexing tweet entities (e.g. photos) as facets allows to search and rank top-entities using a tweets index • All query operators supported • Documents don’t need to be reindexed • Approach reusable for different use cases, e.g.: best vines, hashtags, @mentions, etc.