SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Beyond TF-IDF
Stephen Murtagh
etsy.com
Beyond tf idf why, what & how
20,000,000 items
1,000,000 sellers
Beyond tf idf why, what & how
15,000,000 daily
searches
80,000,000 daily calls to Solr
Etsy Engineering
• Code as Craft - our engineering blog
• http://codeascraft.etsy.com/
• Continuous Deployment
• https://github.com/etsy/deployinator
• Experiment-driven culture
• Hybrid engineering roles
• Dev-Ops
• Data-Driven Products
Etsy Search
• 2 search clusters: Flip and Flop
• Master -> 20 slaves
• Only one cluster takes traffic
• Thrift (no HTTP endpoint)
• BitTorrent for index replication
• Solr 4.1
• Incremental index every 12 minutes
Beyond TF-IDF
•Why?
•What?
•How?
Beyond tf idf why, what & how
Luggage tags
“unique bag”
q = unique+bag
q = unique+bag
>
Scoring in Lucene
Scoring in Lucene
Fixed for any given query
constant
Scoring in Lucene
f(term, document)
f(term)
Scoring in Lucene
User content
Only measure rarity
IDF(“unique”)
4.429547
IDF(“bag”)
4.32836>
q = unique+bag
“unique unique bag” “unique bag bag”
>
“unique” tells us
nothing...
Stop words
• Add “unique” to stop word list?
• What about “handmade” or “blue”?
• Low-information words can still be useful
for matching
• ... but harmful for ranking
Why not replace IDF?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
•How?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
•How?
What do we replace it
with?
Benefits of IDF
I1 =





doc1 doc2 doc3 . . . docn
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





Benefits of IDF
I1 =





doc1 doc2 doc3 . . . docn
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





IDF(jewelry) = 1 + log(
n

d id,jewelry
)
Sharding
I1 =





doc1 doc2 doc3 . . . dock
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





I2 =





dock+1 dock+2 dock+3 . . . docn
art 6 1 0 . . . 1
jewelry 0 1 3 . . . 0
...
...
termm 0 1 1 . . . 0





Sharding
I1 =





doc1 doc2 doc3 . . . dock
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





I2 =





dock+1 dock+2 dock+3 . . . docn
art 6 1 0 . . . 1
jewelry 0 1 3 . . . 0
...
...
termm 0 1 1 . . . 0





IDF(jewelry) = 1 + log(
n

d id,jewelry
)
Sharding
I1 =





doc1 doc2 doc3 . . . dock
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





I2 =





dock+1 dock+2 dock+3 . . . docn
art 6 1 0 . . . 1
jewelry 0 1 3 . . . 0
...
...
termm 0 1 1 . . . 0





IDF1(jewelry) = IDF2(jewelry) = IDF(jewelry)
Sharded IDF options
• Ignore it - Shards score differently
• Shards exchange stats - Messy
• Central source distributes IDF to shards
Information Gain
• P(x) - Probability of x appearing in a listing
• P(x|y) - Probability of x appearing given y appears
info(y) = D(P(X|y)||P(X))
info(y) = Σx∈X log(
P(x|y)
P(x)
) ∗ P(x|y)
Term Info(x) IDF
unique 0.26 4.43
bag 1.24 4.33
pattern 1.20 4.38
original 0.85 4.38
dress 1.31 4.42
man 0.64 4.41
photo 0.74 4.37
stone 0.92 4.35
Similar IDF
Term Info(x) IDF
unique 0.26 4.39
black 0.22 3.32
red 0.22 3.52
handmade 0.20 3.26
two 0.32 5.64
white 0.19 3.32
three 0.37 6.19
for 0.21 3.59
Similar Info Gain
q = unique+bag
Using IDF
score(“unique unique bag”)

score(“unique bag bag”)
Using information gain
score(“unique unique bag”)

score(“unique bag bag”)
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
•How?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
• Information gain accounts for term quality
•How?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
• Information gain accounts for term quality
•How?
Listing Quality
• Performance relative
to rank
• Hadoop: logs - hdfs
• cron: hdfs - master
• bash: master - slave
• Loaded as external file
field
Computing info gain
I1 =





doc1 doc2 doc3 . . . docn
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





info(y) = D(P(X|y)||P(X))
info(y) = Σx∈X log(
P(x|y)
P(x)
) ∗ P(x|y)
Hadoop
• Brute-force
• Count all terms
• Count all co-occuring terms
• Construct distributions
• Compute info gain for all terms
File Distribution
• cron copies score file to master
• master replicates file to slaves
infogain=`find /search/data/ -maxdepth 1 -type f -
name info_gain.* -print | sort | tail -n 1`
scp $infogain user@$slave:$infogain
File Distribution
schema.xml
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
• Information gain accounts for term quality
•How?
• Hadoop + similarity factory = win
Fast Deploys,
Careful Testing
• Idea
• Proof of Concept
• Side-By-Side
• A/B test
• 100% Live
Side-by-Side
Beyond tf idf why, what & how
Beyond tf idf why, what & how
Beyond tf idf why, what & how
Relevant != High quality
A/B Test
• Users are randomly assigned to A or B
• A sees IDF-based results
• B sees info gain-based results
A/B Test
• Users are randomly assigned to A or B
• A sees IDF-based results
• B sees info gain-based results
• Small but significant decrease in clicks,
page views, etc.
More homogeneous results
Lower average quality score
Next Steps
Parameter Tweaking...
Rebalance relevancy and quality signals in score
The Future
Latent Semantic
Indexing in Solr/Lucene
Latent Semantic
Indexing
• In TF-IDF, documents are sparse vectors in
term space
• LSI re-maps these to dense vectors in
“concept” space
• Construct transformation matrix:
• Load file at index and query time
• Re-map query and documents
Rm
+
Rr
Tr×m
CONTACT
Stephen Murtagh
smurtagh@etsy.com

Más contenido relacionado

Was ist angesagt?

EuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To GolangEuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To GolangMax Tepkeev
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...Edge AI and Vision Alliance
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Qiangning Hong
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Platonov Sergey
 
Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011Andrey Breslav
 
Introduction to functional programming using Ocaml
Introduction to functional programming using OcamlIntroduction to functional programming using Ocaml
Introduction to functional programming using Ocamlpramode_ce
 
The Next Great Functional Programming Language
The Next Great Functional Programming LanguageThe Next Great Functional Programming Language
The Next Great Functional Programming LanguageJohn De Goes
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersKimikazu Kato
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torchRiza Fahmi
 
The best language in the world
The best language in the worldThe best language in the world
The best language in the worldDavid Muñoz Díaz
 
Humble introduction to category theory in haskell
Humble introduction to category theory in haskellHumble introduction to category theory in haskell
Humble introduction to category theory in haskellJongsoo Lee
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 

Was ist angesagt? (20)

EuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To GolangEuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To Golang
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Dive Into PyTorch
Dive Into PyTorchDive Into PyTorch
Dive Into PyTorch
 
MTL Versus Free
MTL Versus FreeMTL Versus Free
MTL Versus Free
 
Scala @ TomTom
Scala @ TomTomScala @ TomTom
Scala @ TomTom
 
Hammurabi
HammurabiHammurabi
Hammurabi
 
Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”
 
Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011
 
Introduction to functional programming using Ocaml
Introduction to functional programming using OcamlIntroduction to functional programming using Ocaml
Introduction to functional programming using Ocaml
 
The Next Great Functional Programming Language
The Next Great Functional Programming LanguageThe Next Great Functional Programming Language
The Next Great Functional Programming Language
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torch
 
The best language in the world
The best language in the worldThe best language in the world
The best language in the world
 
Python tour
Python tourPython tour
Python tour
 
Humble introduction to category theory in haskell
Humble introduction to category theory in haskellHumble introduction to category theory in haskell
Humble introduction to category theory in haskell
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Java Class Design
Java Class DesignJava Class Design
Java Class Design
 

Ähnlich wie Beyond tf idf why, what & how

More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...Alex Sadovsky
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with javaGary Sieling
 
Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Peter Kofler
 
Yin Yangs of Software Development
Yin Yangs of Software DevelopmentYin Yangs of Software Development
Yin Yangs of Software DevelopmentNaveenkumar Muguda
 
Multidimensional Indexing
Multidimensional IndexingMultidimensional Indexing
Multidimensional IndexingDigvijay Singh
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm DHIVYADEVAKI
 
Hash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataHash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataLuca Mastrostefano
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningSuntae Kim
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedis Labs
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With ScalaTomer Gabel
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
 
Python Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and GeventPython Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and Geventemptysquare
 

Ähnlich wie Beyond tf idf why, what & how (20)

More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with java
 
Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)
 
Yin Yangs of Software Development
Yin Yangs of Software DevelopmentYin Yangs of Software Development
Yin Yangs of Software Development
 
Multidimensional Indexing
Multidimensional IndexingMultidimensional Indexing
Multidimensional Indexing
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm
 
Hash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataHash - A probabilistic approach for big data
Hash - A probabilistic approach for big data
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
Taint analysis
Taint analysisTaint analysis
Taint analysis
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with Redis
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With Scala
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
Python Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and GeventPython Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and Gevent
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptxBBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptxProf. Kanchan Kumari
 
2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...
2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...
2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...Marlene Maheu
 
Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxDhatriParmar
 
Plant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptxPlant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptxHimansu10
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
Riti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxRiti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxDhatriParmar
 
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYSDLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYSTeacherNicaPrintable
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
POST ENCEPHALITIS case study Jitendra bhargav
POST ENCEPHALITIS case study  Jitendra bhargavPOST ENCEPHALITIS case study  Jitendra bhargav
POST ENCEPHALITIS case study Jitendra bhargavJitendra Bhargav
 
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...Subham Panja
 
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...AKSHAYMAGAR17
 
Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024bsellato
 
VIT336 – Recommender System - Unit 3.pdf
VIT336 – Recommender System - Unit 3.pdfVIT336 – Recommender System - Unit 3.pdf
VIT336 – Recommender System - Unit 3.pdfArthyR3
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...Sandy Millin
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsDhatriParmar
 
The First National K12 TUG March 6 2024.pdf
The First National K12 TUG March 6 2024.pdfThe First National K12 TUG March 6 2024.pdf
The First National K12 TUG March 6 2024.pdfdogden2
 
Metabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptxMetabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptxDr. Santhosh Kumar. N
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudDr. Bruce A. Johnson
 
ICS2208 Lecture4 Intelligent Interface Agents.pdf
ICS2208 Lecture4 Intelligent Interface Agents.pdfICS2208 Lecture4 Intelligent Interface Agents.pdf
ICS2208 Lecture4 Intelligent Interface Agents.pdfVanessa Camilleri
 

Último (20)

BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptxBBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
 
2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...
2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...
2024 March 11, Telehealth Billing- Current Telehealth CPT Codes & Telehealth ...
 
Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptx
 
Plant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptxPlant Tissue culture., Plasticity, Totipotency, pptx
Plant Tissue culture., Plasticity, Totipotency, pptx
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Riti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptxRiti theory by Vamana Indian poetics.pptx
Riti theory by Vamana Indian poetics.pptx
 
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYSDLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
 
ANOVA Parametric test: Biostatics and Research Methodology
ANOVA Parametric test: Biostatics and Research MethodologyANOVA Parametric test: Biostatics and Research Methodology
ANOVA Parametric test: Biostatics and Research Methodology
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
POST ENCEPHALITIS case study Jitendra bhargav
POST ENCEPHALITIS case study  Jitendra bhargavPOST ENCEPHALITIS case study  Jitendra bhargav
POST ENCEPHALITIS case study Jitendra bhargav
 
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
THYROID HORMONE.pptx by Subham Panja,Asst. Professor, Department of B.Sc MLT,...
 
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
 
Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024Awards Presentation 2024 - March 12 2024
Awards Presentation 2024 - March 12 2024
 
VIT336 – Recommender System - Unit 3.pdf
VIT336 – Recommender System - Unit 3.pdfVIT336 – Recommender System - Unit 3.pdf
VIT336 – Recommender System - Unit 3.pdf
 
2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...2024.03.16 How to write better quality materials for your learners ELTABB San...
2024.03.16 How to write better quality materials for your learners ELTABB San...
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian Poetics
 
The First National K12 TUG March 6 2024.pdf
The First National K12 TUG March 6 2024.pdfThe First National K12 TUG March 6 2024.pdf
The First National K12 TUG March 6 2024.pdf
 
Metabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptxMetabolism , Metabolic Fate& disorders of cholesterol.pptx
Metabolism , Metabolic Fate& disorders of cholesterol.pptx
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced Stud
 
ICS2208 Lecture4 Intelligent Interface Agents.pdf
ICS2208 Lecture4 Intelligent Interface Agents.pdfICS2208 Lecture4 Intelligent Interface Agents.pdf
ICS2208 Lecture4 Intelligent Interface Agents.pdf
 

Beyond tf idf why, what & how