SlideShare a Scribd company logo
1 of 19
Lucene Performance
        Workshop




Lucid Imagination, Inc.




                          Lucid Imagination, Inc.   1
Intro


About the speaker and Lucid Imagination
Agenda
 Lucene and performance
 Lucid Gaze for Lucene: UI and API
 Key statistics
 Examples
 Q & A session




                           Lucid Imagination, Inc.




                                                     Lucid Imagination, Inc.   2
Lucene and performance
  Perceived performance issues can have different causes
  Classic JVM problems, classic solutions
   heap size
   garbage collection
   stack size
   HotSpot
  Lucene/Search-related issues: beyond JVM tuning
    Indexing performance: indexing too slow, strange
    slowdowns during indexing
    Search performance: search too slow in general, or for
                           Lucid Imagination, Inc.



    certain types of queries


                                            Lucid Imagination, Inc.   3
Common Lucene performance issues


   Indexing:
     Too many segments being created
     Too many Token-s / TokenStream-s
     Too many Documents / Fields
   Searching:
     Too many IndexReader-s / IndexSearcher-s
     High RAM usage of IndexReader
     Slow response times for certain queries
   Application-level logging may not be up to the task
   Profiler is too low-level and too intrusive
                               Lucid Imagination, Inc.

   We need a lightweight probe to peek at vital Lucene statistics


                                                         Lucid Imagination, Inc.   4
Lucid Gaze for Lucene (LG4L)
Target audience and applications
  Tool for developers
  Performance monitoring
  Statistics collection
  Drop-in replacement for lucene-core-2.4.1.jar




                           Lucid Imagination, Inc.




                                                     Lucid Imagination, Inc.   5
Available information

Statistics (per time unit):
  IndexReader / IndexWriter:
    Number of documents and fields retrieved / created
    Number of IndexWriter / IndexReader / Directory instances created
      And the number of live instances!
    Memory consumption of IndexWriter / IndexReader instances
  Analysis:
    Number of Analyzers / TokenFilters / Tokenizers
    Number of TokenStream-s and Token-s
  Search:
    Number of searches and their average time
                            Lucid Imagination, Inc.
    Number of opened IndexSearcher-s
  Storage:
      Number of Lucene Directory instances created
                                                      Lucid Imagination, Inc.   6
Available information: metrics

Lists and histograms
  Count and a list of Analyzer, Tokenizer, TokenFilter
  instances
  Directory implementations
  Top-N queries:
    Queries with largest numbers of hits
    Queries that took longest to execute


  All this data is available as log, persistent DB and through the API

                              Lucid Imagination, Inc.




                                                        Lucid Imagination, Inc.   7
In-memory and RRD storage

Retaining historical values of collected statistics
  In-memory
    No files, no configuration hassles
    Concise overview periodically written to log (optional)
    Uses Java logging
  RRD (Round-Robin Database)
    Persistent round-robin database
        Single database of a constant size
        E.g. hourly, daily, weekly, monthly, yearly statistics
    Suitable for long-term monitoring
    Many more metrics and statistics tracked
    Can be accessed concurrentlyImagination,other applications
                             Lucid
                                   from Inc.




                                                                 Lucid Imagination, Inc.   8
Configuration

Java properties or gaze.properties
  List of properties supplied as -Dlucid.gaze...
  gaze.properties on classpath
  Configurability:
    Turning on/off selected monitors
    Producing debug output
    Using in-memory or RRD log retention
    Configuring RRD archives (to scale historical data over different
    periods)


                               Lucid Imagination, Inc.




                                                         Lucid Imagination, Inc.   9
API

Facade with static methods: LuceneCore
  Programmatic access to all statistics groups
  Retrieve top-N queries
  Retrieving additional metrics (e.g. histograms of analyzers,
  tokenizers, directory implementations, tracking of IndexReader /
  IndexWriter instances and their memory consumption)
  Enabling / disabling monitors
  Resetting statistics (useful for creating snapshots)


                             Lucid Imagination, Inc.




                                                       Lucid Imagination, Inc.   10
Example: indexing performance tuning

 Based on the contrib/benchmark suite
   Test impact of number of buffered docs
   Other interesting observations
     Number of documents / fields
     Number of tokens / token streams
     Number of IndexReader / Directory instances
     Number of IndexSearchers




                              Lucid Imagination, Inc.




                                                        Lucid Imagination, Inc.   11
Example: console output




                Lucid Imagination, Inc.




                                          Lucid Imagination, Inc.   12
Example: RRD Inspector




               Lucid Imagination, Inc.




                                         Lucid Imagination, Inc.   13
RRD Inspector (2)




             Lucid Imagination, Inc.




                                       Lucid Imagination, Inc.   14
RRD Inspector (3)




             Lucid Imagination, Inc.




                                       Lucid Imagination, Inc.   15
Performance impact of
     performance monitoring
Overhead of using LG4L
  Benchmarks (in contrib/benchmark) slower by ~10-15% on
  average, memory consumption higher by ~10%


  Remember: you can turn off some monitors!




                          Lucid Imagination, Inc.




                                                    Lucid Imagination, Inc.   16
Conclusions


Lucene can perform fantastically
   ... but it can't outmaneuver sub-optimal design or weak
  configuration
LG4L helps to understand the causes of poor performance
  Insight into high-level statistics that relate to Lucene API
  Round-robin database for tracking historical data
  LG4L is lightweight!




                              Lucid Imagination, Inc.




                                                        Lucid Imagination, Inc.   17
Q&A




Download and documentation:
  http://www.lucidimagination.com/Downloads/LucidGaze-for-Lucene
                           Lucid Imagination, Inc.




                                                     Lucid Imagination, Inc.   18
Example: LG4L with Solr


INFO: * AnalysisStats:
INFO: * AnalysisStats:
INFO:
INFO:    counters: {toks=258}
         counters: {toks=258}
INFO:
INFO:    metrics: {tns={=2, WhitespaceTokenizer=8},
         metrics: {tns={=2,
tfs={WordDelimiterFilter=8, StopFilter=8, SynonymFilter=8, LowerCaseFilter=8,
tfs={WordDelimiterFilter=8,                                  LowerCaseFilter=8,
EnglishPorterFilter=8},
EnglishPorterFilter=8},
ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1,
ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1,
org.apache.solr.analysis.TokenizerChain=6,
org.apache.solr.analysis.TokenizerChain=6,
org.apache.solr.schema.FieldType$DefaultAnalyzer=13,
org.apache.lucene.analysis.WhitespaceAnalyzer=1,
org.apache.lucene.analysis.WhitespaceAnalyzer=1,
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer=1}}
INFO: * DocumentStats:
INFO: * DocumentStats:
INFO:
INFO:    counters: {docs=1, fields=20}
         counters: {docs=1,
INFO: * IndexStats:
INFO: * IndexStats:
INFO:
INFO:    counters: {ir_isdC=1, ir_C=4, iw_C=0, ir_newC=7, ir_tpC=12,
         counters: {ir_isdC=1,              iw_C=0,
iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0}
iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0}
INFO: * SearchStats:
INFO: * SearchStats:
INFO:
INFO:    counters: {dfC=11, rwrC=3, rwrT=30265, srchrC=14, srchT=90133076,
         counters: {dfC=11,           rwrT=30265, srchrC=14,
srchC=6}
srchC=6}
INFO: * StoreStats:
INFO: * StoreStats:
INFO:
INFO:    counters: {dirC=8}
         counters: {dirC=8}       Lucid Imagination, Inc.
INFO:
INFO:    metrics: {dir_t={FSDirectory=8}}
         metrics: {dir_t={FSDirectory=8}}


 … but of course you should use LucidGaze for Solr instead!

                                                        Lucid Imagination, Inc.   19

More Related Content

What's hot

Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
Vikrant Chauhan
 

What's hot (20)

Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Postgres index types
Postgres index typesPostgres index types
Postgres index types
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
20150219 初めての「embulk」
20150219 初めての「embulk」20150219 初めての「embulk」
20150219 初めての「embulk」
 
Oracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagridOracle Coherence: in-memory datagrid
Oracle Coherence: in-memory datagrid
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
RESTful API Design Best Practices Using ASP.NET Web API
RESTful API Design Best Practices Using ASP.NET Web APIRESTful API Design Best Practices Using ASP.NET Web API
RESTful API Design Best Practices Using ASP.NET Web API
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
関連記事レコメンドエンジン@Yahoo! JAPAN
関連記事レコメンドエンジン@Yahoo! JAPAN関連記事レコメンドエンジン@Yahoo! JAPAN
関連記事レコメンドエンジン@Yahoo! JAPAN
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
Graylog
GraylogGraylog
Graylog
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
The Elastic ELK Stack
The Elastic ELK StackThe Elastic ELK Stack
The Elastic ELK Stack
 
Benchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on SparkBenchmark MinHash+LSH algorithm on Spark
Benchmark MinHash+LSH algorithm on Spark
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
dm-writeboost-kernelvm
dm-writeboost-kernelvmdm-writeboost-kernelvm
dm-writeboost-kernelvm
 
大量時空間データの処理 ~ 現状の課題と今後OSSが解決すべきこと。(Open Source Conference 2021 Online/Osaka講演資料)
大量時空間データの処理 ~ 現状の課題と今後OSSが解決すべきこと。(Open Source Conference 2021 Online/Osaka講演資料)大量時空間データの処理 ~ 現状の課題と今後OSSが解決すべきこと。(Open Source Conference 2021 Online/Osaka講演資料)
大量時空間データの処理 ~ 現状の課題と今後OSSが解決すべきこと。(Open Source Conference 2021 Online/Osaka講演資料)
 

Viewers also liked

Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
Lucidworks (Archived)
 
The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the market
Paul Williamson
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lr
Lucidworks (Archived)
 
Hellosong
HellosongHellosong
Hellosong
tanica
 

Viewers also liked (20)

Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucene
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid DynamicsFaceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
Faceting with Lucene Block Join Query: Presented by Oleg Savrasov, Grid Dynamics
 
Block join toranomaki
Block join toranomakiBlock join toranomaki
Block join toranomaki
 
学術コンテンツサービスでの活用事例@Lucene/Solr勉強会(2015.5.13)
学術コンテンツサービスでの活用事例@Lucene/Solr勉強会(2015.5.13)学術コンテンツサービスでの活用事例@Lucene/Solr勉強会(2015.5.13)
学術コンテンツサービスでの活用事例@Lucene/Solr勉強会(2015.5.13)
 
第16回Lucene/Solr勉強会 – ランキングチューニングと定量評価 #SolrJP
第16回Lucene/Solr勉強会 – ランキングチューニングと定量評価 #SolrJP第16回Lucene/Solr勉強会 – ランキングチューニングと定量評価 #SolrJP
第16回Lucene/Solr勉強会 – ランキングチューニングと定量評価 #SolrJP
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Nlp4 l intro-20150513
Nlp4 l intro-20150513Nlp4 l intro-20150513
Nlp4 l intro-20150513
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
 
The mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the marketThe mobile as a health hub, and how bluetooth low energy enables the market
The mobile as a health hub, and how bluetooth low energy enables the market
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Coterie 9 11
Coterie 9 11Coterie 9 11
Coterie 9 11
 
Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条
 
Customized Navigation Using SOLR
Customized Navigation Using SOLRCustomized Navigation Using SOLR
Customized Navigation Using SOLR
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lr
 
Hellosong
HellosongHellosong
Hellosong
 
Ob12 01st
Ob12 01stOb12 01st
Ob12 01st
 
Already, just, still, yet
Already, just, still, yetAlready, just, still, yet
Already, just, still, yet
 

Similar to Understanding Lucene Search Performance

SplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's PizzaSplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's Pizza
Splunk
 
SplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.comSplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.com
Splunk
 

Similar to Understanding Lucene Search Performance (20)

Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
December 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopDecember 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over Hadoop
 
SplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's PizzaSplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's Pizza
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
 
dlux - Splunk Technical Overview
dlux - Splunk Technical Overviewdlux - Splunk Technical Overview
dlux - Splunk Technical Overview
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
2015 03-16-elk at-bsides
2015 03-16-elk at-bsides2015 03-16-elk at-bsides
2015 03-16-elk at-bsides
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
SplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.comSplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.com
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Stabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out CeilometerStabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out Ceilometer
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
Splunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT OperationsSplunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT Operations
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Using the Splunk Java SDK
Using the Splunk Java SDKUsing the Splunk Java SDK
Using the Splunk Java SDK
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 

More from Lucidworks (Archived)

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Lucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Lucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Lucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Lucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Lucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Lucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Lucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
Lucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Understanding Lucene Search Performance

  • 1. Lucene Performance Workshop Lucid Imagination, Inc. Lucid Imagination, Inc. 1
  • 2. Intro About the speaker and Lucid Imagination Agenda Lucene and performance Lucid Gaze for Lucene: UI and API Key statistics Examples Q & A session Lucid Imagination, Inc. Lucid Imagination, Inc. 2
  • 3. Lucene and performance Perceived performance issues can have different causes Classic JVM problems, classic solutions heap size garbage collection stack size HotSpot Lucene/Search-related issues: beyond JVM tuning Indexing performance: indexing too slow, strange slowdowns during indexing Search performance: search too slow in general, or for Lucid Imagination, Inc. certain types of queries Lucid Imagination, Inc. 3
  • 4. Common Lucene performance issues Indexing: Too many segments being created Too many Token-s / TokenStream-s Too many Documents / Fields Searching: Too many IndexReader-s / IndexSearcher-s High RAM usage of IndexReader Slow response times for certain queries Application-level logging may not be up to the task Profiler is too low-level and too intrusive Lucid Imagination, Inc. We need a lightweight probe to peek at vital Lucene statistics Lucid Imagination, Inc. 4
  • 5. Lucid Gaze for Lucene (LG4L) Target audience and applications Tool for developers Performance monitoring Statistics collection Drop-in replacement for lucene-core-2.4.1.jar Lucid Imagination, Inc. Lucid Imagination, Inc. 5
  • 6. Available information Statistics (per time unit): IndexReader / IndexWriter: Number of documents and fields retrieved / created Number of IndexWriter / IndexReader / Directory instances created And the number of live instances! Memory consumption of IndexWriter / IndexReader instances Analysis: Number of Analyzers / TokenFilters / Tokenizers Number of TokenStream-s and Token-s Search: Number of searches and their average time Lucid Imagination, Inc. Number of opened IndexSearcher-s Storage: Number of Lucene Directory instances created Lucid Imagination, Inc. 6
  • 7. Available information: metrics Lists and histograms Count and a list of Analyzer, Tokenizer, TokenFilter instances Directory implementations Top-N queries: Queries with largest numbers of hits Queries that took longest to execute All this data is available as log, persistent DB and through the API Lucid Imagination, Inc. Lucid Imagination, Inc. 7
  • 8. In-memory and RRD storage Retaining historical values of collected statistics In-memory No files, no configuration hassles Concise overview periodically written to log (optional) Uses Java logging RRD (Round-Robin Database) Persistent round-robin database Single database of a constant size E.g. hourly, daily, weekly, monthly, yearly statistics Suitable for long-term monitoring Many more metrics and statistics tracked Can be accessed concurrentlyImagination,other applications Lucid from Inc. Lucid Imagination, Inc. 8
  • 9. Configuration Java properties or gaze.properties List of properties supplied as -Dlucid.gaze... gaze.properties on classpath Configurability: Turning on/off selected monitors Producing debug output Using in-memory or RRD log retention Configuring RRD archives (to scale historical data over different periods) Lucid Imagination, Inc. Lucid Imagination, Inc. 9
  • 10. API Facade with static methods: LuceneCore Programmatic access to all statistics groups Retrieve top-N queries Retrieving additional metrics (e.g. histograms of analyzers, tokenizers, directory implementations, tracking of IndexReader / IndexWriter instances and their memory consumption) Enabling / disabling monitors Resetting statistics (useful for creating snapshots) Lucid Imagination, Inc. Lucid Imagination, Inc. 10
  • 11. Example: indexing performance tuning Based on the contrib/benchmark suite Test impact of number of buffered docs Other interesting observations Number of documents / fields Number of tokens / token streams Number of IndexReader / Directory instances Number of IndexSearchers Lucid Imagination, Inc. Lucid Imagination, Inc. 11
  • 12. Example: console output Lucid Imagination, Inc. Lucid Imagination, Inc. 12
  • 13. Example: RRD Inspector Lucid Imagination, Inc. Lucid Imagination, Inc. 13
  • 14. RRD Inspector (2) Lucid Imagination, Inc. Lucid Imagination, Inc. 14
  • 15. RRD Inspector (3) Lucid Imagination, Inc. Lucid Imagination, Inc. 15
  • 16. Performance impact of performance monitoring Overhead of using LG4L Benchmarks (in contrib/benchmark) slower by ~10-15% on average, memory consumption higher by ~10% Remember: you can turn off some monitors! Lucid Imagination, Inc. Lucid Imagination, Inc. 16
  • 17. Conclusions Lucene can perform fantastically ... but it can't outmaneuver sub-optimal design or weak configuration LG4L helps to understand the causes of poor performance Insight into high-level statistics that relate to Lucene API Round-robin database for tracking historical data LG4L is lightweight! Lucid Imagination, Inc. Lucid Imagination, Inc. 17
  • 18. Q&A Download and documentation: http://www.lucidimagination.com/Downloads/LucidGaze-for-Lucene Lucid Imagination, Inc. Lucid Imagination, Inc. 18
  • 19. Example: LG4L with Solr INFO: * AnalysisStats: INFO: * AnalysisStats: INFO: INFO: counters: {toks=258} counters: {toks=258} INFO: INFO: metrics: {tns={=2, WhitespaceTokenizer=8}, metrics: {tns={=2, tfs={WordDelimiterFilter=8, StopFilter=8, SynonymFilter=8, LowerCaseFilter=8, tfs={WordDelimiterFilter=8, LowerCaseFilter=8, EnglishPorterFilter=8}, EnglishPorterFilter=8}, ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1, ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1, org.apache.solr.analysis.TokenizerChain=6, org.apache.solr.analysis.TokenizerChain=6, org.apache.solr.schema.FieldType$DefaultAnalyzer=13, org.apache.lucene.analysis.WhitespaceAnalyzer=1, org.apache.lucene.analysis.WhitespaceAnalyzer=1, org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer=1}} INFO: * DocumentStats: INFO: * DocumentStats: INFO: INFO: counters: {docs=1, fields=20} counters: {docs=1, INFO: * IndexStats: INFO: * IndexStats: INFO: INFO: counters: {ir_isdC=1, ir_C=4, iw_C=0, ir_newC=7, ir_tpC=12, counters: {ir_isdC=1, iw_C=0, iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0} iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0} INFO: * SearchStats: INFO: * SearchStats: INFO: INFO: counters: {dfC=11, rwrC=3, rwrT=30265, srchrC=14, srchT=90133076, counters: {dfC=11, rwrT=30265, srchrC=14, srchC=6} srchC=6} INFO: * StoreStats: INFO: * StoreStats: INFO: INFO: counters: {dirC=8} counters: {dirC=8} Lucid Imagination, Inc. INFO: INFO: metrics: {dir_t={FSDirectory=8}} metrics: {dir_t={FSDirectory=8}} … but of course you should use LucidGaze for Solr instead! Lucid Imagination, Inc. 19